Skip to content

Instantly share code, notes, and snippets.

@skymaiden
Last active June 9, 2025 21:00
Show Gist options
  • Select an option

  • Save skymaiden/8b472bbb01ea9bdfca43f64c32e583a6 to your computer and use it in GitHub Desktop.

Select an option

Save skymaiden/8b472bbb01ea9bdfca43f64c32e583a6 to your computer and use it in GitHub Desktop.
Notes from a front-end dev on the Hugging Face "Agents Course"

Learning AI Agents with Hugging Face:
Notes from a front-end web developer

I am a front-end developer who has been working with HTML/CSS/Javascript (and some PHP) for many years, building production-ready web apps using various frameworks and libraries.

I recently learned Python to start playing with the APIs of LLMs like Google's Gemini and OpenAI's GPT-4o. I have also built a few multi-agent systems using CrewAI, deploying them with a simple API wrapper made with Flask.

This space is a place where I'm taking notes for the Agents Course by Hugging Face. 🤗

The course content is great, and the Python syntax isn't a problem – but the Hugging Face platform is super confusing for a developer like me who is used to Github, Codepen, etc.

The UX makes me feel like an idiot, and there are a lot of assumptions in the course about things that are very specific to Hugging Face or the Machine Learning community – that's why I'm documenting my learnings here.

The course

Note: the course is not hosted on a normal Learning Management System, but a Git-based wiki – so it doesn't track your progress. Make sure to remember where you left off!

Jupyter notebooks and Google Collab

Jupyter notebooks are interactive Python notebooks, where you can combine markdown syntax and executable Python code in a single file. Apparently they're used a lot by data scientists, researchers, and educators.

Google Collab is a hosted Jupyter notebook service. Notebook files can be saved to Google Drive or Github.

The Hugging Face Agents Course provides some of the code examples in Jupyter notebooks, that you can open in Google Collab directly via a small "Open in Colab" button. Save a copy of the provided notebooks to your own Google Drive or Github account.

Executing code blocks

The code blocks can be executed either by clicking the little "play" icon next to each one, or by putting keyboard focus inside the code block and hitting SHIFT + ENTER keys.

Each code block should be run in order.

Installing dependencies

Unlike in a local environment, in Jupyter notebooks we have to prefix pip install with a !.

Example: !pip install smolagents.

Managing secrets

Secrets on Google Collab are stored in the "Secrets" tab in the left sidebar ("key" icon). To run the course notebooks, you'll need a Hugging Face token:

  • read access if you just want to use the Hugging Face LLMs (inference)
  • write access if you also want to publish your notebook to Hugging Face Spaces as a working app.

When you want to use these variables, you'll need to add something like this to your notebook code block:

from google.colab import userdata
userdata.get("THE_NAME_OF_MY_TOKEN")

(which is similar to what we'd do in a local environment with os.getenv("THE_NAME_OF_MY_TOKEN").)

Using the Hugging Face LLM endpoints (the Inference API)

The notebooks have a code block that imports a login form. Make sure to run this block, and log in using a write token.

If you get 402 Client Error: Payment Required errors when running the code blocks that call the LLMs, try an alternative endpoint:

Example using the InferenceClient:

from huggingface_hub import InferenceClient
client = InferenceClient("https://jc26mwg228mkj8dw.us-east-1.aws.endpoints.huggingface.cloud") 

Example using the HfApiModel:

from smolagents import CodeAgent, HfApiModel
agent = CodeAgent(
  tools=[],
  model=HfApiModel("https://pflgm2locj2t89co.us-east-1.aws.endpoints.huggingface.cloud")
)

Using other LLM providers (outside Hugging Face)

If you prefer to use other LLM providers, you can use LiteLLMModel.

Example using the free Gemini models:

!pip install 'smolagents[litellm]'
from smolagents import LiteLLMModel
from google.colab import userdata
model = LiteLLMModel(
    model_id="gemini/gemini-2.0-flash-exp",
    api_key=userdata.get('GOOGLE_API_KEY') # (replace `userdata.get` with `os.getenv` if working in a normal local environment)
)

Hugging Face Spaces

Hugging Face spaces are a kind of git-based hosting solution for deploying small apps and demos – kind of like a mashup between Github, Codepen, and a container deployment service.

It has built-in support for Streamlit and Gradio (two Python frameworks for building UIs quickly), as well as Docker containers, and normal HTML/Javascript apps.

Navigation

The main pages that interest us are:

  • App: renders the functional application (like Codepen)
  • Files: contains the source code (like Github)
  • Settings: contains secrets (and lots of other stuff)

You can view build logs by clicking a tiny icon displayed next to the build status (near the space title).

Dependencies in requirements.txt are automatically installed.

The README.md file is not a real human-readable file, but is used for space configuration! Sometimes you might see a button at the top of the README, to upgrade SDK dependencies like Gradio.

Publish a smolagents notebook to Hugging Face Spaces

The notebooks have example smolagents that are published as working apps to Spaces. If you try to run it without passing your token, you'll get a 403 Client Error: Forbidden error.

  1. First create a new Hugging Face Space.
  2. Then pass your token to the push_to_hub call.
from google.colab import userdata  # needed on Jupyter notebooks on Google Collab
space='your_real_username/The_Space_You_Just_Created' 
token=userdata.get(YOUR_HF_TOKEN)

agent.push_to_hub(space, token=token)

The push_to_hub does some magic and creates a functional app with everything split into multiple files, that can be run in the "App" view. I haven't looked at the source code yet to understand what/how it's doing that.

If you choose Gradio when creating your space, the app will deploy itself as a Gradio chat app, where you can chat to your agent. Example: Alfred party planner

Smolagents

smolagents is a minimalist Python AI agent framework by Hugging Face. It uses a code-first approach where agents write Python actions directly. It's fun for quick experimentations and simple applications, but I quickly ran into limitations due to the way it was designed (e.g. it was great for mathematical and computational tasks, but I struggled to make a researcher/writer/reviewer multi-agent workflow).

Unit 2.1 – Importing a Tool from the Hub

In Unit 2.1 there is an example of importing a smolagents tool directly from Hugging Face Spaces, but I was getting 402 Client Error: Payment Required errors.

So I duplicated the tool to my own space, and reimplemented the image generation using Gemini image generation instead (free of cost):

from smolagents import Tool

from google import genai
from google.genai import types
from PIL import Image # pillow
from io import BytesIO
import os

class TextToImageTool(Tool):
    description = "This tool creates an image according to a prompt, which is a text description."
    name = "image_generator"
    inputs = {
        "prompt": {
            "type": "string", 
            "description": "The image generator prompt. Don't hesitate to add details in the prompt to make the image look better, like 'high-res, photorealistic', etc."
        },
        "api_key": {
            "type": "string", 
            "description": "The Google Gemini API key. This is super important for the tool to work!",
            "default": None, 
            "nullable": True,
        }
    }
    output_type = "image"

    def forward(self, prompt, api_key=os.getenv("GEMINI_API_KEY")):
        client = genai.Client(api_key=api_key)

        response = client.models.generate_content(
            model="gemini-2.0-flash-exp-image-generation",
            contents=f"{prompt}",
            config=types.GenerateContentConfig(
                response_modalities=['Text', 'Image']
            )
        )

        for part in response.candidates[0].content.parts:
            if part.inline_data is not None:
                return Image.open(BytesIO((part.inline_data.data)))

Then in my Jupyter notebook I pass the API key as an additional arg:

from smolagents import load_tool, CodeAgent, HfApiModel

image_generation_tool = load_tool(
    "skymaiden/text-to-image",
    trust_remote_code=True
)

agent = CodeAgent(
    tools=[image_generation_tool],
    model=HfApiModel("https://pflgm2locj2t89co.us-east-1.aws.endpoints.huggingface.cloud")
)

agent.run(
    "Generate an image of a luxurious superhero-themed party at Wayne Manor with made-up superheros.",
    additional_args={"api_key": userdata.get('GOOGLE_API_KEY')}
)

The images are terrifying, but it works!

Unit 2.1 – Importing Hugging Face Spaces as tools

In Unit 2.1 there is an example of importing a Gradio app hosted on a Hugging Face Space as a tool to use in smolagents.

I had issues where it would correctly load the space as an API, but it kept getting errors: Error in generating model output: (ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')), '(Request ID: f455c4f2-46bb-43d1-beae-db299f733f7a)').

This error might have been caused by the Hugging Face Inference API endpoint I was using, or network instability, server-side issues, or exceeding rate limits on the remote server. Retrying it a couple of times worked, so in production I would implement a retry mechanism.

MCP server tools

In Unit 2 there is an example of importing tools from an MCP server, but the code contains errors: it's missing the model (required by the CodeAgent), and the command should be uvx.

The MCP server example is not in the Jupyter notebook, but it's easy to get a local version up and running to adapt the example code:

uv init && uv add mcp smolagents "smolagents[mcp]"
import os
from smolagents import ToolCollection, CodeAgent, HfApiModel # or LiteLLMModel if you prefer
from mcp import StdioServerParameters

def main():
    server_parameters = StdioServerParameters(
        command="uvx", # invokes a tool without installing it
        args=["[email protected]"],
        env={"UV_PYTHON": "3.12", **os.environ},
    )

    with ToolCollection.from_mcp(server_parameters) as tool_collection:
        agent = CodeAgent(
            tools=[*tool_collection.tools],
            add_base_tools=True,
            model=HfApiModel("https://pflgm2locj2t89co.us-east-1.aws.endpoints.huggingface.cloud"),
        )

        agent.run("Please find a remedy for hangover.")

if __name__ == "__main__":
    main()

Unit 2.1 – Vision agents using Helium/Selenium

In Unit 2.1 there is a code example provided in a Python file to run locally.

Since the file was written, the smolagents CLI made a change to the python_executor, which breaks the execution (see this answer in the Hugging Face forum). We need to remove the 2nd argument:

- agent.python_executor("from helium import *", agent.state)
+ agent.python_executor("from helium import *")

Then the script should run correctly – but using an OpenAI key (oops, accidentally spent $0.03).

Running the script with Gemini

To use the script with Gemini, I made some changes to the parse_arguments function:

parser.add_argument(
    "--model-id",
    type=str,
-   default="gpt-4o",
+   default="gemini/gemini-2.0-flash-exp",
    help="The model ID to use for the specified model type",
)
+ parser.add_argument(
+     "--api-key",
+     type=str,
+     default=os.getenv("GEMINI_API_KEY"),
+     help="The API key to use for the specified model type",
+ )

Then I also updated the call to load_tool:

- model = load_model(args.model_type, args.model_id)
+ model = load_model(args.model_type, args.model_id, api_key=args.api_key)

Unit 2.1 – Final coding quiz

⚠️ The quiz is missing some important specifications that the LLM reviewer uses to evaluate your solutions. The LLM reviewer also does NOT seem to have knowledge from the official documentation or source code, and ONLY compares against its reference solution (which is valid but not the only solution).

Things to know (not spoilers)
  1. It expects the Qwen/Qwen2.5-Coder-32B-Instruct model, even though that's the default model.
  2. It expects a specific max_steps value and some unrelated-to-web-search authorized imports.
  3. Its reference solution is incorrect (the documentation and source code show there is no sandbox property on a code agent, and E2BSandbox does not exist). It also expects unrelated authorized imports.

Llamaindex

LlamaIndex is a Python framework that connects LLMs with custom data sources. It provides tools to ingest, structure, and retrieve information from various formats through optimized indexes. It's great for building RAG applications and context-aware AI assistants.

I found the course module on Llamaindex super confusing and ordered in a way that didn't make sense to me. I didn't bother using the Jupyter notebooks this time, and only wrote my scripts in my local environment.

But the official Llamaindex documentation is really great:

Using a local LLM with Ollama

Until now I hadn't had any issues using Gemini models, which can be used for free with pretty good rate limits. But I hit the limits while playing with Llamaindex, so I had to try another provider.

Ollama is a tool that lets you run open-source LLMs on your computer or a home server. It's especially useful for working with sensitive data, because your data and all your LLM interactions stay local and don't leave your network.

Performance

Ideally you need a performant machine with a good chip and GPU, but there are smaller quantized models that can run on older machines too.

On my work computer models run 100% on the GPU, meaning I can keep using the computer to do other things. On my personal computer models run 100% on the CPU, so it heats up when it needs to do harder work and it can't handle memory-intensive tasks.

On both computers I stick to 3-4b variants, and the models provided by Ollama have quantization, which makes them smaller and easier to run on less performant machines – but also a bit less precise (kind of comparable to image compression).

Running an LLM

When you run the app, it starts a server but no models. To run an LLM you'll need to pull and run a model with something like ollama run llama3.2. This will also start a chat session in the terminal, but we don't need that to use the API, so you can quit the chat.

You can see all the currently running models with ollama ps. You can stop them individually with something like ollama stop llama3.2.

Using the LLM in LLamaindex

Replacing Gemini with an Ollama model with Llamaindex is easy:

# LLM
- from llama_index.llms.google_genai import GoogleGenAI
- gemini_api_key = os.getenv("GEMINI_API_KEY")
- llm = GoogleGenAI(
-     model="gemini-2.0-flash",
-     api_key=gemini_api_key,
- )
+ from llama_index.llms.ollama import Ollama
+ llm = Ollama(
+     model="llama3.2",
+     request_timeout=60,
+ )

# Embeddings
- from llama_index.embeddings.google_genai import GoogleGenAIEmbedding
- gemini_api_key = os.getenv("GEMINI_API_KEY")
- Settings.embed_model = GoogleGenAIEmbedding(
-     model="models/embedding-001",
-     api_key=gemini_api_key,
- )
+ from llama_index.embeddings.ollama import OllamaEmbedding
+ Settings.embed_model = OllamaEmbedding(
+     model_name="llama3.2",
+ )

Why is Ollama so slow?

If you notice Ollam models seem to run fine via the CLI chat, but extremely slowly via the API, know that the CLI chat uses a small context window as well as streaming responses.

When you use the API in the same way, perceived performance is much better! Waiting 20 seconds while looking at a black terminal window is definitely not the same experience as waiting 3 seconds then seeing each word printed progressively.

Verbose output can also make it easier to see if it's stuck somewhere or just slowly chugging through.

My agents keep saying they can't do things

Something important to keep in mind is that not all models and variants have the same capabilities – so if you try and get an LLM to do something it can't (e.g. tool calling, prompting with images, embedding), it will fail miserably.

The Ollama website makes it quite easy to see which models have which capabilities, be sure to check whether they only apply to certain variants though (e.g. Gemma3 has vision only at 4b and up).

You can install multiple LLMs though, and use different ones in different cases. One thing you can't do though is create vector embeddings with one LLM, and read them with another! They seem to be proprietary for now at least.

LangGraph

LangGraph is a Python AI agent framework that builds on the LangChain ecosystem. It provides a very structured approach for creating AI agents and orchestrating complex workflows.

I found the first practical exercise in the course too dense for a starter project (I like to start small then add features one by one). I tried to follow the LangGraph Quickstart on the official documentation, but quickly got bogged down in the details. So instead I watched the AI Agents in LangGraph course on Deeplearning.ai, which I found super helpful!

Exercise: Building your first LangGraph

When I came back to do the HF exercise, I found it had multiple errors:

  1. EmailState contains duplicate keys.
  2. draft_response is used both as a state and as a node name, so one has to be renamed.
  3. The classify_email node is not robust enough to handle real-life fuzzy LLM output.

That 3rd issue is annoying because regardless of how the word "spam" might be used in the LLM response, if it's there then the is_spam state is set to true:

is_spam = "spam" in response_text and "not spam" not in response_text

So if the LLM writes stuff like this and decides that the email is legitimate, it will still be categorized as spam because it wrote that naughty word:

"This is not necessarily a red flag for spam: firstly..."


If the LLM response contains both the words "spam" and "reason", then spam_reason is set to whatever text comes after the word "reason".

if is_spam and "reason:" in response_text:
        spam_reason = response_text.split("reason:")[-1].strip()

So if the LLM writes things like this, it will get a spam_reason with various amounts of random text as the value:

"This suggests a legitimate connection, and a reason for mr. smith to be..."

As a quickfix to the 3rd point, I added an expected output to the end of the classify_email prompt, and updated the conditions:

Your response should be in the following format:
is_spam: true/false
spam_reason: "reason" (if spam)
email_category: "category" (if legitimate)
    
Example response for a spam email:
is_spam: true
spam_reason: "This email is a phishing attempt."

Example response for a legitimate email:
is_spam: false
email_category: "thank you"

Please provide a clear and concise response.
- is_spam = "spam" in response_text and "not spam" not in response_text
+ is_spam = "is_spam: true" in response_text

- spam_reason = response_text.split("reason:")[1].strip()
+ spam_reason = response_text.split("spam_reason:")[1].strip()

Not the most robust or production ready, but at least it gives more consistent results.

Exercise: Document analysis graph

So far I've sometimes just been using Gemini's openai-compatible API, to make it easier to switch between providers and see results of different models (they kind of have different personalities!).

But it's still in beta and didn't work for this exercise, I kept getting this error:

openai.BadRequestError: Error code: 400 - [{'error': {'code': 400, 'message': 'Expected string or list of content parts, got: null', 'status': 'INVALID_ARGUMENT'}}]

So I just switched back to using the Gemini API instead, using Langchain's Google GenAI module:

- from langchain_openai import ChatOpenAI
- llm = ChatOpenAI(
-     model="gemini-2.0-flash",
-     api_key=os.getenv("GOOGLE_API_KEY"),
-     base_url="https://generativelanguage.googleapis.com/v1beta/openai",
- )
- llm_with_tools = llm.bind_tools(tools, parallel_tool_calls=False)
+ from langchain_google_genai import ChatGoogleGenerativeAI
+ llm = ChatGoogleGenerativeAI(
+     model="gemini-2.0-flash",
+     google_api_key=os.getenv("GOOGLE_API_KEY"),
+ )
+ llm_with_tools = llm.bind_tools(tools)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment