You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Learning AI Agents with Hugging Face: Notes from a front-end web developer
I am a front-end developer who has been working with HTML/CSS/Javascript (and some PHP) for many years, building production-ready web apps using various frameworks and libraries.
I recently learned Python to start playing with the APIs of LLMs like Google's Gemini and OpenAI's GPT-4o.
I have also built a few multi-agent systems using CrewAI, deploying them with a simple API wrapper made with Flask.
This space is a place where I'm taking notes for the Agents Course by Hugging Face. 🤗
The course content is great, and the Python syntax isn't a problem – but the Hugging Face platform is super confusing for a developer like me who is used to Github, Codepen, etc.
The UX makes me feel like an idiot, and there are a lot of assumptions in the course about things that are very specific to Hugging Face or the Machine Learning community – that's why I'm documenting my learnings here.
The course
Note: the course is not hosted on a normal Learning Management System, but a Git-based wiki – so it doesn't track your progress. Make sure to remember where you left off!
Jupyter notebooks are interactive Python notebooks, where you can combine markdown syntax and executable Python code in a single file.
Apparently they're used a lot by data scientists, researchers, and educators.
Google Collab is a hosted Jupyter notebook service. Notebook files can be saved to Google Drive or Github.
The Hugging Face Agents Course provides some of the code examples in Jupyter notebooks, that you can open in Google Collab directly via a small "Open in Colab" button.
Save a copy of the provided notebooks to your own Google Drive or Github account.
Executing code blocks
The code blocks can be executed either by clicking the little "play" icon next to each one, or by putting keyboard focus inside the code block and hitting SHIFT + ENTER keys.
Each code block should be run in order.
Installing dependencies
Unlike in a local environment, in Jupyter notebooks we have to prefix pip install with a !.
Example: !pip install smolagents.
Managing secrets
Secrets on Google Collab are stored in the "Secrets" tab in the left sidebar ("key" icon).
To run the course notebooks, you'll need a Hugging Face token:
read access if you just want to use the Hugging Face LLMs (inference)
write access if you also want to publish your notebook to Hugging Face Spaces as a working app.
When you want to use these variables, you'll need to add something like this to your notebook code block:
fromsmolagentsimportLiteLLMModelfromgoogle.colabimportuserdatamodel=LiteLLMModel(
model_id="gemini/gemini-2.0-flash-exp",
api_key=userdata.get('GOOGLE_API_KEY') # (replace `userdata.get` with `os.getenv` if working in a normal local environment)
)
Hugging Face spaces are a kind of git-based hosting solution for deploying small apps and demos – kind of like a mashup between Github, Codepen, and a container deployment service.
It has built-in support for Streamlit and Gradio (two Python frameworks for building UIs quickly), as well as Docker containers, and normal HTML/Javascript apps.
Navigation
The main pages that interest us are:
App: renders the functional application (like Codepen)
Files: contains the source code (like Github)
Settings: contains secrets (and lots of other stuff)
You can view build logs by clicking a tiny icon displayed next to the build status (near the space title).
Dependencies in requirements.txt are automatically installed.
The README.md file is not a real human-readable file, but is used for space configuration!
Sometimes you might see a button at the top of the README, to upgrade SDK dependencies like Gradio.
Publish a smolagents notebook to Hugging Face Spaces
The notebooks have example smolagents that are published as working apps to Spaces. If you try to run it without passing your token, you'll get a 403 Client Error: Forbidden error.
fromgoogle.colabimportuserdata# needed on Jupyter notebooks on Google Collabspace='your_real_username/The_Space_You_Just_Created'token=userdata.get(YOUR_HF_TOKEN)
agent.push_to_hub(space, token=token)
The push_to_hub does some magic and creates a functional app with everything split into multiple files, that can be run in the "App" view. I haven't looked at the source code yet to understand what/how it's doing that.
If you choose Gradio when creating your space, the app will deploy itself as a Gradio chat app, where you can chat to your agent. Example: Alfred party planner
smolagents is a minimalist Python AI agent framework by Hugging Face. It uses a code-first approach where agents write Python actions directly. It's fun for quick experimentations and simple applications, but I quickly ran into limitations due to the way it was designed (e.g. it was great for mathematical and computational tasks, but I struggled to make a researcher/writer/reviewer multi-agent workflow).
So I duplicated the tool to my own space, and reimplemented the image generation using Gemini image generation instead (free of cost):
fromsmolagentsimportToolfromgoogleimportgenaifromgoogle.genaiimporttypesfromPILimportImage# pillowfromioimportBytesIOimportosclassTextToImageTool(Tool):
description="This tool creates an image according to a prompt, which is a text description."name="image_generator"inputs= {
"prompt": {
"type": "string",
"description": "The image generator prompt. Don't hesitate to add details in the prompt to make the image look better, like 'high-res, photorealistic', etc."
},
"api_key": {
"type": "string",
"description": "The Google Gemini API key. This is super important for the tool to work!",
"default": None,
"nullable": True,
}
}
output_type="image"defforward(self, prompt, api_key=os.getenv("GEMINI_API_KEY")):
client=genai.Client(api_key=api_key)
response=client.models.generate_content(
model="gemini-2.0-flash-exp-image-generation",
contents=f"{prompt}",
config=types.GenerateContentConfig(
response_modalities=['Text', 'Image']
)
)
forpartinresponse.candidates[0].content.parts:
ifpart.inline_dataisnotNone:
returnImage.open(BytesIO((part.inline_data.data)))
Then in my Jupyter notebook I pass the API key as an additional arg:
fromsmolagentsimportload_tool, CodeAgent, HfApiModelimage_generation_tool=load_tool(
"skymaiden/text-to-image",
trust_remote_code=True
)
agent=CodeAgent(
tools=[image_generation_tool],
model=HfApiModel("https://pflgm2locj2t89co.us-east-1.aws.endpoints.huggingface.cloud")
)
agent.run(
"Generate an image of a luxurious superhero-themed party at Wayne Manor with made-up superheros.",
additional_args={"api_key": userdata.get('GOOGLE_API_KEY')}
)
The images are terrifying, but it works!
Unit 2.1 – Importing Hugging Face Spaces as tools
In Unit 2.1 there is an example of importing a Gradio app hosted on a Hugging Face Space as a tool to use in smolagents.
I had issues where it would correctly load the space as an API, but it kept getting errors: Error in generating model output: (ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')), '(Request ID: f455c4f2-46bb-43d1-beae-db299f733f7a)').
This error might have been caused by the Hugging Face Inference API endpoint I was using, or network instability, server-side issues, or exceeding rate limits on the remote server.
Retrying it a couple of times worked, so in production I would implement a retry mechanism.
MCP server tools
In Unit 2 there is an example of importing tools from an MCP server, but the code contains errors: it's missing the model (required by the CodeAgent), and the command should be uvx.
The MCP server example is not in the Jupyter notebook, but it's easy to get a local version up and running to adapt the example code:
importosfromsmolagentsimportToolCollection, CodeAgent, HfApiModel# or LiteLLMModel if you preferfrommcpimportStdioServerParametersdefmain():
server_parameters=StdioServerParameters(
command="uvx", # invokes a tool without installing itargs=["[email protected]"],
env={"UV_PYTHON": "3.12", **os.environ},
)
withToolCollection.from_mcp(server_parameters) astool_collection:
agent=CodeAgent(
tools=[*tool_collection.tools],
add_base_tools=True,
model=HfApiModel("https://pflgm2locj2t89co.us-east-1.aws.endpoints.huggingface.cloud"),
)
agent.run("Please find a remedy for hangover.")
if__name__=="__main__":
main()
Unit 2.1 – Vision agents using Helium/Selenium
In Unit 2.1 there is a code example provided in a Python file to run locally.
Since the file was written, the smolagents CLI made a change to the python_executor, which breaks the execution (see this answer in the Hugging Face forum). We need to remove the 2nd argument:
Then the script should run correctly – but using an OpenAI key (oops, accidentally spent $0.03).
Running the script with Gemini
To use the script with Gemini, I made some changes to the parse_arguments function:
parser.add_argument(
"--model-id",
type=str,
- default="gpt-4o",+ default="gemini/gemini-2.0-flash-exp",
help="The model ID to use for the specified model type",
)
+ parser.add_argument(+ "--api-key",+ type=str,+ default=os.getenv("GEMINI_API_KEY"),+ help="The API key to use for the specified model type",+ )
Then I also updated the call to load_tool:
- model = load_model(args.model_type, args.model_id)+ model = load_model(args.model_type, args.model_id, api_key=args.api_key)
Unit 2.1 – Final coding quiz
⚠️The quiz is missing some important specifications that the LLM reviewer uses to evaluate your solutions. The LLM reviewer also does NOT seem to have knowledge from the official documentation or source code, and ONLY compares against its reference solution (which is valid but not the only solution).
Things to know (not spoilers)
It expects the Qwen/Qwen2.5-Coder-32B-Instruct model, even though that's the default model.
It expects a specific max_steps value and some unrelated-to-web-search authorized imports.
Its reference solution is incorrect (the documentation and source code show there is no sandbox property on a code agent, and E2BSandbox does not exist). It also expects unrelated authorized imports.
LlamaIndex is a Python framework that connects LLMs with custom data sources.
It provides tools to ingest, structure, and retrieve information from various formats through optimized indexes.
It's great for building RAG applications and context-aware AI assistants.
I found the course module on Llamaindex super confusing and ordered in a way that didn't make sense to me. I didn't bother using the Jupyter notebooks this time, and only wrote my scripts in my local environment.
But the official Llamaindex documentation is really great:
The Event drivent agentic workflows course on Deeplearning.ai is also short and super clear. The notebooks on Deeplearning.ai work directly on the platform (no need to duplicate to Google Collab).
Using a local LLM with Ollama
Until now I hadn't had any issues using Gemini models, which can be used for free with pretty good rate limits. But I hit the limits while playing with Llamaindex, so I had to try another provider.
Ollama is a tool that lets you run open-source LLMs on your computer or a home server. It's especially useful for working with sensitive data, because your data and all your LLM interactions stay local and don't leave your network.
Performance
Ideally you need a performant machine with a good chip and GPU, but there are smaller quantized models that can run on older machines too.
On my work computer models run 100% on the GPU, meaning I can keep using the computer to do other things. On my personal computer models run 100% on the CPU, so it heats up when it needs to do harder work and it can't handle memory-intensive tasks.
On both computers I stick to 3-4b variants, and the models provided by Ollama have quantization, which makes them smaller and easier to run on less performant machines – but also a bit less precise (kind of comparable to image compression).
Running an LLM
When you run the app, it starts a server but no models. To run an LLM you'll need to pull and run a model with something like ollama run llama3.2. This will also start a chat session in the terminal, but we don't need that to use the API, so you can quit the chat.
You can see all the currently running models with ollama ps. You can stop them individually with something like ollama stop llama3.2.
Using the LLM in LLamaindex
Replacing Gemini with an Ollama model with Llamaindex is easy:
If you notice Ollam models seem to run fine via the CLI chat, but extremely slowly via the API, know that the CLI chat uses a small context window as well as streaming responses.
When you use the API in the same way, perceived performance is much better! Waiting 20 seconds while looking at a black terminal window is definitely not the same experience as waiting 3 seconds then seeing each word printed progressively.
Verbose output can also make it easier to see if it's stuck somewhere or just slowly chugging through.
My agents keep saying they can't do things
Something important to keep in mind is that not all models and variants have the same capabilities – so if you try and get an LLM to do something it can't (e.g. tool calling, prompting with images, embedding), it will fail miserably.
The Ollama website makes it quite easy to see which models have which capabilities, be sure to check whether they only apply to certain variants though (e.g. Gemma3 has vision only at 4b and up).
You can install multiple LLMs though, and use different ones in different cases. One thing you can't do though is create vector embeddings with one LLM, and read them with another! They seem to be proprietary for now at least.
LangGraph is a Python AI agent framework that builds on the LangChain ecosystem.
It provides a very structured approach for creating AI agents and orchestrating complex workflows.
I found the first practical exercise in the course too dense for a starter project (I like to start small then add features one by one).
I tried to follow the LangGraph Quickstart on the official documentation, but quickly got bogged down in the details.
So instead I watched the AI Agents in LangGraph course on Deeplearning.ai, which I found super helpful!
Exercise: Building your first LangGraph
When I came back to do the HF exercise, I found it had multiple errors:
EmailState contains duplicate keys.
draft_response is used both as a state and as a node name, so one has to be renamed.
The classify_email node is not robust enough to handle real-life fuzzy LLM output.
That 3rd issue is annoying because regardless of how the word "spam" might be used in the LLM response, if it's there then the is_spam state is set to true:
So if the LLM writes stuff like this and decides that the email is legitimate, it will still be categorized as spam because it wrote that naughty word:
"This is not necessarily a red flag for spam: firstly..."
If the LLM response contains both the words "spam" and "reason", then spam_reason is set to whatever text comes after the word "reason".
So if the LLM writes things like this, it will get a spam_reason with various amounts of random text as the value:
"This suggests a legitimate connection, and a reason for mr. smith to be..."
As a quickfix to the 3rd point, I added an expected output to the end of the classify_email prompt, and updated the conditions:
Your response should be in the following format:
is_spam: true/false
spam_reason: "reason" (if spam)
email_category: "category" (if legitimate)
Example response for a spam email:
is_spam: true
spam_reason: "This email is a phishing attempt."
Example response for a legitimate email:
is_spam: false
email_category: "thank you"
Please provide a clear and concise response.
- is_spam = "spam" in response_text and "not spam" not in response_text+ is_spam = "is_spam: true" in response_text- spam_reason = response_text.split("reason:")[1].strip()+ spam_reason = response_text.split("spam_reason:")[1].strip()
Not the most robust or production ready, but at least it gives more consistent results.
Exercise: Document analysis graph
So far I've sometimes just been using Gemini's openai-compatible API, to make it easier to switch between providers and see results of different models (they kind of have different personalities!).
But it's still in beta and didn't work for this exercise, I kept getting this error:
openai.BadRequestError: Error code: 400 - [{'error': {'code': 400, 'message': 'Expected string or list of content parts, got: null', 'status': 'INVALID_ARGUMENT'}}]
So I just switched back to using the Gemini API instead, using Langchain's Google GenAI module: