GSoC 2025 Final Report

Contributor: @FallenDeity
Mentor: @dynamicwebpaige
Organization: Google DeepMind
Project: Open-source Gemini Example Applications

Overview

The Gemini Cookbook is a set of sample applications and tutorials illustrating different functionalities of the Gemini APIs in Python. During Google Summer of Code (GSoC) my tasks involved working with Cookbook and the latest unified genai SDK's.

The goal of this project is to modernize current tutorials and documentation to assist with the new unified Gemini SDKs for JavaScript/TypeScript and Python. This is roughly divided into 3 phases as follows:

Cookbook Migration: Porting Python quickstarts/ and examples/ to JS/TS based notebooks
New Tutorials and AI Studio Applets: Create Google AI Studio applets showcasing various Gemini capabilities
Open Source Library Updates: Identify open-source libraries that use outdated versions of the Gemini APIs and update/raise issues to use the latest SDKs.

What I Did

Cookbook Migration

You can browse all quickstart guides and examples in the cookbook from the site here you can also view all the notebooks and commits from the source repository here

Rebuilt the Gemini Cookbook from scratch in TypeScript/JavaScript notebooks using Jupyter NoteBook and a TypeScript kernel (tslab).
Setup CI/CD workflows to lint and format notebooks, scripts and build the project and deploy to github pages via Quarto
All major quickstarts and examples were ported:
- Get Started guides (audio, TTS, embeddings, caching, code execution, OpenAI compatibility, etc.) ~30 notebooks
- Examples (entity extraction, anomaly detection with embeddings, story writing, search grounding, charts/graphs, etc.) ~60 notebooks
Advanced Integrations:
- LangChain examples - SQL QA, code analysis, web-based RAG with Chroma/Pinecone, summarization pipelines
- LlamaIndex examples - QA with Chroma + WebPageReader
- Weaviate integration - Utilizing Weviate Query Agent
- ChromaDB integration - Using Chroma as a VectorDB and perform RAG
- MCP-based examples - building MCP servers from scratch, Exposing MCP tools to Gemini, Querying and Reasoning chains

New Tutorials & Google AI Studio Applets

AI-based image editor/transformer This applet demonstrates how to apply image filters using the Gemini API. It allows users to upload an image and apply various artistic filters to it, such as Renaissance, Cyberpunk, Watercolor and more.
Silent Video Dubber / Synesthetic Video Director This applet uses the Gemini API to understand context from user uploaded videos, on frame-by-frame basis, and generate a dub/background music that matches the mood and content of the scene.
Recipe Generator from fridge images This applet allows users to upload an image of their fridge or pantry, and the Gemini API will analyze the ingredients present. It then suggests a recipe based on the available items, taking into account dietary preferences and restrictions.
Social media emoji-caption & soundtrack generator This applet allows users to upload an image and generate a caption with emojis, as well as suggest a soundtrack that fits the mood of the image. It uses the Gemini API to analyze the image and create engaging content for social media posts.
Notebook Playground A TypeScript/JavaScript Jupyter notebook environment running in Google AI Studio that lets users experiment with the Google JS GenAI SDK. Load existing notebooks or create new ones to test and prototype with Google’s Generative AI APIs directly in your browser.
Head to the cookbook showcase section here to view working and demos

MCP SDK Development

In the 2nd half of GSoC I started working with MCP based applications and examples where I delved deep into the existing SDK anthropics python-sdk. While I was attempting to develop some MCP servers and applications utilizing the SDK I quickly stumbled on some of its limitations and drawbacks such as the following:

Inconsistent lifespan and operation contexts (state) of the MCP server between STDIO/HTTP
Incomplete detail and metadata parsing from annotations for MCP primitives and currently require explicit pydantic Field annotations
Context injection support only available for tools (This means no state or context availabiltiy for resources or prompts which really hurts when working with resource templates)
All MCP primitives require to be defined in a single file due to requirement of the MCP instance, this leads to no modularisation or distribution and we end with a huge monolithic file of code
Lacks middleware, ratelimits, targeted autocomplete support (completion/complete) MCP request etc.

To resolve the above drawbacks, I extended the MCP Python SDK framework to implement these features which were a combination of QOL (Quality of Life) and essential features such as context injection and a Plugin system. Below are some feature demonstrations that the SDK now supports:

Context Injection

Prior to this there was no way to access context in resources and prompts, which meant we couldnt access the request data or lifespan state

@mcp.resource("resource://system-status")
async def get_system_status(ctx: DiscordMCPContext) -> dict[str, t.Any]:  # type: ignore
    """Provides system status information."""
    return {
        "status": "operational",
        "request_id": ctx.request_id,
        "uptime": f"{ctx.request_context.lifespan_context.bot.uptime.total_seconds()} seconds",
    }

Plugin System

Currently the SDK requires definition of one large file containing all MCP primitives either as decorated functions or in their overriden callbacks both of which lead to huge single file codebases.

To address this, I collaborated with Snipy7374 on designing and implementing a plugin system inspired by frameworks such as FastAPI Routers, Flask Blueprints, and Django Routers. This system allows plugins to be defined in separate files and registered with the MCP instance prior to initialization, enabling a more modular and scalable architecture.

user_tools_manager = MCPPluginManager(name="user-tools")

@user_tools_manager.register_tool
async def get_current_user(ctx: DiscordMCPContext) -> DiscordUser:
    """Get the current bot user."""
    return DiscordUser.from_discord_user(ctx.bot.user)

This already a common concept in quite a few popular frameworks like fastapi, django etc. which have the concept of routers which is largely used to create applications at scale

Targeted Autocompletes

MCP has a completion/complete request where the LLM can request the MCP server for parameter or input autocompletes for resources and prompts and the server returns a list of possible options for the given LLM query.

At the moment the only way to define an autocomplete handler is to define a global completion handler like the following:

@mcp.completion()
async def handle_completion(ref: PromptReference | ResourceTemplateReference, argument: CompletionArgument, context: CompletionContext) -> Completion:
    if isinstance(ref, ResourceTemplateReference):
        # Return completions based on ref, argument, and context
        return Completion(values=["option1", "option2"])
    return None

And as a result of this now any completion for a prompt or a resource template requires a big if/elif chain or a large match/case statement ladder. To solve this we extended our plugin system to handle autocompletes for the primitives in a targeted manner elegantly as such:

@user_tools_manager.register_resource("resource://discord/user/{user_id}")
async def get_user_resource(ctx: DiscordMCPContext, user_id: str) -> DiscordUser:
    """Get a user resource by their ID."""
    user = ctx.bot.get_user(int(user_id)) or await ctx.bot.fetch_user(int(user_id))
    return DiscordUser.from_discord_user(user)

# Current completion/complete request handler requires a global handler, this could help with argument and callback specific handlers
@get_user_resource.autocomplete("user_id")
async def autocomplete_user_id(
    ctx: DiscordMCPContext, ref: DiscordMCPResourceTemplate, query: str, context_args: dict[str, t.Any] | None = None
) -> list[str]:
    """Autocomplete user IDs."""
    if not query:
        return []
    users = ctx.bot.users
    return [str(user.id) for user in users if query.lower() in user.name.lower() or query in str(user.id)][:10]

This allows us to define autocomplete callbacks on a per resource/prompt and on a per argument basis for that primitive.

Checks and Ratelimits

Frameworks like Django and FastAPI allow defining checks and cooldowns for endpoints and this is a powerful feature as these checks perform access control and other predicate logic to ensure secure and robust access to endpoints without redefining these checks and ratelimits individually for each callback. The SDK provided no support for this and any such logic would have to ideally be defined in the callback and manually called when required, to overcome this we extended our plugin system to add both checks and ratelimits on a per primitive basis.

def has_bot_user(ctx: MiddlewareContext[CallToolRequest]) -> bool:
    return ctx.context.bot.user is not None

@user_tools_manager.register_tool
@user_tools_manager.check(has_bot_user)  # predicate checks, can be control logic or access control, you can stack multiple such predicates
async def get_current_user(ctx: DiscordMCPContext) -> DiscordUser:
    """Get the current bot user."""
    return DiscordUser.from_discord_user(ctx.bot.user)

@user_tools_manager.register_tool
@user_tools_manager.limit(RateLimitType.FIXED_WINDOW, rate=1, per=180) # possible ratelimit, or cooldown logic
async def get_latency(ctx: DiscordMCPContext) -> float:
    """Get the latency of the bot."""
    latency = ctx.bot.latency * 1000
    return latency

Related Resources

Implementation Repository
Proposed Feature Request to anthropic Python SDK
Complete Feature List
- Plugin/Router System PR
- Autocompletes per Primitive PR
- Ratelimits and Checks PR Checks PR Ratelimits
- Context Injection PR
- Accessibility and DX updates (i.e parsing metadata from docstrings etc) PR
- Middlewares PR

Open Source Library Updates

Migrated multiple libraries to the latest google-genai SDK:

Final Cookbook Site

Added technical setup & installation guide.
Published applet showcase docs.
Cookbook Website

Usage statistics from Google Analytics indicate strong community engagement with the Gemini TypeScript Cookbook, with 92% growth in both active and new users over the last 30 days. Notably, this engagement occurred without any active promotion or advertising, and through just organic search.

Related Issues

Future Work

Follow up on pending PRs/issues raised in community projects (some still awaiting maintainer review). Add new tutorials and guides in the future as DeepMind releases additional Gemini features.

Challenges and Learnings

A major challenge during this project was navigating inconsistencies in the js-genai SDK compared to the Python SDK and official API references. For example, issues with function calling, automatic function checks, and tool arrays required careful debugging, creative workarounds, and raising detailed GitHub issues. Working through these problems gave me a much stronger understanding of SDK internals and the importance of contributing feedback upstream to improve developer experience.

Another significant area of learning came from exploring the Model Context Protocol (MCP). I extended the Python MCP SDK with features such as structured logging, middleware for rate-limiting and access control, and a modular plugin system for dynamically registering tools and resources. These additions taught me how to design clean abstractions and developer-friendly APIs, similar to frameworks like FastAPI or Django. I also worked with the internals of Pydantic typing, forward references, and resource template handling, which deepened my understanding of how protocol-driven systems manage complexity.

On the application side, I had to rethink many Python-centric workflows in the TypeScript ecosystem. This led me to adopt libraries like danfo-node (as a pandas equivalent), fluent-ffmpeg for audio, and TensorFlow.js for embeddings and clustering. I also built interactive visualization pipelines with Plotly and t-SNE, which helped me showcase advanced multimodal use cases for Gemini. Combining these with applets in Google AI Studio, I learned how to engineer prompts, design safe interactions, and create engaging end-to-end demos for the community.

Overall, the project strengthened my ability to debug SDK-level issues, work with complex protocol internals, and bridge gaps between different ecosystems.

Acknowledgements

I would like to thank my mentor, Paige Bailey, for her guidance and support throughout the summer.

I would also like to thank the developers and community members I interacted with across the Gemini and open-source ecosystem. Their feedback, collaboration, and willingness to help made this project much more enjoyable and productive.

Finally, I would like to thank the Google Summer of Code program for accepting my proposal and giving me the opportunity to work on such an impactful project. The resources and community support provided through GSoC were invaluable in helping me explore, build, and share new ideas with developers worldwide.

FallenDeity/GSoC25_Triyan.md

Select an option

No results found