Skip to content

Instantly share code, notes, and snippets.

@raveeshbhalla
Created December 8, 2025 01:23
Show Gist options
  • Select an option

  • Save raveeshbhalla/e64d2bf401f75b01a7af2591b53fface to your computer and use it in GitHub Desktop.

Select an option

Save raveeshbhalla/e64d2bf401f75b01a7af2591b53fface to your computer and use it in GitHub Desktop.
Video input support for DSPy
BigBuckBunny.mp4 url = http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4
ForBiggerJoyrides.mp4 url = http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ForBiggerJoyrides.mp4
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Video Understanding with DSPy and Gemini\n",
"\n",
"This notebook demonstrates how to use `dspy.Video` for video understanding tasks with Google's Gemini models.\n",
"\n",
"## Prerequisites\n",
"\n",
"1. A Google AI Studio API key (get one at [aistudio.google.com](https://aistudio.google.com))\n",
"2. This version of DSPy installed (run from repo root)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# Install from the local repo (run this first!)\n%pip install -e .."
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import dspy\n",
"\n",
"# Set your Gemini API key\n",
"os.environ[\"GEMINI_API_KEY\"] = \"your-api-key-here\" # Replace with your actual key\n",
"\n",
"# Configure DSPy with Gemini\n",
"lm = dspy.LM(\"gemini/gemini-2.0-flash\")\n",
"dspy.configure(lm=lm)\n",
"\n",
"print(f\"DSPy configured with: {lm}\")\n",
"print(f\"dspy.Video available: {hasattr(dspy, 'Video')}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Basic Video Creation\n",
"\n",
"Let's explore different ways to create `dspy.Video` objects."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# From a YouTube URL (no download needed - Gemini natively supports YouTube)\n",
"youtube_video = dspy.Video.from_youtube(\"https://www.youtube.com/watch?v=dQw4w9WgXcQ\")\n",
"print(f\"YouTube video: {repr(youtube_video)}\")\n",
"\n",
"# From a remote URL\n",
"url_video = dspy.Video.from_url(\"https://example.com/sample.mp4\")\n",
"print(f\"URL video: {repr(url_video)}\")\n",
"\n",
"# From a file_id (for pre-uploaded videos)\n",
"file_id_video = dspy.Video.from_file_id(\"files/abc123\", mime_type=\"video/mp4\")\n",
"print(f\"File ID video: {repr(file_id_video)}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# You can also use the simple constructor with positional arguments\n",
"video1 = dspy.Video(\"https://www.youtube.com/watch?v=dQw4w9WgXcQ\")\n",
"print(f\"Simple constructor: {repr(video1)}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Video Question-Answering\n",
"\n",
"Let's create a simple video Q&A system."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class VideoQA(dspy.Signature):\n",
" \"\"\"Answer questions about a video.\"\"\"\n",
" video: dspy.Video = dspy.InputField(desc=\"The video to analyze\")\n",
" question: str = dspy.InputField(desc=\"Question about the video\")\n",
" answer: str = dspy.OutputField(desc=\"Answer based on the video content\")\n",
"\n",
"qa = dspy.Predict(VideoQA)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test with a YouTube video\n",
"# Using a short public video for testing\n",
"test_video = dspy.Video.from_youtube(\"https://www.youtube.com/watch?v=dQw4w9WgXcQ\")\n",
"\n",
"result = qa(\n",
" video=test_video,\n",
" question=\"What is happening in this video? Describe the main visual elements.\"\n",
")\n",
"\n",
"print(f\"Answer: {result.answer}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Video Summarization with Chain of Thought"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class VideoSummary(dspy.Signature):\n",
" \"\"\"Generate a comprehensive summary of video content.\"\"\"\n",
" video: dspy.Video = dspy.InputField(desc=\"The video to summarize\")\n",
" summary: str = dspy.OutputField(desc=\"Detailed summary of what happens in the video\")\n",
"\n",
"summarize = dspy.ChainOfThought(VideoSummary)\n",
"\n",
"result = summarize(video=test_video)\n",
"\n",
"print(f\"Reasoning: {result.reasoning}\\n\")\n",
"print(f\"Summary: {result.summary}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Video Classification"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from typing import Literal\n",
"\n",
"class VideoClassification(dspy.Signature):\n",
" \"\"\"Classify the type of video content.\"\"\"\n",
" video: dspy.Video = dspy.InputField(desc=\"The video to classify\")\n",
" category: Literal[\"music\", \"tutorial\", \"entertainment\", \"news\", \"sports\", \"other\"] = dspy.OutputField(\n",
" desc=\"The category of the video\"\n",
" )\n",
" confidence: Literal[\"high\", \"medium\", \"low\"] = dspy.OutputField(\n",
" desc=\"Confidence level of the classification\"\n",
" )\n",
"\n",
"classify = dspy.Predict(VideoClassification)\n",
"\n",
"result = classify(video=test_video)\n",
"print(f\"Category: {result.category}\")\n",
"print(f\"Confidence: {result.confidence}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Analyzing Multiple Videos"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class MultiVideoAnalysis(dspy.Signature):\n",
" \"\"\"Analyze and compare multiple videos.\"\"\"\n",
" videos: list[dspy.Video] = dspy.InputField(desc=\"List of videos to analyze\")\n",
" analysis_type: str = dspy.InputField(desc=\"What aspect to analyze\")\n",
" analysis: str = dspy.OutputField(desc=\"Analysis results\")\n",
"\n",
"analyze = dspy.Predict(MultiVideoAnalysis)\n",
"\n",
"# Example with multiple YouTube videos\n",
"videos = [\n",
" dspy.Video.from_youtube(\"https://www.youtube.com/watch?v=dQw4w9WgXcQ\"),\n",
" # Add more videos here for comparison\n",
"]\n",
"\n",
"result = analyze(\n",
" videos=videos,\n",
" analysis_type=\"visual style and content themes\"\n",
")\n",
"print(f\"Analysis: {result.analysis}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. Video + Text Context"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class VideoWithContext(dspy.Signature):\n",
" \"\"\"Analyze video with additional context information.\"\"\"\n",
" video: dspy.Video = dspy.InputField(desc=\"The video to analyze\")\n",
" context: str = dspy.InputField(desc=\"Background information about the video\")\n",
" question: str = dspy.InputField(desc=\"Question to answer\")\n",
" answer: str = dspy.OutputField(desc=\"Answer considering both video and context\")\n",
"\n",
"contextual_qa = dspy.Predict(VideoWithContext)\n",
"\n",
"result = contextual_qa(\n",
" video=test_video,\n",
" context=\"This is a famous music video from the 1980s that became an internet meme.\",\n",
" question=\"How does the video's style reflect the era it was made in?\"\n",
")\n",
"print(f\"Answer: {result.answer}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 7. Content Moderation Example"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class ContentModeration(dspy.Signature):\n",
" \"\"\"Analyze video for content policy compliance.\"\"\"\n",
" video: dspy.Video = dspy.InputField(desc=\"Video to moderate\")\n",
" \n",
" is_safe: bool = dspy.OutputField(desc=\"Whether the content is safe for general audiences\")\n",
" category: Literal[\"safe\", \"violence\", \"adult\", \"hate_speech\", \"other\"] = dspy.OutputField(\n",
" desc=\"Content category\"\n",
" )\n",
" explanation: str = dspy.OutputField(desc=\"Brief explanation of the assessment\")\n",
"\n",
"moderate = dspy.ChainOfThought(ContentModeration)\n",
"\n",
"result = moderate(video=test_video)\n",
"print(f\"Is Safe: {result.is_safe}\")\n",
"print(f\"Category: {result.category}\")\n",
"print(f\"Explanation: {result.explanation}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 8. Working with Local Video Files\n",
"\n",
"If you have a local video file (under 20MB), you can use it directly."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Uncomment and modify the path to test with a local file\n",
"# local_video = dspy.Video(\"./path/to/your/video.mp4\")\n",
"# result = qa(video=local_video, question=\"What is shown in this video?\")\n",
"# print(result.answer)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 9. Uploading Large Videos (>20MB)\n",
"\n",
"For videos larger than 20MB, use the `upload()` method to upload to Gemini's Files API."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Example of uploading a large video (uncomment to test)\n",
"# large_video = dspy.Video.from_path(\"./large_video.mp4\")\n",
"# uploaded_video = large_video.upload() # Uploads to Gemini Files API\n",
"# print(f\"Uploaded video: {repr(uploaded_video)}\")\n",
"# \n",
"# # Now use the uploaded video\n",
"# result = qa(video=uploaded_video, question=\"What happens in this video?\")\n",
"# print(result.answer)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 10. Inspecting the LM History\n",
"\n",
"You can inspect what's being sent to the model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Show the last LM call\n",
"dspy.inspect_history(n=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"In this notebook, we demonstrated:\n",
"\n",
"1. **Creating Video objects** from YouTube URLs, remote URLs, local files, and file IDs\n",
"2. **Video Q&A** - asking questions about video content\n",
"3. **Video Summarization** - generating summaries with Chain of Thought reasoning\n",
"4. **Video Classification** - categorizing video content\n",
"5. **Multi-video Analysis** - comparing multiple videos\n",
"6. **Contextual Analysis** - combining video with text context\n",
"7. **Content Moderation** - checking videos for policy compliance\n",
"8. **Local Files** - working with local video files\n",
"9. **Large Videos** - uploading videos >20MB via Files API\n",
"\n",
"### Key Points\n",
"\n",
"- `dspy.Video` works best with **Gemini models** (gemini-2.0-flash, gemini-1.5-pro, etc.)\n",
"- **YouTube URLs** are natively supported - no download needed\n",
"- **Local files under 20MB** are automatically base64-encoded\n",
"- **Large files >20MB** require using `video.upload()` first\n",
"- Videos are processed at **~1 FPS** and use **~300 tokens per second**"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment