Created
December 8, 2025 01:23
-
-
Save raveeshbhalla/e64d2bf401f75b01a7af2591b53fface to your computer and use it in GitHub Desktop.
Video input support for DSPy
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| BigBuckBunny.mp4 url = http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4 | |
| ForBiggerJoyrides.mp4 url = http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ForBiggerJoyrides.mp4 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# Video Understanding with DSPy and Gemini\n", | |
| "\n", | |
| "This notebook demonstrates how to use `dspy.Video` for video understanding tasks with Google's Gemini models.\n", | |
| "\n", | |
| "## Prerequisites\n", | |
| "\n", | |
| "1. A Google AI Studio API key (get one at [aistudio.google.com](https://aistudio.google.com))\n", | |
| "2. This version of DSPy installed (run from repo root)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": "# Install from the local repo (run this first!)\n%pip install -e .." | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Setup" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "import os\n", | |
| "import dspy\n", | |
| "\n", | |
| "# Set your Gemini API key\n", | |
| "os.environ[\"GEMINI_API_KEY\"] = \"your-api-key-here\" # Replace with your actual key\n", | |
| "\n", | |
| "# Configure DSPy with Gemini\n", | |
| "lm = dspy.LM(\"gemini/gemini-2.0-flash\")\n", | |
| "dspy.configure(lm=lm)\n", | |
| "\n", | |
| "print(f\"DSPy configured with: {lm}\")\n", | |
| "print(f\"dspy.Video available: {hasattr(dspy, 'Video')}\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 1. Basic Video Creation\n", | |
| "\n", | |
| "Let's explore different ways to create `dspy.Video` objects." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# From a YouTube URL (no download needed - Gemini natively supports YouTube)\n", | |
| "youtube_video = dspy.Video.from_youtube(\"https://www.youtube.com/watch?v=dQw4w9WgXcQ\")\n", | |
| "print(f\"YouTube video: {repr(youtube_video)}\")\n", | |
| "\n", | |
| "# From a remote URL\n", | |
| "url_video = dspy.Video.from_url(\"https://example.com/sample.mp4\")\n", | |
| "print(f\"URL video: {repr(url_video)}\")\n", | |
| "\n", | |
| "# From a file_id (for pre-uploaded videos)\n", | |
| "file_id_video = dspy.Video.from_file_id(\"files/abc123\", mime_type=\"video/mp4\")\n", | |
| "print(f\"File ID video: {repr(file_id_video)}\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# You can also use the simple constructor with positional arguments\n", | |
| "video1 = dspy.Video(\"https://www.youtube.com/watch?v=dQw4w9WgXcQ\")\n", | |
| "print(f\"Simple constructor: {repr(video1)}\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 2. Video Question-Answering\n", | |
| "\n", | |
| "Let's create a simple video Q&A system." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "class VideoQA(dspy.Signature):\n", | |
| " \"\"\"Answer questions about a video.\"\"\"\n", | |
| " video: dspy.Video = dspy.InputField(desc=\"The video to analyze\")\n", | |
| " question: str = dspy.InputField(desc=\"Question about the video\")\n", | |
| " answer: str = dspy.OutputField(desc=\"Answer based on the video content\")\n", | |
| "\n", | |
| "qa = dspy.Predict(VideoQA)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# Test with a YouTube video\n", | |
| "# Using a short public video for testing\n", | |
| "test_video = dspy.Video.from_youtube(\"https://www.youtube.com/watch?v=dQw4w9WgXcQ\")\n", | |
| "\n", | |
| "result = qa(\n", | |
| " video=test_video,\n", | |
| " question=\"What is happening in this video? Describe the main visual elements.\"\n", | |
| ")\n", | |
| "\n", | |
| "print(f\"Answer: {result.answer}\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 3. Video Summarization with Chain of Thought" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "class VideoSummary(dspy.Signature):\n", | |
| " \"\"\"Generate a comprehensive summary of video content.\"\"\"\n", | |
| " video: dspy.Video = dspy.InputField(desc=\"The video to summarize\")\n", | |
| " summary: str = dspy.OutputField(desc=\"Detailed summary of what happens in the video\")\n", | |
| "\n", | |
| "summarize = dspy.ChainOfThought(VideoSummary)\n", | |
| "\n", | |
| "result = summarize(video=test_video)\n", | |
| "\n", | |
| "print(f\"Reasoning: {result.reasoning}\\n\")\n", | |
| "print(f\"Summary: {result.summary}\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 4. Video Classification" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "from typing import Literal\n", | |
| "\n", | |
| "class VideoClassification(dspy.Signature):\n", | |
| " \"\"\"Classify the type of video content.\"\"\"\n", | |
| " video: dspy.Video = dspy.InputField(desc=\"The video to classify\")\n", | |
| " category: Literal[\"music\", \"tutorial\", \"entertainment\", \"news\", \"sports\", \"other\"] = dspy.OutputField(\n", | |
| " desc=\"The category of the video\"\n", | |
| " )\n", | |
| " confidence: Literal[\"high\", \"medium\", \"low\"] = dspy.OutputField(\n", | |
| " desc=\"Confidence level of the classification\"\n", | |
| " )\n", | |
| "\n", | |
| "classify = dspy.Predict(VideoClassification)\n", | |
| "\n", | |
| "result = classify(video=test_video)\n", | |
| "print(f\"Category: {result.category}\")\n", | |
| "print(f\"Confidence: {result.confidence}\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 5. Analyzing Multiple Videos" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "class MultiVideoAnalysis(dspy.Signature):\n", | |
| " \"\"\"Analyze and compare multiple videos.\"\"\"\n", | |
| " videos: list[dspy.Video] = dspy.InputField(desc=\"List of videos to analyze\")\n", | |
| " analysis_type: str = dspy.InputField(desc=\"What aspect to analyze\")\n", | |
| " analysis: str = dspy.OutputField(desc=\"Analysis results\")\n", | |
| "\n", | |
| "analyze = dspy.Predict(MultiVideoAnalysis)\n", | |
| "\n", | |
| "# Example with multiple YouTube videos\n", | |
| "videos = [\n", | |
| " dspy.Video.from_youtube(\"https://www.youtube.com/watch?v=dQw4w9WgXcQ\"),\n", | |
| " # Add more videos here for comparison\n", | |
| "]\n", | |
| "\n", | |
| "result = analyze(\n", | |
| " videos=videos,\n", | |
| " analysis_type=\"visual style and content themes\"\n", | |
| ")\n", | |
| "print(f\"Analysis: {result.analysis}\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 6. Video + Text Context" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "class VideoWithContext(dspy.Signature):\n", | |
| " \"\"\"Analyze video with additional context information.\"\"\"\n", | |
| " video: dspy.Video = dspy.InputField(desc=\"The video to analyze\")\n", | |
| " context: str = dspy.InputField(desc=\"Background information about the video\")\n", | |
| " question: str = dspy.InputField(desc=\"Question to answer\")\n", | |
| " answer: str = dspy.OutputField(desc=\"Answer considering both video and context\")\n", | |
| "\n", | |
| "contextual_qa = dspy.Predict(VideoWithContext)\n", | |
| "\n", | |
| "result = contextual_qa(\n", | |
| " video=test_video,\n", | |
| " context=\"This is a famous music video from the 1980s that became an internet meme.\",\n", | |
| " question=\"How does the video's style reflect the era it was made in?\"\n", | |
| ")\n", | |
| "print(f\"Answer: {result.answer}\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 7. Content Moderation Example" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "class ContentModeration(dspy.Signature):\n", | |
| " \"\"\"Analyze video for content policy compliance.\"\"\"\n", | |
| " video: dspy.Video = dspy.InputField(desc=\"Video to moderate\")\n", | |
| " \n", | |
| " is_safe: bool = dspy.OutputField(desc=\"Whether the content is safe for general audiences\")\n", | |
| " category: Literal[\"safe\", \"violence\", \"adult\", \"hate_speech\", \"other\"] = dspy.OutputField(\n", | |
| " desc=\"Content category\"\n", | |
| " )\n", | |
| " explanation: str = dspy.OutputField(desc=\"Brief explanation of the assessment\")\n", | |
| "\n", | |
| "moderate = dspy.ChainOfThought(ContentModeration)\n", | |
| "\n", | |
| "result = moderate(video=test_video)\n", | |
| "print(f\"Is Safe: {result.is_safe}\")\n", | |
| "print(f\"Category: {result.category}\")\n", | |
| "print(f\"Explanation: {result.explanation}\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 8. Working with Local Video Files\n", | |
| "\n", | |
| "If you have a local video file (under 20MB), you can use it directly." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# Uncomment and modify the path to test with a local file\n", | |
| "# local_video = dspy.Video(\"./path/to/your/video.mp4\")\n", | |
| "# result = qa(video=local_video, question=\"What is shown in this video?\")\n", | |
| "# print(result.answer)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 9. Uploading Large Videos (>20MB)\n", | |
| "\n", | |
| "For videos larger than 20MB, use the `upload()` method to upload to Gemini's Files API." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# Example of uploading a large video (uncomment to test)\n", | |
| "# large_video = dspy.Video.from_path(\"./large_video.mp4\")\n", | |
| "# uploaded_video = large_video.upload() # Uploads to Gemini Files API\n", | |
| "# print(f\"Uploaded video: {repr(uploaded_video)}\")\n", | |
| "# \n", | |
| "# # Now use the uploaded video\n", | |
| "# result = qa(video=uploaded_video, question=\"What happens in this video?\")\n", | |
| "# print(result.answer)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 10. Inspecting the LM History\n", | |
| "\n", | |
| "You can inspect what's being sent to the model." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# Show the last LM call\n", | |
| "dspy.inspect_history(n=1)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Summary\n", | |
| "\n", | |
| "In this notebook, we demonstrated:\n", | |
| "\n", | |
| "1. **Creating Video objects** from YouTube URLs, remote URLs, local files, and file IDs\n", | |
| "2. **Video Q&A** - asking questions about video content\n", | |
| "3. **Video Summarization** - generating summaries with Chain of Thought reasoning\n", | |
| "4. **Video Classification** - categorizing video content\n", | |
| "5. **Multi-video Analysis** - comparing multiple videos\n", | |
| "6. **Contextual Analysis** - combining video with text context\n", | |
| "7. **Content Moderation** - checking videos for policy compliance\n", | |
| "8. **Local Files** - working with local video files\n", | |
| "9. **Large Videos** - uploading videos >20MB via Files API\n", | |
| "\n", | |
| "### Key Points\n", | |
| "\n", | |
| "- `dspy.Video` works best with **Gemini models** (gemini-2.0-flash, gemini-1.5-pro, etc.)\n", | |
| "- **YouTube URLs** are natively supported - no download needed\n", | |
| "- **Local files under 20MB** are automatically base64-encoded\n", | |
| "- **Large files >20MB** require using `video.upload()` first\n", | |
| "- Videos are processed at **~1 FPS** and use **~300 tokens per second**" | |
| ] | |
| } | |
| ], | |
| "metadata": { | |
| "kernelspec": { | |
| "display_name": "Python 3", | |
| "language": "python", | |
| "name": "python3" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.12.0" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 4 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment