Skip to content

Instantly share code, notes, and snippets.

@al-swaiti
Last active November 28, 2025 12:39
Show Gist options
  • Select an option

  • Save al-swaiti/8ea1d7df6ab728d111406b70e26150be to your computer and use it in GitHub Desktop.

Select an option

Save al-swaiti/8ea1d7df6ab728d111406b70e26150be to your computer and use it in GitHub Desktop.

πŸš€ Deploy FLUX.2-Dev as MCP Server on Modal (Step-by-Step Guide)

Deploy FLUX.2-Dev image generation on Modal with MCP support, so AI agents (Claude, Cursor, VS Code Copilot) can generate images for you!

What you'll get:

  • 🎨 FLUX.2-Dev running on H200/H100 GPU
  • 🌐 Web UI at your-url.modal.run
  • πŸ€– MCP endpoint for AI agents
  • ⚑ ~15-20s per image generation

πŸ“‹ Step 1: Prerequisites

1.1 Create Accounts

  1. Modal Account β†’ modal.com/signup
  2. Hugging Face Account β†’ huggingface.co/join

1.2 Accept FLUX.2 License

⚠️ IMPORTANT: You MUST accept the model license first!

  1. Go to huggingface.co/black-forest-labs/FLUX.2-dev
  2. Click "Agree and access repository"
  3. Wait for approval (usually instant)

1.3 Get HuggingFace Token

  1. Go to huggingface.co/settings/tokens
  2. Click "New token"
  3. Name: modal-flux (or anything)
  4. Type: Read
  5. Click "Generate"
  6. Copy the token (starts with hf_...)

πŸ”§ Step 2: Setup Modal

2.1 Install Modal CLI

pip install modal

2.2 Login to Modal

modal setup

This opens browser β†’ login β†’ done!

2.3 Create HuggingFace Secret in Modal

  1. Go to modal.com/secrets
  2. Click "Create new secret"
  3. Choose "Custom"
  4. Name: huggingface
  5. Add key-value:
    • Key: HF_TOKEN
    • Value: hf_your_token_here (paste your token)
  6. Click "Create"

πŸ“ Step 3: Create the Code

3.1 Create a new file called flux_2_api.py

mkdir -p flux-mcp && cd flux-mcp
touch flux_2_api.py

3.2 Copy this entire code into flux_2_api.py:

import time
from io import BytesIO
from pathlib import Path
import modal

# --- 1. Container Images ---

# GPU Backend Image
flux_image = (
    modal.Image.from_registry("nvidia/cuda:12.4.0-devel-ubuntu22.04", add_python="3.11")
    .apt_install("git", "libglib2.0-0", "libsm6", "libxrender1", "libxext6", "ffmpeg", "libgl1")
    .pip_install(
        "invisible_watermark>=0.2.0",
        "huggingface_hub",
        "hf_transfer",
        "safetensors",
        "sentencepiece",
        "numpy<2",
        "torch==2.5.0",
        "git+https://github.com/huggingface/transformers.git",
        "git+https://github.com/huggingface/diffusers.git@e6d46123091afd58281dc7487c0f6b67055683b9",
        "git+https://github.com/huggingface/peft.git",
        "git+https://github.com/huggingface/accelerate.git",
        "gradio_client",
    )
    .env({
        "HF_HUB_ENABLE_HF_TRANSFER": "1",
        "HF_HUB_CACHE": "/cache",
    })
)

# Web UI Image
web_image = (
    modal.Image.debian_slim(python_version="3.12")
    .pip_install(
        "fastapi[standard]",
        "gradio[mcp]>=5.0.0", 
        "pillow"
    )
    .env({"GRADIO_MCP_SERVER": "True"})
)

app = modal.App("flux-mcp-app")

# Imports
with flux_image.imports():
    import os
    import torch
    from huggingface_hub import login
    from diffusers import Flux2Pipeline, Flux2Transformer2DModel

with web_image.imports():
    import gradio as gr
    from fastapi import FastAPI
    from PIL import Image
    import os

MINUTES = 60

# --- 2. GPU Backend ---

@app.cls(
    image=flux_image,
    gpu=["H200", "H100"],  # H200 first, fallback to H100
    scaledown_window=20 * MINUTES,
    timeout=60 * MINUTES,
    secrets=[modal.Secret.from_name("huggingface")],
    volumes={
        "/cache": modal.Volume.from_name("hf-hub-cache", create_if_missing=True),
    },
)
class Model:
    @modal.enter()
    def enter(self):
        hf_token = os.environ.get("HF_TOKEN")
        if hf_token:
            login(token=hf_token)
            print("βœ… Logged in to HuggingFace")
        
        repo_id = "black-forest-labs/FLUX.2-dev"
        print(f"⏳ Loading {repo_id}...")
        
        self.pipe = Flux2Pipeline.from_pretrained(
            repo_id, torch_dtype=torch.bfloat16, token=hf_token
        )
        self.pipe.to("cuda")
        
        try:
            self.pipe.transformer.fuse_qkv_projections()
            self.pipe.vae.fuse_qkv_projections()
            print("βœ… QKV projections fused")
        except AttributeError:
            pass
        
        self.device = "cuda"
        print("βœ… Model loaded!")

    @modal.method()
    def generate_image(self, prompt: str, width: int, height: int, steps: int, guidance: float, seed: int):
        print(f"🎨 Generating: {prompt}")
        start_time = time.time()
        
        generator = torch.Generator(device=self.device).manual_seed(seed)
        out = self.pipe(
            prompt=prompt, width=width, height=height,
            num_inference_steps=steps, guidance_scale=guidance,
            generator=generator
        ).images[0]
        
        print(f"βœ… Generated in {time.time() - start_time:.1f}s")
        
        byte_stream = BytesIO()
        out.save(byte_stream, format="JPEG", quality=95)
        return byte_stream.getvalue()

# --- 3. Web UI & MCP Server ---

RESOLUTION_PRESETS = {
    "1:1 Square (1024Γ—1024)": (1024, 1024),
    "16:9 Landscape (1360Γ—768)": (1360, 768),
    "9:16 Portrait (768Γ—1360)": (768, 1360),
    "4:3 Standard (1152Γ—896)": (1152, 896),
    "3:4 Portrait (896Γ—1152)": (896, 1152),
    "3:2 Photo (1216Γ—832)": (1216, 832),
    "2:3 Portrait Photo (832Γ—1216)": (832, 1216),
    "21:9 Ultrawide (1536Γ—640)": (1536, 640),
    "2K HD (1920Γ—1080)": (1920, 1080),
    "2K Vertical (1080Γ—1920)": (1080, 1920),
}

QUALITY_PRESETS = {
    "⚑ Fast (20 steps)": 20,
    "πŸ”„ Balanced (28 steps)": 28,
    "✨ Quality (35 steps)": 35,
    "🎨 Maximum (50 steps)": 50,
}

@app.function(image=web_image, max_containers=1)
@modal.concurrent(max_inputs=100)
@modal.asgi_app()
def ui():
    os.environ["GRADIO_MCP_SERVER"] = "True"

    def generate_flux_image(
        prompt: str, 
        aspect_ratio: str = "1:1 Square (1024Γ—1024)",
        quality_preset: str = "πŸ”„ Balanced (28 steps)",
        guidance: str = "3.5", 
        seed: str = "42",
        progress=gr.Progress()
    ):
        """
        Generate high-quality images using Flux.2-Dev model on Modal H200/H100.
        
        Args:
            prompt (str): Detailed text description of the image.
            aspect_ratio (str): Image aspect ratio preset.
            quality_preset (str): Quality/speed preset. Default: Balanced.
            guidance (str): Guidance scale (1.0-10.0). Default: 3.5.
            seed (str): Random seed for reproducibility. Default: 42.
        """
        sd = int(seed)
        g = float(guidance)
        s = QUALITY_PRESETS.get(quality_preset, 28)
        w, h = RESOLUTION_PRESETS.get(aspect_ratio, (1024, 1024))

        progress(0.2, desc=f"Generating ({s} steps)...")
        image_bytes = Model().generate_image.remote(prompt, w, h, s, g, sd)
        
        progress(1.0, desc="Done!")
        return Image.open(BytesIO(image_bytes))

    demo = gr.Interface(
        fn=generate_flux_image,
        inputs=[
            gr.Textbox(label="Prompt", lines=3, placeholder="A cat holding a sign that says 'Hello FLUX.2'"),
            gr.Dropdown(choices=list(RESOLUTION_PRESETS.keys()), value="1:1 Square (1024Γ—1024)", label="Aspect Ratio"),
            gr.Dropdown(choices=list(QUALITY_PRESETS.keys()), value="πŸ”„ Balanced (28 steps)", label="Quality"),
            gr.Slider(1.0, 10.0, 3.5, step=0.5, label="Guidance Scale"),
            gr.Number(42, label="Seed")
        ],
        outputs=gr.Image(label="Result"),
        title="🎨 FLUX.2-Dev MCP Server",
        description="Generate images with FLUX.2-Dev on H200/H100. MCP enabled for AI agents.",
        api_name="generate"
    )
    
    demo.queue()
    return gr.mount_gradio_app(FastAPI(), demo, path="/")

πŸš€ Step 4: Deploy to Modal

4.1 Deploy the app

modal deploy flux_2_api.py

4.2 Wait for deployment

You'll see output like:

βœ“ Created objects.
β”œβ”€β”€ πŸ”¨ Created web function ui => https://YOUR-USERNAME--flux-mcp-app-ui.modal.run
└── πŸ”¨ Created function Model.*.
βœ“ App deployed! πŸŽ‰

4.3 Copy your URL

Your app is now live at:

https://YOUR-USERNAME--flux-mcp-app-ui.modal.run

βœ… Step 5: Test Your Deployment

5.1 Test the Web UI

  1. Open your URL in browser: https://YOUR-USERNAME--flux-mcp-app-ui.modal.run
  2. Enter a prompt: A cat holding a sign that says "Hello World"
  3. Click Submit
  4. Wait ~30-60s for first image (cold start), then ~15-20s for subsequent images

5.2 Test the MCP Endpoint

curl https://YOUR-USERNAME--flux-mcp-app-ui.modal.run/gradio_api/mcp/schema

Should return JSON with tool definitions.


πŸ€– Step 6: Connect to AI Agents

Option A: Claude Desktop

  1. Find config file:

    • Mac: ~/Library/Application Support/Claude/claude_desktop_config.json
    • Windows: %APPDATA%\Claude\claude_desktop_config.json
    • Linux: ~/.config/Claude/claude_desktop_config.json
  2. Add this (replace YOUR-USERNAME):

{
  "mcpServers": {
    "flux-generator": {
      "url": "https://YOUR-USERNAME--flux-mcp-app-ui.modal.run/gradio_api/mcp/sse"
    }
  }
}
  1. Restart Claude Desktop

  2. Ask Claude: "Generate an image of a sunset over mountains using the flux generator"

Option B: VS Code / Cursor

Add to your MCP settings (.vscode/mcp.json or settings):

{
  "servers": {
    "flux-generator": {
      "url": "https://YOUR-USERNAME--flux-mcp-app-ui.modal.run/gradio_api/mcp/sse"
    }
  }
}

🎯 Step 7: Use It!

Example Prompts for FLUX.2-Dev

FLUX.2-Dev excels at text in images! Try these:

A coffee shop storefront with a neon sign that says "OPEN 24/7"

A movie poster for "The Last Galaxy" featuring a spaceship and stars

A birthday card with "Happy Birthday Sarah!" in elegant script

Professional headshot of a business woman, studio lighting

Anime style character holding a sword with "HERO" written on it

πŸ’° Cost Estimate

GPU Price/hour Typical session
H100 ~$4/hr ~$0.02/image
H200 ~$5/hr ~$0.02/image

Container stays warm for 20 minutes, so rapid generations are cheap!


πŸ”§ Troubleshooting

"401 Unauthorized" Error

"CUDA out of memory" Error

  • This shouldn't happen on H100/H200
  • If using smaller GPU, add self.pipe.enable_model_cpu_offload() after loading

MCP not working

  • Make sure URL ends with /gradio_api/mcp/sse
  • Check that GRADIO_MCP_SERVER=True is set
  • Restart your AI agent after config change

Slow first generation

  • First request takes ~60s (cold start + model loading)
  • Subsequent requests: ~15-20s
  • Keep container warm by generating regularly

πŸŽ‰ Done!

You now have:

  • βœ… FLUX.2-Dev running on H200/H100
  • βœ… Web UI for manual generation
  • βœ… MCP server for AI agents
  • βœ… 10 resolution presets
  • βœ… 4 quality presets

Your URLs:

  • Web UI: https://YOUR-USERNAME--flux-mcp-app-ui.modal.run
  • MCP: https://YOUR-USERNAME--flux-mcp-app-ui.modal.run/gradio_api/mcp/sse

πŸ“š Resources

Happy generating! 🎨

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment