MLX-VLM Quickstart: Latest Qwen3-VL Models (2025-10-14)

🚀 Reproduce Latest Vision-Language Model Inference on Apple Silicon

This guide shows how to use the latest Qwen3-VL models (released TODAY: October 14, 2025) for image description on macOS with Apple Silicon using MLX.

Prerequisites

macOS with Apple Silicon (M1/M2/M3/M4)
Python 3.9+
uvx (from uv package manager)

Installation

# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh

# uvx will automatically install mlx-vlm when needed

Latest Models (Released 2025-10-14)

All models available at mlx-community on Hugging Face:

Qwen3-VL-8B Series

mlx-community/Qwen3-VL-8B-Instruct-{bf16,8bit,6bit,5bit,4bit}
mlx-community/Qwen3-VL-8B-Thinking-{bf16,8bit,6bit,5bit,4bit}

Qwen3-VL-4B Series (Recommended for Quick Start)

mlx-community/Qwen3-VL-4B-Instruct-{bf16,8bit,6bit,5bit,4bit} ✨
mlx-community/Qwen3-VL-4B-Thinking-{bf16,8bit,6bit,5bit,4bit}

Quick Start: Single Image

# Describe a single image using the latest 4-bit quantized model
uvx --from mlx-vlm mlx_vlm.generate \
  --model mlx-community/Qwen3-VL-4B-Instruct-4bit \
  --image /path/to/your/image.jpg \
  --prompt "Describe this image in detail." \
  --max-tokens 150 \
  --temperature 0.7

Batch Processing: 17 Random Images

Create a Python script to process multiple images:

#!/usr/bin/env python3
"""
Batch image description using Qwen3-VL-4B-Instruct-4bit
Seed: 1069 (balanced ternary: [+1, -1, -1, +1, +1, +1, +1])
"""
import subprocess
import json
from pathlib import Path
import random

MODEL = "mlx-community/Qwen3-VL-4B-Instruct-4bit"
SEED = 1069

def find_random_images(directory: str, count: int = 17) -> list:
    """Find random images from directory"""
    image_extensions = {'.jpg', '.jpeg', '.png', '.gif', '.webp', '.heic'}
    all_images = []

    for ext in image_extensions:
        all_images.extend(Path(directory).rglob(f'*{ext}'))

    random.seed(SEED)
    return random.sample(all_images, min(count, len(all_images)))

def describe_image(image_path: Path, index: int) -> dict:
    """Describe image using mlx-vlm"""
    print(f"\n▽ Processing {index}: {image_path.name}")

    try:
        result = subprocess.run(
            [
                "uvx", "--from", "mlx-vlm", "mlx_vlm.generate",
                "--model", MODEL,
                "--image", str(image_path),
                "--prompt", "Describe this image in detail.",
                "--max-tokens", "150",
                "--temperature", "0.7"
            ],
            capture_output=True,
            text=True,
            timeout=120
        )

        # Extract description from output
        output = result.stdout
        if "=" in output:
            description = output.split("=")[-1].strip()
        else:
            description = output.strip()

        return {
            "index": index,
            "filename": image_path.name,
            "path": str(image_path),
            "description": description,
            "model": MODEL,
            "seed": SEED
        }
    except Exception as e:
        return {
            "index": index,
            "filename": image_path.name,
            "path": str(image_path),
            "description": f"[ERROR: {e}]",
            "model": MODEL,
            "seed": SEED
        }

def main():
    # Find 17 random images from Desktop
    images = find_random_images("~/Desktop", 17)

    print(f"◬ Processing {len(images)} images with {MODEL}")
    print(f"◬ Seed: {SEED}")
    print("=" * 80)

    results = []
    for i, image_path in enumerate(images, 1):
        result = describe_image(image_path, i)
        results.append(result)
        print(f"  → {result['description'][:100]}...")

    # Save results
    output_file = f"image_descriptions_{SEED}.json"
    with open(output_file, 'w') as f:
        json.dump(results, f, indent=2)

    print("\n" + "=" * 80)
    print(f"✓ Saved descriptions to: {output_file}")
    print(f"✓ Model: {MODEL}")
    print(f"✓ Seed: {SEED}")

if __name__ == "__main__":
    main()

Usage

# Save the script
chmod +x describe_images.py

# Run it
./describe_images.py

# Or with Python directly
python3 describe_images.py

Model Selection Guide

Model	Size	Speed	Quality	Use Case
`Qwen3-VL-4B-Instruct-4bit`	~2.5GB	Fastest	Good	Quick testing, batch processing
`Qwen3-VL-4B-Instruct-8bit`	~4GB	Fast	Better	Balanced performance
`Qwen3-VL-8B-Instruct-4bit`	~5GB	Medium	Best	High-quality descriptions
`Qwen3-VL-8B-Instruct-bf16`	~16GB	Slower	Excellent	Maximum quality

Advanced Options

# Custom prompt
uvx --from mlx-vlm mlx_vlm.generate \
  --model mlx-community/Qwen3-VL-4B-Instruct-4bit \
  --image image.jpg \
  --prompt "What objects are visible? List them." \
  --max-tokens 100

# Lower temperature for more deterministic output
uvx --from mlx-vlm mlx_vlm.generate \
  --model mlx-community/Qwen3-VL-4B-Instruct-4bit \
  --image image.jpg \
  --prompt "Describe this image." \
  --temperature 0.3 \
  --max-tokens 200

# Multiple images
uvx --from mlx-vlm mlx_vlm.generate \
  --model mlx-community/Qwen3-VL-4B-Instruct-4bit \
  --image image1.jpg image2.jpg image3.jpg \
  --prompt "Describe each image." \
  --max-tokens 200

Troubleshooting

Model Download Issues

# First run will download ~4GB, be patient
# Check download progress in terminal output

Memory Issues

# Use 4-bit quantized models for lower memory usage
# Close other applications to free RAM

Timeout Issues

# Increase timeout in Python script
timeout=300  # 5 minutes

# Or run without batch processing

ServiceNow StarVector (Bonus)

While exploring latest models, we also discovered ServiceNow's StarVector:

Multimodal LLM for SVG generation from images/text
Accepted at CVPR 2025
Available on Hugging Face

Performance Notes

On Apple M-series chips:

First run: ~5-10 minutes (model download)
Subsequent runs: ~2-5 seconds per image (4-bit model)
Memory usage: ~3-4GB (4-bit model)

References

MLX-VLM: https://github.com/Blaizzy/mlx-vlm
Qwen3-VL Models: https://huggingface.co/mlx-community
MLX Framework: https://ml-explore.github.io/mlx/

Credits

Generated with seed 1069 (balanced ternary: [+1, -1, -1, +1, +1, +1, +1])

Last updated: 2025-10-14 Models released: 2025-10-14T18:13-18:29 UTC

bmorphism/mlx_vlm_quickstart.md

Select an option

No results found