This guide shows how to use the latest Qwen3-VL models (released TODAY: October 14, 2025) for image description on macOS with Apple Silicon using MLX.
- macOS with Apple Silicon (M1/M2/M3/M4)
- Python 3.9+
uvx(from uv package manager)
# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh
# uvx will automatically install mlx-vlm when neededAll models available at mlx-community on Hugging Face:
mlx-community/Qwen3-VL-8B-Instruct-{bf16,8bit,6bit,5bit,4bit}mlx-community/Qwen3-VL-8B-Thinking-{bf16,8bit,6bit,5bit,4bit}
mlx-community/Qwen3-VL-4B-Instruct-{bf16,8bit,6bit,5bit,4bit}β¨mlx-community/Qwen3-VL-4B-Thinking-{bf16,8bit,6bit,5bit,4bit}
# Describe a single image using the latest 4-bit quantized model
uvx --from mlx-vlm mlx_vlm.generate \
--model mlx-community/Qwen3-VL-4B-Instruct-4bit \
--image /path/to/your/image.jpg \
--prompt "Describe this image in detail." \
--max-tokens 150 \
--temperature 0.7Create a Python script to process multiple images:
#!/usr/bin/env python3
"""
Batch image description using Qwen3-VL-4B-Instruct-4bit
Seed: 1069 (balanced ternary: [+1, -1, -1, +1, +1, +1, +1])
"""
import subprocess
import json
from pathlib import Path
import random
MODEL = "mlx-community/Qwen3-VL-4B-Instruct-4bit"
SEED = 1069
def find_random_images(directory: str, count: int = 17) -> list:
"""Find random images from directory"""
image_extensions = {'.jpg', '.jpeg', '.png', '.gif', '.webp', '.heic'}
all_images = []
for ext in image_extensions:
all_images.extend(Path(directory).rglob(f'*{ext}'))
random.seed(SEED)
return random.sample(all_images, min(count, len(all_images)))
def describe_image(image_path: Path, index: int) -> dict:
"""Describe image using mlx-vlm"""
print(f"\nβ½ Processing {index}: {image_path.name}")
try:
result = subprocess.run(
[
"uvx", "--from", "mlx-vlm", "mlx_vlm.generate",
"--model", MODEL,
"--image", str(image_path),
"--prompt", "Describe this image in detail.",
"--max-tokens", "150",
"--temperature", "0.7"
],
capture_output=True,
text=True,
timeout=120
)
# Extract description from output
output = result.stdout
if "=" in output:
description = output.split("=")[-1].strip()
else:
description = output.strip()
return {
"index": index,
"filename": image_path.name,
"path": str(image_path),
"description": description,
"model": MODEL,
"seed": SEED
}
except Exception as e:
return {
"index": index,
"filename": image_path.name,
"path": str(image_path),
"description": f"[ERROR: {e}]",
"model": MODEL,
"seed": SEED
}
def main():
# Find 17 random images from Desktop
images = find_random_images("~/Desktop", 17)
print(f"β¬ Processing {len(images)} images with {MODEL}")
print(f"β¬ Seed: {SEED}")
print("=" * 80)
results = []
for i, image_path in enumerate(images, 1):
result = describe_image(image_path, i)
results.append(result)
print(f" β {result['description'][:100]}...")
# Save results
output_file = f"image_descriptions_{SEED}.json"
with open(output_file, 'w') as f:
json.dump(results, f, indent=2)
print("\n" + "=" * 80)
print(f"β Saved descriptions to: {output_file}")
print(f"β Model: {MODEL}")
print(f"β Seed: {SEED}")
if __name__ == "__main__":
main()# Save the script
chmod +x describe_images.py
# Run it
./describe_images.py
# Or with Python directly
python3 describe_images.py| Model | Size | Speed | Quality | Use Case |
|---|---|---|---|---|
Qwen3-VL-4B-Instruct-4bit |
~2.5GB | Fastest | Good | Quick testing, batch processing |
Qwen3-VL-4B-Instruct-8bit |
~4GB | Fast | Better | Balanced performance |
Qwen3-VL-8B-Instruct-4bit |
~5GB | Medium | Best | High-quality descriptions |
Qwen3-VL-8B-Instruct-bf16 |
~16GB | Slower | Excellent | Maximum quality |
# Custom prompt
uvx --from mlx-vlm mlx_vlm.generate \
--model mlx-community/Qwen3-VL-4B-Instruct-4bit \
--image image.jpg \
--prompt "What objects are visible? List them." \
--max-tokens 100
# Lower temperature for more deterministic output
uvx --from mlx-vlm mlx_vlm.generate \
--model mlx-community/Qwen3-VL-4B-Instruct-4bit \
--image image.jpg \
--prompt "Describe this image." \
--temperature 0.3 \
--max-tokens 200
# Multiple images
uvx --from mlx-vlm mlx_vlm.generate \
--model mlx-community/Qwen3-VL-4B-Instruct-4bit \
--image image1.jpg image2.jpg image3.jpg \
--prompt "Describe each image." \
--max-tokens 200# First run will download ~4GB, be patient
# Check download progress in terminal output# Use 4-bit quantized models for lower memory usage
# Close other applications to free RAM# Increase timeout in Python script
timeout=300 # 5 minutes
# Or run without batch processingWhile exploring latest models, we also discovered ServiceNow's StarVector:
- Multimodal LLM for SVG generation from images/text
- Accepted at CVPR 2025
- Available on Hugging Face
On Apple M-series chips:
- First run: ~5-10 minutes (model download)
- Subsequent runs: ~2-5 seconds per image (4-bit model)
- Memory usage: ~3-4GB (4-bit model)
- MLX-VLM: https://github.com/Blaizzy/mlx-vlm
- Qwen3-VL Models: https://huggingface.co/mlx-community
- MLX Framework: https://ml-explore.github.io/mlx/
Generated with seed 1069 (balanced ternary: [+1, -1, -1, +1, +1, +1, +1])
Last updated: 2025-10-14 Models released: 2025-10-14T18:13-18:29 UTC