"Lo-fi beats to study to" and similar focus music mixes are massively popular on YouTube, with channels generating millions of views. Creating these mixes traditionally requires:
- Sourcing or producing hours of music
- Audio engineering (mixing, normalization, transitions)
- Thumbnail design
- Video production
- SEO-optimized titles, descriptions, and tags
- Manual upload and scheduling
This is time-intensive and doesn't scale.
Suno Mixer is a fully automated pipeline that generates complete, YouTube-ready focus music videos from a single command. It orchestrates multiple AI services to handle every step of the content creation process.
suno-mixer generate --mood FOCUS --genre dark_synthwave --tracks 10
One command. One hour of original music. Ready for YouTube.
┌─────────────────────────────────────────────────────────────────┐
│ INPUT │
│ Mood: "FOCUS" + Genre: "dark_synthwave" │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ AI GENERATION LAYER │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Gemini │ │ Suno API │ │ Gemini │ │
│ │ │ │ │ │ │ │
│ │ Track Titles│ │ 10 Tracks │ │ Thumbnail │ │
│ │ "Neon │ │ (parallel) │ │ (or asset) │ │
│ │ Highway │ │ │ │ │ │
│ │ Dreams" │ │ ~3 min each │ │ 1280x720 │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ MEDIA PROCESSING LAYER │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ FFmpeg Pipeline │ │
│ │ │ │
│ │ Audio: Normalize → Transition → Mix → Export MP3 │ │
│ │ Video: Static Image + Audio → H.264 MP4 │ │
│ │ Image: Add text overlay with glow/shadow effects │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ METADATA GENERATION │
│ │
│ Title: "FOCUS Flow: Deep Work Music for Intense..." │
│ Description: Track listing with timestamps, CTAs, hashtags │
│ Tags: 16+ SEO-optimized tags │
│ Thumbnail: Text overlay auto-sized to fit │
│ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ OUTPUT │
│ │
│ 📁 output/FOCUS_dark_synthwave_20241202/ │
│ ├── mix.mp4 (1hr video, ready for upload) │
│ ├── mix.mp3 (audio only) │
│ ├── thumbnail.png (YouTube thumbnail with text) │
│ └── metadata.json (title, description, tags) │
│ │
│ Optional: Auto-upload to YouTube via OAuth │
│ │
└─────────────────────────────────────────────────────────────────┘
| Service | Role | Why |
|---|---|---|
| Suno API | Music generation | Generates original instrumental tracks from text prompts. No licensing issues. |
| Google Gemini | Title & thumbnail generation | Creates unique, evocative track names and thumbnail images matching the aesthetic. |
| YouTube Data API | Publishing | Automated upload with metadata, thumbnail, and scheduling. |
Async-First Architecture
- 10 tracks generate in parallel (not sequentially)
- Downloads happen concurrently
- Total generation time: ~10-15 minutes for 1 hour of content
Graceful Fallbacks
- Pre-generated thumbnail assets as primary (faster, consistent quality)
- AI generation as fallback
- Word-bank title generation if Gemini unavailable
Audio Engineering
- Loudness normalization to -14 dBFS (YouTube/Spotify standard)
- Clean cuts between tracks (crossfades often sound worse with AI music)
- 320kbps MP3 output
Video Production
- H.264 encoding for universal compatibility
- 2-second fade-in from black
- Text overlay with auto-resize to prevent edge overflow
The system is designed to run as a scheduled job:
# Cron job: Generate a new mix every day at 2 AM
0 2 * * * cd /path/to/suno-mixer && ./run-daily.shEach run produces unique content:
- Random mood selection from curated list
- AI-generated titles ensure no duplicates
- Thumbnail variation through asset rotation or AI generation
Generated Title:
FOCUS Flow: Deep Work Music for Intense Programming Sessions | 1 Hour Synthwave Mix
Generated Description:
Deep work music engineered for flow states and intense focus.
Tracks:
00:00 - Neon Highway Dreams
03:24 - Digital Horizon Pulse
06:48 - Velvet Midnight Echo
...
#deepwork #focusmusic #synthwave #programming #studymusic
Python 3.11+
├── aiohttp / aiofiles (async I/O)
├── pydub (audio processing)
├── Pillow (image manipulation)
├── google-genai (Gemini API)
├── google-api-client (YouTube API)
├── click + rich (CLI)
└── pydantic (configuration)
External:
└── FFmpeg (video/audio encoding)
- Zero marginal cost - Each video costs only API calls (~$0.50-1.00)
- Infinite variety - AI ensures every mix is unique
- No licensing - Suno-generated music is commercially usable
- SEO-optimized - Metadata generation follows YouTube best practices
- Fully hands-off - Schedule it and forget it
Built as an experiment in end-to-end AI content automation.