| name | description |
|---|---|
demoscene-webgl-demo |
Build audio-synchronized visual demos in a single HTML file using WebGL2/GLSL raymarching. Covers SDF-based 3D scenes, multi-pass rendering with bloom, phase-based timeline choreography, and real-time audio-reactive visuals via Web Audio API. Use when asked to create demoscene demos, WebGL raymarching, audio-reactive graphics, music visualizers, or choreographed visual experiences.
|
Single index.html containing all shaders, WebGL setup, audio integration, and
overlays. Self-contained by design — no build step, no dependencies.
- Scene pass — Raymarching with SDFs, lighting, volumetrics, materials
- Bright extract — Threshold filter isolating bloom-worthy pixels
- Gaussian blur — Two-pass separable blur on the bright extract
- Composite — Combines scene + bloom, applies post-effects (glitch, fade, color grading)
Each pass renders to its own framebuffer/texture. Final composite goes to screen.
This is the most important part of making a demo. The process turns any audio file into a precise choreography map — timestamps, phases, energy curves — before writing any shader code. Bad choreography makes a technically impressive demo feel lifeless.
Before anything else, build a complete picture of the track's energy and structure.
# Get track duration
ffprobe -v error -show_entries format=duration -of csv=p=0 track.wav
# Generate a waveform image — the single most useful visual reference
# Width of 3000px means ~1px per 100ms for a 5min track
ffmpeg -i track.wav -filter_complex "showwavespic=s=3000x200:colors=white" -frames:v 1 waveform.png
# Generate RMS energy over time (one value per audio frame)
ffmpeg -i track.wav -af astats=metadata=1:reset=1,ametadata=print:key=lavfi.astats.Overall.RMS_level -f null - 2>&1 | grep RMS_level > energy.txt
# Generate a spectrogram — shows frequency content over time
# Useful for distinguishing builds (rising freq) from drops (full spectrum)
ffmpeg -i track.wav -lavfi showspectrumpic=s=3000x400 spectrogram.pngOpen waveform.png and spectrogram.png side by side. The waveform shows you
when energy changes. The spectrogram shows you what kind of energy changes
(bass drop vs. hi-hat entrance vs. pad swell).
async function analyzeTrack(file) {
const arrayBuffer = await file.arrayBuffer();
const audioCtx = new AudioContext();
const buffer = await audioCtx.decodeAudioData(arrayBuffer);
const samples = buffer.getChannelData(0); // mono or left channel
const sampleRate = buffer.sampleRate;
const hopSize = Math.floor(sampleRate * 0.1); // 100ms windows
const events = [];
for (let i = 0; i < samples.length; i += hopSize) {
const end = Math.min(i + hopSize, samples.length);
let rms = 0, peak = 0, zeroCrossings = 0;
for (let j = i; j < end; j++) {
const s = Math.abs(samples[j]);
rms += samples[j] * samples[j];
if (s > peak) peak = s;
if (j > i && (samples[j] >= 0) !== (samples[j-1] >= 0)) zeroCrossings++;
}
rms = Math.sqrt(rms / (end - i));
const brightness = zeroCrossings / (end - i); // proxy for spectral centroid
events.push({
time: i / sampleRate,
rms, // overall loudness
peak, // transient detection
brightness // high = hi-hats/cymbals, low = bass/pads
});
}
audioCtx.close();
return { events, duration: buffer.duration, sampleRate };
}pip install librosa numpyimport librosa
import numpy as np
y, sr = librosa.load('track.wav', sr=22050, mono=True)
duration = librosa.get_duration(y=y, sr=sr)
# Onset detection — finds percussive hits
onset_frames = librosa.onset.onset_detect(y=y, sr=sr, units='frames')
onset_times = librosa.frames_to_time(onset_frames, sr=sr)
# Beat tracking
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)
beat_times = librosa.frames_to_time(beat_frames, sr=sr)
# RMS energy over time
rms = librosa.feature.rms(y=y, frame_length=2048, hop_length=512)[0]
rms_times = librosa.frames_to_time(range(len(rms)), sr=sr, hop_length=512)
# Spectral centroid (brightness)
centroid = librosa.feature.spectral_centroid(y=y, sr=sr, hop_length=512)[0]
# Chromagram (harmonic content — useful for detecting key changes)
chroma = librosa.feature.chroma_stft(y=y, sr=sr, hop_length=512)
print(f"Duration: {duration:.1f}s")
print(f"BPM: {tempo:.1f}")
print(f"Beats: {len(beat_times)}")
print(f"Onsets: {len(onset_times)}")
print(f"\nFirst 20 onset times:")
for t in onset_times[:20]:
print(f" {t:.2f}s")From the analysis, segment the track into sections. Look for these patterns in the waveform/energy data:
| Pattern in waveform | What it means | Musical term |
|---|---|---|
| Near-zero amplitude → gradual rise | Energy building | Intro / Build |
| Sudden jump from low to high | Energy release | Drop |
| Sustained high amplitude | Full arrangement | Chorus / Main section |
| Brief dip in sustained energy | Tension/release cycle | Breakdown |
| High → gradually decreasing | Energy winding down | Outro / Fade |
| Texture change (spectrogram shift) | Instrument swap | Transition |
| Isolated spike in low energy | Percussive accent | Hit |
| Flat near-silence | Space/rest | Void / Silence |
For each section boundary, note the exact timestamp to the nearest 0.1s. These
become your smoothstep parameters.
Create a plain-text document — this is the single source of truth for all visual timing. Write it before any shader code.
CHOREOGRAPHY MAP — track: [filename] ([duration]s, [BPM] BPM)
═══════════════════════════════════════════════════════════════
SECTIONS:
[start] - [end] [NAME] — [description of what happens visually]
[start] - [end] [NAME] — [description]
...
KEY HITS (one-shot events):
[time] — [what happens in the music] → [visual response]
[time] — [what happens] → [visual response]
...
ENERGY ARC:
[prose description of overall energy shape, e.g.
"slow build → explosive drop → sustained intensity → brief calm →
second build → climactic transformation → fade"]
MOOD TRANSITIONS:
[time]: [mood A] → [mood B] (e.g. "warm/organic → cold/digital")
...
Rules for a good map:
- Every section needs a distinct visual identity (if two sections look the same, merge them)
- Drops must have a visual event — a drop with no visual impact is a wasted moment
- Builds must have visible escalation — if nothing changes over 20s, the audience loses interest
- Quiet sections must actually be quiet visually — contrast makes loud sections louder
- The map should read like a story arc, not a flat list
If the track has a steady beat, snapping phase boundaries to bar lines makes choreography feel musical rather than arbitrary.
function detectBPM(events, minBPM = 60, maxBPM = 180) {
const energies = events.map(e => e.rms);
const dt = events[1].time - events[0].time;
const minLag = Math.floor(60 / (maxBPM * dt));
const maxLag = Math.floor(60 / (minBPM * dt));
let bestLag = minLag, bestCorr = -1;
for (let lag = minLag; lag <= maxLag; lag++) {
let corr = 0;
for (let i = 0; i < energies.length - lag; i++) {
corr += energies[i] * energies[i + lag];
}
if (corr > bestCorr) { bestCorr = corr; bestLag = lag; }
}
return 60 / (bestLag * dt);
}With BPM known, compute the bar grid:
BPM: 128 → 1 beat = 0.469s, 1 bar (4/4) = 1.875s
BPM: 80 → 1 beat = 0.750s, 1 bar (4/4) = 3.000s
BPM: 140 → 1 beat = 0.429s, 1 bar (4/4) = 1.714s
Common section lengths:
4 bars = intro/outro, breakdown
8 bars = verse, build
16 bars = chorus, main section
32 bars = extended section
Snap your section boundaries to the nearest bar line. Drops almost always land on beat 1 of a bar. Phase transitions sound best on bar boundaries.
For demos where choreography must perfectly track the audio energy — beyond what real-time FFT can provide — bake the energy envelope into a 1D texture:
const energyData = new Uint8Array(events.map(e => Math.min(255, e.rms * 512)));
const energyTex = gl.createTexture();
gl.bindTexture(gl.TEXTURE_2D, energyTex);
gl.texImage2D(gl.TEXTURE_2D, 0, gl.R8, energyData.length, 1, 0,
gl.RED, gl.UNSIGNED_BYTE, energyData);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.LINEAR);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);Sample in the shader:
uniform sampler2D energyTex;
uniform float trackDuration;
float bakedEnergy = texture(energyTex, vec2(t / trackDuration, 0.5)).r;This is deterministic and frame-rate independent — no FFT latency, no device variance. Use it for geometry and camera (must be rock-solid). Layer real-time FFT on top for materials (benefits from organic variance).
You can bake multiple channels — pack bass/mid/treble/overall into RGBA for a full 4-band deterministic energy texture.
Each section in the choreography map becomes a phase field. Each phase is a
smoothstep that ramps 0→1 over the transition window:
// Pattern: ramp in, hold, ramp out
ph.sectionName = smoothstep(startTime, startTime + fadeIn, t)
* (1.0 - smoothstep(endTime - fadeOut, endTime, t));The fadeIn / fadeOut durations control transition sharpness:
0.1s— near-instant snap (drops, impacts)0.5-1.0s— quick but smooth (normal transitions)2.0-5.0s— gradual blend (mood shifts, slow builds)10.0-30.0s— glacial evolution (entire build sections)
Each key hit becomes an exp() pulse:
float hit = exp(-abs(t - hitTime) * sharpness);
// sharpness 4-6: wide pulse, ~0.5s visible
// sharpness 8-12: tight spike, ~0.2s visible
// sharpness 15-20: near-instant flashEach section type has proven visual approaches:
| Section | Camera | Geometry | Materials | Post-processing |
|---|---|---|---|---|
| Void/Intro | Static or very slow drift | Minimal or absent | Dark, monochrome | Clean, maybe subtle fog |
| Build | Slow approach | Growing, emerging | Warming colors | Increasing bloom |
| Drop | Snap to new angle | Burst/expansion | Bright flash, saturated | Bloom spike, screen shake |
| Main/Chorus | Smooth orbit | Full scene visible | Rich, layered | Moderate bloom |
| Breakdown | Drift, lose focus | Fracture, glitch | Desaturated | UV displacement, scanlines |
| Storm/Peak | Fast movement, close | Maximum complexity | Hot, intense | Heavy bloom, shake |
| Transition | Dolly or whip pan | Morph/transform | Palette swap | Color grading shift |
| Outro/Fade | Slow pull back | Simplifying | Cooling/dimming | Fade to black |
- Play the demo with the track
- Where visuals feel early/late — adjust timestamps ±0.1-0.5s
- Where energy feels flat — add audio multipliers or one-shot pulses
- Where transitions feel abrupt — widen
smoothstepranges - Where transitions feel mushy — narrow
smoothstepranges - Repeat until every musical moment has a visual counterpart
- Every drop has a visual impact (flash, snap, bloom spike)
- Every build has visible escalation (growth, approach, brightening)
- Every breakdown has a visual shift (glitch, desaturation, fracture)
- Camera never stays static for more than ~15s
- Adjacent sections have visual contrast
- Quiet moments are actually quiet (dim, sparse, slow)
- The finale feels like a finale (biggest visual moment, then resolution)
- Mood transitions in the music have corresponding color/material shifts
- One-shot hits land on the beat, not between beats
- The demo still looks good without audio (base values are reasonable)
The sync system is hybrid: hardcoded timestamps from the choreography map provide structure, real-time FFT adds organic responsiveness.
The shader time uniform must come from audioElement.currentTime, NOT from a JS
accumulator or performance.now(). This locks visuals to audio playback position
even when frames drop or the browser throttles.
const time = audioEl ? audioEl.currentTime : 0;
gl.uniform1f(timeLoc, time);Extract frequency bands each frame and pass as a vec4 uniform:
analyser.fftSize = 512; // 256 bins, good latency/resolution balance
analyser.smoothingTimeConstant = 0.55; // responsive but not jittery
const freq = new Uint8Array(analyser.frequencyBinCount);
analyser.getByteFrequencyData(freq);
let bass = 0; for (let i=0; i<4; i++) bass += freq[i]; bass /= 4*255;
let mid = 0; for (let i=4; i<16; i++) mid += freq[i]; mid /= 12*255;
let treble = 0; for (let i=16; i<64; i++) treble += freq[i]; treble /= 48*255;
let overall = 0; for (let i=0; i<128;i++) overall += freq[i]; overall /= 128*255;
gl.uniform4f(audioLoc, bass, mid, treble, overall); // all normalized 0–1Tuning:
fftSize: 512 is a good default. 256 = snappier but coarser. 1024+ = more frequency detail but more latency.smoothingTimeConstant: 0.55 is balanced. 0.3 = twitchy (good for percussion- heavy tracks). 0.8 = sluggish (good for ambient/drone).
| Band | Freq range | Visual target | Example |
|---|---|---|---|
bass (audio.x) |
Sub/kicks | Camera shake, bloom pulse, ring glow | col += glow * audio.x * 0.4 |
mid (audio.y) |
Melody/pads | Color shifts, surface detail | col *= 1.0 + audio.y * 0.15 |
treble (audio.z) |
Hi-hats/shimmer | Scanline intensity, sparkle | br *= 0.3 + audio.z * 0.7 |
overall (audio.w) |
Full energy | Bloom intensity, global glow | bloom *= 1.2 + audioE * 0.4 |
Positions, sizes, angles, and shapes must be deterministic from time. Audio
reactivity only affects materials, brightness, glow, and post-processing. This
prevents visual jitter and ensures the demo looks correct even with audio analysis
variance across devices.
Always use audio as a multiplier on a base value, never as the sole driver:
// CORRECT — works without audio, enhanced with it
col *= 1.0 + audio.x * 0.2;
bloom * (1.2 + audioE * 0.4);
// WRONG — black screen when audio is silent
col *= audio.x;Square the bass energy so only strong hits register:
float shake = audio.x * audio.x * 0.003;Define a GLSL struct encoding the current act. Each field is a smoothstep blend
(0→1). The struct fields match sections from your choreography map:
struct Phase {
float emerge; // opening — geometry fades in
float build; // tension rising
float ignite; // the drop
float system; // full scene, cruising
float storm; // intensity peak
float breakdown; // momentary collapse
float transform; // mood/texture shift
float reveal; // resolution, finale
float pulse; // audio-reactive multiplier (1.0 + audio)
};Adapt the field names and count to your track. A 2-minute ambient piece might have 3 phases. A 5-minute EDM track might have 10+.
// Snap transition (drop): 0.2s fade
ph.drop = smoothstep(dropTime - 0.1, dropTime + 0.1, t);
// Smooth transition (mood shift): 2s crossfade
ph.cold = smoothstep(shiftTime - 1.0, shiftTime + 1.0, t);
// Section with start and end (breakdown): ramp in, hold, ramp out
ph.breakdown = smoothstep(bdStart, bdStart + 0.2, t)
* (1.0 - smoothstep(bdEnd - 0.3, bdEnd, t));
// Gradual build over 25s
ph.build = smoothstep(buildStart, buildEnd, t);float pulse = exp(-abs(t - hitTime) * sharpness);Geometry grows/shrinks by adding phase-gated terms:
radius += ph.build * 0.8; // grows during build
radius += ph.storm * (0.5 + i * 0.2); // expands per-instance in storm
radius *= 1.0 - ph.breakdown * 0.3; // shrinks during breakdownEach term is gated by its phase so effects don't leak into other sections.
Camera position as a sum of phase-gated terms:
float camDist = initialDist
- approach * smoothstep(tA, tB, t) // move in
+ pullBack * smoothstep(tC, tD, t) // move out
- rushIn * smoothstep(tE, tF, t); // final approachEach line is one camera move. Easy to read, reorder, and adjust independently. The camera should tell the same story as the music — approaching during builds, snapping on drops, drifting during calm sections, pulling back for the finale.
When objects must not intersect (e.g., rings around a star), compute the inner object's radius dynamically and clamp:
float innerR = getInnerRadius() + clearance;
outerR = max(outerR, innerR + index * gap);Layer noise at different frequencies for organic surfaces:
float detail = noise(p * 5.0 + t * 0.3) * noise(p * 7.0 - t * 0.5);
col *= 0.75 + detail * 0.5;Animate noise offsets with t at different speeds per layer for convincing motion.
Bright extract → separable Gaussian blur → additive composite:
vec3 bloom = texture(blurTex, uv).rgb;
col += bloom * (1.2 + audioE * 0.4);UV displacement gated to timestamp windows from the choreography map:
uv.x += step(0.99, sin(uv.y * 400.0 + t * 50.0)) * glitchAmt * 0.02; // scanline tears
uv.x += (floor(sin(uv.y * 8.0 + t * 3.0) * 4.0) / 4.0) * glitchAmt * 0.03; // block shiftsfloat fade = smoothstep(endTime - fadeDuration, endTime, t);
col *= 1.0 - fade;- Audio context requires user gesture — always gate on a click-to-start overlay
- Shader compilation errors are silent — check
gl.getShaderInfoLog()and log it smoothstepneeds margin —smoothstep(10.0, 10.5, t)notstep(10.0, t)for clean cuts- Phase overlaps are features — two phases at 50/50 during crossfade = natural transition
- Audio uniform fallback — if
analyseris null, return[0,0,0,0]so shaders still work - Use
onendednot timeouts — actual playback may differ from expected track length - Test without audio — the multiplier pattern ensures visuals degrade gracefully
- Choreography map first, code second — changing timestamps in code without updating the map leads to drift between intent and implementation