Skip to content

Instantly share code, notes, and snippets.

@lovelaced
Last active March 6, 2026 23:54
Show Gist options
  • Select an option

  • Save lovelaced/710f0314c59c74c6b081bab2d6224bcc to your computer and use it in GitHub Desktop.

Select an option

Save lovelaced/710f0314c59c74c6b081bab2d6224bcc to your computer and use it in GitHub Desktop.
demo skill
name description
demoscene-webgl-demo
Build audio-synchronized visual demos in a single HTML file using WebGL2/GLSL raymarching. Covers SDF-based 3D scenes, multi-pass rendering with bloom, phase-based timeline choreography, and real-time audio-reactive visuals via Web Audio API. Use when asked to create demoscene demos, WebGL raymarching, audio-reactive graphics, music visualizers, or choreographed visual experiences.

Audio-Synced WebGL Demo Skill

Architecture

Single index.html containing all shaders, WebGL setup, audio integration, and overlays. Self-contained by design — no build step, no dependencies.

Multi-Pass Rendering Pipeline

  1. Scene pass — Raymarching with SDFs, lighting, volumetrics, materials
  2. Bright extract — Threshold filter isolating bloom-worthy pixels
  3. Gaussian blur — Two-pass separable blur on the bright extract
  4. Composite — Combines scene + bloom, applies post-effects (glitch, fade, color grading)

Each pass renders to its own framebuffer/texture. Final composite goes to screen.


Computing Choreography from Audio

This is the most important part of making a demo. The process turns any audio file into a precise choreography map — timestamps, phases, energy curves — before writing any shader code. Bad choreography makes a technically impressive demo feel lifeless.

Step 1: Analyze the Audio File

Before anything else, build a complete picture of the track's energy and structure.

Option A: ffmpeg (fast, works on any machine)

# Get track duration
ffprobe -v error -show_entries format=duration -of csv=p=0 track.wav

# Generate a waveform image — the single most useful visual reference
# Width of 3000px means ~1px per 100ms for a 5min track
ffmpeg -i track.wav -filter_complex "showwavespic=s=3000x200:colors=white" -frames:v 1 waveform.png

# Generate RMS energy over time (one value per audio frame)
ffmpeg -i track.wav -af astats=metadata=1:reset=1,ametadata=print:key=lavfi.astats.Overall.RMS_level -f null - 2>&1 | grep RMS_level > energy.txt

# Generate a spectrogram — shows frequency content over time
# Useful for distinguishing builds (rising freq) from drops (full spectrum)
ffmpeg -i track.wav -lavfi showspectrumpic=s=3000x400 spectrogram.png

Open waveform.png and spectrogram.png side by side. The waveform shows you when energy changes. The spectrogram shows you what kind of energy changes (bass drop vs. hi-hat entrance vs. pad swell).

Option B: Web Audio OfflineAudioContext (programmatic)

async function analyzeTrack(file) {
  const arrayBuffer = await file.arrayBuffer();
  const audioCtx = new AudioContext();
  const buffer = await audioCtx.decodeAudioData(arrayBuffer);
  const samples = buffer.getChannelData(0); // mono or left channel
  const sampleRate = buffer.sampleRate;

  const hopSize = Math.floor(sampleRate * 0.1); // 100ms windows
  const events = [];

  for (let i = 0; i < samples.length; i += hopSize) {
    const end = Math.min(i + hopSize, samples.length);
    let rms = 0, peak = 0, zeroCrossings = 0;

    for (let j = i; j < end; j++) {
      const s = Math.abs(samples[j]);
      rms += samples[j] * samples[j];
      if (s > peak) peak = s;
      if (j > i && (samples[j] >= 0) !== (samples[j-1] >= 0)) zeroCrossings++;
    }

    rms = Math.sqrt(rms / (end - i));
    const brightness = zeroCrossings / (end - i); // proxy for spectral centroid

    events.push({
      time: i / sampleRate,
      rms,           // overall loudness
      peak,          // transient detection
      brightness     // high = hi-hats/cymbals, low = bass/pads
    });
  }

  audioCtx.close();
  return { events, duration: buffer.duration, sampleRate };
}

Option C: Python with librosa (most precise)

pip install librosa numpy
import librosa
import numpy as np

y, sr = librosa.load('track.wav', sr=22050, mono=True)
duration = librosa.get_duration(y=y, sr=sr)

# Onset detection — finds percussive hits
onset_frames = librosa.onset.onset_detect(y=y, sr=sr, units='frames')
onset_times = librosa.frames_to_time(onset_frames, sr=sr)

# Beat tracking
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)
beat_times = librosa.frames_to_time(beat_frames, sr=sr)

# RMS energy over time
rms = librosa.feature.rms(y=y, frame_length=2048, hop_length=512)[0]
rms_times = librosa.frames_to_time(range(len(rms)), sr=sr, hop_length=512)

# Spectral centroid (brightness)
centroid = librosa.feature.spectral_centroid(y=y, sr=sr, hop_length=512)[0]

# Chromagram (harmonic content — useful for detecting key changes)
chroma = librosa.feature.chroma_stft(y=y, sr=sr, hop_length=512)

print(f"Duration: {duration:.1f}s")
print(f"BPM: {tempo:.1f}")
print(f"Beats: {len(beat_times)}")
print(f"Onsets: {len(onset_times)}")
print(f"\nFirst 20 onset times:")
for t in onset_times[:20]:
    print(f"  {t:.2f}s")

Step 2: Identify Structural Sections

From the analysis, segment the track into sections. Look for these patterns in the waveform/energy data:

Pattern in waveform What it means Musical term
Near-zero amplitude → gradual rise Energy building Intro / Build
Sudden jump from low to high Energy release Drop
Sustained high amplitude Full arrangement Chorus / Main section
Brief dip in sustained energy Tension/release cycle Breakdown
High → gradually decreasing Energy winding down Outro / Fade
Texture change (spectrogram shift) Instrument swap Transition
Isolated spike in low energy Percussive accent Hit
Flat near-silence Space/rest Void / Silence

For each section boundary, note the exact timestamp to the nearest 0.1s. These become your smoothstep parameters.

Step 3: Build the Choreography Map

Create a plain-text document — this is the single source of truth for all visual timing. Write it before any shader code.

CHOREOGRAPHY MAP — track: [filename] ([duration]s, [BPM] BPM)
═══════════════════════════════════════════════════════════════

SECTIONS:
  [start] - [end]   [NAME]     — [description of what happens visually]
  [start] - [end]   [NAME]     — [description]
  ...

KEY HITS (one-shot events):
  [time]  — [what happens in the music] → [visual response]
  [time]  — [what happens] → [visual response]
  ...

ENERGY ARC:
  [prose description of overall energy shape, e.g.
   "slow build → explosive drop → sustained intensity → brief calm →
    second build → climactic transformation → fade"]

MOOD TRANSITIONS:
  [time]: [mood A] → [mood B]  (e.g. "warm/organic → cold/digital")
  ...

Rules for a good map:

  • Every section needs a distinct visual identity (if two sections look the same, merge them)
  • Drops must have a visual event — a drop with no visual impact is a wasted moment
  • Builds must have visible escalation — if nothing changes over 20s, the audience loses interest
  • Quiet sections must actually be quiet visually — contrast makes loud sections louder
  • The map should read like a story arc, not a flat list

Step 4: Compute BPM and Beat Grid

If the track has a steady beat, snapping phase boundaries to bar lines makes choreography feel musical rather than arbitrary.

function detectBPM(events, minBPM = 60, maxBPM = 180) {
  const energies = events.map(e => e.rms);
  const dt = events[1].time - events[0].time;
  const minLag = Math.floor(60 / (maxBPM * dt));
  const maxLag = Math.floor(60 / (minBPM * dt));

  let bestLag = minLag, bestCorr = -1;
  for (let lag = minLag; lag <= maxLag; lag++) {
    let corr = 0;
    for (let i = 0; i < energies.length - lag; i++) {
      corr += energies[i] * energies[i + lag];
    }
    if (corr > bestCorr) { bestCorr = corr; bestLag = lag; }
  }
  return 60 / (bestLag * dt);
}

With BPM known, compute the bar grid:

BPM: 128 → 1 beat = 0.469s, 1 bar (4/4) = 1.875s
BPM: 80  → 1 beat = 0.750s, 1 bar (4/4) = 3.000s
BPM: 140 → 1 beat = 0.429s, 1 bar (4/4) = 1.714s

Common section lengths:
  4 bars  = intro/outro, breakdown
  8 bars  = verse, build
  16 bars = chorus, main section
  32 bars = extended section

Snap your section boundaries to the nearest bar line. Drops almost always land on beat 1 of a bar. Phase transitions sound best on bar boundaries.

Step 5: Bake Energy Envelope (Optional, for Tight Sync)

For demos where choreography must perfectly track the audio energy — beyond what real-time FFT can provide — bake the energy envelope into a 1D texture:

const energyData = new Uint8Array(events.map(e => Math.min(255, e.rms * 512)));
const energyTex = gl.createTexture();
gl.bindTexture(gl.TEXTURE_2D, energyTex);
gl.texImage2D(gl.TEXTURE_2D, 0, gl.R8, energyData.length, 1, 0,
              gl.RED, gl.UNSIGNED_BYTE, energyData);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.LINEAR);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);

Sample in the shader:

uniform sampler2D energyTex;
uniform float trackDuration;
float bakedEnergy = texture(energyTex, vec2(t / trackDuration, 0.5)).r;

This is deterministic and frame-rate independent — no FFT latency, no device variance. Use it for geometry and camera (must be rock-solid). Layer real-time FFT on top for materials (benefits from organic variance).

You can bake multiple channels — pack bass/mid/treble/overall into RGBA for a full 4-band deterministic energy texture.

Step 6: Translate Map to Phase Code

Each section in the choreography map becomes a phase field. Each phase is a smoothstep that ramps 0→1 over the transition window:

// Pattern: ramp in, hold, ramp out
ph.sectionName = smoothstep(startTime, startTime + fadeIn, t)
               * (1.0 - smoothstep(endTime - fadeOut, endTime, t));

The fadeIn / fadeOut durations control transition sharpness:

  • 0.1s — near-instant snap (drops, impacts)
  • 0.5-1.0s — quick but smooth (normal transitions)
  • 2.0-5.0s — gradual blend (mood shifts, slow builds)
  • 10.0-30.0s — glacial evolution (entire build sections)

Each key hit becomes an exp() pulse:

float hit = exp(-abs(t - hitTime) * sharpness);
// sharpness 4-6: wide pulse, ~0.5s visible
// sharpness 8-12: tight spike, ~0.2s visible
// sharpness 15-20: near-instant flash

Step 7: Map Sections to Visual Strategies

Each section type has proven visual approaches:

Section Camera Geometry Materials Post-processing
Void/Intro Static or very slow drift Minimal or absent Dark, monochrome Clean, maybe subtle fog
Build Slow approach Growing, emerging Warming colors Increasing bloom
Drop Snap to new angle Burst/expansion Bright flash, saturated Bloom spike, screen shake
Main/Chorus Smooth orbit Full scene visible Rich, layered Moderate bloom
Breakdown Drift, lose focus Fracture, glitch Desaturated UV displacement, scanlines
Storm/Peak Fast movement, close Maximum complexity Hot, intense Heavy bloom, shake
Transition Dolly or whip pan Morph/transform Palette swap Color grading shift
Outro/Fade Slow pull back Simplifying Cooling/dimming Fade to black

Step 8: Iterate

  1. Play the demo with the track
  2. Where visuals feel early/late — adjust timestamps ±0.1-0.5s
  3. Where energy feels flat — add audio multipliers or one-shot pulses
  4. Where transitions feel abrupt — widen smoothstep ranges
  5. Where transitions feel mushy — narrow smoothstep ranges
  6. Repeat until every musical moment has a visual counterpart

Choreography Quality Checklist

  • Every drop has a visual impact (flash, snap, bloom spike)
  • Every build has visible escalation (growth, approach, brightening)
  • Every breakdown has a visual shift (glitch, desaturation, fracture)
  • Camera never stays static for more than ~15s
  • Adjacent sections have visual contrast
  • Quiet moments are actually quiet (dim, sparse, slow)
  • The finale feels like a finale (biggest visual moment, then resolution)
  • Mood transitions in the music have corresponding color/material shifts
  • One-shot hits land on the beat, not between beats
  • The demo still looks good without audio (base values are reasonable)

Audio-Visual Sync System

The sync system is hybrid: hardcoded timestamps from the choreography map provide structure, real-time FFT adds organic responsiveness.

Time Source: Always Use audio.currentTime

The shader time uniform must come from audioElement.currentTime, NOT from a JS accumulator or performance.now(). This locks visuals to audio playback position even when frames drop or the browser throttles.

const time = audioEl ? audioEl.currentTime : 0;
gl.uniform1f(timeLoc, time);

FFT Data Pipeline

Extract frequency bands each frame and pass as a vec4 uniform:

analyser.fftSize = 512;                  // 256 bins, good latency/resolution balance
analyser.smoothingTimeConstant = 0.55;   // responsive but not jittery

const freq = new Uint8Array(analyser.frequencyBinCount);
analyser.getByteFrequencyData(freq);

let bass = 0;    for (let i=0;  i<4;  i++) bass    += freq[i]; bass    /= 4*255;
let mid = 0;     for (let i=4;  i<16; i++) mid     += freq[i]; mid     /= 12*255;
let treble = 0;  for (let i=16; i<64; i++) treble  += freq[i]; treble  /= 48*255;
let overall = 0; for (let i=0;  i<128;i++) overall += freq[i]; overall /= 128*255;

gl.uniform4f(audioLoc, bass, mid, treble, overall); // all normalized 0–1

Tuning:

  • fftSize: 512 is a good default. 256 = snappier but coarser. 1024+ = more frequency detail but more latency.
  • smoothingTimeConstant: 0.55 is balanced. 0.3 = twitchy (good for percussion- heavy tracks). 0.8 = sluggish (good for ambient/drone).

What Each Band Should Drive

Band Freq range Visual target Example
bass (audio.x) Sub/kicks Camera shake, bloom pulse, ring glow col += glow * audio.x * 0.4
mid (audio.y) Melody/pads Color shifts, surface detail col *= 1.0 + audio.y * 0.15
treble (audio.z) Hi-hats/shimmer Scanline intensity, sparkle br *= 0.3 + audio.z * 0.7
overall (audio.w) Full energy Bloom intensity, global glow bloom *= 1.2 + audioE * 0.4

Critical Rule: Audio Never Drives SDF Geometry

Positions, sizes, angles, and shapes must be deterministic from time. Audio reactivity only affects materials, brightness, glow, and post-processing. This prevents visual jitter and ensures the demo looks correct even with audio analysis variance across devices.

Multiplier Pattern

Always use audio as a multiplier on a base value, never as the sole driver:

// CORRECT — works without audio, enhanced with it
col *= 1.0 + audio.x * 0.2;
bloom * (1.2 + audioE * 0.4);

// WRONG — black screen when audio is silent
col *= audio.x;

Camera Shake

Square the bass energy so only strong hits register:

float shake = audio.x * audio.x * 0.003;

Choreography: Phase-Based Timeline

Phase Struct

Define a GLSL struct encoding the current act. Each field is a smoothstep blend (0→1). The struct fields match sections from your choreography map:

struct Phase {
  float emerge;     // opening — geometry fades in
  float build;      // tension rising
  float ignite;     // the drop
  float system;     // full scene, cruising
  float storm;      // intensity peak
  float breakdown;  // momentary collapse
  float transform;  // mood/texture shift
  float reveal;     // resolution, finale
  float pulse;      // audio-reactive multiplier (1.0 + audio)
};

Adapt the field names and count to your track. A 2-minute ambient piece might have 3 phases. A 5-minute EDM track might have 10+.

Phase Transitions

// Snap transition (drop): 0.2s fade
ph.drop = smoothstep(dropTime - 0.1, dropTime + 0.1, t);

// Smooth transition (mood shift): 2s crossfade
ph.cold = smoothstep(shiftTime - 1.0, shiftTime + 1.0, t);

// Section with start and end (breakdown): ramp in, hold, ramp out
ph.breakdown = smoothstep(bdStart, bdStart + 0.2, t)
             * (1.0 - smoothstep(bdEnd - 0.3, bdEnd, t));

// Gradual build over 25s
ph.build = smoothstep(buildStart, buildEnd, t);

One-Shot Events

float pulse = exp(-abs(t - hitTime) * sharpness);

Dynamic Geometry with Phases

Geometry grows/shrinks by adding phase-gated terms:

radius += ph.build * 0.8;              // grows during build
radius += ph.storm * (0.5 + i * 0.2); // expands per-instance in storm
radius *= 1.0 - ph.breakdown * 0.3;   // shrinks during breakdown

Each term is gated by its phase so effects don't leak into other sections.

Camera Choreography

Camera position as a sum of phase-gated terms:

float camDist = initialDist
  - approach   * smoothstep(tA, tB, t)     // move in
  + pullBack   * smoothstep(tC, tD, t)     // move out
  - rushIn     * smoothstep(tE, tF, t);    // final approach

Each line is one camera move. Easy to read, reorder, and adjust independently. The camera should tell the same story as the music — approaching during builds, snapping on drops, drifting during calm sections, pulling back for the finale.


SDF Raymarching Patterns

Preventing Geometry Intersection

When objects must not intersect (e.g., rings around a star), compute the inner object's radius dynamically and clamp:

float innerR = getInnerRadius() + clearance;
outerR = max(outerR, innerR + index * gap);

Surface Detail via Noise

Layer noise at different frequencies for organic surfaces:

float detail = noise(p * 5.0 + t * 0.3) * noise(p * 7.0 - t * 0.5);
col *= 0.75 + detail * 0.5;

Animate noise offsets with t at different speeds per layer for convincing motion.


Post-Processing

Bloom

Bright extract → separable Gaussian blur → additive composite:

vec3 bloom = texture(blurTex, uv).rgb;
col += bloom * (1.2 + audioE * 0.4);

Glitch Effects

UV displacement gated to timestamp windows from the choreography map:

uv.x += step(0.99, sin(uv.y * 400.0 + t * 50.0)) * glitchAmt * 0.02; // scanline tears
uv.x += (floor(sin(uv.y * 8.0 + t * 3.0) * 4.0) / 4.0) * glitchAmt * 0.03; // block shifts

Fade to Black

float fade = smoothstep(endTime - fadeDuration, endTime, t);
col *= 1.0 - fade;

Common Pitfalls

  1. Audio context requires user gesture — always gate on a click-to-start overlay
  2. Shader compilation errors are silent — check gl.getShaderInfoLog() and log it
  3. smoothstep needs marginsmoothstep(10.0, 10.5, t) not step(10.0, t) for clean cuts
  4. Phase overlaps are features — two phases at 50/50 during crossfade = natural transition
  5. Audio uniform fallback — if analyser is null, return [0,0,0,0] so shaders still work
  6. Use onended not timeouts — actual playback may differ from expected track length
  7. Test without audio — the multiplier pattern ensures visuals degrade gracefully
  8. Choreography map first, code second — changing timestamps in code without updating the map leads to drift between intent and implementation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment