Skip to content

Instantly share code, notes, and snippets.

@Foadsf
Created January 26, 2026 13:28
Show Gist options
  • Select an option

  • Save Foadsf/121e3d68bb7ec0a8fedc6130513c78ca to your computer and use it in GitHub Desktop.

Select an option

Save Foadsf/121e3d68bb7ec0a8fedc6130513c78ca to your computer and use it in GitHub Desktop.

WhisperX Installation Guide for Windows

A comprehensive guide to installing and configuring WhisperX on Windows with GPU acceleration and speaker diarization.

Last tested: January 2026
Environment: Windows 11, NVIDIA GPU (CUDA 12.6), Miniconda, Python 3.10


Table of Contents

  1. Prerequisites
  2. Conda Environment Setup
  3. WhisperX Installation
  4. Hugging Face Configuration
  5. Wrapper Scripts
  6. Usage
  7. Troubleshooting
  8. Known Warnings

Prerequisites

1. Miniconda or Anaconda

Install via Scoop (recommended) or download from conda.io:

scoop install miniconda3

2. NVIDIA GPU Drivers & CUDA

Ensure you have recent NVIDIA drivers. CUDA toolkit is bundled with PyTorch, so no separate installation is typically needed.

Verify GPU is detected:

nvidia-smi

3. Hugging Face Account

Create an account at huggingface.co — required for speaker diarization models.


Conda Environment Setup

Create a dedicated environment with Python 3.10 (recommended for compatibility):

conda create -n whisperx python=3.10 -y
conda activate whisperx

WhisperX Installation

Step 1: Install PyTorch with CUDA

Install PyTorch with CUDA support. Check pytorch.org for the latest command, but typically:

conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia

Or for CUDA 12.1:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Step 2: Install WhisperX

pip install whisperx

Step 3: Install Missing Dependencies

WhisperX has some dependencies that may not be automatically installed:

pip install requests

Step 4: Verify Installation

python -c "import whisperx; print('WhisperX OK')"
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

Hugging Face Configuration

Speaker diarization requires access to gated models on Hugging Face. This is the most common source of errors.

Step 1: Create Access Token

  1. Go to huggingface.co/settings/tokens
  2. Create a new token with a descriptive name (e.g., "WhisperX")
  3. Enable: "Read access to contents of all public gated repos you can access"
  4. Save the token securely

Step 2: Accept Model Licenses

You must visit each of these pages while logged in and accept the user agreement:

Model URL
Speaker Diarization 3.1 huggingface.co/pyannote/speaker-diarization-3.1
Segmentation 3.0 huggingface.co/pyannote/segmentation-3.0

Look for "Gated model - You have been granted access to this model" after accepting.

Step 3: (Optional) Login via CLI

For persistent authentication without passing tokens:

conda activate whisperx
huggingface-cli login

Paste your token when prompted. This caches credentials in ~/.cache/huggingface/token.


Wrapper Scripts

WhisperX requires patches to work with modern PyTorch (2.6+) and huggingface_hub versions. Create these two files in your scripts directory (e.g., C:\dev\scripts\).

File 1: run_whisperx_safe.py

This Python wrapper applies necessary compatibility patches:

import sys
import os
import torch
import functools
import huggingface_hub

# === PATCH 1: Fix PyTorch 2.6+ Security Change ===
# Force "weights_only=False" globally to allow loading older models
_original_load = torch.load

@functools.wraps(_original_load)
def robust_load(*args, **kwargs):
    kwargs['weights_only'] = False
    return _original_load(*args, **kwargs)

torch.load = robust_load
# ================================================

# === PATCH 2: Fix Hugging Face "use_auth_token" Deprecation ===
# The library changed argument 'use_auth_token' to 'token'.
# We intercept the call and rename the argument on the fly.
_original_hf_download = huggingface_hub.hf_hub_download

@functools.wraps(_original_hf_download)
def robust_hf_download(*args, **kwargs):
    if 'use_auth_token' in kwargs:
        kwargs['token'] = kwargs.pop('use_auth_token')
    return _original_hf_download(*args, **kwargs)

huggingface_hub.hf_hub_download = robust_hf_download
# ================================================

# === PATCH 3: Set HF_TOKEN environment variable ===
# pyannote.audio reads from environment as fallback
if len(sys.argv) > 1:
    for i, arg in enumerate(sys.argv):
        if arg == '--hf_token' and i + 1 < len(sys.argv):
            os.environ['HF_TOKEN'] = sys.argv[i + 1]
            break
# ================================================

# Import WhisperX AFTER applying patches
from whisperx.__main__ import cli

if __name__ == "__main__":
    sys.exit(cli())

File 2: transcribe_WhisperX.bat

Batch script for easy transcription with drag-and-drop support:

@echo off
setlocal

:: ================= CONFIGURATION =================
:: PASTE YOUR HUGGING FACE TOKEN BELOW
set "HF_TOKEN=hf_YOUR_TOKEN_HERE"

:: NAME OF YOUR CONDA ENVIRONMENT
set "CONDA_ENV=whisperx"

:: SUPPRESS SYMLINK WARNINGS ON WINDOWS
set "HF_HUB_DISABLE_SYMLINKS_WARNING=1"
:: =================================================

:: Check if input file is provided
if "%~1"=="" (
    echo [ERROR] No input file provided.
    echo Usage: transcribe "path\to\audio.mp3"
    goto :EOF
)

:: Get absolute path of the input file
set "INPUT_FILE=%~f1"

:: Get directory of the input file
set "OUTPUT_DIR=%~dp1"
:: Remove trailing backslash to prevent quote escaping bugs
if "%OUTPUT_DIR:~-1%"=="\" set "OUTPUT_DIR=%OUTPUT_DIR:~0,-1%"

echo.
echo ----------------------------------------------------------------
echo  Source: %INPUT_FILE%
echo  Target: %OUTPUT_DIR%
echo ----------------------------------------------------------------
echo.

:: Activate the Conda environment
call conda activate %CONDA_ENV%

:: === RUN THE WRAPPER SCRIPT ===
:: We use "%~dp0" to find the python script in the same folder as this batch file.
python "%~dp0run_whisperx_safe.py" "%INPUT_FILE%" --model medium --diarize --hf_token %HF_TOKEN% --output_dir "%OUTPUT_DIR%" --device cuda --compute_type int8 --batch_size 4

echo.
echo ----------------------------------------------------------------
echo  Transcription Complete!
echo ----------------------------------------------------------------

:: Deactivate
call conda deactivate

endlocal

Note: Replace hf_YOUR_TOKEN_HERE with your actual Hugging Face token.

Optional: Add to PATH

Add your scripts directory to your system PATH, or create a shortcut/alias for easy access.


Usage

Basic Usage

C:\dev\scripts\transcribe_WhisperX.bat "C:\path\to\audio.mp3"

Drag and Drop

Create a shortcut to the batch file on your desktop. Drag audio files onto it to transcribe.

Output Files

Transcription creates multiple output formats in the same directory as the input file:

  • .json — Full transcript with timestamps and speaker labels
  • .srt — Subtitle format
  • .vtt — WebVTT subtitle format
  • .txt — Plain text transcript

Command-Line Options

Common options you can modify in the batch file:

Option Description Values
--model Whisper model size tiny, base, small, medium, large-v2, large-v3
--device Compute device cuda, cpu
--compute_type Precision float16, int8, float32
--batch_size Batch size for GPU 1-32 (depends on VRAM)
--diarize Enable speaker diarization flag (no value)
--language Force language en, nl, de, etc. (auto-detect if omitted)
--min_speakers Minimum speakers integer
--max_speakers Maximum speakers integer

Troubleshooting

Error: 'NoneType' object has no attribute 'to'

Cause: Hugging Face authentication failed for gated models.

Solution:

  1. Verify you accepted licenses at:
  2. Check token has "gated repos" read permission
  3. Ensure token is correctly set in batch file

Error: ModuleNotFoundError: No module named 'requests'

Cause: Missing dependency not installed by WhisperX.

Solution:

conda activate whisperx
pip install requests

Error: weights_only or pickle-related errors

Cause: PyTorch 2.6+ changed default security settings for torch.load().

Solution: Use the run_whisperx_safe.py wrapper which patches this automatically.

Error: use_auth_token deprecation / token not working

Cause: huggingface_hub renamed the parameter from use_auth_token to token.

Solution: The wrapper script handles this. Also ensure HF_TOKEN environment variable is set.

Slow first run

Cause: Models are downloaded on first use (~3-5 GB total).

Solution: Wait for downloads to complete. Subsequent runs use cached models from:

  • ~/.cache/huggingface/
  • ~/.cache/torch/

CUDA out of memory

Solution: Reduce batch size or use a smaller model:

python ... --batch_size 2 --model small

Known Warnings

These warnings are safe to ignore — they don't affect functionality:

Warning Explanation
torchaudio._backend.list_audio_backends deprecated Future API change in torchaudio
Model was trained with pyannote.audio 0.0.1 Version mismatch, but backward compatible
TensorFloat-32 (TF32) has been disabled Reproducibility safeguard
speechbrain.pretrained deprecated Auto-redirects to new API
symlinks not supported Windows limitation, uses more disk space
Lightning upgraded your checkpoint Automatic checkpoint format update
std(): degrees of freedom <= 0 Edge case in speaker embedding calculation

To suppress the symlink warning, set in your batch file:

set "HF_HUB_DISABLE_SYMLINKS_WARNING=1"

Quick Reference

# Create environment
conda create -n whisperx python=3.10 -y
conda activate whisperx

# Install dependencies
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install whisperx requests

# Login to Hugging Face (one-time)
huggingface-cli login

# Accept model licenses (visit in browser while logged in):
# - https://huggingface.co/pyannote/speaker-diarization-3.1
# - https://huggingface.co/pyannote/segmentation-3.0

# Run transcription
python run_whisperx_safe.py "audio.mp3" --model medium --diarize --hf_token YOUR_TOKEN --output_dir . --device cuda

References


Created after extensive troubleshooting. May your transcriptions be swift and your speakers correctly identified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment