WhisperX Installation Guide for Windows

A comprehensive guide to installing and configuring WhisperX on Windows with GPU acceleration and speaker diarization.

Last tested: January 2026
Environment: Windows 11, NVIDIA GPU (CUDA 12.6), Miniconda, Python 3.10

Prerequisites
Conda Environment Setup
WhisperX Installation
Hugging Face Configuration
Wrapper Scripts
Usage
Troubleshooting
Known Warnings

Prerequisites

1. Miniconda or Anaconda

Install via Scoop (recommended) or download from conda.io:

scoop install miniconda3

2. NVIDIA GPU Drivers & CUDA

Ensure you have recent NVIDIA drivers. CUDA toolkit is bundled with PyTorch, so no separate installation is typically needed.

Verify GPU is detected:

nvidia-smi

3. Hugging Face Account

Create an account at huggingface.co — required for speaker diarization models.

Conda Environment Setup

Create a dedicated environment with Python 3.10 (recommended for compatibility):

conda create -n whisperx python=3.10 -y
conda activate whisperx

WhisperX Installation

Step 1: Install PyTorch with CUDA

Install PyTorch with CUDA support. Check pytorch.org for the latest command, but typically:

conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia

Or for CUDA 12.1:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Step 2: Install WhisperX

pip install whisperx

Step 3: Install Missing Dependencies

WhisperX has some dependencies that may not be automatically installed:

pip install requests

Step 4: Verify Installation

python -c "import whisperx; print('WhisperX OK')"
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

Hugging Face Configuration

Speaker diarization requires access to gated models on Hugging Face. This is the most common source of errors.

Step 1: Create Access Token

Go to huggingface.co/settings/tokens
Create a new token with a descriptive name (e.g., "WhisperX")
Enable: "Read access to contents of all public gated repos you can access"
Save the token securely

Step 2: Accept Model Licenses

You must visit each of these pages while logged in and accept the user agreement:

Model	URL
Speaker Diarization 3.1	huggingface.co/pyannote/speaker-diarization-3.1
Segmentation 3.0	huggingface.co/pyannote/segmentation-3.0

Look for "Gated model - You have been granted access to this model" after accepting.

Step 3: (Optional) Login via CLI

For persistent authentication without passing tokens:

conda activate whisperx
huggingface-cli login

Paste your token when prompted. This caches credentials in ~/.cache/huggingface/token.

Wrapper Scripts

WhisperX requires patches to work with modern PyTorch (2.6+) and huggingface_hub versions. Create these two files in your scripts directory (e.g., C:\dev\scripts\).

File 1: `run_whisperx_safe.py`

This Python wrapper applies necessary compatibility patches:

import sys
import os
import torch
import functools
import huggingface_hub

# === PATCH 1: Fix PyTorch 2.6+ Security Change ===
# Force "weights_only=False" globally to allow loading older models
_original_load = torch.load

@functools.wraps(_original_load)
def robust_load(*args, **kwargs):
    kwargs['weights_only'] = False
    return _original_load(*args, **kwargs)

torch.load = robust_load
# ================================================

# === PATCH 2: Fix Hugging Face "use_auth_token" Deprecation ===
# The library changed argument 'use_auth_token' to 'token'.
# We intercept the call and rename the argument on the fly.
_original_hf_download = huggingface_hub.hf_hub_download

@functools.wraps(_original_hf_download)
def robust_hf_download(*args, **kwargs):
    if 'use_auth_token' in kwargs:
        kwargs['token'] = kwargs.pop('use_auth_token')
    return _original_hf_download(*args, **kwargs)

huggingface_hub.hf_hub_download = robust_hf_download
# ================================================

# === PATCH 3: Set HF_TOKEN environment variable ===
# pyannote.audio reads from environment as fallback
if len(sys.argv) > 1:
    for i, arg in enumerate(sys.argv):
        if arg == '--hf_token' and i + 1 < len(sys.argv):
            os.environ['HF_TOKEN'] = sys.argv[i + 1]
            break
# ================================================

# Import WhisperX AFTER applying patches
from whisperx.__main__ import cli

if __name__ == "__main__":
    sys.exit(cli())

File 2: `transcribe_WhisperX.bat`

Batch script for easy transcription with drag-and-drop support:

@echo off
setlocal

:: ================= CONFIGURATION =================
:: PASTE YOUR HUGGING FACE TOKEN BELOW
set "HF_TOKEN=hf_YOUR_TOKEN_HERE"

:: NAME OF YOUR CONDA ENVIRONMENT
set "CONDA_ENV=whisperx"

:: SUPPRESS SYMLINK WARNINGS ON WINDOWS
set "HF_HUB_DISABLE_SYMLINKS_WARNING=1"
:: =================================================

:: Check if input file is provided
if "%~1"=="" (
    echo [ERROR] No input file provided.
    echo Usage: transcribe "path\to\audio.mp3"
    goto :EOF
)

:: Get absolute path of the input file
set "INPUT_FILE=%~f1"

:: Get directory of the input file
set "OUTPUT_DIR=%~dp1"
:: Remove trailing backslash to prevent quote escaping bugs
if "%OUTPUT_DIR:~-1%"=="\" set "OUTPUT_DIR=%OUTPUT_DIR:~0,-1%"

echo.
echo ----------------------------------------------------------------
echo  Source: %INPUT_FILE%
echo  Target: %OUTPUT_DIR%
echo ----------------------------------------------------------------
echo.

:: Activate the Conda environment
call conda activate %CONDA_ENV%

:: === RUN THE WRAPPER SCRIPT ===
:: We use "%~dp0" to find the python script in the same folder as this batch file.
python "%~dp0run_whisperx_safe.py" "%INPUT_FILE%" --model medium --diarize --hf_token %HF_TOKEN% --output_dir "%OUTPUT_DIR%" --device cuda --compute_type int8 --batch_size 4

echo.
echo ----------------------------------------------------------------
echo  Transcription Complete!
echo ----------------------------------------------------------------

:: Deactivate
call conda deactivate

endlocal

Note: Replace hf_YOUR_TOKEN_HERE with your actual Hugging Face token.

Optional: Add to PATH

Add your scripts directory to your system PATH, or create a shortcut/alias for easy access.

Usage

Basic Usage

C:\dev\scripts\transcribe_WhisperX.bat "C:\path\to\audio.mp3"

Drag and Drop

Create a shortcut to the batch file on your desktop. Drag audio files onto it to transcribe.

Output Files

Transcription creates multiple output formats in the same directory as the input file:

.json — Full transcript with timestamps and speaker labels
.srt — Subtitle format
.vtt — WebVTT subtitle format
.txt — Plain text transcript

Command-Line Options

Common options you can modify in the batch file:

Option	Description	Values
`--model`	Whisper model size	`tiny`, `base`, `small`, `medium`, `large-v2`, `large-v3`
`--device`	Compute device	`cuda`, `cpu`
`--compute_type`	Precision	`float16`, `int8`, `float32`
`--batch_size`	Batch size for GPU	`1`-`32` (depends on VRAM)
`--diarize`	Enable speaker diarization	flag (no value)
`--language`	Force language	`en`, `nl`, `de`, etc. (auto-detect if omitted)
`--min_speakers`	Minimum speakers	integer
`--max_speakers`	Maximum speakers	integer

Troubleshooting

Error: `'NoneType' object has no attribute 'to'`

Cause: Hugging Face authentication failed for gated models.

Solution:

Verify you accepted licenses at:
- https://huggingface.co/pyannote/speaker-diarization-3.1
- https://huggingface.co/pyannote/segmentation-3.0
Check token has "gated repos" read permission
Ensure token is correctly set in batch file

Error: `ModuleNotFoundError: No module named 'requests'`

Cause: Missing dependency not installed by WhisperX.

Solution:

conda activate whisperx
pip install requests

Error: `weights_only` or pickle-related errors

Cause: PyTorch 2.6+ changed default security settings for torch.load().

Solution: Use the run_whisperx_safe.py wrapper which patches this automatically.

Error: `use_auth_token` deprecation / token not working

Cause: huggingface_hub renamed the parameter from use_auth_token to token.

Solution: The wrapper script handles this. Also ensure HF_TOKEN environment variable is set.

Slow first run

Cause: Models are downloaded on first use (~3-5 GB total).

Solution: Wait for downloads to complete. Subsequent runs use cached models from:

~/.cache/huggingface/
~/.cache/torch/

CUDA out of memory

Solution: Reduce batch size or use a smaller model:

python ... --batch_size 2 --model small

Known Warnings

These warnings are safe to ignore — they don't affect functionality:

Warning	Explanation
`torchaudio._backend.list_audio_backends deprecated`	Future API change in torchaudio
`Model was trained with pyannote.audio 0.0.1`	Version mismatch, but backward compatible
`TensorFloat-32 (TF32) has been disabled`	Reproducibility safeguard
`speechbrain.pretrained deprecated`	Auto-redirects to new API
`symlinks not supported`	Windows limitation, uses more disk space
`Lightning upgraded your checkpoint`	Automatic checkpoint format update
`std(): degrees of freedom <= 0`	Edge case in speaker embedding calculation

To suppress the symlink warning, set in your batch file:

set "HF_HUB_DISABLE_SYMLINKS_WARNING=1"

Quick Reference

# Create environment
conda create -n whisperx python=3.10 -y
conda activate whisperx

# Install dependencies
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install whisperx requests

# Login to Hugging Face (one-time)
huggingface-cli login

# Accept model licenses (visit in browser while logged in):
# - https://huggingface.co/pyannote/speaker-diarization-3.1
# - https://huggingface.co/pyannote/segmentation-3.0

# Run transcription
python run_whisperx_safe.py "audio.mp3" --model medium --diarize --hf_token YOUR_TOKEN --output_dir . --device cuda

References

Created after extensive troubleshooting. May your transcriptions be swift and your speakers correctly identified.

Foadsf/WhisperX-Installation-Guide.md

Select an option

No results found

Select an option

No results found

WhisperX Installation Guide for Windows

Table of Contents

Prerequisites

1. Miniconda or Anaconda

2. NVIDIA GPU Drivers & CUDA

3. Hugging Face Account

Conda Environment Setup

WhisperX Installation

Step 1: Install PyTorch with CUDA

Step 2: Install WhisperX

Step 3: Install Missing Dependencies

Step 4: Verify Installation

Hugging Face Configuration

Step 1: Create Access Token

Step 2: Accept Model Licenses

Step 3: (Optional) Login via CLI

Wrapper Scripts

File 1: `run_whisperx_safe.py`

File 2: `transcribe_WhisperX.bat`

Optional: Add to PATH

Usage

Basic Usage

Drag and Drop

Output Files

Command-Line Options

Troubleshooting

Error: `'NoneType' object has no attribute 'to'`

Error: `ModuleNotFoundError: No module named 'requests'`

Error: `weights_only` or pickle-related errors

Error: `use_auth_token` deprecation / token not working

Slow first run

CUDA out of memory

Known Warnings

Quick Reference

References

Foadsf/WhisperX-Installation-Guide.md

WhisperX Installation Guide for Windows

Table of Contents

Prerequisites

1. Miniconda or Anaconda

2. NVIDIA GPU Drivers & CUDA

3. Hugging Face Account

Conda Environment Setup

WhisperX Installation

Step 1: Install PyTorch with CUDA

Step 2: Install WhisperX

Step 3: Install Missing Dependencies

Step 4: Verify Installation

Hugging Face Configuration

Step 1: Create Access Token

Step 2: Accept Model Licenses

Step 3: (Optional) Login via CLI

Wrapper Scripts

File 1: run_whisperx_safe.py

File 2: transcribe_WhisperX.bat

Optional: Add to PATH

Usage

Basic Usage

Drag and Drop

Output Files

Command-Line Options

Troubleshooting

Error: 'NoneType' object has no attribute 'to'

Error: ModuleNotFoundError: No module named 'requests'

Error: weights_only or pickle-related errors

Error: use_auth_token deprecation / token not working

Slow first run

CUDA out of memory

Known Warnings

Quick Reference

References

File 1: `run_whisperx_safe.py`

File 2: `transcribe_WhisperX.bat`

Error: `'NoneType' object has no attribute 'to'`

Error: `ModuleNotFoundError: No module named 'requests'`

Error: `weights_only` or pickle-related errors

Error: `use_auth_token` deprecation / token not working