LVM Thin Pool Disaster: Lessons Learned from a Proxmox Media Server

TL;DR: Don't Set Up Thin Pools Like This

I spent hours fighting filesystem corruption on my Proxmox media server. The root cause? LVM thin pool with only 20MB of physical free space. Here's what went wrong and how to avoid it.

The Setup

Hardware: Proxmox VE 8.4.1 with Seagate SlimBUP 931GB external drive
Storage: LVM thin pool (SlimBUP) with containers for Plex and media services
Problem: Thin pool allocated 931.5GB out of 931.51GB physical space (leaving only 20MB free)

The Symptoms

Initial Signs

Filesystem corruption errors: EXT4-fs error: Detected aborted journal
I/O errors: Buffer I/O error on dev dm-13, logical block 9258
Thin pool metadata failures: device-mapper: thin: dm_thin_find_block() failed: error = -5
Filesystem remounted read-only

What Triggered It

Downloading media files (~14GB) pushed thin pool from 70% → 80%
Plex transcoding - writing temp files triggered metadata operations
Running fsck - filesystem repair operations write to physical volume
Any write-heavy operation at high capacity

The Root Cause

Thin Pool Design Flaw

# Physical volume status
PV         VG      Fmt  Attr PSize   PFree
/dev/sdd1  SlimBUP lvm2 a--  931.51g 20.00m

# Thin pool allocation
[SlimBUP_tdata]   - 912.86g (data storage)
[SlimBUP_tmeta]   - 9.32g   (metadata)
[lvol0_pmspare]   - 9.32g   (spare metadata)
Total: 931.5g (leaving only 20MB PFree)

The Problem: Thin pools need physical free space for:

Metadata updates during write operations
Copy-on-write operations
Journal operations
Filesystem repairs

With only 20MB free, any operation that writes to the physical volume causes metadata allocation failures, triggering cascading corruption.

What Didn't Work

Attempt 1: Deleting Media

Deleted 100GB of media files
Thin pool usage dropped from 89% → 64%
BUT: Filesystem metadata was corrupt, so free space wasn't registered
Required: e2fsck to fix metadata, then fstrim to inform thin pool

Attempt 2: Filesystem Repair

Ran e2fsck -f -y /dev/SlimBUP/vm-103-disk-0
Fixed corrupt metadata
BUT: Running fsck itself writes to physical volume (20MB space)
Result: Created MORE corruption while trying to fix corruption

Attempt 3: Just Using the System Normally

Thought we were safe at 73% thin pool usage (242GB free internally)
User played a video in Plex
Plex transcoder wrote temp files
Triggered the same corruption cascade

The Real Fix

Immediate Solution

Move Plex transcoding off the thin pool:

# Create transcoder directory on Proxmox host (not on thin pool)
mkdir -p /var/lib/plex-transcoder
chown 100000:100000 /var/lib/plex-transcoder

# Add mount point to Plex container
pct set 101 -mp2 /var/lib/plex-transcoder,mp=/transcode

# Restart Plex
pct stop 101 && pct start 101

Then configure Plex:

Settings → Transcoder
Set "Transcoder temporary directory" to /transcode
Save

Why this works: Transcoding temp files write to Proxmox host's local storage (49GB free) instead of the thin pool with 20MB physical free.

Long-term Solutions

Option 1: Rebuild Thin Pool Correctly

WRONG WAY (what was done):

# Allocates entire physical volume
lvcreate -T -L 912.86g SlimBUP/SlimBUP

RIGHT WAY:

# Leave 10-15% physical space free
lvcreate -T -L 800g SlimBUP/SlimBUP
# This leaves ~131GB physical free for metadata operations

Option 2: Don't Use Thin Pools

Use regular LVM volumes or direct filesystem on partitions:

# Create regular logical volume instead
lvcreate -L 800g -n media SlimBUP
mkfs.ext4 /dev/SlimBUP/media

Pros: No thin pool metadata issues, simpler management Cons: No snapshots, no overprovisioning

Option 3: Use ZFS Instead

Proxmox supports ZFS which handles thin provisioning better:

zpool create -o ashift=12 media /dev/sdd
zfs create -o compression=lz4 media/storage

Recovery Procedure

When corruption happens:

# 1. Stop all containers using the thin pool
pct stop 101
pct stop 103

# 2. Deactivate and reactivate the volume
lvchange -an SlimBUP/vm-103-disk-0
lvchange -ay SlimBUP/vm-103-disk-0

# 3. Run filesystem check
e2fsck -f -y /dev/SlimBUP/vm-103-disk-0

# 4. Mount and trim to reclaim space
mount /dev/SlimBUP/vm-103-disk-0 /mnt
fstrim -v /mnt
umount /mnt

# 5. Check thin pool status
lvs SlimBUP

# 6. Restart containers
pct start 103
pct start 101

Key Lessons

1. Always Leave Physical Free Space

Minimum 10-15% of physical volume should remain unallocated
Thin pools need room for metadata operations

2. Separate Write-Heavy Operations

Transcoding, temp files, downloads → separate storage
Media library → can live on thin pool
Don't mix temp/volatile with permanent storage

3. Monitor Thin Pool Usage

# Check regularly
lvs -o lv_name,data_percent,metadata_percent SlimBUP

# Set up alerts at:
# - 80% data usage (warning)
# - 85% data usage (critical)
# - 50% metadata usage (warning)

4. The 20MB Physical Free is Normal... Until It's Not

Thin pools can run for weeks with 20MB PFree
Normal operations (downloads, streaming) write INSIDE the thin pool
Problems occur when operations touch the physical volume:
- Filesystem checks (fsck)
- LVM metadata updates
- Heavy write operations at high capacity

5. Corruption Cascades

Once corruption starts:

Filesystem goes read-only
Can't fix filesystem without stopping containers
Running fsck writes to physical volume (triggers more corruption)
Each repair attempt can make it worse
Need to stop EVERYTHING to break the cycle

Best Practices for Proxmox Media Servers

Storage Layout

Physical Drive (931GB)
├── LVM Physical Volume (931GB)
│   ├── Thin Pool (800GB max) ← Leave headroom!
│   │   ├── vm-101-disk-0 (16GB) - Plex OS
│   │   └── vm-103-disk-0 (820GB) - Media storage
│   └── Unallocated (131GB) ← Breathing room
│
Host Local Storage (for temp files)
├── /var/lib/plex-transcoder (transcoding temp)
└── /var/tmp/downloads (incomplete downloads)

Container Configuration

Plex Container (/etc/pve/lxc/101.conf):

mp0: SlimBUP:vm-103-disk-0,mp=/shared_root  # Media access
mp1: /mnt/plex-usb,mp=/mnt/usb-media        # USB archive
mp2: /var/lib/plex-transcoder,mp=/transcode # Temp transcoding

Media Services Container (/etc/pve/lxc/103.conf):

Radarr, Sonarr, SABnzbd should run with PUID=1000 PGID=1000
Prevents permission issues with media files

Radarr Docker Setup

docker run -d \
  --name radarr \
  --restart=unless-stopped \
  -e PUID=1000 \
  -e PGID=1000 \
  -e TZ=America/New_York \
  -v /mnt/media/config/radarr:/config \
  -v /mnt/media/downloads:/data/downloads \
  -v /mnt/media/movies-slimbup:/data/movies \
  --network=host \
  linuxserver/radarr

Signs You're About to Have a Bad Time

Watch for these warnings:

# Check thin pool usage
lvs SlimBUP

# If you see:
Data% > 80%  # Warning zone
Data% > 85%  # Danger zone
Meta% > 50%  # Metadata issues incoming

# Check physical free space
pvs

# If you see:
PFree < 100GB  # Concerning
PFree < 50GB   # Dangerous
PFree < 1GB    # You're gonna have a bad time

Related Issues to Watch For

Permission Problems

Files created by containers may have wrong ownership:

# Fix ownership (from host)
pct exec 103 -- chown -R 1000:1000 /mnt/media/movies-slimbup/

Bind Mount Visibility

If Container A bind mounts Container B's filesystem, and Container B mounts additional drives AFTER the bind mount is created, Container A won't see them. Solution: Mount drives before starting containers.

Resources

Conclusion

Don't create thin pools that use 100% of physical volume space. The 20MB physical free is a ticking time bomb. Leave 10-15% unallocated, move write-heavy operations off the thin pool, and monitor usage closely.

Your future self will thank you when you're not spending hours fighting filesystem corruption at 2 AM.

Lessons learned the hard way on 2025-12-07. Total downtime: ~4 hours. Data lost: 0 (got lucky).

jmanhype/lvm-thin-pool-lessons-learned.md

Select an option

No results found