A comprehensive research document analyzing Tumblr's video infrastructure, embed patterns, stream formats, and optimal download strategies using modern tools
Authors: SERP Apps
Date: December 2024
Version: 1.0
This research document provides a comprehensive technical analysis of Tumblr's video delivery infrastructure, including embed URL patterns, content delivery networks (CDNs), stream formats, and optimal download methodologies. We examine Tumblr's video hosting architecture, the Neue Post Format (NPF) video blocks, external video embeds, and provide practical implementation guidance using industry-standard tools like yt-dlp, ffmpeg, gallery-dl, and alternative solutions for reliable video extraction and download.
- Introduction
- Tumblr Video Infrastructure Overview
- URL Patterns and Detection
- Stream Formats and CDN Analysis
- yt-dlp Implementation Strategies
- FFmpeg Processing Techniques
- Alternative Tools and Backup Methods
- Tumblr API Integration
- Implementation Recommendations
- Troubleshooting and Edge Cases
- Conclusion
Tumblr is a versatile microblogging platform that supports various media types including videos, images, GIFs, and text posts. Unlike dedicated video platforms, Tumblr serves as both a content host and an aggregator of embedded content from external platforms like YouTube and Vimeo. This dual nature presents unique challenges for video download implementations.
This document covers:
- Technical analysis of Tumblr's native video hosting infrastructure
- Comprehensive URL pattern recognition for posts and embedded videos
- Stream format analysis and codec specifications
- Practical implementation using open-source command-line tools
- API-based approaches for programmatic access
- Alternative tools and backup strategies for edge cases
Our research methodology includes:
- Network traffic analysis of Tumblr video playback
- API documentation review and testing
- URL pattern extraction and validation
- Testing with various video types (native, embedded, reblogged)
- Cross-platform validation across CDN endpoints
Tumblr utilizes its own CDN infrastructure for hosting native video content:
Primary Video CDN:
- Domain:
va.media.tumblr.com - Purpose: Direct video file hosting (MP4)
- Geographic Distribution: Global edge locations
Caption/Subtitle CDN:
- Domain:
vtt.tumblr.com - Purpose: WebVTT subtitle/caption file hosting
- Format:
.vttfiles
Static Assets CDN:
- Domain:
64.media.tumblr.com - Purpose: Images, thumbnails, and static media
- Note: May also serve some video content
Tumblr's native video processing follows this pipeline:
- Upload: User uploads video file via web/mobile interface
- Validation: File size (≤500MB), duration (≤10 minutes), format (MP4/MOV) checks
- Transcoding: Video converted to H.264/AAC format if needed
- Resolution Optimization: Scaled to fit Tumblr's display constraints
- CDN Distribution: Files distributed to
va.media.tumblr.com - Metadata Extraction: Duration, dimensions, thumbnails generated
Tumblr handles three distinct video content types:
- Hosted directly on Tumblr's CDN
- Subject to upload limits and transcoding
- Direct MP4 download URLs available
- YouTube, Vimeo, Dailymotion embeds
- Rendered via iframe in posts
- Require separate extractor handling
- Inherited from original posts
- May be native or external
- Maintain original source information
| Property | Specification |
|---|---|
| Maximum File Size | 500 MB |
| Maximum Duration | 10 minutes per video |
| Daily Upload Limit | 20 videos / 60 minutes total |
| Supported Formats | MP4, MOV |
| Video Codec | H.264 (AVC) |
| Audio Codec | AAC |
| Recommended Resolution | ≤720p (1280×720) |
| Display Dimensions | Max 500×700 pixels |
# Modern format (recommended)
https://www.tumblr.com/{blogname}/{postid}
https://www.tumblr.com/{blogname}/{postid}/{slug}
https://tumblr.com/{blogname}/{postid}
# Subdomain format (legacy but still active)
https://{blogname}.tumblr.com/post/{postid}
https://{blogname}.tumblr.com/post/{postid}/{slug}
# Short format
https://tmblr.co/{shortcode}
# Modern format
https://www.tumblr.com/exampleblog/123456789012
https://www.tumblr.com/exampleblog/123456789012/my-video-post
# Subdomain format
https://exampleblog.tumblr.com/post/123456789012
https://exampleblog.tumblr.com/post/123456789012/my-video-post# Standard MP4 video
https://va.media.tumblr.com/tumblr_{video_id}.mp4
# With quality suffix
https://va.media.tumblr.com/tumblr_{video_id}_480.mp4
https://va.media.tumblr.com/tumblr_{video_id}_720.mp4
# Alternative pattern
https://ve.media.tumblr.com/{post_id}/{filename}.mp4
https://vtt.tumblr.com/tumblr_{video_id}.vtt
# Extract blog name and post ID from modern URL
https?://(?:www\.)?tumblr\.com/([\w-]+)/(\d+)(?:/[\w-]*)?
# Extract from subdomain URL
https?://([\w-]+)\.tumblr\.com/post/(\d+)
# Extract post ID only
/post/(\d+)|tumblr\.com/[\w-]+/(\d+)
# Extract direct video file
https?://va\.media\.tumblr\.com/tumblr_([a-zA-Z0-9]+)(?:_\d+)?\.mp4Using grep for pattern extraction:
# Extract Tumblr post URLs from HTML files
grep -oE "https?://(?:www\.)?tumblr\.com/[\w-]+/\d+" input.html
# Extract video URLs from page source
grep -oE "https?://va\.media\.tumblr\.com/tumblr_[a-zA-Z0-9_]+\.mp4" input.html
# Extract from subdomain format
grep -oE "https?://[\w-]+\.tumblr\.com/post/\d+" input.html
# Extract post IDs only
grep -oE "tumblr\.com/[\w-]+/(\d+)" input.html | grep -oE "\d+$"Using yt-dlp for detection and metadata extraction:
# Test if URL contains downloadable video
yt-dlp --dump-json "https://www.tumblr.com/blogname/123456789" | jq '.id'
# List available formats
yt-dlp --list-formats "https://www.tumblr.com/blogname/123456789"
# Extract all video information
yt-dlp --dump-json "https://www.tumblr.com/blogname/123456789" > video_info.jsonUsing curl to inspect pages:
# Get page source and look for video URLs
curl -s "https://www.tumblr.com/blogname/123456789" | grep -oE "va\.media\.tumblr\.com[^\"']*"
# Check response headers
curl -I "https://va.media.tumblr.com/tumblr_example.mp4"- Container: MP4
- Video Codec: H.264 (AVC)
- Audio Codec: AAC
- Profile: Main/High
- Quality Levels: 480p, 720p (varies by upload)
- Bitrates: Typically 1,000-4,000 kbps
| Quality | Resolution | Video Bitrate | Audio Bitrate | Typical Size |
|---|---|---|---|---|
| SD | 480p (848×480) | 1,000-2,000 kbps | 128 kbps | ~15 MB/min |
| HD | 720p (1280×720) | 2,500-4,000 kbps | 128-192 kbps | ~30 MB/min |
# Standard format
https://va.media.tumblr.com/tumblr_{video_id}.mp4
# With quality indicator
https://va.media.tumblr.com/tumblr_{video_id}_480.mp4
https://va.media.tumblr.com/tumblr_{video_id}_720.mp4
# With post reference
https://va.media.tumblr.com/{post_id}/{filename}.mp4# Test primary video CDN
curl -I "https://va.media.tumblr.com/tumblr_{video_id}.mp4"
# Check content type and size
curl -sI "https://va.media.tumblr.com/tumblr_{video_id}.mp4" | grep -E "Content-Type|Content-Length"
# Test with different user agents
curl -I -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)" \
"https://va.media.tumblr.com/tumblr_{video_id}.mp4"Tumblr supports external video embeds from major platforms:
| Platform | Detection Pattern | Extractor |
|---|---|---|
| YouTube | youtube.com/embed/ or youtu.be/ |
yt-dlp native |
| Vimeo | player.vimeo.com/video/ |
yt-dlp native |
| Dailymotion | dailymotion.com/embed/ |
yt-dlp native |
| Native Tumblr | va.media.tumblr.com/ |
Direct download |
# Download from Tumblr post URL
yt-dlp "https://www.tumblr.com/blogname/123456789"
# Download from subdomain URL
yt-dlp "https://blogname.tumblr.com/post/123456789"
# Download with custom filename
yt-dlp -o "%(uploader)s - %(title)s.%(ext)s" "https://www.tumblr.com/blogname/123456789"# List available formats
yt-dlp -F "https://www.tumblr.com/blogname/123456789"
# Download best quality MP4
yt-dlp -f "best[ext=mp4]" "https://www.tumblr.com/blogname/123456789"
# Download specific quality
yt-dlp -f "best[height<=720]" "https://www.tumblr.com/blogname/123456789"
# Best video + best audio merged
yt-dlp -f "bv+ba/best" "https://www.tumblr.com/blogname/123456789"# Download with metadata
yt-dlp --write-info-json --write-thumbnail "https://www.tumblr.com/blogname/123456789"
# Download with subtitles if available
yt-dlp --write-subs --sub-langs all "https://www.tumblr.com/blogname/123456789"
# Rate limiting to avoid blocks
yt-dlp --limit-rate 1M "https://www.tumblr.com/blogname/123456789"
# With retry logic
yt-dlp --retries 5 --fragment-retries 5 "https://www.tumblr.com/blogname/123456789"
# Custom user agent
yt-dlp --user-agent "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36" \
"https://www.tumblr.com/blogname/123456789"# Download from URL list file
yt-dlp -a tumblr_urls.txt
# With archive tracking (skip already downloaded)
yt-dlp --download-archive downloaded.txt -a tumblr_urls.txt
# With output directory
yt-dlp -o "./downloads/%(uploader)s/%(title)s.%(ext)s" -a tumblr_urls.txt#!/bin/bash
# Batch download Tumblr videos with error handling
download_tumblr_batch() {
local url_file="$1"
local output_dir="${2:-./tumblr_downloads}"
mkdir -p "$output_dir"
while IFS= read -r url; do
[[ "$url" =~ ^#.*$ ]] && continue # Skip comments
[[ -z "$url" ]] && continue # Skip empty lines
echo "Downloading: $url"
yt-dlp \
--output "$output_dir/%(uploader)s - %(title)s.%(ext)s" \
--format "best[ext=mp4]/best" \
--write-info-json \
--retries 5 \
--rate-limit 1M \
--sleep-interval 2 \
"$url"
# Brief pause between downloads
sleep 3
done < "$url_file"
}
# Usage: download_tumblr_batch urls.txt ./my_downloads# Download with comprehensive error handling
yt-dlp \
--retries 5 \
--fragment-retries 5 \
--retry-sleep linear:2:10:2 \
--socket-timeout 30 \
--ignore-errors \
"https://www.tumblr.com/blogname/123456789"
# Verbose output for debugging
yt-dlp -v "https://www.tumblr.com/blogname/123456789" 2>&1 | tee download.log
# Continue partial downloads
yt-dlp --continue "https://www.tumblr.com/blogname/123456789"Create a configuration file at ~/.config/yt-dlp/config:
# yt-dlp configuration for Tumblr downloads
-o "%(uploader)s/%(title)s.%(ext)s"
--format "best[ext=mp4]/best"
--write-info-json
--write-thumbnail
--embed-metadata
--retries 5
--fragment-retries 5
--rate-limit 2M
--user-agent "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"# Analyze video stream details
ffprobe -v quiet -print_format json -show_format -show_streams \
"https://va.media.tumblr.com/tumblr_{video_id}.mp4"
# Get duration only
ffprobe -v quiet -show_entries format=duration -of csv="p=0" "video.mp4"
# Get codec information
ffprobe -v quiet -select_streams v:0 \
-show_entries stream=codec_name,width,height,bit_rate \
-of csv="s=x:p=0" "video.mp4"
# Full stream report
ffprobe -v error -show_streams -show_format "video.mp4"#!/bin/bash
analyze_tumblr_video() {
local url="$1"
echo "=== Tumblr Video Analysis ==="
echo "URL: $url"
echo
# Get format info
echo "Format Information:"
ffprobe -v quiet -print_format json -show_format "$url" | jq '{
format_name: .format.format_name,
duration: .format.duration,
size: .format.size,
bit_rate: .format.bit_rate
}'
echo
echo "Video Stream:"
ffprobe -v quiet -print_format json -show_streams -select_streams v:0 "$url" | jq '.streams[0] | {
codec_name: .codec_name,
width: .width,
height: .height,
bit_rate: .bit_rate,
frame_rate: .r_frame_rate
}'
echo
echo "Audio Stream:"
ffprobe -v quiet -print_format json -show_streams -select_streams a:0 "$url" | jq '.streams[0] | {
codec_name: .codec_name,
sample_rate: .sample_rate,
channels: .channels,
bit_rate: .bit_rate
}'
}# Download MP4 directly
ffmpeg -i "https://va.media.tumblr.com/tumblr_{video_id}.mp4" -c copy output.mp4
# Download with headers
ffmpeg -headers "User-Agent: Mozilla/5.0\r\n" \
-i "https://va.media.tumblr.com/tumblr_{video_id}.mp4" \
-c copy output.mp4
# Download with timeout
ffmpeg -timeout 30000000 \
-i "https://va.media.tumblr.com/tumblr_{video_id}.mp4" \
-c copy output.mp4# Download HLS stream
ffmpeg -i "https://example.tumblr.com/video.m3u8" -c copy output.mp4
# With custom headers for restricted streams
ffmpeg -headers "Referer: https://www.tumblr.com/\r\nUser-Agent: Mozilla/5.0\r\n" \
-i "https://example.tumblr.com/video.m3u8" \
-c copy output.mp4# Convert to different format
ffmpeg -i input.mp4 -c:v libx264 -crf 23 -c:a aac -b:a 128k output.mp4
# Convert WebM to MP4 (if needed)
ffmpeg -i input.webm -c:v libx264 -c:a aac output.mp4
# Create web-optimized MP4
ffmpeg -i input.mp4 -c:v libx264 -crf 23 -c:a aac -movflags +faststart web_optimized.mp4# Compress for smaller file size
ffmpeg -i input.mp4 -c:v libx264 -crf 28 -c:a aac -b:a 96k compressed.mp4
# High quality re-encode
ffmpeg -i input.mp4 -c:v libx264 -preset slow -crf 18 -c:a aac -b:a 192k hq_output.mp4
# Scale to specific resolution
ffmpeg -i input.mp4 -vf "scale=720:-1" -c:v libx264 -c:a copy scaled_720p.mp4# Extract audio as MP3
ffmpeg -i input.mp4 -vn -c:a libmp3lame -q:a 2 audio.mp3
# Extract audio as AAC
ffmpeg -i input.mp4 -vn -c:a aac -b:a 192k audio.m4a
# Extract audio as WAV
ffmpeg -i input.mp4 -vn -c:a pcm_s16le audio.wav#!/bin/bash
# Batch convert Tumblr videos
batch_convert() {
local input_dir="$1"
local output_dir="$2"
mkdir -p "$output_dir"
for file in "$input_dir"/*.mp4; do
if [[ -f "$file" ]]; then
filename=$(basename "$file" .mp4)
echo "Processing: $filename"
ffmpeg -i "$file" \
-c:v libx264 -crf 23 \
-c:a aac -b:a 128k \
-movflags +faststart \
"$output_dir/${filename}_processed.mp4"
fi
done
}
# Usage: batch_convert ./raw_videos ./processed_videosGallery-dl is an excellent alternative for downloading media from Tumblr, especially for batch operations.
# Install via pip
pip install -U gallery-dl
# Or use pipx for isolated environment
pipx install gallery-dl# Download from Tumblr blog
gallery-dl "https://blogname.tumblr.com"
# Download specific post
gallery-dl "https://blogname.tumblr.com/post/123456789"
# Download with custom output
gallery-dl -D ./tumblr_downloads "https://blogname.tumblr.com"Create configuration at ~/.config/gallery-dl/config.json:
{
"extractor": {
"tumblr": {
"api-key": "<your-oauth-consumer-key>",
"api-secret": "<your-secret-key>",
"access-token": "<your-access-token>",
"access-token-secret": "<your-access-token-secret>",
"filename": "{blog[name]}_{id}_{filename}.{extension}",
"directory": ["tumblr", "{blog[name]}"],
"posts": "all",
"reblogs": true,
"external": false
}
},
"downloader": {
"rate": "2M",
"retries": 5
}
}# Get OAuth tokens interactively
gallery-dl oauth:tumblr
# This will open browser for authorization
# Copy the provided tokens to your config fileTumblThree is a dedicated Windows application for Tumblr blog backup.
- Bulk blog backup
- Video, image, and text support
- Queue management
- Resume interrupted downloads
- NSFW content support
- Metadata preservation
- Download from GitHub Releases
- Extract and run (portable application)
# TumblThree is primarily GUI-based
# For automation, use gallery-dl or yt-dlp instead# Clone repository
git clone https://github.com/Thesharing/tumblr-downloader.git
cd tumblr-downloader
# Install dependencies
pip install -r requirements.txt
# Configure API credentials in config file
# Run downloader
python tumblr-downloader.py --blog blogname# Using wget
wget -O "tumblr_video.mp4" "https://va.media.tumblr.com/tumblr_{video_id}.mp4"
# Using curl
curl -L -o "tumblr_video.mp4" "https://va.media.tumblr.com/tumblr_{video_id}.mp4"
# With custom headers
curl -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36" \
-H "Referer: https://www.tumblr.com/" \
-o "tumblr_video.mp4" \
"https://va.media.tumblr.com/tumblr_{video_id}.mp4"#!/bin/bash
# Download Tumblr videos with fallback
download_tumblr_video() {
local video_url="$1"
local output_file="${2:-tumblr_video.mp4}"
# Try wget first
if wget -q --spider "$video_url"; then
echo "Downloading with wget..."
wget -O "$output_file" "$video_url"
return $?
fi
# Fallback to curl
echo "Trying curl..."
curl -L -o "$output_file" "$video_url"
return $?
}
# Usage: download_tumblr_video "https://va.media.tumblr.com/..." "output.mp4"For manual or occasional downloads:
| Extension | Browser | Features |
|---|---|---|
| Video DownloadHelper | Firefox/Chrome | General video detection |
| SaveFrom.net Helper | Firefox/Chrome | Multiple site support |
| Video Downloader Plus | Chrome | Tumblr video support |
Tumblr provides official API v2 for programmatic access to posts and media.
# Base URL
https://api.tumblr.com/v2/
# Blog posts endpoint
https://api.tumblr.com/v2/blog/{blog-identifier}/posts
# Specific post endpoint
https://api.tumblr.com/v2/blog/{blog-identifier}/posts/{post-id}
- OAuth 1.0a: Required for most endpoints
- API Key: Available for public content access
- Rate Limits: 1,000 requests/hour (new apps), 5,000 requests/day
{
"type": "video",
"media": [
{
"type": "video/mp4",
"url": "https://va.media.tumblr.com/tumblr_xyz.mp4",
"width": 640,
"height": 480,
"hd": true
}
],
"poster": [
{
"type": "image/jpeg",
"url": "https://64.media.tumblr.com/poster.jpg",
"width": 640,
"height": 480
}
],
"filmstrip": {...},
"attribution": {...}
}{
"type": "video",
"provider": "youtube",
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"embed_html": "<iframe...>",
"embed_url": "https://www.youtube.com/embed/dQw4w9WgXcQ"
}import requests
def get_tumblr_post(blog_name, post_id, api_key):
"""
Fetch a Tumblr post and extract video URLs
"""
url = f"https://api.tumblr.com/v2/blog/{blog_name}/posts/{post_id}"
params = {
"api_key": api_key,
"npf": True # Request NPF format
}
response = requests.get(url, params=params)
if response.status_code == 200:
return response.json()
else:
raise Exception(f"API request failed: {response.status_code}")
def extract_video_urls(post_data):
"""
Extract video URLs from NPF content blocks
"""
video_urls = []
for block in post_data.get("content", []):
if block.get("type") == "video":
# Native Tumblr video
for media in block.get("media", []):
if "url" in media:
video_urls.append({
"url": media["url"],
"type": media.get("type", "video/mp4"),
"width": media.get("width"),
"height": media.get("height"),
"native": True
})
# External embed
if "embed_url" in block:
video_urls.append({
"url": block["embed_url"],
"provider": block.get("provider"),
"native": False
})
return video_urlsimport requests
import os
def download_tumblr_video(video_url, output_path, headers=None):
"""
Download video from direct URL
"""
if headers is None:
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
}
response = requests.get(video_url, headers=headers, stream=True)
response.raise_for_status()
with open(output_path, "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
if chunk:
f.write(chunk)
return output_path
def download_blog_videos(blog_name, api_key, output_dir="./downloads"):
"""
Download all videos from a Tumblr blog
"""
os.makedirs(output_dir, exist_ok=True)
# Get posts (paginated)
offset = 0
limit = 20
while True:
url = f"https://api.tumblr.com/v2/blog/{blog_name}/posts"
params = {
"api_key": api_key,
"npf": True,
"type": "video",
"limit": limit,
"offset": offset
}
response = requests.get(url, params=params)
data = response.json()
posts = data.get("response", {}).get("posts", [])
if not posts:
break
for post in posts:
video_urls = extract_video_urls(post)
for i, video in enumerate(video_urls):
if video["native"]:
filename = f"{post['id']}_{i}.mp4"
output_path = os.path.join(output_dir, filename)
try:
download_tumblr_video(video["url"], output_path)
print(f"Downloaded: {filename}")
except Exception as e:
print(f"Failed to download {filename}: {e}")
offset += limitimport time
from functools import wraps
def rate_limit(calls_per_minute=30):
"""
Rate limiting decorator for API calls
"""
min_interval = 60.0 / calls_per_minute
last_call = [0.0]
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
elapsed = time.time() - last_call[0]
wait_time = min_interval - elapsed
if wait_time > 0:
time.sleep(wait_time)
result = func(*args, **kwargs)
last_call[0] = time.time()
return result
return wrapper
return decorator
@rate_limit(calls_per_minute=30)
def api_request(url, params):
"""Rate-limited API request"""
return requests.get(url, params=params)Use a sequential approach with fallbacks:
#!/bin/bash
download_tumblr() {
local url="$1"
local output_dir="${2:-./downloads}"
echo "Attempting download of: $url"
# Method 1: yt-dlp (primary - handles most cases)
if yt-dlp --ignore-errors -o "$output_dir/%(title)s.%(ext)s" "$url"; then
echo "✓ Success with yt-dlp"
return 0
fi
# Method 2: Extract direct video URL and use ffmpeg
echo "Trying direct video extraction..."
video_url=$(curl -s "$url" | grep -oE "https://va\.media\.tumblr\.com/[^\"']+\.mp4" | head -1)
if [ -n "$video_url" ]; then
output_file="$output_dir/tumblr_video_$(date +%s).mp4"
if ffmpeg -i "$video_url" -c copy "$output_file"; then
echo "✓ Success with ffmpeg"
return 0
fi
fi
# Method 3: gallery-dl as fallback
if gallery-dl -D "$output_dir" "$url"; then
echo "✓ Success with gallery-dl"
return 0
fi
# Method 4: Direct wget/curl
if [ -n "$video_url" ]; then
if wget -O "$output_dir/tumblr_video.mp4" "$video_url"; then
echo "✓ Success with wget"
return 0
fi
fi
echo "✗ All methods failed"
return 1
}# Check available formats first
check_formats() {
local url="$1"
echo "Checking available formats..."
yt-dlp -F "$url"
}
# Download with quality preference
download_with_quality() {
local url="$1"
local max_quality="${2:-720}"
local output_dir="${3:-./downloads}"
echo "Downloading with max quality: ${max_quality}p"
yt-dlp \
-f "best[height<=$max_quality][ext=mp4]/best[height<=$max_quality]/best" \
-o "$output_dir/%(uploader)s - %(title)s.%(ext)s" \
"$url"
}#!/bin/bash
download_with_retry() {
local url="$1"
local max_retries=3
local delay=5
for attempt in $(seq 1 $max_retries); do
echo "Attempt $attempt of $max_retries..."
if yt-dlp --retries 2 --socket-timeout 30 "$url"; then
echo "✓ Download successful"
return 0
fi
if [ $attempt -lt $max_retries ]; then
echo "Waiting ${delay}s before retry..."
sleep $delay
delay=$((delay * 2)) # Exponential backoff
fi
done
echo "✗ All retries failed"
return 1
}
# Check URL accessibility
check_url_access() {
local url="$1"
# Test with curl
status_code=$(curl -o /dev/null -s -w "%{http_code}" "$url")
if [ "$status_code" = "200" ]; then
echo "✓ URL accessible (HTTP $status_code)"
return 0
else
echo "✗ URL not accessible (HTTP $status_code)"
return 1
fi
}#!/bin/bash
# Setup logging
setup_logging() {
local log_dir="${1:-./logs}"
mkdir -p "$log_dir"
export DOWNLOAD_LOG="$log_dir/downloads_$(date +%Y%m%d).log"
export ERROR_LOG="$log_dir/errors_$(date +%Y%m%d).log"
}
# Log download activity
log_download() {
local status="$1"
local url="$2"
local message="$3"
local timestamp=$(date '+%Y-%m-%d %H:%M:%S')
echo "[$timestamp] [$status] $url - $message" >> "$DOWNLOAD_LOG"
if [ "$status" = "ERROR" ]; then
echo "[$timestamp] $url - $message" >> "$ERROR_LOG"
fi
}
# Generate download report
generate_report() {
echo "=== Tumblr Download Report ==="
echo "Date: $(date)"
echo ""
echo "Downloads:"
grep -c "\[SUCCESS\]" "$DOWNLOAD_LOG" 2>/dev/null || echo "0"
echo ""
echo "Errors:"
grep -c "\[ERROR\]" "$DOWNLOAD_LOG" 2>/dev/null || echo "0"
echo ""
echo "Recent errors:"
tail -10 "$ERROR_LOG" 2>/dev/null
}# Parallel downloads using GNU parallel
batch_download_parallel() {
local url_file="$1"
local max_jobs="${2:-4}"
local output_dir="${3:-./downloads}"
mkdir -p "$output_dir"
# Export function for parallel
export -f download_single_video
export output_dir
parallel -j $max_jobs download_single_video {} "$output_dir" :::: "$url_file"
}
download_single_video() {
local url="$1"
local output_dir="$2"
yt-dlp \
--output "$output_dir/%(title)s.%(ext)s" \
--format "best[ext=mp4]/best" \
--rate-limit 1M \
"$url"
}# Private blog access - use cookies
yt-dlp --cookies-from-browser firefox "https://www.tumblr.com/privateblog/123456"
# Or export cookies to file
yt-dlp --cookies cookies.txt "https://www.tumblr.com/privateblog/123456"
# Test with different user agents
yt-dlp --user-agent "Mozilla/5.0 (iPhone; CPU iPhone OS 14_0 like Mac OS X)" \
"https://www.tumblr.com/blogname/123456"# Implement rate limiting in downloads
yt-dlp \
--rate-limit 500K \
--sleep-interval 3 \
--sleep-requests 1 \
"https://www.tumblr.com/blogname/123456"
# Batch with delays
for url in $(cat urls.txt); do
yt-dlp "$url"
sleep 5 # 5 second delay between downloads
done# For YouTube/Vimeo embeds in Tumblr posts
# yt-dlp typically handles these automatically
# If embed detection fails, extract embed URL manually
embed_url=$(curl -s "https://www.tumblr.com/blogname/123456" | \
grep -oE "youtube\.com/embed/[a-zA-Z0-9_-]+" | head -1)
if [ -n "$embed_url" ]; then
yt-dlp "https://$embed_url"
fi# Verify downloaded video integrity
verify_video() {
local file="$1"
# Check with ffprobe
if ffprobe -v error -select_streams v:0 \
-show_entries stream=codec_name \
-of csv="p=0" "$file" > /dev/null 2>&1; then
echo "✓ Video file is valid: $file"
return 0
else
echo "✗ Video file may be corrupted: $file"
return 1
fi
}
# Repair corrupted video
repair_video() {
local input="$1"
local output="${2:-repaired_${input}}"
ffmpeg -err_detect ignore_err -i "$input" -c copy "$output"
}# Check for audio stream
check_audio() {
local file="$1"
audio_streams=$(ffprobe -v error -select_streams a \
-show_entries stream=codec_type \
-of csv="p=0" "$file" | wc -l)
if [ "$audio_streams" -gt 0 ]; then
echo "Audio streams found: $audio_streams"
else
echo "No audio stream in file"
fi
}# Reblogged videos may have different source URLs
# Use yt-dlp's --dump-json to inspect
yt-dlp --dump-json "https://www.tumblr.com/blogname/123456" | \
jq '.formats[] | {format_id, url, ext}'# Check if content is still available
check_availability() {
local url="$1"
response=$(yt-dlp --dump-json "$url" 2>&1)
if echo "$response" | grep -q "ERROR"; then
echo "Content not available: $url"
echo "Error: $response"
return 1
else
echo "Content available: $url"
return 0
fi
}# Some NSFW content may require authentication
# Use browser cookies for access
# Extract cookies from browser
yt-dlp --cookies-from-browser chrome "https://www.tumblr.com/nsfwblog/123456"
# Or use logged-in session cookies
yt-dlp --cookies tumblr_cookies.txt "https://www.tumblr.com/nsfwblog/123456"#!/bin/bash
# Comprehensive diagnostic for Tumblr URL
diagnose_tumblr_url() {
local url="$1"
echo "=== Tumblr URL Diagnostics ==="
echo "URL: $url"
echo
# Test URL accessibility
echo "1. URL Accessibility:"
curl -sI "$url" | head -5
echo
# Check for video content
echo "2. Video Detection:"
yt-dlp -F "$url" 2>&1 | head -20
echo
# Extract video info
echo "3. Video Information:"
yt-dlp --dump-json "$url" 2>&1 | jq '{
title: .title,
uploader: .uploader,
duration: .duration,
format: .format,
url: .url
}' 2>/dev/null || echo "Could not extract video info"
echo
# Check for direct video URLs
echo "4. Direct Video URLs:"
curl -s "$url" | grep -oE "https://va\.media\.tumblr\.com/[^\"']+\.mp4" | head -5
}This research has provided a comprehensive analysis of Tumblr's video delivery infrastructure, revealing a straightforward MP4-based hosting system through their dedicated CDN (va.media.tumblr.com). Unlike more complex streaming platforms, Tumblr primarily serves direct MP4 files, making downloads relatively straightforward with proper tools.
Key Technical Findings:
- Tumblr hosts native videos on
va.media.tumblr.comas direct MP4 files - Videos are encoded in H.264/AAC format with typical quality up to 720p
- External embeds from YouTube, Vimeo, etc. are handled separately via iframes
- The Tumblr API v2 with NPF format provides programmatic access to video URLs
- Subtitle/caption files are available via
vtt.tumblr.comin WebVTT format
Based on our research, we recommend a hierarchical download strategy:
- Primary Method: yt-dlp for standard Tumblr post URLs (90%+ success rate)
- Secondary Method: Direct video URL extraction + ffmpeg/wget
- Tertiary Method: gallery-dl with API authentication
- API Method: Tumblr API v2 for programmatic/bulk downloads
Essential Tools:
| Tool | Purpose | Priority |
|---|---|---|
| yt-dlp | Primary download tool | Required |
| ffmpeg | Stream processing, conversion | Required |
| curl/wget | Direct downloads, testing | Required |
Recommended Backup Tools:
| Tool | Purpose | Use Case |
|---|---|---|
| gallery-dl | Bulk blog downloads | Large archives |
| TumblThree | Windows GUI backup | Non-technical users |
| Python scripts | Custom automation | API integration |
- Concurrent Downloads: 3-4 simultaneous downloads recommended
- Rate Limiting: 30 requests/minute to avoid blocks
- Retry Logic: Exponential backoff with 3-5 retries
- Quality Selection: 720p provides best balance for most content
Important Considerations:
- Respect Tumblr's Terms of Service and usage policies
- Implement appropriate rate limiting
- Only download content you have rights to access
- Consider privacy implications when archiving content
- Ensure compliance with applicable copyright laws
Areas for Continued Development:
- API Changes: Monitor Tumblr API v2 for updates or deprecations
- CDN Updates: Watch for new video hosting patterns
- Tool Updates: Keep yt-dlp and gallery-dl updated regularly
- Authentication: Prepare for potential OAuth requirement changes
| Frequency | Task |
|---|---|
| Weekly | Update yt-dlp (yt-dlp -U) |
| Monthly | Test URL patterns and CDN endpoints |
| Quarterly | Review API authentication and limits |
| Annually | Comprehensive strategy review |
Disclaimer: This research is provided for educational and legitimate archival purposes. Users must comply with applicable terms of service, copyright laws, and data protection regulations when implementing these techniques.
Last Updated: December 2024
Research Version: 1.0
Next Review: March 2025