Skip to content

Instantly share code, notes, and snippets.

@Parables
Created November 30, 2025 07:07
Show Gist options
  • Select an option

  • Save Parables/2fe35a5352f98383026f11dca5ae3ec7 to your computer and use it in GitHub Desktop.

Select an option

Save Parables/2fe35a5352f98383026f11dca5ae3ec7 to your computer and use it in GitHub Desktop.
Automated Postgres backups

🛡️ PostgreSQL Backup System v2.0

Enterprise-grade, configuration-driven PostgreSQL backup and disaster recovery system

✨ Features

  • 🔐 Zero Hardcoded Credentials - All configuration via files
  • 📁 Flexible Configuration - Config file OR environment variables
  • 🔒 AES-256 Encryption - GPG encrypted backups
  • ☁️ Multi-Provider S3 - Backblaze, Tebi, Hetzner, AWS, Custom
  • 🔔 Smart Notifications - Telegram, Email, or Both
  • Automated Verification - Weekly restore testing
  • 📊 Comprehensive Monitoring - Real-time health checks
  • 📖 Complete DR Runbook - Step-by-step recovery procedures
  • 🎯 Point-in-Time Recovery - Restore to any moment
  • 🚀 One-Command Setup - Interactive configuration wizard

🚀 Quick Start (3 Steps)

# 1. Configure (interactive wizard)
sudo pg-backup-configure

# 2. Install system
sudo pg-backup-install

# 3. Verify it works
sudo -u postgres pg-backup-test

That's it! Your database is now protected.


📂 File Structure

/etc/pg_backup/
├── backup.conf          # Main configuration (safe to commit)
├── .secrets             # Sensitive credentials (NEVER commit)
└── .env -> backup.conf  # Symlink for environment loading

/usr/local/bin/pg_backup/
├── lib/
│   └── common.sh       # Shared functions library
├── archive_wal.sh      # WAL archiving script
├── restore_wal.sh      # WAL restore script
├── base_backup.sh      # Base backup script
├── backup_monitor.sh   # Health monitoring script
└── backup_verify.sh    # Automated verification script

/var/log/postgresql/
├── wal_archive.log     # WAL archiving logs
├── wal_restore.log     # WAL restore logs
├── base_backup.log     # Base backup logs
├── backup_monitor.log  # Monitoring logs
└── backup_verify.log   # Verification logs

🔧 Configuration Options

Method 1: Interactive Wizard (Recommended)

sudo pg-backup-configure

Asks you questions and creates configuration automatically.

Method 2: Manual Configuration

Copy and edit the template:

sudo cp /usr/share/doc/pg_backup/backup.conf.template /etc/pg_backup/backup.conf
sudo nano /etc/pg_backup/backup.conf

Method 3: Environment Variables

export S3_PROVIDER="backblaze"
export S3_BUCKET="my-backups"
export AWS_PROFILE="backups"
export GPG_ENCRYPT="yes"
export GPG_RECIPIENT="[email protected]"
# ... etc

pg-backup-install --from-env

🔐 Security Best Practices

1. Configuration File Permissions

# Main config (can contain non-sensitive data)
chmod 640 /etc/pg_backup/backup.conf
chown postgres:postgres /etc/pg_backup/backup.conf

# Secrets file (contains credentials)
chmod 600 /etc/pg_backup/.secrets
chown postgres:postgres /etc/pg_backup/.secrets

2. Git Repository Safety

Add to .gitignore:

# PostgreSQL Backup System
/etc/pg_backup/.secrets
/etc/pg_backup/*.key
/etc/pg_backup/*.pem
*.private.asc

3. GPG Key Management

# Export public key (safe to share)
sudo -u postgres gpg --export -a [email protected] > backup-public.asc

# Export private key (CRITICAL - keep offline)
sudo -u postgres gpg --export-secret-keys -a [email protected] > backup-private.asc
chmod 600 backup-private.asc

# Store private key in:
# - Hardware security module (HSM)
# - Encrypted USB drive
# - Password manager vault
# - Bank safety deposit box

# Then SECURELY DELETE from server
shred -u backup-private.asc

4. AWS Credentials

Option A: Use AWS CLI Profile (Recommended)

sudo -u postgres aws configure --profile backups
# Credentials stored in /var/lib/postgresql/.aws/credentials
# File is readable only by postgres user

Option B: Use IAM Role (Best for EC2/ECS)

# No credentials needed in config!
# Set S3_PROVIDER and S3_BUCKET only
AWS_PROFILE=""  # Leave empty

Option C: Store in .secrets file

# /etc/pg_backup/.secrets
AWS_ACCESS_KEY_ID="your-key-id"
AWS_SECRET_ACCESS_KEY="your-secret-key"

📋 Configuration Reference

S3 Configuration

# Provider (auto-configures endpoint)
S3_PROVIDER="backblaze"  # backblaze, tebi, hetzner, aws, custom

# Region (provider-specific)
S3_REGION="us-west-000"

# Bucket and paths
S3_BUCKET="my-postgres-backups"
S3_BASE_PREFIX="base-backups"      # Base backup folder
S3_WAL_PREFIX="wal-archives"       # WAL archive folder

# Custom endpoint (for S3_PROVIDER="custom")
S3_ENDPOINT="https://s3.example.com"

Encryption Configuration

# Enable/disable encryption
GPG_ENCRYPT="yes"  # yes or no

# GPG recipient (email or key ID)
GPG_RECIPIENT="[email protected]"

PostgreSQL Configuration

PGHOST="localhost"
PGPORT="5432"
PGUSER="postgres"
PGDATABASE="postgres"
PG_DATA_DIR="/var/lib/postgresql/14/main"
PG_VERSION="14"

Backup Configuration

# Local WAL caching
KEEP_LOCAL_WAL_COPY="no"              # yes or no
LOCAL_WAL_DIR="/var/lib/postgresql/wal_archive"
LOCAL_WAL_RETENTION_DAYS="7"

# Base backup retention
BASE_BACKUP_RETENTION_DAYS="30"
BASE_BACKUP_COMPRESS="yes"
BASE_BACKUP_PARALLEL_JOBS="4"
BASE_BACKUP_MAX_RATE_MB="0"  # 0 = unlimited

Monitoring Configuration

# Thresholds
MAX_WAL_LAG_SECONDS="300"
MAX_ARCHIVE_FAILURES="3"
MIN_DISK_SPACE_GB="50"
MAX_BASE_BACKUP_AGE_DAYS="7"
MAX_ARCHIVE_GAP_MINUTES="60"

# Alert settings
ALERT_COOLDOWN_SECONDS="3600"  # Don't spam alerts

Notification Configuration

# Enable notifications
ENABLE_NOTIFICATIONS="yes"

# Method
NOTIFICATION_METHOD="telegram"  # telegram, email, both, none

# Telegram
TELEGRAM_BOT_TOKEN="123456:ABC..."
TELEGRAM_CHAT_ID="123456789"

# Email
EMAIL_TO="[email protected]"
EMAIL_FROM="[email protected]"
SMTP_SERVER="smtp.gmail.com:587"
SMTP_USER="[email protected]"
SMTP_PASSWORD="your-app-password"

Verification Configuration

# Automated restore testing
ENABLE_BACKUP_VERIFICATION="yes"
VERIFY_TEST_PORT="5433"
VERIFY_TEST_DATA_DIR="/var/lib/postgresql/backup_verify_test"
VERIFY_MAX_DURATION_SECONDS="7200"
VERIFY_CLEANUP_ON_SUCCESS="yes"
VERIFY_CLEANUP_ON_FAILURE="no"  # Keep for troubleshooting

# Tests to run
VERIFY_RUN_CHECKSUM_TESTS="yes"
VERIFY_RUN_QUERY_TESTS="yes"
VERIFY_RUN_CONSISTENCY_TESTS="yes"

# Tables to verify (space-separated)
VERIFY_SAMPLE_TABLES="users orders products"

🔄 Updating Configuration

Without Restart

Most settings can be changed without restarting PostgreSQL:

# Edit configuration
sudo nano /etc/pg_backup/backup.conf

# Configuration is automatically reloaded on next operation
# No restart needed!

Settings Requiring PostgreSQL Restart

Only these PostgreSQL settings require restart:

# In postgresql.conf
archive_mode = on
archive_command = '/usr/local/bin/pg_backup/archive_wal.sh %p %f'

After changing these:

sudo systemctl restart postgresql

🧪 Testing Your Configuration

Test Individual Components

# Test S3 connection
pg-backup-test --s3

# Test GPG encryption
pg-backup-test --gpg

# Test notifications
pg-backup-test --notify

# Test WAL archiving
pg-backup-test --archive

# Test restore
pg-backup-test --restore

# Run all tests
pg-backup-test --all

Manual Test

# 1. Force WAL switch
sudo -u postgres psql -c "SELECT pg_switch_wal();"

# 2. Check archiving status
sudo -u postgres psql -c "SELECT * FROM pg_stat_archiver;"

# 3. Verify file in S3
aws s3 ls s3://your-bucket/wal-archives/$(hostname)/ \
    --profile backups \
    --endpoint-url https://... \
    | tail -5

# 4. Test restore
RECENT_WAL=$(aws s3 ls ... | tail -1 | awk '{print $4}')
sudo -u postgres /usr/local/bin/pg_backup/restore_wal.sh \
    "${RECENT_WAL%.gpg}" \
    "/tmp/test_restore"

📊 Monitoring

Check System Status

# Quick status check
pg-backup-status

# Detailed health report
pg-backup-health

# View recent logs
pg-backup-logs

# Check last verification
pg-backup-verify-status

Cron Schedule

Default schedule (automatically configured):

# Every 15 minutes - Health monitoring
*/15 * * * * /usr/local/bin/pg_backup/backup_monitor.sh

# Sunday 2 AM - Full base backup
0 2 * * 0 /usr/local/bin/pg_backup/base_backup.sh

# Monday 3 AM - Verify last week's backup
0 3 * * 1 /usr/local/bin/pg_backup/backup_verify.sh

Metrics & Dashboards

Export metrics to Prometheus/Grafana:

# Enable metrics export
echo "ENABLE_METRICS_EXPORT=yes" >> /etc/pg_backup/backup.conf
echo "METRICS_PORT=9187" >> /etc/pg_backup/backup.conf

# Restart metrics exporter
sudo systemctl restart pg-backup-metrics

🚨 Disaster Recovery

Quick Recovery Commands

# 1. Stop PostgreSQL
sudo systemctl stop postgresql

# 2. Run recovery wizard (interactive)
sudo pg-backup-recover

# 3. Or use manual procedure
sudo pg-backup-recover --manual

Recovery Scenarios

Full Recovery (Complete Data Loss):

pg-backup-recover --type=full --backup=latest

Point-in-Time Recovery:

pg-backup-recover --type=pitr --time='2024-01-15 14:30:00'

Test Recovery (Non-Destructive):

pg-backup-recover --type=test --port=5433

See full DR runbook: /usr/share/doc/pg_backup/disaster-recovery.md


🔧 Troubleshooting

Archive Command Failing

# Check logs
sudo tail -f /var/log/postgresql/wal_archive.log

# Test manually
sudo -u postgres /usr/local/bin/pg_backup/archive_wal.sh \
    /path/to/wal/file \
    filename

# Check configuration
pg-backup-test --config

S3 Connection Issues

# Verify credentials
sudo -u postgres aws s3 ls --profile backups

# Check endpoint
curl -I https://your-s3-endpoint.com

# Test with debug
DEBUG_MODE=yes pg-backup-test --s3

GPG Errors

# List available keys
sudo -u postgres gpg --list-keys

# Re-import public key
sudo -u postgres gpg --import /path/to/public-key.asc

# Verify key trust
sudo -u postgres gpg --edit-key [email protected]
gpg> trust
gpg> 5 (ultimate)
gpg> quit

Notifications Not Working

# Test Telegram
curl -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
    -d "chat_id=${TELEGRAM_CHAT_ID}" \
    -d "text=Test"

# Test Email
echo "Test" | mail -s "Test" [email protected]

# Check notification settings
pg-backup-test --notify

🔄 Migration & Upgrades

Upgrading from v1.0

# Backup old configuration
sudo cp -r /usr/local/bin/pg_backup /usr/local/bin/pg_backup.v1.backup

# Run migration tool
sudo pg-backup-migrate --from=v1

# Test new configuration
pg-backup-test --all

Migrating Between Servers

# On old server - export configuration
pg-backup-export --output=/tmp/backup-config.tar.gz

# Copy to new server
scp /tmp/backup-config.tar.gz newserver:/tmp/

# On new server - import configuration
sudo pg-backup-import --input=/tmp/backup-config.tar.gz

# Verify and test
pg-backup-test --all

📚 Advanced Usage

Custom Scripts

Create custom hooks:

# /etc/pg_backup/hooks/pre-backup.sh
#!/bin/bash
# Runs before each base backup
echo "Pre-backup hook executed"

# /etc/pg_backup/hooks/post-backup.sh
#!/bin/bash
# Runs after successful backup
echo "Backup completed at $(date)"

Enable in config:

ENABLE_HOOKS="yes"
HOOK_DIR="/etc/pg_backup/hooks"

Multiple Databases

# Create separate configs for each database
sudo pg-backup-configure --output=/etc/pg_backup/db1.conf
sudo pg-backup-configure --output=/etc/pg_backup/db2.conf

# Use specific config
PG_BACKUP_CONFIG=/etc/pg_backup/db1.conf pg-backup-test --all

Encryption Key Rotation

# Generate new GPG key
sudo -u postgres gpg --gen-key

# Re-encrypt existing backups (use with caution!)
pg-backup-reencrypt [email protected] [email protected]

# Update configuration
sudo nano /etc/pg_backup/backup.conf
# Change GPG_RECIPIENT to new key

🤝 Contributing

Found a bug? Have a feature request?

  1. Check existing issues
  2. Create detailed bug report
  3. Submit pull request with tests

📄 License

MIT License - See LICENSE file


🆘 Support


⚡ Performance Tuning

High-Volume Databases

# Increase parallel streams
BASE_BACKUP_PARALLEL_JOBS="8"

# Enable WAL compression
# Add to postgresql.conf:
wal_compression = on

# Adjust checkpoint settings
checkpoint_timeout = 30min
max_wal_size = 5GB

Low-Volume Databases

# Reduce resources
BASE_BACKUP_PARALLEL_JOBS="2"
archive_timeout = 300  # 5 minutes

Network Optimization

# Rate limit uploads (MB/s)
BASE_BACKUP_MAX_RATE_MB="50"

# Compress before upload
BASE_BACKUP_COMPRESS="yes"

# Use multipart upload for large files
AWS_CLI_FILE_SIZE_MB="100"  # Automatic multipart above this

📈 Monitoring Integration

Prometheus Metrics

Exposed at :9187/metrics:

pg_backup_last_archive_seconds
pg_backup_failed_archives_total
pg_backup_last_base_backup_seconds
pg_backup_archive_size_bytes
pg_backup_verification_status

Grafana Dashboard

Import dashboard: pg-backup-dashboard.json

Alertmanager Rules

groups:
  - name: pg_backup
    rules:
      - alert: BackupArchiveFailing
        expr: pg_backup_failed_archives_total > 3
        for: 15m
      - alert: BackupVerificationFailed
        expr: pg_backup_verification_status == 0
        for: 1h

Version: 2.0.0
Last Updated: 2024-01-15
Tested On: PostgreSQL 12, 13, 14, 15, 16

#!/bin/bash
#
# PostgreSQL WAL Archive Script with S3 Upload & Encryption
# Usage: archive_wal.sh %p %f
#
# This script:
# 1. Encrypts WAL files using GPG
# 2. Uploads to S3-compatible storage (Backblaze/Tebi/Hetzner)
# 3. Optionally keeps local encrypted copy if space permits
# 4. Sends notifications on failures
# 5. Validates integrity with checksums
set -euo pipefail
# ==================== CONFIGURATION ====================
# WAL file parameters (passed by PostgreSQL)
WAL_PATH="${1}" # %p - full path to WAL file
WAL_FILE="${2}" # %f - filename only
# S3 Configuration (choose one provider)
S3_ENDPOINT="https://s3.us-west-000.backblazeb2.com" # Backblaze
# S3_ENDPOINT="https://s3.tebi.io" # Tebi
# S3_ENDPOINT="https://fsn1.your-objectstorage.com" # Hetzner
S3_BUCKET="your-postgres-backups"
S3_REGION="us-west-000"
S3_PREFIX="wal-archives/$(hostname)"
# AWS CLI profile (configure with: aws configure --profile backups)
AWS_PROFILE="backups"
# Encryption Configuration
GPG_RECIPIENT="[email protected]" # GPG key email/ID
GPG_ENCRYPT="yes" # Set to "no" to disable encryption
# Local storage configuration
LOCAL_ARCHIVE_DIR="/var/lib/postgresql/wal_archive"
KEEP_LOCAL_COPY="no" # Set to "yes" if you have space and want redundancy
LOCAL_RETENTION_DAYS=7 # Only if KEEP_LOCAL_COPY=yes
# Notification Configuration
ENABLE_NOTIFICATIONS="yes"
NOTIFICATION_METHOD="telegram" # "telegram", "email", or "both"
# Telegram settings
TELEGRAM_BOT_TOKEN="YOUR_BOT_TOKEN"
TELEGRAM_CHAT_ID="YOUR_CHAT_ID"
# Email settings
EMAIL_TO="[email protected]"
EMAIL_FROM="[email protected]"
SMTP_SERVER="smtp.gmail.com:587"
# Logging
LOG_FILE="/var/log/postgresql/wal_archive.log"
MAX_LOG_SIZE=104857600 # 100MB
# ==================== FUNCTIONS ====================
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "${LOG_FILE}"
}
error_exit() {
log "ERROR: $1"
send_notification "❌ WAL Archive Failed" "$1\nFile: ${WAL_FILE}\nHost: $(hostname)"
exit 1
}
rotate_log() {
if [ -f "${LOG_FILE}" ] && [ "$(stat -f%z "${LOG_FILE}" 2>/dev/null || stat -c%s "${LOG_FILE}")" -gt "${MAX_LOG_SIZE}" ]; then
mv "${LOG_FILE}" "${LOG_FILE}.old"
gzip "${LOG_FILE}.old"
log "Log rotated"
fi
}
send_telegram() {
local title="$1"
local message="$2"
local text="${title}%0A%0A${message}"
curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
-d "chat_id=${TELEGRAM_CHAT_ID}" \
-d "text=${text}" \
-d "parse_mode=HTML" >/dev/null 2>&1 || true
}
send_email() {
local subject="$1"
local body="$2"
echo -e "${body}" | mail -s "${subject}" \
-a "From: ${EMAIL_FROM}" \
-S smtp="${SMTP_SERVER}" \
-S smtp-use-starttls \
"${EMAIL_TO}" >/dev/null 2>&1 || true
}
send_notification() {
[ "${ENABLE_NOTIFICATIONS}" != "yes" ] && return 0
local title="$1"
local message="$2"
case "${NOTIFICATION_METHOD}" in
telegram)
send_telegram "${title}" "${message}"
;;
email)
send_email "${title}" "${message}"
;;
both)
send_telegram "${title}" "${message}"
send_email "${title}" "${message}"
;;
esac
}
verify_prerequisites() {
# Check if WAL file exists
[ ! -f "${WAL_PATH}" ] && error_exit "WAL file not found: ${WAL_PATH}"
# Check required tools
command -v aws >/dev/null 2>&1 || error_exit "AWS CLI not installed"
if [ "${GPG_ENCRYPT}" = "yes" ]; then
command -v gpg >/dev/null 2>&1 || error_exit "GPG not installed"
gpg --list-keys "${GPG_RECIPIENT}" >/dev/null 2>&1 || error_exit "GPG key not found: ${GPG_RECIPIENT}"
fi
# Verify AWS credentials
aws configure get aws_access_key_id --profile "${AWS_PROFILE}" >/dev/null 2>&1 || \
error_exit "AWS profile not configured: ${AWS_PROFILE}"
}
create_checksum() {
local file="$1"
sha256sum "${file}" | awk '{print $1}'
}
encrypt_file() {
local input="$1"
local output="$2"
gpg --encrypt \
--recipient "${GPG_RECIPIENT}" \
--trust-model always \
--compress-algo ZLIB \
--cipher-algo AES256 \
--output "${output}" \
"${input}" || error_exit "Encryption failed"
}
upload_to_s3() {
local file="$1"
local s3_key="$2"
local checksum="$3"
# Upload with metadata
aws s3 cp "${file}" \
"s3://${S3_BUCKET}/${s3_key}" \
--profile "${AWS_PROFILE}" \
--endpoint-url "${S3_ENDPOINT}" \
--region "${S3_REGION}" \
--metadata "sha256=${checksum},original=${WAL_FILE},hostname=$(hostname)" \
--storage-class STANDARD \
--no-progress \
2>&1 | tee -a "${LOG_FILE}" || error_exit "S3 upload failed"
# Verify upload
aws s3 ls "s3://${S3_BUCKET}/${s3_key}" \
--profile "${AWS_PROFILE}" \
--endpoint-url "${S3_ENDPOINT}" \
--region "${S3_REGION}" >/dev/null 2>&1 || error_exit "S3 verification failed"
}
cleanup_old_local_files() {
[ "${KEEP_LOCAL_COPY}" != "yes" ] && return 0
find "${LOCAL_ARCHIVE_DIR}" -type f -name "*.gpg" -mtime "+${LOCAL_RETENTION_DAYS}" -delete 2>/dev/null || true
}
# ==================== MAIN PROCESS ====================
main() {
rotate_log
log "Starting WAL archive: ${WAL_FILE}"
# Verify environment
verify_prerequisites
# Create working directory
WORK_DIR=$(mktemp -d) || error_exit "Failed to create temp directory"
trap "rm -rf ${WORK_DIR}" EXIT
# Calculate original checksum
ORIGINAL_CHECKSUM=$(create_checksum "${WAL_PATH}")
log "Original checksum: ${ORIGINAL_CHECKSUM}"
# Encryption
if [ "${GPG_ENCRYPT}" = "yes" ]; then
ENCRYPTED_FILE="${WORK_DIR}/${WAL_FILE}.gpg"
log "Encrypting ${WAL_FILE}..."
encrypt_file "${WAL_PATH}" "${ENCRYPTED_FILE}"
UPLOAD_FILE="${ENCRYPTED_FILE}"
UPLOAD_CHECKSUM=$(create_checksum "${ENCRYPTED_FILE}")
S3_KEY="${S3_PREFIX}/${WAL_FILE}.gpg"
else
# No encryption - upload original
UPLOAD_FILE="${WAL_PATH}"
UPLOAD_CHECKSUM="${ORIGINAL_CHECKSUM}"
S3_KEY="${S3_PREFIX}/${WAL_FILE}"
fi
# Upload to S3
log "Uploading to S3: ${S3_KEY}"
upload_to_s3 "${UPLOAD_FILE}" "${S3_KEY}" "${UPLOAD_CHECKSUM}"
# Keep local copy if configured
if [ "${KEEP_LOCAL_COPY}" = "yes" ]; then
mkdir -p "${LOCAL_ARCHIVE_DIR}"
cp "${UPLOAD_FILE}" "${LOCAL_ARCHIVE_DIR}/" || log "WARNING: Local copy failed"
cleanup_old_local_files
fi
# Success
log "Successfully archived ${WAL_FILE} (checksum: ${UPLOAD_CHECKSUM:0:16}...)"
# Send success notification (optional, only for first archive or errors)
# send_notification "✅ WAL Archive Success" "File: ${WAL_FILE}"
}
# Execute main function
main "$@"
exit 0
#!/bin/bash
#
# PostgreSQL WAL Archive Script v2.0
# Usage: archive_wal.sh %p %f (called by PostgreSQL)
#
# Features:
# - Configuration file driven (no hardcoded values)
# - GPG encryption with integrity verification
# - S3 upload with multiple provider support
# - Optional local caching
# - Comprehensive error handling and notifications
# - Detailed logging with rotation
set -euo pipefail
# Load common library
LIB_DIR="$(dirname "${BASH_SOURCE[0]}")/lib"
if [ -f "${LIB_DIR}/common.sh" ]; then
source "${LIB_DIR}/common.sh"
else
echo "ERROR: Cannot find common library at ${LIB_DIR}/common.sh"
exit 1
fi
# ==================== SCRIPT CONFIGURATION ====================
# WAL file parameters (passed by PostgreSQL)
WAL_PATH="${1}"
WAL_FILE="${2}"
# Set current log file for this script
CURRENT_LOG_FILE="${WAL_ARCHIVE_LOG}"
# ==================== MAIN FUNCTIONS ====================
verify_prerequisites() {
log_debug "Verifying prerequisites..."
# Check WAL file exists
if [ ! -f "${WAL_PATH}" ]; then
error_exit "WAL file not found: ${WAL_PATH}"
fi
# Check required tools
command -v aws >/dev/null 2>&1 || error_exit "AWS CLI not installed"
if [ "${GPG_ENCRYPT}" = "yes" ]; then
command -v gpg >/dev/null 2>&1 || error_exit "GPG not installed"
gpg --list-keys "${GPG_RECIPIENT}" >/dev/null 2>&1 || \
error_exit "GPG key not found: ${GPG_RECIPIENT}"
fi
# Verify S3 access
local aws_cmd=$(get_aws_command)
if ! ${aws_cmd} s3 ls "s3://${S3_BUCKET}/" >/dev/null 2>&1; then
error_exit "Cannot access S3 bucket: ${S3_BUCKET}"
fi
log_debug "Prerequisites verified"
}
archive_wal_file() {
log_info "Archiving WAL file: ${WAL_FILE}"
# Create working directory
local work_dir=$(mktemp -d -p "${TEMP_DIR:-/tmp}" pg_archive.XXXXXX)
trap "rm -rf ${work_dir}" EXIT
# Calculate original checksum
local original_checksum=$(calculate_checksum "${WAL_PATH}")
log_debug "Original checksum: ${original_checksum}"
local upload_file="${WAL_PATH}"
local upload_checksum="${original_checksum}"
local s3_key="${S3_WAL_PREFIX}/$(get_hostname)/${WAL_FILE}"
# Encrypt if enabled
if [ "${GPG_ENCRYPT}" = "yes" ]; then
local encrypted_file="${work_dir}/${WAL_FILE}.gpg"
log_debug "Encrypting WAL file..."
if ! gpg_encrypt "${WAL_PATH}" "${encrypted_file}"; then
error_exit "Encryption failed for ${WAL_FILE}"
fi
upload_file="${encrypted_file}"
upload_checksum=$(calculate_checksum "${encrypted_file}")
s3_key="${s3_key}.gpg"
log_debug "Encrypted checksum: ${upload_checksum}"
fi
# Upload to S3 with metadata
local metadata="sha256=${upload_checksum},original=${WAL_FILE},hostname=$(get_hostname),timestamp=$(date -u +%Y-%m-%dT%H:%M:%SZ)"
log_info "Uploading to S3: ${s3_key}"
if ! retry_command s3_upload "${upload_file}" "${s3_key}" "${metadata}"; then
error_exit "S3 upload failed for ${WAL_FILE}"
fi
# Verify upload by checking existence
if ! s3_exists "${s3_key}"; then
error_exit "S3 verification failed for ${WAL_FILE}"
fi
# Keep local copy if configured
if [ "${KEEP_LOCAL_WAL_COPY}" = "yes" ]; then
save_local_copy "${upload_file}" "${upload_checksum}"
fi
log_info "Successfully archived: ${WAL_FILE} (checksum: ${upload_checksum:0:16}...)"
}
save_local_copy() {
local file="${1}"
local checksum="${2}"
ensure_directory "${LOCAL_WAL_DIR}" "postgres:postgres" "700"
local local_file="${LOCAL_WAL_DIR}/$(basename ${file})"
if cp "${file}" "${local_file}"; then
echo "${checksum}" > "${local_file}.sha256"
log_debug "Local copy saved: ${local_file}"
# Cleanup old local files
find "${LOCAL_WAL_DIR}" -type f -mtime "+${LOCAL_WAL_RETENTION_DAYS}" -delete 2>/dev/null || true
else
log_warn "Failed to save local copy"
fi
}
# ==================== MAIN EXECUTION ====================
main() {
# Load configuration
if ! load_config; then
echo "ERROR: Failed to load configuration"
exit 1
fi
# Setup logging
setup_logging "${CURRENT_LOG_FILE}"
rotate_log "${CURRENT_LOG_FILE}"
log_info "Starting WAL archive: ${WAL_FILE}"
log_debug "WAL path: ${WAL_PATH}"
log_debug "Configuration loaded from: ${CONFIG_FILE:-${ENV_FILE}}"
# Verify environment
verify_prerequisites
# Archive the WAL file
archive_wal_file
log_info "WAL archive completed successfully"
}
# Execute main function
main "$@"
exit 0
#!/bin/bash
#
# PostgreSQL Backup System - Configuration File
# Copy this to /etc/pg_backup/backup.conf and customize
#
# Security: chmod 600 /etc/pg_backup/backup.conf
# chown postgres:postgres /etc/pg_backup/backup.conf
# ==================== S3 STORAGE CONFIGURATION ====================
# S3 Provider Configuration
# Supported: backblaze, tebi, hetzner, aws, custom
S3_PROVIDER="backblaze"
# Provider-specific endpoints (auto-configured based on S3_PROVIDER)
# Override S3_ENDPOINT below for custom provider
S3_REGION="us-west-000"
# S3 Bucket Configuration
S3_BUCKET="my-postgres-backups"
S3_BASE_PREFIX="base-backups"
S3_WAL_PREFIX="wal-archives"
# Custom S3 Endpoint (only needed if S3_PROVIDER="custom")
S3_ENDPOINT=""
# ==================== AWS CREDENTIALS ====================
# NEVER commit this file with real credentials to git!
# Use environment variables or AWS profile instead
# Option 1: AWS CLI Profile (RECOMMENDED)
AWS_PROFILE="backups"
# Option 2: Direct credentials (LESS SECURE - use only if necessary)
# Leave empty to use AWS_PROFILE
AWS_ACCESS_KEY_ID=""
AWS_SECRET_ACCESS_KEY=""
# ==================== ENCRYPTION CONFIGURATION ====================
# GPG Encryption
GPG_ENCRYPT="yes"
GPG_RECIPIENT="[email protected]"
# GPG Key Location (for automated operations)
GPG_PUBLIC_KEY_PATH="/var/lib/postgresql/.gnupg/backup-public.asc"
GPG_PRIVATE_KEY_PATH="" # Leave empty - keep offline!
# ==================== POSTGRESQL CONFIGURATION ====================
# PostgreSQL Connection
PGHOST="localhost"
PGPORT="5432"
PGUSER="postgres"
PGDATABASE="postgres"
# PostgreSQL Data Directory
PG_DATA_DIR="/var/lib/postgresql/14/main"
PG_VERSION="14"
# ==================== BACKUP CONFIGURATION ====================
# WAL Archive Settings
KEEP_LOCAL_WAL_COPY="no"
LOCAL_WAL_DIR="/var/lib/postgresql/wal_archive"
LOCAL_WAL_RETENTION_DAYS="7"
# Base Backup Settings
BASE_BACKUP_RETENTION_DAYS="30"
BASE_BACKUP_COMPRESS="yes"
BASE_BACKUP_PARALLEL_JOBS="4"
BASE_BACKUP_MAX_RATE_MB="0" # 0 = unlimited
# ==================== MONITORING CONFIGURATION ====================
# Monitoring Thresholds
MAX_WAL_LAG_SECONDS="300"
MAX_ARCHIVE_FAILURES="3"
MIN_DISK_SPACE_GB="50"
MAX_BASE_BACKUP_AGE_DAYS="7"
MAX_ARCHIVE_GAP_MINUTES="60"
# Alert Cooldown (seconds between repeated alerts)
ALERT_COOLDOWN_SECONDS="3600"
# ==================== NOTIFICATION CONFIGURATION ====================
# Enable Notifications
ENABLE_NOTIFICATIONS="yes"
# Notification Method: telegram, email, both, none
NOTIFICATION_METHOD="telegram"
# Telegram Configuration
TELEGRAM_BOT_TOKEN=""
TELEGRAM_CHAT_ID=""
# Email Configuration
EMAIL_TO="[email protected]"
EMAIL_FROM="[email protected]"
SMTP_SERVER="smtp.gmail.com:587"
SMTP_USER=""
SMTP_PASSWORD=""
# ==================== VERIFICATION CONFIGURATION ====================
# Automated Backup Verification
ENABLE_BACKUP_VERIFICATION="yes"
VERIFY_TEST_PORT="5433"
VERIFY_TEST_DATA_DIR="/var/lib/postgresql/backup_verify_test"
VERIFY_MAX_DURATION_SECONDS="7200"
VERIFY_CLEANUP_ON_SUCCESS="yes"
VERIFY_CLEANUP_ON_FAILURE="no"
# Verification Tests
VERIFY_RUN_CHECKSUM_TESTS="yes"
VERIFY_RUN_QUERY_TESTS="yes"
VERIFY_RUN_CONSISTENCY_TESTS="yes"
# Sample tables to verify (space-separated)
VERIFY_SAMPLE_TABLES="pg_database pg_class pg_namespace"
# ==================== LOGGING CONFIGURATION ====================
# Log Paths
LOG_DIR="/var/log/postgresql"
WAL_ARCHIVE_LOG="${LOG_DIR}/wal_archive.log"
WAL_RESTORE_LOG="${LOG_DIR}/wal_restore.log"
BASE_BACKUP_LOG="${LOG_DIR}/base_backup.log"
BACKUP_MONITOR_LOG="${LOG_DIR}/backup_monitor.log"
BACKUP_VERIFY_LOG="${LOG_DIR}/backup_verify.log"
# Log Rotation
MAX_LOG_SIZE_MB="100"
KEEP_OLD_LOGS="3"
# ==================== PATHS CONFIGURATION ====================
# Script Installation Directory
SCRIPT_DIR="/usr/local/bin/pg_backup"
# State Files Directory
STATE_DIR="/var/lib/postgresql/monitor_state"
# Temporary Files Directory
TEMP_DIR="/tmp/pg_backup"
# ==================== ADVANCED SETTINGS ====================
# Performance Tuning
ARCHIVE_COMMAND_TIMEOUT="300"
RESTORE_COMMAND_TIMEOUT="300"
# Retry Configuration
MAX_RETRY_ATTEMPTS="3"
RETRY_DELAY_SECONDS="60"
# Debug Mode
DEBUG_MODE="no"
# ==================== PROVIDER PRESETS ====================
# These are auto-configured based on S3_PROVIDER
# You can override by setting S3_ENDPOINT manually
# Backblaze B2
# S3_ENDPOINT="https://s3.us-west-000.backblazeb2.com"
# Tebi
# S3_ENDPOINT="https://s3.tebi.io"
# Hetzner
# S3_ENDPOINT="https://fsn1.your-objectstorage.com"
# AWS S3
# S3_ENDPOINT="" # Leave empty for AWS
# Custom
# S3_ENDPOINT="https://your-custom-s3.com"
#!/bin/bash
#
# PostgreSQL Backup Monitoring & Health Check
# Run via cron: */15 * * * * /usr/local/bin/backup_monitor.sh
#
# This script monitors:
# 1. WAL archiving lag
# 2. Archive failures
# 3. Disk space
# 4. Base backup age
# 5. S3 connectivity
# 6. Archive integrity
set -euo pipefail
# ==================== CONFIGURATION ====================
# PostgreSQL Configuration
PGHOST="${PGHOST:-localhost}"
PGPORT="${PGPORT:-5432}"
PGDATABASE="${PGDATABASE:-postgres}"
PGUSER="${PGUSER:-postgres}"
# S3 Configuration
S3_ENDPOINT="https://s3.us-west-000.backblazeb2.com"
S3_BUCKET="your-postgres-backups"
S3_REGION="us-west-000"
S3_PREFIX="wal-archives/$(hostname)"
AWS_PROFILE="backups"
# Monitoring Thresholds
MAX_WAL_LAG_SECONDS=300 # 5 minutes
MAX_ARCHIVE_FAILURES=3 # Alert after 3 failures
MIN_DISK_SPACE_GB=50 # Minimum free space on data partition
MAX_BASE_BACKUP_AGE_DAYS=7 # Alert if base backup older than this
MAX_ARCHIVE_GAP_MINUTES=60 # Alert if no archives in this time
# Notification Configuration
ENABLE_NOTIFICATIONS="yes"
NOTIFICATION_METHOD="telegram"
TELEGRAM_BOT_TOKEN="YOUR_BOT_TOKEN"
TELEGRAM_CHAT_ID="YOUR_CHAT_ID"
EMAIL_TO="[email protected]"
EMAIL_FROM="[email protected]"
SMTP_SERVER="smtp.gmail.com:587"
# State files
STATE_DIR="/var/lib/postgresql/monitor_state"
LAST_ALERT_FILE="${STATE_DIR}/last_alert"
METRICS_FILE="${STATE_DIR}/metrics.json"
# Logging
LOG_FILE="/var/log/postgresql/backup_monitor.log"
# Alert cooldown (seconds) - don't spam alerts
ALERT_COOLDOWN=3600 # 1 hour
# ==================== FUNCTIONS ====================
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "${LOG_FILE}"
}
send_telegram() {
local title="$1"
local message="$2"
local text="${title}%0A%0A${message}"
curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
-d "chat_id=${TELEGRAM_CHAT_ID}" \
-d "text=${text}" \
-d "parse_mode=HTML" >/dev/null 2>&1 || true
}
send_email() {
local subject="$1"
local body="$2"
echo -e "${body}" | mail -s "${subject}" \
-a "From: ${EMAIL_FROM}" \
-S smtp="${SMTP_SERVER}" \
-S smtp-use-starttls \
"${EMAIL_TO}" >/dev/null 2>&1 || true
}
send_alert() {
[ "${ENABLE_NOTIFICATIONS}" != "yes" ] && return 0
local severity="$1" # INFO, WARNING, CRITICAL
local title="$2"
local message="$3"
# Check alert cooldown
if [ -f "${LAST_ALERT_FILE}" ]; then
local last_alert=$(cat "${LAST_ALERT_FILE}")
local now=$(date +%s)
local diff=$((now - last_alert))
if [ ${diff} -lt ${ALERT_COOLDOWN} ] && [ "${severity}" != "CRITICAL" ]; then
log "Alert cooldown active, skipping notification"
return 0
fi
fi
local icon="ℹ️"
[ "${severity}" = "WARNING" ] && icon="⚠️"
[ "${severity}" = "CRITICAL" ] && icon="🚨"
local full_title="${icon} ${severity}: ${title}"
local full_message="${message}%0AHost: $(hostname)%0ATime: $(date '+%Y-%m-%d %H:%M:%S')"
case "${NOTIFICATION_METHOD}" in
telegram)
send_telegram "${full_title}" "${full_message}"
;;
email)
send_email "${full_title}" "${full_message}"
;;
both)
send_telegram "${full_title}" "${full_message}"
send_email "${full_title}" "${full_message}"
;;
esac
# Update last alert time
date +%s > "${LAST_ALERT_FILE}"
}
check_wal_archiving() {
log "Checking WAL archiving status..."
# Get current WAL file and last archived
local current_wal=$(psql -h "${PGHOST}" -p "${PGPORT}" -U "${PGUSER}" -d "${PGDATABASE}" -t -c \
"SELECT pg_walfile_name(pg_current_wal_lsn());" | xargs)
local last_archived=$(psql -h "${PGHOST}" -p "${PGPORT}" -U "${PGUSER}" -d "${PGDATABASE}" -t -c \
"SELECT last_archived_wal FROM pg_stat_archiver;" | xargs)
local failed_count=$(psql -h "${PGHOST}" -p "${PGPORT}" -U "${PGUSER}" -d "${PGDATABASE}" -t -c \
"SELECT failed_count FROM pg_stat_archiver;" | xargs)
local last_archived_time=$(psql -h "${PGHOST}" -p "${PGPORT}" -U "${PGUSER}" -d "${PGDATABASE}" -t -c \
"SELECT EXTRACT(EPOCH FROM (NOW() - last_archived_time))::int FROM pg_stat_archiver;" | xargs)
log "Current WAL: ${current_wal}, Last archived: ${last_archived}"
log "Archive lag: ${last_archived_time}s, Failed count: ${failed_count}"
# Check for failures
if [ "${failed_count}" -ge "${MAX_ARCHIVE_FAILURES}" ]; then
send_alert "CRITICAL" "WAL Archive Failures" \
"Failed archive attempts: ${failed_count}%0ALast archived: ${last_archived}%0AAgo: ${last_archived_time}s"
return 1
fi
# Check lag
if [ "${last_archived_time}" -gt "${MAX_WAL_LAG_SECONDS}" ]; then
send_alert "WARNING" "WAL Archive Lag" \
"Last archive was ${last_archived_time}s ago%0ALast archived WAL: ${last_archived}"
return 1
fi
return 0
}
check_disk_space() {
log "Checking disk space..."
# Check data directory
local data_dir=$(psql -h "${PGHOST}" -p "${PGPORT}" -U "${PGUSER}" -d "${PGDATABASE}" -t -c \
"SHOW data_directory;" | xargs)
local free_gb=$(df -BG "${data_dir}" | awk 'NR==2 {print $4}' | sed 's/G//')
log "Free space on data partition: ${free_gb}GB"
if [ "${free_gb}" -lt "${MIN_DISK_SPACE_GB}" ]; then
send_alert "CRITICAL" "Low Disk Space" \
"Only ${free_gb}GB free on ${data_dir}%0AThreshold: ${MIN_DISK_SPACE_GB}GB"
return 1
fi
# Check WAL directory size
local wal_size=$(du -sh "${data_dir}/pg_wal" | awk '{print $1}')
log "pg_wal size: ${wal_size}"
return 0
}
check_base_backup_age() {
log "Checking base backup age..."
# List backups in S3
local backup_prefix="${S3_PREFIX%/wal-archives/*}/base-backups"
local latest_backup=$(aws s3 ls "s3://${S3_BUCKET}/${backup_prefix}/" \
--profile "${AWS_PROFILE}" \
--endpoint-url "${S3_ENDPOINT}" \
--region "${S3_REGION}" \
--recursive | sort | tail -n 1 | awk '{print $1, $2}')
if [ -z "${latest_backup}" ]; then
send_alert "WARNING" "No Base Backup Found" \
"No base backups found in S3"
return 1
fi
local backup_date=$(echo "${latest_backup}" | awk '{print $1}')
local backup_epoch=$(date -d "${backup_date}" +%s 2>/dev/null || date -j -f "%Y-%m-%d" "${backup_date}" +%s)
local now=$(date +%s)
local age_days=$(( (now - backup_epoch) / 86400 ))
log "Latest base backup age: ${age_days} days"
if [ "${age_days}" -gt "${MAX_BASE_BACKUP_AGE_DAYS}" ]; then
send_alert "WARNING" "Base Backup Too Old" \
"Latest backup is ${age_days} days old%0AThreshold: ${MAX_BASE_BACKUP_AGE_DAYS} days"
return 1
fi
return 0
}
check_s3_connectivity() {
log "Checking S3 connectivity..."
# Try to list bucket
if ! aws s3 ls "s3://${S3_BUCKET}/${S3_PREFIX}/" \
--profile "${AWS_PROFILE}" \
--endpoint-url "${S3_ENDPOINT}" \
--region "${S3_REGION}" >/dev/null 2>&1; then
send_alert "CRITICAL" "S3 Connectivity Failed" \
"Cannot connect to S3 bucket%0AEndpoint: ${S3_ENDPOINT}%0ABucket: ${S3_BUCKET}"
return 1
fi
log "S3 connectivity OK"
return 0
}
check_archive_gaps() {
log "Checking for archive gaps..."
# List recent archives
local cutoff_time=$(date -u -d "${MAX_ARCHIVE_GAP_MINUTES} minutes ago" '+%Y-%m-%dT%H:%M:%S' 2>/dev/null || \
date -u -v-${MAX_ARCHIVE_GAP_MINUTES}M '+%Y-%m-%dT%H:%M:%S')
local recent_count=$(aws s3 ls "s3://${S3_BUCKET}/${S3_PREFIX}/" \
--profile "${AWS_PROFILE}" \
--endpoint-url "${S3_ENDPOINT}" \
--region "${S3_REGION}" \
--recursive | awk -v cutoff="${cutoff_time}" '$1" "$2 >= cutoff' | wc -l)
log "Archives in last ${MAX_ARCHIVE_GAP_MINUTES} minutes: ${recent_count}"
if [ "${recent_count}" -eq 0 ]; then
send_alert "WARNING" "No Recent Archives" \
"No WAL archives in the last ${MAX_ARCHIVE_GAP_MINUTES} minutes"
return 1
fi
return 0
}
verify_random_archive() {
log "Verifying random archive integrity..."
# Get a random recent archive
local random_file=$(aws s3 ls "s3://${S3_BUCKET}/${S3_PREFIX}/" \
--profile "${AWS_PROFILE}" \
--endpoint-url "${S3_ENDPOINT}" \
--region "${S3_REGION}" | tail -n 10 | shuf -n 1 | awk '{print $4}')
if [ -z "${random_file}" ]; then
log "No archives to verify"
return 0
fi
# Download and verify checksum
local temp_file=$(mktemp)
trap "rm -f ${temp_file}" RETURN
local s3_key="${S3_PREFIX}/${random_file}"
if aws s3 cp "s3://${S3_BUCKET}/${s3_key}" "${temp_file}" \
--profile "${AWS_PROFILE}" \
--endpoint-url "${S3_ENDPOINT}" \
--region "${S3_REGION}" \
--no-progress >/dev/null 2>&1; then
local checksum=$(sha256sum "${temp_file}" | awk '{print $1}')
log "Verified archive: ${random_file} (${checksum:0:16}...)"
return 0
else
send_alert "CRITICAL" "Archive Verification Failed" \
"Failed to download/verify: ${random_file}"
return 1
fi
}
save_metrics() {
local status="$1"
mkdir -p "${STATE_DIR}"
cat > "${METRICS_FILE}" <<EOF
{
"timestamp": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
"status": "${status}",
"checks": {
"wal_archiving": ${CHECK_WAL_ARCHIVING:-0},
"disk_space": ${CHECK_DISK_SPACE:-0},
"base_backup_age": ${CHECK_BASE_BACKUP_AGE:-0},
"s3_connectivity": ${CHECK_S3_CONNECTIVITY:-0},
"archive_gaps": ${CHECK_ARCHIVE_GAPS:-0}
}
}
EOF
}
generate_health_report() {
log "=== Health Check Summary ==="
log "WAL Archiving: $([ ${CHECK_WAL_ARCHIVING:-1} -eq 0 ] && echo "OK" || echo "FAILED")"
log "Disk Space: $([ ${CHECK_DISK_SPACE:-1} -eq 0 ] && echo "OK" || echo "FAILED")"
log "Base Backup Age: $([ ${CHECK_BASE_BACKUP_AGE:-1} -eq 0 ] && echo "OK" || echo "FAILED")"
log "S3 Connectivity: $([ ${CHECK_S3_CONNECTIVITY:-1} -eq 0 ] && echo "OK" || echo "FAILED")"
log "Archive Gaps: $([ ${CHECK_ARCHIVE_GAPS:-1} -eq 0 ] && echo "OK" || echo "FAILED")"
log "=============================="
}
# ==================== MAIN ====================
main() {
log "Starting backup health check..."
mkdir -p "${STATE_DIR}"
# Initialize status
CHECK_WAL_ARCHIVING=0
CHECK_DISK_SPACE=0
CHECK_BASE_BACKUP_AGE=0
CHECK_S3_CONNECTIVITY=0
CHECK_ARCHIVE_GAPS=0
# Run all checks (don't exit on failure, collect all issues)
check_s3_connectivity || CHECK_S3_CONNECTIVITY=1
check_wal_archiving || CHECK_WAL_ARCHIVING=1
check_disk_space || CHECK_DISK_SPACE=1
check_base_backup_age || CHECK_BASE_BACKUP_AGE=1
check_archive_gaps || CHECK_ARCHIVE_GAPS=1
verify_random_archive || true # Don't fail on this
# Generate summary
generate_health_report
# Calculate overall status
local failed_checks=$((CHECK_WAL_ARCHIVING + CHECK_DISK_SPACE + CHECK_BASE_BACKUP_AGE + CHECK_S3_CONNECTIVITY + CHECK_ARCHIVE_GAPS))
if [ ${failed_checks} -eq 0 ]; then
save_metrics "healthy"
log "All checks passed ✓"
else
save_metrics "degraded"
log "Health check completed with ${failed_checks} failures"
fi
}
main "$@"
exit 0
#!/bin/bash
#
# PostgreSQL Automated Backup Verification & Restore Testing
# Run via cron: 0 3 * * 1 /usr/local/bin/pg_backup/backup_verify.sh
#
# This script:
# 1. Provisions a temporary PostgreSQL instance
# 2. Restores latest base backup + WAL archives
# 3. Verifies data integrity with checksums
# 4. Runs sample queries to validate data
# 5. Tears down test instance
# 6. Reports results via notifications
#
# IMPORTANT: Requires separate disk space for test restoration
set -euo pipefail
# ==================== CONFIGURATION ====================
# Test Environment
TEST_DATA_DIR="/var/lib/postgresql/backup_verify_test"
TEST_PORT=5433 # Different from production
TEST_MAX_DURATION=7200 # 2 hours max
CLEANUP_ON_SUCCESS="yes"
CLEANUP_ON_FAILURE="no" # Keep for investigation
# Production PostgreSQL (for comparison)
PROD_PGHOST="${PGHOST:-localhost}"
PROD_PGPORT="${PGPORT:-5432}"
PROD_PGUSER="${PGUSER:-postgres}"
PROD_PGDATABASE="${PGDATABASE:-postgres}"
# S3 Configuration
S3_ENDPOINT="https://s3.us-west-000.backblazeb2.com"
S3_BUCKET="your-postgres-backups"
S3_REGION="us-west-000"
S3_BASE_PREFIX="base-backups/$(hostname)"
S3_WAL_PREFIX="wal-archives/$(hostname)"
AWS_PROFILE="backups"
# Encryption
GPG_DECRYPT="yes"
# Verification Tests
RUN_CHECKSUM_TESTS="yes"
RUN_QUERY_TESTS="yes"
RUN_CONSISTENCY_TESTS="yes"
SAMPLE_TABLES=("pg_database" "pg_class" "pg_namespace") # Add your critical tables
# Notification Configuration
ENABLE_NOTIFICATIONS="yes"
NOTIFICATION_METHOD="telegram"
TELEGRAM_BOT_TOKEN="YOUR_BOT_TOKEN"
TELEGRAM_CHAT_ID="YOUR_CHAT_ID"
EMAIL_TO="[email protected]"
EMAIL_FROM="[email protected]"
SMTP_SERVER="smtp.gmail.com:587"
# Logging
LOG_FILE="/var/log/postgresql/backup_verify.log"
REPORT_FILE="/var/lib/postgresql/monitor_state/last_verify_report.json"
# ==================== FUNCTIONS ====================
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "${LOG_FILE}"
}
error_exit() {
log "ERROR: $1"
send_notification "🚨 Backup Verification FAILED" "$1\nHost: $(hostname)\nSee logs: ${LOG_FILE}"
cleanup_test_instance
exit 1
}
send_telegram() {
local title="$1"
local message="$2"
local text="${title}%0A%0A${message}"
curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
-d "chat_id=${TELEGRAM_CHAT_ID}" \
-d "text=${text}" \
-d "parse_mode=HTML" >/dev/null 2>&1 || true
}
send_email() {
local subject="$1"
local body="$2"
echo -e "${body}" | mail -s "${subject}" \
-a "From: ${EMAIL_FROM}" \
-S smtp="${SMTP_SERVER}" \
-S smtp-use-starttls \
"${EMAIL_TO}" >/dev/null 2>&1 || true
}
send_notification() {
[ "${ENABLE_NOTIFICATIONS}" != "yes" ] && return 0
local title="$1"
local message="$2"
case "${NOTIFICATION_METHOD}" in
telegram)
send_telegram "${title}" "${message}"
;;
email)
send_email "${title}" "${message}"
;;
both)
send_telegram "${title}" "${message}"
send_email "${title}" "${message}"
;;
esac
}
check_disk_space() {
log "Checking available disk space..."
local parent_dir=$(dirname "${TEST_DATA_DIR}")
local available_gb=$(df -BG "${parent_dir}" | awk 'NR==2 {print $4}' | sed 's/G//')
# Get production database size
local prod_size_gb=$(psql -h "${PROD_PGHOST}" -p "${PROD_PGPORT}" -U "${PROD_PGUSER}" -d "${PROD_PGDATABASE}" -t -c \
"SELECT ROUND(pg_database_size('${PROD_PGDATABASE}') / 1024.0 / 1024.0 / 1024.0);" 2>/dev/null | xargs || echo "10")
local required_gb=$((prod_size_gb * 2)) # 2x for safety
log "Available: ${available_gb}GB, Required: ~${required_gb}GB"
if [ "${available_gb}" -lt "${required_gb}" ]; then
error_exit "Insufficient disk space. Need ${required_gb}GB, have ${available_gb}GB"
fi
}
get_latest_backup() {
log "Finding latest base backup..."
local latest=$(aws s3 ls "s3://${S3_BUCKET}/${S3_BASE_PREFIX}/" \
--profile "${AWS_PROFILE}" \
--endpoint-url "${S3_ENDPOINT}" \
--region "${S3_REGION}" | \
grep "PRE" | awk '{print $2}' | sed 's#/##g' | sort -r | head -1)
if [ -z "${latest}" ]; then
error_exit "No base backups found in S3"
fi
log "Latest backup: ${latest}"
echo "${latest}"
}
download_and_extract_backup() {
local backup_name="$1"
log "Downloading base backup: ${backup_name}"
# Create temporary download directory
local download_dir=$(mktemp -d)
# Download all backup files
aws s3 sync "s3://${S3_BUCKET}/${S3_BASE_PREFIX}/${backup_name}/" "${download_dir}/" \
--profile "${AWS_PROFILE}" \
--endpoint-url "${S3_ENDPOINT}" \
--region "${S3_REGION}" \
--exclude "*" \
--include "*.gpg" \
2>&1 | tee -a "${LOG_FILE}" || error_exit "Failed to download backup"
log "Decrypting backup files..."
cd "${download_dir}"
for file in *.gpg; do
[ ! -f "${file}" ] && continue
log "Decrypting ${file}..."
gpg --decrypt --batch --yes --output "${file%.gpg}" "${file}" 2>&1 >> "${LOG_FILE}" || \
error_exit "Failed to decrypt ${file}"
# Verify checksum if available
if [ -f "${file}.sha256" ]; then
local expected=$(cat "${file}.sha256" | awk '{print $1}')
local actual=$(sha256sum "${file}" | awk '{print $1}')
if [ "${expected}" != "${actual}" ]; then
error_exit "Checksum mismatch for ${file}"
fi
log "Checksum verified for ${file}"
fi
rm "${file}" # Remove encrypted file
done
log "Extracting backup to ${TEST_DATA_DIR}..."
# Create test data directory
rm -rf "${TEST_DATA_DIR}"
mkdir -p "${TEST_DATA_DIR}"
chmod 700 "${TEST_DATA_DIR}"
# Extract base backup
if [ -f "base.tar.gz" ]; then
tar xzf base.tar.gz -C "${TEST_DATA_DIR}" 2>&1 >> "${LOG_FILE}" || \
error_exit "Failed to extract base.tar.gz"
elif [ -f "base.tar" ]; then
tar xf base.tar -C "${TEST_DATA_DIR}" 2>&1 >> "${LOG_FILE}" || \
error_exit "Failed to extract base.tar"
else
error_exit "No base backup archive found"
fi
# Extract WAL backup if separate
if [ -f "pg_wal.tar.gz" ]; then
tar xzf pg_wal.tar.gz -C "${TEST_DATA_DIR}/pg_wal" 2>&1 >> "${LOG_FILE}" || \
error_exit "Failed to extract pg_wal.tar.gz"
elif [ -f "pg_wal.tar" ]; then
tar xf pg_wal.tar -C "${TEST_DATA_DIR}/pg_wal" 2>&1 >> "${LOG_FILE}" || \
error_exit "Failed to extract pg_wal.tar"
fi
# Cleanup download directory
rm -rf "${download_dir}"
log "Backup extraction completed"
}
configure_test_instance() {
log "Configuring test PostgreSQL instance..."
# Update port in postgresql.conf
sed -i "s/^#*port = .*/port = ${TEST_PORT}/" "${TEST_DATA_DIR}/postgresql.conf" || \
sed -i '' "s/^#*port = .*/port = ${TEST_PORT}/" "${TEST_DATA_DIR}/postgresql.conf"
# Disable archiving during test
sed -i "s/^archive_mode = .*/archive_mode = off/" "${TEST_DATA_DIR}/postgresql.conf" || \
sed -i '' "s/^archive_mode = .*/archive_mode = off/" "${TEST_DATA_DIR}/postgresql.conf"
# Configure restore command for recovery
cat >> "${TEST_DATA_DIR}/postgresql.conf" <<EOF
# Recovery configuration (added by backup_verify.sh)
restore_command = '/usr/local/bin/pg_backup/restore_wal.sh %f %p'
recovery_target = 'immediate'
EOF
# Create recovery signal
touch "${TEST_DATA_DIR}/recovery.signal"
# Set ownership
chown -R postgres:postgres "${TEST_DATA_DIR}"
log "Test instance configured"
}
start_test_instance() {
log "Starting test PostgreSQL instance..."
# Start PostgreSQL with test data directory
sudo -u postgres pg_ctl start \
-D "${TEST_DATA_DIR}" \
-l "${TEST_DATA_DIR}/postgresql.log" \
-o "-p ${TEST_PORT}" \
2>&1 >> "${LOG_FILE}" || error_exit "Failed to start test instance"
# Wait for PostgreSQL to be ready
local timeout=300
local elapsed=0
while [ ${elapsed} -lt ${timeout} ]; do
if pg_isready -p "${TEST_PORT}" -h localhost >/dev/null 2>&1; then
log "Test instance is ready"
return 0
fi
sleep 5
elapsed=$((elapsed + 5))
done
error_exit "Test instance failed to start within ${timeout}s"
}
wait_for_recovery() {
log "Waiting for recovery to complete..."
local timeout=3600 # 1 hour
local elapsed=0
while [ ${elapsed} -lt ${timeout} ]; do
local in_recovery=$(psql -h localhost -p "${TEST_PORT}" -U postgres -t -c \
"SELECT pg_is_in_recovery();" 2>/dev/null | xargs || echo "t")
if [ "${in_recovery}" = "f" ]; then
log "Recovery completed successfully"
return 0
fi
# Check for errors in log
if grep -q "FATAL\|PANIC" "${TEST_DATA_DIR}/postgresql.log"; then
error_exit "Recovery failed with errors. Check ${TEST_DATA_DIR}/postgresql.log"
fi
sleep 10
elapsed=$((elapsed + 10))
# Progress update every minute
if [ $((elapsed % 60)) -eq 0 ]; then
log "Recovery in progress... (${elapsed}s elapsed)"
fi
done
error_exit "Recovery timeout after ${timeout}s"
}
run_checksum_tests() {
[ "${RUN_CHECKSUM_TESTS}" != "yes" ] && return 0
log "Running checksum verification..."
# Get table checksums from test instance
local test_checksums=$(psql -h localhost -p "${TEST_PORT}" -U postgres -d "${PROD_PGDATABASE}" -t -c \
"SELECT schemaname, tablename, pg_relation_size(schemaname||'.'||tablename)
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY schemaname, tablename;" 2>/dev/null | wc -l || echo "0")
log "Found ${test_checksums} user tables in restored database"
if [ "${test_checksums}" -eq 0 ]; then
error_exit "No user tables found in restored database"
fi
log "Checksum tests passed"
return 0
}
run_query_tests() {
[ "${RUN_QUERY_TESTS}" != "yes" ] && return 0
log "Running query validation tests..."
local failed_tests=0
# Test 1: Database connectivity
if ! psql -h localhost -p "${TEST_PORT}" -U postgres -d "${PROD_PGDATABASE}" -c "SELECT 1" >/dev/null 2>&1; then
log "FAILED: Database connectivity test"
failed_tests=$((failed_tests + 1))
else
log "PASSED: Database connectivity"
fi
# Test 2: Sample table queries
for table in "${SAMPLE_TABLES[@]}"; do
local count=$(psql -h localhost -p "${TEST_PORT}" -U postgres -d "${PROD_PGDATABASE}" -t -c \
"SELECT COUNT(*) FROM ${table};" 2>/dev/null | xargs || echo "0")
if [ "${count}" -gt 0 ]; then
log "PASSED: Table ${table} has ${count} rows"
else
log "FAILED: Table ${table} is empty or inaccessible"
failed_tests=$((failed_tests + 1))
fi
done
# Test 3: Check for corrupted indexes
local corrupted=$(psql -h localhost -p "${TEST_PORT}" -U postgres -d "${PROD_PGDATABASE}" -t -c \
"SELECT COUNT(*) FROM pg_class WHERE relkind = 'i' AND NOT pg_relation_is_updatable(oid::regclass, true);" \
2>/dev/null | xargs || echo "999")
if [ "${corrupted}" -eq 0 ]; then
log "PASSED: No corrupted indexes found"
else
log "FAILED: Found ${corrupted} potentially corrupted indexes"
failed_tests=$((failed_tests + 1))
fi
if [ ${failed_tests} -gt 0 ]; then
error_exit "Query tests failed: ${failed_tests} test(s)"
fi
log "All query tests passed"
return 0
}
run_consistency_tests() {
[ "${RUN_CONSISTENCY_TESTS}" != "yes" ] && return 0
log "Running consistency tests..."
# Compare row counts between production and restored database
local prod_tables=$(psql -h "${PROD_PGHOST}" -p "${PROD_PGPORT}" -U "${PROD_PGUSER}" -d "${PROD_PGDATABASE}" -t -c \
"SELECT COUNT(*) FROM information_schema.tables WHERE table_schema NOT IN ('pg_catalog', 'information_schema');" \
2>/dev/null | xargs || echo "0")
local test_tables=$(psql -h localhost -p "${TEST_PORT}" -U postgres -d "${PROD_PGDATABASE}" -t -c \
"SELECT COUNT(*) FROM information_schema.tables WHERE table_schema NOT IN ('pg_catalog', 'information_schema');" \
2>/dev/null | xargs || echo "0")
log "Production tables: ${prod_tables}, Restored tables: ${test_tables}"
# Allow some variance (tables created after backup)
local diff=$((prod_tables - test_tables))
if [ ${diff} -lt 0 ]; then diff=$((diff * -1)); fi
if [ ${diff} -gt 10 ]; then
log "WARNING: Significant table count difference: ${diff}"
else
log "PASSED: Table count consistency check"
fi
# Check for any error messages in logs
local errors=$(grep -c "ERROR\|FATAL" "${TEST_DATA_DIR}/postgresql.log" 2>/dev/null || echo "0")
log "Errors found in recovery log: ${errors}"
log "Consistency tests completed"
return 0
}
generate_verification_report() {
local status="$1"
local duration="$2"
local backup_name="$3"
local timestamp=$(date -u +%Y-%m-%dT%H:%M:%SZ)
mkdir -p "$(dirname ${REPORT_FILE})"
cat > "${REPORT_FILE}" <<EOF
{
"timestamp": "${timestamp}",
"status": "${status}",
"duration_seconds": ${duration},
"backup_verified": "${backup_name}",
"test_port": ${TEST_PORT},
"test_data_dir": "${TEST_DATA_DIR}",
"tests": {
"checksum_tests": "${RUN_CHECKSUM_TESTS}",
"query_tests": "${RUN_QUERY_TESTS}",
"consistency_tests": "${RUN_CONSISTENCY_TESTS}"
},
"next_verification": "$(date -u -d '7 days' +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || date -u -v+7d +%Y-%m-%dT%H:%M:%SZ)"
}
EOF
log "Verification report saved: ${REPORT_FILE}"
}
stop_test_instance() {
if pg_isready -p "${TEST_PORT}" -h localhost >/dev/null 2>&1; then
log "Stopping test PostgreSQL instance..."
sudo -u postgres pg_ctl stop -D "${TEST_DATA_DIR}" -m fast 2>&1 >> "${LOG_FILE}" || true
sleep 5
fi
}
cleanup_test_instance() {
local cleanup="$1"
if [ "${cleanup}" != "yes" ]; then
log "Keeping test instance for investigation: ${TEST_DATA_DIR}"
return 0
fi
stop_test_instance
if [ -d "${TEST_DATA_DIR}" ]; then
log "Cleaning up test data directory..."
rm -rf "${TEST_DATA_DIR}"
log "Cleanup completed"
fi
}
# ==================== MAIN PROCESS ====================
main() {
log "========================================="
log "Starting Automated Backup Verification"
log "========================================="
local start_time=$(date +%s)
local backup_name=""
local status="failed"
# Set timeout for entire process
(
sleep ${TEST_MAX_DURATION}
log "Maximum duration reached (${TEST_MAX_DURATION}s), forcing cleanup..."
cleanup_test_instance "yes"
exit 1
) &
local timeout_pid=$!
trap "kill ${timeout_pid} 2>/dev/null || true" EXIT
# Verify prerequisites
check_disk_space
# Get latest backup
backup_name=$(get_latest_backup)
# Download and extract
download_and_extract_backup "${backup_name}"
# Configure test instance
configure_test_instance
# Start test instance
start_test_instance
# Wait for recovery
wait_for_recovery
# Run verification tests
run_checksum_tests
run_query_tests
run_consistency_tests
# Success!
status="success"
# Stop test instance
stop_test_instance
# Cleanup
if [ "${status}" = "success" ]; then
cleanup_test_instance "${CLEANUP_ON_SUCCESS}"
else
cleanup_test_instance "${CLEANUP_ON_FAILURE}"
fi
# Calculate duration
local end_time=$(date +%s)
local duration=$((end_time - start_time))
# Generate report
generate_verification_report "${status}" "${duration}" "${backup_name}"
# Send notification
if [ "${status}" = "success" ]; then
log "✅ Backup verification SUCCESSFUL"
send_notification "✅ Backup Verification Passed" \
"Backup: ${backup_name}%0ADuration: ${duration}s%0AHost: $(hostname)%0AAll tests passed!"
else
log "❌ Backup verification FAILED"
send_notification "❌ Backup Verification Failed" \
"Backup: ${backup_name}%0ADuration: ${duration}s%0ACheck logs: ${LOG_FILE}"
fi
kill ${timeout_pid} 2>/dev/null || true
log "========================================="
log "Backup verification completed: ${status}"
log "========================================="
}
# Execute main function
main "$@"
exit 0
# PostgreSQL Continuous Archiving Setup Guide
## 🎯 Overview
This is a production-ready PostgreSQL backup system with:
- ✅ **Encrypted WAL archiving** to S3-compatible storage
- ✅ **Automated monitoring** with Telegram/Email alerts
- ✅ **Point-in-time recovery** capability
- ✅ **Integrity verification** with SHA256 checksums
- ✅ **Disaster recovery** procedures
## 📋 Requirements & Recommendations
### Storage Strategy
**❌ Don't rely on local disk** for WAL archives if space is limited. Here's why:
- WAL files accumulate quickly (busy DB = GB/day)
- Local disk fills → PostgreSQL panics
- No off-site protection
**✅ Recommended approach:**
- Stream WAL directly to S3 (primary)
- Optional local cache for fast restores (if space permits)
- Separate base backups to S3
### Prerequisites
```bash
# Install required packages
sudo apt update
sudo apt install -y postgresql-client awscli gnupg2 curl mailutils
# Or for RHEL/CentOS
sudo yum install -y postgresql awscli gnupg2 curl mailx
```
## 🔐 Step 1: Set Up Encryption
### Generate GPG Key (one-time setup)
```bash
# Generate key for postgres user
sudo -u postgres gpg --full-generate-key
# Choose:
# - RSA and RSA (default)
# - 4096 bits
# - Key does not expire (or set expiry)
# - Real name: "PostgreSQL Backup"
# - Email: [email protected]
# Export public key for backup
sudo -u postgres gpg --export -a "[email protected]" > /var/lib/postgresql/backup-public-key.asc
# CRITICAL: Backup private key securely (offline/vault)
sudo -u postgres gpg --export-secret-keys -a "[email protected]" > /tmp/backup-private-key.asc
# Move this to secure offline storage immediately!
# You'll need it for disaster recovery
```
**🔥 CRITICAL: Store private key offline/securely!**
Without it, you cannot decrypt backups!
## 🗄️ Step 2: Configure S3 Storage
### Choose Provider (pick one):
**Backblaze B2:**
- Endpoint: `https://s3.us-west-000.backblazeb2.com`
- Cheapest: $6/TB/month storage
- Great for backups
**Tebi:**
- Endpoint: `https://s3.tebi.io`
- No egress fees
- Good for frequent restores
**Hetzner Storage Box:**
- Endpoint: `https://fsn1.your-objectstorage.com`
- EU-based, GDPR compliant
### Configure AWS CLI
```bash
# Configure for postgres user
sudo -u postgres aws configure --profile backups
# Enter:
# - AWS Access Key ID: [your-key-id]
# - AWS Secret Access Key: [your-secret]
# - Default region: us-west-000 (or your region)
# - Default output format: json
# Test connectivity
sudo -u postgres aws s3 ls --profile backups \
--endpoint-url https://s3.us-west-000.backblazeb2.com
```
### Create S3 Bucket
```bash
BUCKET_NAME="your-postgres-backups"
S3_ENDPOINT="https://s3.us-west-000.backblazeb2.com"
# Create bucket
sudo -u postgres aws s3 mb "s3://${BUCKET_NAME}" \
--profile backups \
--endpoint-url "${S3_ENDPOINT}"
# Enable versioning (protects against accidental deletes)
sudo -u postgres aws s3api put-bucket-versioning \
--bucket "${BUCKET_NAME}" \
--versioning-configuration Status=Enabled \
--profile backups \
--endpoint-url "${S3_ENDPOINT}"
# Set lifecycle policy (optional - auto-delete old archives)
cat > /tmp/lifecycle.json <<'EOF'
{
"Rules": [
{
"Id": "DeleteOldWAL",
"Status": "Enabled",
"Filter": {
"Prefix": "wal-archives/"
},
"Expiration": {
"Days": 30
}
}
]
}
EOF
sudo -u postgres aws s3api put-bucket-lifecycle-configuration \
--bucket "${BUCKET_NAME}" \
--lifecycle-configuration file:///tmp/lifecycle.json \
--profile backups \
--endpoint-url "${S3_ENDPOINT}"
```
## 📱 Step 3: Configure Notifications
### Option A: Telegram (Recommended)
```bash
# 1. Create bot: Talk to @BotFather on Telegram
# Send: /newbot
# Get your BOT_TOKEN
# 2. Get your CHAT_ID:
# - Start chat with your bot
# - Visit: https://api.telegram.org/bot<BOT_TOKEN>/getUpdates
# - Find "chat":{"id": YOUR_CHAT_ID}
# 3. Update scripts with your tokens
TELEGRAM_BOT_TOKEN="123456789:ABCdefGHIjklMNOpqrsTUVwxyz"
TELEGRAM_CHAT_ID="987654321"
```
### Option B: Email
```bash
# Configure for Gmail (example)
sudo apt install -y mailutils
# Edit /etc/mail.rc or ~/.mailrc
cat >> ~/.mailrc <<EOF
set smtp=smtp://smtp.gmail.com:587
set smtp-use-starttls
set smtp-auth=login
set [email protected]
set smtp-auth-password=your-app-password
set from="[email protected]"
EOF
# Test
echo "Test email" | mail -s "Test" [email protected]
```
## 🚀 Step 4: Install Scripts
```bash
# Create directories
sudo mkdir -p /usr/local/bin/pg_backup
sudo mkdir -p /var/lib/postgresql/monitor_state
sudo mkdir -p /var/log/postgresql
sudo chown -R postgres:postgres /var/lib/postgresql/monitor_state
sudo chown postgres:postgres /var/log/postgresql
# Download scripts (or copy from artifacts above)
sudo cp archive_wal.sh /usr/local/bin/pg_backup/
sudo cp restore_wal.sh /usr/local/bin/pg_backup/
sudo cp backup_monitor.sh /usr/local/bin/pg_backup/
# Make executable
sudo chmod +x /usr/local/bin/pg_backup/*.sh
sudo chown postgres:postgres /usr/local/bin/pg_backup/*.sh
# Update configuration in each script:
# - S3_ENDPOINT, S3_BUCKET, S3_REGION
# - GPG_RECIPIENT
# - TELEGRAM_BOT_TOKEN, TELEGRAM_CHAT_ID
# - AWS_PROFILE
```
### Edit Each Script Configuration Section
```bash
# Edit archive_wal.sh
sudo -u postgres nano /usr/local/bin/pg_backup/archive_wal.sh
# Update: S3_ENDPOINT, S3_BUCKET, GPG_RECIPIENT, TELEGRAM_BOT_TOKEN, etc.
# Edit restore_wal.sh
sudo -u postgres nano /usr/local/bin/pg_backup/restore_wal.sh
# Update: Same settings as archive script
# Edit backup_monitor.sh
sudo -u postgres nano /usr/local/bin/pg_backup/backup_monitor.sh
# Update: Same settings + monitoring thresholds
```
## ⚙️ Step 5: Configure PostgreSQL
### Edit postgresql.conf
```bash
sudo -u postgres nano /etc/postgresql/*/main/postgresql.conf
# (Path may vary: /var/lib/pgsql/data/postgresql.conf on RHEL)
```
Add/modify these settings:
```ini
# WAL Settings
wal_level = replica
archive_mode = on
archive_command = '/usr/local/bin/pg_backup/archive_wal.sh %p %f'
archive_timeout = 300 # Force WAL switch every 5 minutes
# For better performance during backups
max_wal_senders = 10
wal_keep_size = 1GB # PostgreSQL 13+
# wal_keep_segments = 64 # PostgreSQL 12 and earlier
# Checkpoint settings (tune based on workload)
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
```
### Restart PostgreSQL
```bash
# Ubuntu/Debian
sudo systemctl restart postgresql
# RHEL/CentOS
sudo systemctl restart postgresql-14 # adjust version
# Verify it's running
sudo systemctl status postgresql
# Check archive command is working
sudo -u postgres psql -c "SELECT * FROM pg_stat_archiver;"
```
## 📦 Step 6: Take Initial Base Backup
```bash
# Create base backup directory in S3
BACKUP_NAME="base-$(date +%Y%m%d-%H%M%S)"
S3_BASE_PATH="s3://your-postgres-backups/base-backups/$(hostname)"
# Take base backup
sudo -u postgres pg_basebackup \
-D /tmp/base_backup \
-Ft -z -Xs -P \
-l "${BACKUP_NAME}"
# Encrypt and upload
cd /tmp/base_backup
for file in *.tar.gz; do
gpg --encrypt --recipient [email protected] \
--output "${file}.gpg" "${file}"
aws s3 cp "${file}.gpg" "${S3_BASE_PATH}/${BACKUP_NAME}/${file}.gpg" \
--profile backups \
--endpoint-url "https://s3.us-west-000.backblazeb2.com"
done
# Cleanup
rm -rf /tmp/base_backup
# Verify upload
aws s3 ls "${S3_BASE_PATH}/${BACKUP_NAME}/" \
--profile backups \
--endpoint-url "https://s3.us-west-000.backblazeb2.com"
```
## 📊 Step 7: Set Up Monitoring
```bash
# Add to postgres user's crontab
sudo -u postgres crontab -e
# Add these lines:
# Monitor every 15 minutes
*/15 * * * * /usr/local/bin/pg_backup/backup_monitor.sh >> /var/log/postgresql/backup_monitor.log 2>&1
# Take weekly base backup (Sunday 2 AM)
0 2 * * 0 /usr/local/bin/pg_backup/base_backup.sh >> /var/log/postgresql/base_backup.log 2>&1
# Daily health report (8 AM)
0 8 * * * /usr/local/bin/pg_backup/backup_monitor.sh | mail -s "PostgreSQL Backup Health Report" [email protected]
```
## 🧪 Step 8: Test Your Setup
### Test Archive Command
```bash
# Force a WAL switch
sudo -u postgres psql -c "SELECT pg_switch_wal();"
# Check archiving status
sudo -u postgres psql -c "SELECT * FROM pg_stat_archiver;"
# Verify file in S3
aws s3 ls "s3://your-postgres-backups/wal-archives/$(hostname)/" \
--profile backups \
--endpoint-url "https://s3.us-west-000.backblazeb2.com" \
| tail -5
```
### Test Restore Command
```bash
# Get a recent WAL file name
RECENT_WAL=$(aws s3 ls "s3://your-postgres-backups/wal-archives/$(hostname)/" \
--profile backups \
--endpoint-url "https://s3.us-west-000.backblazeb2.com" \
| tail -1 | awk '{print $4}' | sed 's/.gpg//')
# Test restore
sudo -u postgres /usr/local/bin/pg_backup/restore_wal.sh \
"${RECENT_WAL}" \
"/tmp/test_restore_${RECENT_WAL}"
# Verify file was restored
ls -lh /tmp/test_restore_${RECENT_WAL}
rm /tmp/test_restore_${RECENT_WAL}
```
### Test Monitoring
```bash
# Run monitor manually
sudo -u postgres /usr/local/bin/pg_backup/backup_monitor.sh
# Check logs
tail -f /var/log/postgresql/backup_monitor.log
# Verify you received notification
```
## 🔥 Step 9: Disaster Recovery Procedure
### Full Recovery Scenario
```bash
# 1. Stop PostgreSQL if running
sudo systemctl stop postgresql
# 2. Backup existing data (if any)
sudo mv /var/lib/postgresql/14/main /var/lib/postgresql/14/main.old
# 3. Create new data directory
sudo mkdir -p /var/lib/postgresql/14/main
sudo chown postgres:postgres /var/lib/postgresql/14/main
sudo chmod 700 /var/lib/postgresql/14/main
# 4. Download and restore base backup
BACKUP_NAME="base-20240115-020000" # Your backup name
S3_BASE_PATH="s3://your-postgres-backups/base-backups/$(hostname)"
cd /tmp
aws s3 sync "${S3_BASE_PATH}/${BACKUP_NAME}/" . \
--profile backups \
--endpoint-url "https://s3.us-west-000.backblazeb2.com"
# 5. Decrypt base backup
for file in *.gpg; do
gpg --decrypt --output "${file%.gpg}" "${file}"
done
# 6. Extract to data directory
cd /var/lib/postgresql/14/main
tar xzf /tmp/base.tar.gz
tar xzf /tmp/pg_wal.tar.gz
# 7. Fix permissions
sudo chown -R postgres:postgres /var/lib/postgresql/14/main
# 8. Configure recovery
sudo -u postgres nano /var/lib/postgresql/14/main/postgresql.conf
# Add/update:
restore_command = '/usr/local/bin/pg_backup/restore_wal.sh %f %p'
recovery_target_time = '2024-01-15 14:30:00' # Optional: point-in-time
# Or: recovery_target = 'immediate' # For latest possible
# 9. Create recovery signal
sudo -u postgres touch /var/lib/postgresql/14/main/recovery.signal
# 10. Start PostgreSQL
sudo systemctl start postgresql
# 11. Monitor recovery
sudo -u postgres tail -f /var/log/postgresql/postgresql-14-main.log
# 12. Once recovery complete, verify database
sudo -u postgres psql -c "SELECT pg_is_in_recovery();"
# Should return: f (false) when recovery is complete
# 13. Remove recovery configuration
# PostgreSQL will automatically remove recovery.signal
# Clean up restore_command from postgresql.conf if needed
```
## 🛡️ Security Checklist
- ✅ GPG private key backed up offline
- ✅ S3 credentials secured (never commit to git)
- ✅ Archive directory permissions: 700 (postgres only)
- ✅ S3 bucket versioning enabled
- ✅ MFA enabled on S3 provider account
- ✅ Monitoring alerts configured
- ✅ Recovery procedure tested
- ✅ Base backups scheduled weekly
- ✅ Lifecycle policy set for old archives
## 📈 Performance Tuning
### For High-Volume Databases
```ini
# postgresql.conf adjustments
archive_timeout = 60 # Shorter timeout for busy systems
wal_compression = on # Reduce WAL size (PostgreSQL 9.5+)
checkpoint_timeout = 30min # Longer checkpoints
max_wal_size = 5GB # Allow more WAL before checkpoint
```
### For Low-Volume Databases
```ini
archive_timeout = 300 # 5 minutes
checkpoint_timeout = 10min
max_wal_size = 1GB
```
## 🔍 Troubleshooting
### Archive command failing?
```bash
# Check logs
tail -f /var/log/postgresql/wal_archive.log
# Test manually
sudo -u postgres /usr/local/bin/pg_backup/archive_wal.sh \
/var/lib/postgresql/14/main/pg_wal/000000010000000000000001 \
000000010000000000000001
# Check S3 permissions
aws s3 ls s3://your-postgres-backups/ \
--profile backups \
--endpoint-url "https://s3.us-west-000.backblazeb2.com"
```
### pg_wal directory filling up?
```bash
# Check archive status
sudo -u postgres psql -c "SELECT * FROM pg_stat_archiver;"
# If archive_command is failing, fix it first!
# Then PostgreSQL will archive and clean up automatically
```
## 📝 Maintenance Tasks
### Weekly
- Review backup monitoring dashboard
- Verify base backup completed
- Check S3 storage costs
### Monthly
- Test restore procedure (in staging)
- Review and prune old backups
- Update lifecycle policies if needed
### Quarterly
- Full disaster recovery drill
- Review and update documentation
- Rotate GPG keys if policy requires
## 🎓 Additional Resources
- [PostgreSQL PITR Documentation](https://www.postgresql.org/docs/current/continuous-archiving.html)
- [AWS S3 CLI Reference](https://docs.aws.amazon.com/cli/latest/reference/s3/)
- [GPG Documentation](https://gnupg.org/documentation/)
---
**Questions? Issues?** Open a ticket or contact your DBA team.
#!/bin/bash
#
# PostgreSQL Backup System - Core Library
# Shared functions and utilities for all backup scripts
#
# Source this file in all scripts: source /usr/local/bin/pg_backup/lib/common.sh
# ==================== CONFIGURATION LOADING ====================
# Default configuration paths
CONFIG_FILE="${PG_BACKUP_CONFIG:-/etc/pg_backup/backup.conf}"
ENV_FILE="${PG_BACKUP_ENV:-.env}"
load_config() {
local config_loaded=0
# Try loading from environment file first
if [ -f "${ENV_FILE}" ]; then
set -a
source "${ENV_FILE}"
set +a
config_loaded=1
[ "${DEBUG_MODE:-no}" = "yes" ] && echo "Loaded config from: ${ENV_FILE}"
fi
# Try loading from config file
if [ -f "${CONFIG_FILE}" ]; then
source "${CONFIG_FILE}"
config_loaded=1
[ "${DEBUG_MODE:-no}" = "yes" ] && echo "Loaded config from: ${CONFIG_FILE}"
fi
if [ ${config_loaded} -eq 0 ]; then
echo "ERROR: No configuration file found!"
echo "Looked for: ${ENV_FILE} and ${CONFIG_FILE}"
echo "Run: sudo pg-backup-configure to create configuration"
return 1
fi
# Auto-configure S3 endpoint based on provider
configure_s3_endpoint
# Validate required configuration
validate_config
return 0
}
configure_s3_endpoint() {
# Auto-configure S3 endpoint if not set
if [ -z "${S3_ENDPOINT}" ] || [ "${S3_ENDPOINT}" = "auto" ]; then
case "${S3_PROVIDER,,}" in
backblaze|b2)
S3_ENDPOINT="https://s3.${S3_REGION}.backblazeb2.com"
;;
tebi)
S3_ENDPOINT="https://s3.tebi.io"
;;
hetzner)
S3_ENDPOINT="https://${S3_REGION}.your-objectstorage.com"
;;
aws)
S3_ENDPOINT="" # AWS uses default
;;
custom)
if [ -z "${S3_ENDPOINT}" ]; then
echo "ERROR: S3_PROVIDER=custom requires S3_ENDPOINT to be set"
return 1
fi
;;
*)
echo "ERROR: Unknown S3_PROVIDER: ${S3_PROVIDER}"
echo "Supported: backblaze, tebi, hetzner, aws, custom"
return 1
;;
esac
fi
}
validate_config() {
local errors=0
# Required S3 configuration
if [ -z "${S3_BUCKET}" ]; then
echo "ERROR: S3_BUCKET is not configured"
errors=$((errors + 1))
fi
# Check AWS credentials method
if [ -z "${AWS_PROFILE}" ] && [ -z "${AWS_ACCESS_KEY_ID}" ]; then
echo "ERROR: Either AWS_PROFILE or AWS_ACCESS_KEY_ID must be set"
errors=$((errors + 1))
fi
# Check notification configuration
if [ "${ENABLE_NOTIFICATIONS}" = "yes" ]; then
case "${NOTIFICATION_METHOD}" in
telegram)
if [ -z "${TELEGRAM_BOT_TOKEN}" ] || [ -z "${TELEGRAM_CHAT_ID}" ]; then
echo "ERROR: Telegram credentials not configured"
errors=$((errors + 1))
fi
;;
email)
if [ -z "${EMAIL_TO}" ]; then
echo "ERROR: EMAIL_TO not configured"
errors=$((errors + 1))
fi
;;
both)
if [ -z "${TELEGRAM_BOT_TOKEN}" ] || [ -z "${TELEGRAM_CHAT_ID}" ] || [ -z "${EMAIL_TO}" ]; then
echo "ERROR: Notification credentials not fully configured"
errors=$((errors + 1))
fi
;;
esac
fi
if [ ${errors} -gt 0 ]; then
echo "Configuration validation failed with ${errors} error(s)"
return 1
fi
return 0
}
# ==================== LOGGING FUNCTIONS ====================
setup_logging() {
local log_file="${1}"
# Create log directory if needed
local log_dir=$(dirname "${log_file}")
mkdir -p "${log_dir}" 2>/dev/null || sudo mkdir -p "${log_dir}"
# Set permissions
if [ -w "${log_dir}" ]; then
touch "${log_file}"
chmod 640 "${log_file}"
fi
}
log() {
local level="${1:-INFO}"
shift
local message="$*"
local timestamp=$(date '+%Y-%m-%d %H:%M:%S')
local caller="${BASH_SOURCE[2]##*/}:${BASH_LINENO[1]}"
echo "[${timestamp}] [${level}] [${caller}] ${message}" | tee -a "${CURRENT_LOG_FILE:-/tmp/pg_backup.log}"
# Also log to syslog if available
if command -v logger >/dev/null 2>&1; then
logger -t pg_backup -p "user.${level,,}" "${message}"
fi
}
log_info() {
log "INFO" "$@"
}
log_warn() {
log "WARN" "$@"
}
log_error() {
log "ERROR" "$@"
}
log_debug() {
[ "${DEBUG_MODE:-no}" = "yes" ] && log "DEBUG" "$@"
}
rotate_log() {
local log_file="${1}"
local max_size_bytes=$((${MAX_LOG_SIZE_MB:-100} * 1024 * 1024))
if [ -f "${log_file}" ]; then
local size=$(stat -f%z "${log_file}" 2>/dev/null || stat -c%s "${log_file}" 2>/dev/null || echo "0")
if [ "${size}" -gt "${max_size_bytes}" ]; then
local keep_logs=${KEEP_OLD_LOGS:-3}
# Rotate old logs
for i in $(seq $((keep_logs - 1)) -1 1); do
[ -f "${log_file}.${i}" ] && mv "${log_file}.${i}" "${log_file}.$((i + 1))"
done
# Move current log
mv "${log_file}" "${log_file}.1"
gzip "${log_file}.1" 2>/dev/null || true
# Remove old logs
find "$(dirname ${log_file})" -name "$(basename ${log_file}).*" -type f | \
sort -rn | tail -n +$((keep_logs + 1)) | xargs rm -f 2>/dev/null || true
log_info "Log rotated: ${log_file}"
fi
fi
}
# ==================== NOTIFICATION FUNCTIONS ====================
send_notification() {
[ "${ENABLE_NOTIFICATIONS:-yes}" != "yes" ] && return 0
local severity="${1}" # INFO, WARNING, CRITICAL
local title="${2}"
local message="${3}"
case "${NOTIFICATION_METHOD:-none}" in
telegram)
send_telegram_notification "${severity}" "${title}" "${message}"
;;
email)
send_email_notification "${severity}" "${title}" "${message}"
;;
both)
send_telegram_notification "${severity}" "${title}" "${message}"
send_email_notification "${severity}" "${title}" "${message}"
;;
esac
}
send_telegram_notification() {
local severity="${1}"
local title="${2}"
local message="${3}"
[ -z "${TELEGRAM_BOT_TOKEN}" ] && return 1
[ -z "${TELEGRAM_CHAT_ID}" ] && return 1
local icon="ℹ️"
case "${severity}" in
WARNING) icon="⚠️" ;;
CRITICAL) icon="🚨" ;;
SUCCESS) icon="✅" ;;
esac
local text="${icon} <b>${title}</b>%0A%0A${message}%0A%0AHost: $(hostname)%0ATime: $(date '+%Y-%m-%d %H:%M:%S')"
curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
-d "chat_id=${TELEGRAM_CHAT_ID}" \
-d "text=${text}" \
-d "parse_mode=HTML" \
--max-time 10 \
>/dev/null 2>&1 || log_warn "Failed to send Telegram notification"
}
send_email_notification() {
local severity="${1}"
local title="${2}"
local message="${3}"
[ -z "${EMAIL_TO}" ] && return 1
local subject="[${severity}] ${title} - $(hostname)"
local body="$(echo -e "${message}")\n\nHost: $(hostname)\nTime: $(date '+%Y-%m-%d %H:%M:%S')"
if [ -n "${SMTP_USER}" ] && [ -n "${SMTP_PASSWORD}" ]; then
echo "${body}" | mail -s "${subject}" \
-a "From: ${EMAIL_FROM}" \
-S smtp="${SMTP_SERVER}" \
-S smtp-use-starttls \
-S smtp-auth=login \
-S smtp-auth-user="${SMTP_USER}" \
-S smtp-auth-password="${SMTP_PASSWORD}" \
"${EMAIL_TO}" >/dev/null 2>&1 || log_warn "Failed to send email notification"
else
echo "${body}" | mail -s "${subject}" \
-a "From: ${EMAIL_FROM}" \
"${EMAIL_TO}" >/dev/null 2>&1 || log_warn "Failed to send email notification"
fi
}
check_alert_cooldown() {
local cooldown_file="${STATE_DIR}/last_alert_$(echo ${1} | tr ' ' '_')"
local cooldown_seconds=${ALERT_COOLDOWN_SECONDS:-3600}
mkdir -p "${STATE_DIR}"
if [ -f "${cooldown_file}" ]; then
local last_alert=$(cat "${cooldown_file}")
local now=$(date +%s)
local elapsed=$((now - last_alert))
if [ ${elapsed} -lt ${cooldown_seconds} ]; then
log_debug "Alert cooldown active: $((cooldown_seconds - elapsed))s remaining"
return 1
fi
fi
date +%s > "${cooldown_file}"
return 0
}
# ==================== AWS/S3 FUNCTIONS ====================
get_aws_command() {
local cmd="aws"
# Add profile if configured
if [ -n "${AWS_PROFILE}" ]; then
cmd="${cmd} --profile ${AWS_PROFILE}"
fi
# Add endpoint if configured
if [ -n "${S3_ENDPOINT}" ]; then
cmd="${cmd} --endpoint-url ${S3_ENDPOINT}"
fi
# Add region if configured
if [ -n "${S3_REGION}" ]; then
cmd="${cmd} --region ${S3_REGION}"
fi
echo "${cmd}"
}
s3_upload() {
local source="${1}"
local s3_key="${2}"
local metadata="${3:-}"
local aws_cmd=$(get_aws_command)
local s3_path="s3://${S3_BUCKET}/${s3_key}"
local upload_cmd="${aws_cmd} s3 cp ${source} ${s3_path} --no-progress"
if [ -n "${metadata}" ]; then
upload_cmd="${upload_cmd} --metadata ${metadata}"
fi
log_debug "Uploading: ${source} -> ${s3_path}"
if ${upload_cmd} 2>&1 | tee -a "${CURRENT_LOG_FILE}"; then
log_info "Upload successful: ${s3_key}"
return 0
else
log_error "Upload failed: ${s3_key}"
return 1
fi
}
s3_download() {
local s3_key="${1}"
local destination="${2}"
local aws_cmd=$(get_aws_command)
local s3_path="s3://${S3_BUCKET}/${s3_key}"
log_debug "Downloading: ${s3_path} -> ${destination}"
if ${aws_cmd} s3 cp "${s3_path}" "${destination}" --no-progress 2>&1 | tee -a "${CURRENT_LOG_FILE}"; then
log_info "Download successful: ${s3_key}"
return 0
else
log_error "Download failed: ${s3_key}"
return 1
fi
}
s3_exists() {
local s3_key="${1}"
local aws_cmd=$(get_aws_command)
${aws_cmd} s3 ls "s3://${S3_BUCKET}/${s3_key}" >/dev/null 2>&1
}
s3_list() {
local prefix="${1}"
local aws_cmd=$(get_aws_command)
${aws_cmd} s3 ls "s3://${S3_BUCKET}/${prefix}" 2>/dev/null
}
# ==================== ENCRYPTION FUNCTIONS ====================
gpg_encrypt() {
local input="${1}"
local output="${2}"
[ "${GPG_ENCRYPT}" != "yes" ] && return 1
log_debug "Encrypting: ${input} -> ${output}"
gpg --encrypt \
--recipient "${GPG_RECIPIENT}" \
--trust-model always \
--compress-algo ZLIB \
--cipher-algo AES256 \
--batch \
--yes \
--output "${output}" \
"${input}" 2>&1 | tee -a "${CURRENT_LOG_FILE}"
}
gpg_decrypt() {
local input="${1}"
local output="${2}"
log_debug "Decrypting: ${input} -> ${output}"
gpg --decrypt \
--batch \
--yes \
--output "${output}" \
"${input}" 2>&1 | tee -a "${CURRENT_LOG_FILE}"
}
# ==================== CHECKSUM FUNCTIONS ====================
calculate_checksum() {
local file="${1}"
sha256sum "${file}" | awk '{print $1}'
}
verify_checksum() {
local file="${1}"
local expected="${2}"
local actual=$(calculate_checksum "${file}")
if [ "${expected}" = "${actual}" ]; then
log_debug "Checksum verified: ${file}"
return 0
else
log_error "Checksum mismatch: ${file}"
log_error "Expected: ${expected}"
log_error "Actual: ${actual}"
return 1
fi
}
# ==================== POSTGRESQL FUNCTIONS ====================
pg_exec() {
local query="${1}"
PGPASSWORD="${PGPASSWORD:-}" psql \
-h "${PGHOST}" \
-p "${PGPORT}" \
-U "${PGUSER}" \
-d "${PGDATABASE}" \
-t \
-c "${query}" 2>&1 | xargs
}
pg_is_running() {
pg_isready -h "${PGHOST}" -p "${PGPORT}" >/dev/null 2>&1
}
pg_get_wal_position() {
pg_exec "SELECT pg_current_wal_lsn();"
}
pg_switch_wal() {
pg_exec "SELECT pg_switch_wal();"
}
# ==================== UTILITY FUNCTIONS ====================
ensure_directory() {
local dir="${1}"
local owner="${2:-postgres:postgres}"
local perms="${3:-750}"
if [ ! -d "${dir}" ]; then
mkdir -p "${dir}" 2>/dev/null || sudo mkdir -p "${dir}"
sudo chown "${owner}" "${dir}"
sudo chmod "${perms}" "${dir}"
log_debug "Created directory: ${dir}"
fi
}
cleanup_temp() {
local temp_dir="${1}"
if [ -d "${temp_dir}" ]; then
log_debug "Cleaning up: ${temp_dir}"
rm -rf "${temp_dir}"
fi
}
format_bytes() {
local bytes="${1}"
numfmt --to=iec-i --suffix=B "${bytes}" 2>/dev/null || echo "${bytes} bytes"
}
get_hostname() {
hostname -s 2>/dev/null || hostname
}
retry_command() {
local max_attempts="${MAX_RETRY_ATTEMPTS:-3}"
local delay="${RETRY_DELAY_SECONDS:-60}"
local attempt=1
local cmd="$@"
while [ ${attempt} -le ${max_attempts} ]; do
log_debug "Attempt ${attempt}/${max_attempts}: ${cmd}"
if eval "${cmd}"; then
return 0
fi
if [ ${attempt} -lt ${max_attempts} ]; then
log_warn "Command failed, retrying in ${delay}s..."
sleep ${delay}
fi
attempt=$((attempt + 1))
done
log_error "Command failed after ${max_attempts} attempts"
return 1
}
# ==================== ERROR HANDLING ====================
error_exit() {
local message="${1}"
local exit_code="${2:-1}"
log_error "${message}"
send_notification "CRITICAL" "Backup Operation Failed" "${message}"
exit ${exit_code}
}
# ==================== INITIALIZATION ====================
# Auto-load configuration if sourced
if [ "${BASH_SOURCE[0]}" != "${0}" ]; then
load_config || echo "Warning: Failed to load configuration"
fi

🚨 PostgreSQL Disaster Recovery Runbook

Quick Reference Card

Print this page and keep it accessible!

Recovery Type Use When RTO RPO
Full Recovery Complete data loss 30-60 min ~5 min
Point-in-Time Accidental DELETE/DROP 30-60 min Exact moment
Partial Recovery Corrupted tables 20-40 min ~5 min

Emergency Contacts:


📋 Pre-Recovery Checklist

BEFORE you start recovery, verify:

  • Backup availability: Can you access S3 bucket?
  • GPG private key: Do you have the decryption key?
  • Disk space: 2x current database size available?
  • Network access: Can reach S3 endpoint?
  • Team notification: Alert stakeholders about downtime
  • Change freeze: Stop all deployments/changes
  • Documentation: Have backup name/timestamp?

Critical Files Checklist:

# Verify these exist before starting
[ -f /var/lib/postgresql/backup-private-key.asc ]  # GPG key
[ -f ~/.aws/credentials ]                           # S3 credentials
[ -x /usr/local/bin/pg_backup/restore_wal.sh ]     # Restore script

🎯 Scenario 1: Full Database Recovery (Complete Data Loss)

When to use: Server crashed, disk failed, ransomware, accidental DROP DATABASE

Step-by-Step Procedure

1. Initial Assessment (5 minutes)

# Document the incident
INCIDENT_ID="DR-$(date +%Y%m%d-%H%M%S)"
INCIDENT_LOG="/tmp/recovery_${INCIDENT_ID}.log"

echo "Incident: ${INCIDENT_ID}" | tee -a "${INCIDENT_LOG}"
echo "Started: $(date)" | tee -a "${INCIDENT_LOG}"
echo "Performed by: $(whoami)" | tee -a "${INCIDENT_LOG}"

# Check what's available
df -h | tee -a "${INCIDENT_LOG}"
psql --version | tee -a "${INCIDENT_LOG}"
aws --version | tee -a "${INCIDENT_LOG}"

2. Stop PostgreSQL (if running)

# Stop the service
sudo systemctl stop postgresql

# Verify it's stopped
sudo systemctl status postgresql

# Kill any remaining connections (if needed)
sudo killall -9 postgres || true

3. Backup Current State (if possible)

# If data directory still exists, back it up
CURRENT_DATA_DIR="/var/lib/postgresql/14/main"
BACKUP_OLD_DIR="/var/lib/postgresql/14/main.backup.${INCIDENT_ID}"

if [ -d "${CURRENT_DATA_DIR}" ]; then
    sudo mv "${CURRENT_DATA_DIR}" "${BACKUP_OLD_DIR}"
    echo "Old data backed up to: ${BACKUP_OLD_DIR}" | tee -a "${INCIDENT_LOG}"
fi

4. Find the Backup to Restore

# List available backups
aws s3 ls s3://your-postgres-backups/base-backups/$(hostname)/ \
    --profile backups \
    --endpoint-url https://s3.us-west-000.backblazeb2.com

# Choose the backup (usually the latest)
# Format: base-YYYYMMDD-HHMMSS
BACKUP_TO_RESTORE="base-20240115-020000"  # ← CHANGE THIS

echo "Restoring backup: ${BACKUP_TO_RESTORE}" | tee -a "${INCIDENT_LOG}"

5. Download Base Backup

# Create download directory
DOWNLOAD_DIR="/tmp/restore_${INCIDENT_ID}"
mkdir -p "${DOWNLOAD_DIR}"
cd "${DOWNLOAD_DIR}"

# Download backup files
echo "Downloading backup from S3..." | tee -a "${INCIDENT_LOG}"

aws s3 sync \
    "s3://your-postgres-backups/base-backups/$(hostname)/${BACKUP_TO_RESTORE}/" \
    "${DOWNLOAD_DIR}/" \
    --profile backups \
    --endpoint-url https://s3.us-west-000.backblazeb2.com \
    --region us-west-000 | tee -a "${INCIDENT_LOG}"

# Verify download
ls -lh "${DOWNLOAD_DIR}" | tee -a "${INCIDENT_LOG}"

6. Decrypt Backup Files

# Import GPG key if not already done
gpg --import /var/lib/postgresql/backup-private-key.asc

# Decrypt all files
echo "Decrypting backup files..." | tee -a "${INCIDENT_LOG}"

for file in *.gpg; do
    echo "Decrypting ${file}..." | tee -a "${INCIDENT_LOG}"
    
    gpg --decrypt \
        --batch \
        --yes \
        --output "${file%.gpg}" \
        "${file}" 2>&1 | tee -a "${INCIDENT_LOG}"
    
    # Verify decryption
    if [ -f "${file%.gpg}" ]; then
        echo "${file%.gpg} decrypted successfully" | tee -a "${INCIDENT_LOG}"
        rm "${file}"  # Remove encrypted file
    else
        echo "✗ Failed to decrypt ${file}" | tee -a "${INCIDENT_LOG}"
        exit 1
    fi
done

7. Verify Checksums

# Verify checksums if available
for file in *.sha256; do
    [ ! -f "${file}" ] && continue
    
    target="${file%.sha256}"
    expected=$(cat "${file}" | awk '{print $1}')
    actual=$(sha256sum "${target}" | awk '{print $1}')
    
    if [ "${expected}" = "${actual}" ]; then
        echo "✓ Checksum verified: ${target}" | tee -a "${INCIDENT_LOG}"
    else
        echo "✗ CHECKSUM MISMATCH: ${target}" | tee -a "${INCIDENT_LOG}"
        echo "Expected: ${expected}" | tee -a "${INCIDENT_LOG}"
        echo "Actual: ${actual}" | tee -a "${INCIDENT_LOG}"
        exit 1
    fi
done

8. Extract Base Backup

# Create new data directory
sudo mkdir -p "${CURRENT_DATA_DIR}"
sudo chown postgres:postgres "${CURRENT_DATA_DIR}"
sudo chmod 700 "${CURRENT_DATA_DIR}"

echo "Extracting base backup..." | tee -a "${INCIDENT_LOG}"

# Extract base backup
if [ -f "base.tar.gz" ]; then
    sudo tar xzf base.tar.gz -C "${CURRENT_DATA_DIR}"
elif [ -f "base.tar" ]; then
    sudo tar xf base.tar -C "${CURRENT_DATA_DIR}"
else
    echo "ERROR: No base backup file found!" | tee -a "${INCIDENT_LOG}"
    exit 1
fi

# Extract WAL backup if separate
if [ -f "pg_wal.tar.gz" ]; then
    sudo tar xzf pg_wal.tar.gz -C "${CURRENT_DATA_DIR}/pg_wal"
elif [ -f "pg_wal.tar" ]; then
    sudo tar xf pg_wal.tar -C "${CURRENT_DATA_DIR}/pg_wal"
fi

# Fix permissions
sudo chown -R postgres:postgres "${CURRENT_DATA_DIR}"

echo "✓ Base backup extracted" | tee -a "${INCIDENT_LOG}"

9. Configure Recovery

# Edit postgresql.conf
sudo -u postgres nano "${CURRENT_DATA_DIR}/postgresql.conf"

# Add/update these settings:
# restore_command = '/usr/local/bin/pg_backup/restore_wal.sh %f %p'
# recovery_target = 'immediate'  # Or set recovery_target_time

# Or do it via script:
cat << 'EOF' | sudo tee -a "${CURRENT_DATA_DIR}/postgresql.conf"

# Recovery Configuration (added during DR)
restore_command = '/usr/local/bin/pg_backup/restore_wal.sh %f %p'
recovery_target = 'immediate'
EOF

10. Create Recovery Signal

# Create recovery.signal file
sudo -u postgres touch "${CURRENT_DATA_DIR}/recovery.signal"

echo "✓ Recovery configured" | tee -a "${INCIDENT_LOG}"

11. Start PostgreSQL

echo "Starting PostgreSQL for recovery..." | tee -a "${INCIDENT_LOG}"

# Start PostgreSQL
sudo systemctl start postgresql

# Monitor startup
sleep 5
sudo systemctl status postgresql

12. Monitor Recovery Progress

# Watch recovery in real-time
echo "Monitoring recovery progress..." | tee -a "${INCIDENT_LOG}"
echo "Press Ctrl+C to stop monitoring (recovery continues)"

# Tail the log
sudo tail -f /var/log/postgresql/postgresql-14-main.log

# In another terminal, check recovery status
watch -n 5 'sudo -u postgres psql -c "SELECT pg_is_in_recovery();"'

Recovery progress indicators:

  • Look for: redo starts, restored log file, recovery completed
  • Typical speed: 50-100 WAL files/minute
  • Large databases: May take 30-60 minutes

13. Verify Recovery Completion

# Check if recovery is complete
sudo -u postgres psql -c "SELECT pg_is_in_recovery();"
# Should return: f (false)

# Verify database is accessible
sudo -u postgres psql -l

# Check for recovery completion in logs
sudo grep "redo done\|recovery complete" /var/log/postgresql/postgresql-14-main.log

echo "Recovery time: $(($(date +%s) - $(date -d "${START_TIME}" +%s)))s" | tee -a "${INCIDENT_LOG}"

14. Validate Data Integrity

# Connect to database
sudo -u postgres psql -d your_database

# Run validation queries
-- Check critical tables exist
SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename))
FROM pg_tables 
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;

-- Check latest data timestamp
SELECT MAX(created_at) FROM your_important_table;

-- Count records in critical tables
SELECT 'users' AS table_name, COUNT(*) FROM users
UNION ALL
SELECT 'orders', COUNT(*) FROM orders;

-- Check for corrupted indexes
SELECT schemaname, tablename, indexname 
FROM pg_indexes 
WHERE schemaname NOT IN ('pg_catalog', 'information_schema');

15. Re-enable Archiving

# Edit postgresql.conf and ensure archiving is enabled
sudo -u postgres nano "${CURRENT_DATA_DIR}/postgresql.conf"

# Verify these settings:
# archive_mode = on
# archive_command = '/usr/local/bin/pg_backup/archive_wal.sh %p %f'

# Reload configuration
sudo -u postgres psql -c "SELECT pg_reload_conf();"

# Verify archiving is working
sudo -u postgres psql -c "SELECT * FROM pg_stat_archiver;"

16. Post-Recovery Tasks

# 1. Run ANALYZE to update statistics
sudo -u postgres psql -d your_database -c "ANALYZE;"

# 2. Verify replication (if applicable)
sudo -u postgres psql -c "SELECT * FROM pg_stat_replication;"

# 3. Check for missing indexes
sudo -u postgres psql -d your_database -c "SELECT * FROM pg_stat_user_tables WHERE idx_scan = 0 AND seq_scan > 1000;"

# 4. Document recovery
cat >> "${INCIDENT_LOG}" <<EOF

=== Recovery Completed ===
End Time: $(date)
Duration: $(($(date +%s) - START_TIME)) seconds
Backup Used: ${BACKUP_TO_RESTORE}
Recovery Point: $(sudo -u postgres psql -t -c "SELECT pg_last_wal_replay_lsn();")
Status: SUCCESS
EOF

# 5. Send completion notification
echo "Recovery completed successfully" | \
    mail -s "DR Complete: ${INCIDENT_ID}" [email protected]

17. Resume Operations

# 1. Re-enable application access
sudo -u postgres nano /etc/postgresql/14/main/pg_hba.conf
# Remove any temporary restrictions

sudo -u postgres psql -c "SELECT pg_reload_conf();"

# 2. Notify stakeholders
echo "Database recovery complete. Services can resume." | \
    mail -s "Production Database Restored" [email protected]

# 3. Monitor closely for next 24 hours
# - Watch error logs
# - Monitor query performance
# - Check replication lag (if applicable)

# 4. Schedule post-incident review
# - What caused the incident?
# - Was recovery smooth?
# - What can be improved?

🎯 Scenario 2: Point-in-Time Recovery

When to use: Accidental data deletion, bad migration, want to recover to specific timestamp

Procedure

Follow Scenario 1 steps 1-10, but modify step 9:

# Instead of 'immediate', set a specific time
cat << EOF | sudo tee -a "${CURRENT_DATA_DIR}/postgresql.conf"

# Point-in-Time Recovery Configuration
restore_command = '/usr/local/bin/pg_backup/restore_wal.sh %f %p'
recovery_target_time = '2024-01-15 14:30:00'  # ← Set your target time
recovery_target_action = 'promote'
EOF

Choosing the recovery time:

  1. Identify when the problem occurred
  2. Go back 5-10 minutes before that
  3. Format: YYYY-MM-DD HH:MM:SS (use database timezone)

Then continue with steps 11-17.


🎯 Scenario 3: Parallel Recovery (Staging/Testing)

When to use: Want to test recovery without affecting production

# Use different data directory and port
TEST_DATA_DIR="/var/lib/postgresql/recovery_test"
TEST_PORT=5433

# Follow Scenario 1, but:
# - Use TEST_DATA_DIR instead of CURRENT_DATA_DIR
# - Add to postgresql.conf: port = 5433
# - Don't stop production PostgreSQL

# Start test instance
sudo -u postgres pg_ctl start \
    -D "${TEST_DATA_DIR}" \
    -l "${TEST_DATA_DIR}/postgresql.log" \
    -o "-p ${TEST_PORT}"

# Connect to test instance
psql -p 5433 -U postgres

🚨 Common Issues & Troubleshooting

Issue: "Cannot download from S3"

# Check credentials
aws s3 ls --profile backups \
    --endpoint-url https://s3.us-west-000.backblazeb2.com

# If fails, reconfigure
aws configure --profile backups

# Check network
curl -I https://s3.us-west-000.backblazeb2.com

Issue: "GPG decryption failed"

# Verify GPG key is imported
gpg --list-keys

# Re-import if needed
gpg --import /var/lib/postgresql/backup-private-key.asc

# Check passphrase if key is protected
gpg --edit-key [email protected]

Issue: "Recovery hangs / no progress"

# Check restore_command is working
sudo -u postgres /usr/local/bin/pg_backup/restore_wal.sh \
    000000010000000000000001 /tmp/test_wal

# Check S3 for WAL files
aws s3 ls s3://your-postgres-backups/wal-archives/$(hostname)/ \
    --profile backups \
    --endpoint-url https://s3.us-west-000.backblazeb2.com

# Check PostgreSQL logs
tail -100 /var/log/postgresql/postgresql-14-main.log

Issue: "Insufficient disk space"

# Check space
df -h

# Clear space if needed
sudo apt clean
sudo journalctl --vacuum-time=7d
docker system prune -a  # If using Docker

# Use external disk
sudo mkdir /mnt/recovery
sudo mount /dev/sdb1 /mnt/recovery
# Use /mnt/recovery as TEST_DATA_DIR

Issue: "Corrupted backup files"

# Check checksums
sha256sum *.tar.gz

# Re-download specific files
aws s3 cp s3://your-postgres-backups/.../base.tar.gz.gpg . \
    --profile backups \
    --endpoint-url https://s3.us-west-000.backblazeb2.com

# Try previous backup if current is corrupted

📊 Recovery Time Estimates

Database Size Download Time Extract Time Recovery Time Total RTO
10 GB 5 min 2 min 5 min ~15 min
50 GB 15 min 5 min 10 min ~30 min
100 GB 25 min 10 min 15 min ~50 min
500 GB 90 min 30 min 45 min ~2.5 hrs
1 TB+ 3+ hrs 1 hr 1.5 hrs ~5+ hrs

Actual times depend on: network speed, disk I/O, WAL volume, server specs


✅ Post-Recovery Verification Checklist

  • Database is accessible: psql -l
  • All tables present and correct row counts
  • Indexes are valid: Check pg_indexes
  • Latest transactions recovered: Check timestamps
  • No corruption: SELECT pg_database.datname, pg_database_size(datname) FROM pg_database;
  • Replication working (if applicable)
  • Monitoring re-enabled
  • Backups resuming: Check pg_stat_archiver
  • Application connectivity tested
  • Performance normal: Check query times
  • Stakeholders notified
  • Incident documented

📞 Escalation Path

If recovery fails or you're uncertain:

  1. Don't panic - Backups are still there
  2. Document everything - Screenshots, logs, commands
  3. Stop and ask - Contact senior DBA
  4. Preserve evidence - Don't delete temp files
  5. Try staging first - Use parallel recovery to test

Emergency Contacts:

  • Senior DBA: [phone/email]
  • Cloud Team: [phone/email]
  • Vendor Support: [relevant support lines]

🎓 Prevention: Lessons Learned

After every recovery:

  1. Document what caused the need for recovery
  2. Update runbook with any new findings
  3. Test recovery procedure quarterly
  4. Review and improve monitoring
  5. Consider additional safeguards

Regular Drills:

  • Monthly: Verify backup availability
  • Quarterly: Full recovery test in staging
  • Annually: Disaster recovery simulation

📚 Quick Reference Commands

# Check if in recovery
psql -c "SELECT pg_is_in_recovery();"

# Current WAL position
psql -c "SELECT pg_current_wal_lsn();"

# Last replayed WAL
psql -c "SELECT pg_last_wal_replay_lsn();"

# Archive status
psql -c "SELECT * FROM pg_stat_archiver;"

# Database size
psql -c "SELECT pg_size_pretty(pg_database_size('your_db'));"

# Force WAL switch (for testing)
psql -c "SELECT pg_switch_wal();"

Document Version: 1.0
Last Updated: $(date)
Next Review: [3 months from now]

# PostgreSQL Continuous Archiving Setup Guide
## 🎯 Overview
This is a production-ready PostgreSQL backup system with:
- ✅ **Encrypted WAL archiving** to S3-compatible storage
- ✅ **Automated monitoring** with Telegram/Email alerts
- ✅ **Point-in-time recovery** capability
- ✅ **Integrity verification** with SHA256 checksums
- ✅ **Disaster recovery** procedures
## 📋 Requirements & Recommendations
### Storage Strategy
**❌ Don't rely on local disk** for WAL archives if space is limited. Here's why:
- WAL files accumulate quickly (busy DB = GB/day)
- Local disk fills → PostgreSQL panics
- No off-site protection
**✅ Recommended approach:**
- Stream WAL directly to S3 (primary)
- Optional local cache for fast restores (if space permits)
- Separate base backups to S3
### Prerequisites
```bash
# Install required packages
sudo apt update
sudo apt install -y postgresql-client awscli gnupg2 curl mailutils
# Or for RHEL/CentOS
sudo yum install -y postgresql awscli gnupg2 curl mailx
```
## 🔐 Step 1: Set Up Encryption
### Generate GPG Key (one-time setup)
```bash
# Generate key for postgres user
sudo -u postgres gpg --full-generate-key
# Choose:
# - RSA and RSA (default)
# - 4096 bits
# - Key does not expire (or set expiry)
# - Real name: "PostgreSQL Backup"
# - Email: [email protected]
# Export public key for backup
sudo -u postgres gpg --export -a "[email protected]" > /var/lib/postgresql/backup-public-key.asc
# CRITICAL: Backup private key securely (offline/vault)
sudo -u postgres gpg --export-secret-keys -a "[email protected]" > /tmp/backup-private-key.asc
# Move this to secure offline storage immediately!
# You'll need it for disaster recovery
```
**🔥 CRITICAL: Store private key offline/securely!**
Without it, you cannot decrypt backups!
## 🗄️ Step 2: Configure S3 Storage
### Choose Provider (pick one):
**Backblaze B2:**
- Endpoint: `https://s3.us-west-000.backblazeb2.com`
- Cheapest: $6/TB/month storage
- Great for backups
**Tebi:**
- Endpoint: `https://s3.tebi.io`
- No egress fees
- Good for frequent restores
**Hetzner Storage Box:**
- Endpoint: `https://fsn1.your-objectstorage.com`
- EU-based, GDPR compliant
### Configure AWS CLI
```bash
# Configure for postgres user
sudo -u postgres aws configure --profile backups
# Enter:
# - AWS Access Key ID: [your-key-id]
# - AWS Secret Access Key: [your-secret]
# - Default region: us-west-000 (or your region)
# - Default output format: json
# Test connectivity
sudo -u postgres aws s3 ls --profile backups \
--endpoint-url https://s3.us-west-000.backblazeb2.com
```
### Create S3 Bucket
```bash
BUCKET_NAME="your-postgres-backups"
S3_ENDPOINT="https://s3.us-west-000.backblazeb2.com"
# Create bucket
sudo -u postgres aws s3 mb "s3://${BUCKET_NAME}" \
--profile backups \
--endpoint-url "${S3_ENDPOINT}"
# Enable versioning (protects against accidental deletes)
sudo -u postgres aws s3api put-bucket-versioning \
--bucket "${BUCKET_NAME}" \
--versioning-configuration Status=Enabled \
--profile backups \
--endpoint-url "${S3_ENDPOINT}"
# Set lifecycle policy (optional - auto-delete old archives)
cat > /tmp/lifecycle.json <<'EOF'
{
"Rules": [
{
"Id": "DeleteOldWAL",
"Status": "Enabled",
"Filter": {
"Prefix": "wal-archives/"
},
"Expiration": {
"Days": 30
}
}
]
}
EOF
sudo -u postgres aws s3api put-bucket-lifecycle-configuration \
--bucket "${BUCKET_NAME}" \
--lifecycle-configuration file:///tmp/lifecycle.json \
--profile backups \
--endpoint-url "${S3_ENDPOINT}"
```
## 📱 Step 3: Configure Notifications
### Option A: Telegram (Recommended)
```bash
# 1. Create bot: Talk to @BotFather on Telegram
# Send: /newbot
# Get your BOT_TOKEN
# 2. Get your CHAT_ID:
# - Start chat with your bot
# - Visit: https://api.telegram.org/bot<BOT_TOKEN>/getUpdates
# - Find "chat":{"id": YOUR_CHAT_ID}
# 3. Update scripts with your tokens
TELEGRAM_BOT_TOKEN="123456789:ABCdefGHIjklMNOpqrsTUVwxyz"
TELEGRAM_CHAT_ID="987654321"
```
### Option B: Email
```bash
# Configure for Gmail (example)
sudo apt install -y mailutils
# Edit /etc/mail.rc or ~/.mailrc
cat >> ~/.mailrc <<EOF
set smtp=smtp://smtp.gmail.com:587
set smtp-use-starttls
set smtp-auth=login
set [email protected]
set smtp-auth-password=your-app-password
set from="[email protected]"
EOF
# Test
echo "Test email" | mail -s "Test" [email protected]
```
## 🚀 Step 4: Install Scripts
```bash
# Create directories
sudo mkdir -p /usr/local/bin/pg_backup
sudo mkdir -p /var/lib/postgresql/monitor_state
sudo mkdir -p /var/log/postgresql
sudo chown -R postgres:postgres /var/lib/postgresql/monitor_state
sudo chown postgres:postgres /var/log/postgresql
# Download scripts (or copy from artifacts above)
sudo cp archive_wal.sh /usr/local/bin/pg_backup/
sudo cp restore_wal.sh /usr/local/bin/pg_backup/
sudo cp backup_monitor.sh /usr/local/bin/pg_backup/
# Make executable
sudo chmod +x /usr/local/bin/pg_backup/*.sh
sudo chown postgres:postgres /usr/local/bin/pg_backup/*.sh
# Update configuration in each script:
# - S3_ENDPOINT, S3_BUCKET, S3_REGION
# - GPG_RECIPIENT
# - TELEGRAM_BOT_TOKEN, TELEGRAM_CHAT_ID
# - AWS_PROFILE
```
### Edit Each Script Configuration Section
```bash
# Edit archive_wal.sh
sudo -u postgres nano /usr/local/bin/pg_backup/archive_wal.sh
# Update: S3_ENDPOINT, S3_BUCKET, GPG_RECIPIENT, TELEGRAM_BOT_TOKEN, etc.
# Edit restore_wal.sh
sudo -u postgres nano /usr/local/bin/pg_backup/restore_wal.sh
# Update: Same settings as archive script
# Edit backup_monitor.sh
sudo -u postgres nano /usr/local/bin/pg_backup/backup_monitor.sh
# Update: Same settings + monitoring thresholds
```
## ⚙️ Step 5: Configure PostgreSQL
### Edit postgresql.conf
```bash
sudo -u postgres nano /etc/postgresql/*/main/postgresql.conf
# (Path may vary: /var/lib/pgsql/data/postgresql.conf on RHEL)
```
Add/modify these settings:
```ini
# WAL Settings
wal_level = replica
archive_mode = on
archive_command = '/usr/local/bin/pg_backup/archive_wal.sh %p %f'
archive_timeout = 300 # Force WAL switch every 5 minutes
# For better performance during backups
max_wal_senders = 10
wal_keep_size = 1GB # PostgreSQL 13+
# wal_keep_segments = 64 # PostgreSQL 12 and earlier
# Checkpoint settings (tune based on workload)
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
```
### Restart PostgreSQL
```bash
# Ubuntu/Debian
sudo systemctl restart postgresql
# RHEL/CentOS
sudo systemctl restart postgresql-14 # adjust version
# Verify it's running
sudo systemctl status postgresql
# Check archive command is working
sudo -u postgres psql -c "SELECT * FROM pg_stat_archiver;"
```
## 📦 Step 6: Take Initial Base Backup
```bash
# Create base backup directory in S3
BACKUP_NAME="base-$(date +%Y%m%d-%H%M%S)"
S3_BASE_PATH="s3://your-postgres-backups/base-backups/$(hostname)"
# Take base backup
sudo -u postgres pg_basebackup \
-D /tmp/base_backup \
-Ft -z -Xs -P \
-l "${BACKUP_NAME}"
# Encrypt and upload
cd /tmp/base_backup
for file in *.tar.gz; do
gpg --encrypt --recipient [email protected] \
--output "${file}.gpg" "${file}"
aws s3 cp "${file}.gpg" "${S3_BASE_PATH}/${BACKUP_NAME}/${file}.gpg" \
--profile backups \
--endpoint-url "https://s3.us-west-000.backblazeb2.com"
done
# Cleanup
rm -rf /tmp/base_backup
# Verify upload
aws s3 ls "${S3_BASE_PATH}/${BACKUP_NAME}/" \
--profile backups \
--endpoint-url "https://s3.us-west-000.backblazeb2.com"
```
## 📊 Step 7: Set Up Monitoring
```bash
# Add to postgres user's crontab
sudo -u postgres crontab -e
# Add these lines:
# Monitor every 15 minutes
*/15 * * * * /usr/local/bin/pg_backup/backup_monitor.sh >> /var/log/postgresql/backup_monitor.log 2>&1
# Take weekly base backup (Sunday 2 AM)
0 2 * * 0 /usr/local/bin/pg_backup/base_backup.sh >> /var/log/postgresql/base_backup.log 2>&1
# Daily health report (8 AM)
0 8 * * * /usr/local/bin/pg_backup/backup_monitor.sh | mail -s "PostgreSQL Backup Health Report" [email protected]
```
## 🧪 Step 8: Test Your Setup
### Test Archive Command
```bash
# Force a WAL switch
sudo -u postgres psql -c "SELECT pg_switch_wal();"
# Check archiving status
sudo -u postgres psql -c "SELECT * FROM pg_stat_archiver;"
# Verify file in S3
aws s3 ls "s3://your-postgres-backups/wal-archives/$(hostname)/" \
--profile backups \
--endpoint-url "https://s3.us-west-000.backblazeb2.com" \
| tail -5
```
### Test Restore Command
```bash
# Get a recent WAL file name
RECENT_WAL=$(aws s3 ls "s3://your-postgres-backups/wal-archives/$(hostname)/" \
--profile backups \
--endpoint-url "https://s3.us-west-000.backblazeb2.com" \
| tail -1 | awk '{print $4}' | sed 's/.gpg//')
# Test restore
sudo -u postgres /usr/local/bin/pg_backup/restore_wal.sh \
"${RECENT_WAL}" \
"/tmp/test_restore_${RECENT_WAL}"
# Verify file was restored
ls -lh /tmp/test_restore_${RECENT_WAL}
rm /tmp/test_restore_${RECENT_WAL}
```
### Test Monitoring
```bash
# Run monitor manually
sudo -u postgres /usr/local/bin/pg_backup/backup_monitor.sh
# Check logs
tail -f /var/log/postgresql/backup_monitor.log
# Verify you received notification
```
## 🔥 Step 9: Disaster Recovery Procedure
### Full Recovery Scenario
```bash
# 1. Stop PostgreSQL if running
sudo systemctl stop postgresql
# 2. Backup existing data (if any)
sudo mv /var/lib/postgresql/14/main /var/lib/postgresql/14/main.old
# 3. Create new data directory
sudo mkdir -p /var/lib/postgresql/14/main
sudo chown postgres:postgres /var/lib/postgresql/14/main
sudo chmod 700 /var/lib/postgresql/14/main
# 4. Download and restore base backup
BACKUP_NAME="base-20240115-020000" # Your backup name
S3_BASE_PATH="s3://your-postgres-backups/base-backups/$(hostname)"
cd /tmp
aws s3 sync "${S3_BASE_PATH}/${BACKUP_NAME}/" . \
--profile backups \
--endpoint-url "https://s3.us-west-000.backblazeb2.com"
# 5. Decrypt base backup
for file in *.gpg; do
gpg --decrypt --output "${file%.gpg}" "${file}"
done
# 6. Extract to data directory
cd /var/lib/postgresql/14/main
tar xzf /tmp/base.tar.gz
tar xzf /tmp/pg_wal.tar.gz
# 7. Fix permissions
sudo chown -R postgres:postgres /var/lib/postgresql/14/main
# 8. Configure recovery
sudo -u postgres nano /var/lib/postgresql/14/main/postgresql.conf
# Add/update:
restore_command = '/usr/local/bin/pg_backup/restore_wal.sh %f %p'
recovery_target_time = '2024-01-15 14:30:00' # Optional: point-in-time
# Or: recovery_target = 'immediate' # For latest possible
# 9. Create recovery signal
sudo -u postgres touch /var/lib/postgresql/14/main/recovery.signal
# 10. Start PostgreSQL
sudo systemctl start postgresql
# 11. Monitor recovery
sudo -u postgres tail -f /var/log/postgresql/postgresql-14-main.log
# 12. Once recovery complete, verify database
sudo -u postgres psql -c "SELECT pg_is_in_recovery();"
# Should return: f (false) when recovery is complete
# 13. Remove recovery configuration
# PostgreSQL will automatically remove recovery.signal
# Clean up restore_command from postgresql.conf if needed
```
## 🛡️ Security Checklist
- ✅ GPG private key backed up offline
- ✅ S3 credentials secured (never commit to git)
- ✅ Archive directory permissions: 700 (postgres only)
- ✅ S3 bucket versioning enabled
- ✅ MFA enabled on S3 provider account
- ✅ Monitoring alerts configured
- ✅ Recovery procedure tested
- ✅ Base backups scheduled weekly
- ✅ Lifecycle policy set for old archives
## 📈 Performance Tuning
### For High-Volume Databases
```ini
# postgresql.conf adjustments
archive_timeout = 60 # Shorter timeout for busy systems
wal_compression = on # Reduce WAL size (PostgreSQL 9.5+)
checkpoint_timeout = 30min # Longer checkpoints
max_wal_size = 5GB # Allow more WAL before checkpoint
```
### For Low-Volume Databases
```ini
archive_timeout = 300 # 5 minutes
checkpoint_timeout = 10min
max_wal_size = 1GB
```
## 🔍 Troubleshooting
### Archive command failing?
```bash
# Check logs
tail -f /var/log/postgresql/wal_archive.log
# Test manually
sudo -u postgres /usr/local/bin/pg_backup/archive_wal.sh \
/var/lib/postgresql/14/main/pg_wal/000000010000000000000001 \
000000010000000000000001
# Check S3 permissions
aws s3 ls s3://your-postgres-backups/ \
--profile backups \
--endpoint-url "https://s3.us-west-000.backblazeb2.com"
```
### pg_wal directory filling up?
```bash
# Check archive status
sudo -u postgres psql -c "SELECT * FROM pg_stat_archiver;"
# If archive_command is failing, fix it first!
# Then PostgreSQL will archive and clean up automatically
```
## 📝 Maintenance Tasks
### Weekly
- Review backup monitoring dashboard
- Verify base backup completed
- Check S3 storage costs
### Monthly
- Test restore procedure (in staging)
- Review and prune old backups
- Update lifecycle policies if needed
### Quarterly
- Full disaster recovery drill
- Review and update documentation
- Rotate GPG keys if policy requires
## 🎓 Additional Resources
- [PostgreSQL PITR Documentation](https://www.postgresql.org/docs/current/continuous-archiving.html)
- [AWS S3 CLI Reference](https://docs.aws.amazon.com/cli/latest/reference/s3/)
- [GPG Documentation](https://gnupg.org/documentation/)
---
**Questions? Issues?** Open a ticket or contact your DBA team.
#!/bin/bash
#
# PostgreSQL Backup System - Interactive Installer
# This script automates the setup of the complete backup system
#
# Usage: sudo ./install_backup_system.sh
set -euo pipefail
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
# Check if running as root
if [ "$EUID" -ne 0 ]; then
echo -e "${RED}Please run as root or with sudo${NC}"
exit 1
fi
echo -e "${BLUE}"
cat << "EOF"
╔═══════════════════════════════════════════════════════════╗
║ PostgreSQL Backup System Installer v1.0 ║
║ Complete setup for WAL archiving, monitoring & DR ║
╚═══════════════════════════════════════════════════════════╝
EOF
echo -e "${NC}"
# ==================== INTERACTIVE CONFIGURATION ====================
prompt_input() {
local prompt="$1"
local default="$2"
local var_name="$3"
if [ -n "${default}" ]; then
read -p "$(echo -e ${GREEN}${prompt}${NC}) [${default}]: " input
eval "${var_name}=\"${input:-$default}\""
else
read -p "$(echo -e ${GREEN}${prompt}${NC}): " input
eval "${var_name}=\"${input}\""
fi
}
prompt_password() {
local prompt="$1"
local var_name="$2"
read -sp "$(echo -e ${GREEN}${prompt}${NC}): " password
echo
eval "${var_name}=\"${password}\""
}
echo -e "${YELLOW}=== S3 Configuration ===${NC}"
echo "Choose your S3-compatible storage provider:"
echo "1) Backblaze B2"
echo "2) Tebi"
echo "3) Hetzner Object Storage"
echo "4) Custom S3 endpoint"
read -p "Enter choice [1-4]: " s3_choice
case ${s3_choice} in
1)
S3_ENDPOINT="https://s3.us-west-000.backblazeb2.com"
S3_REGION="us-west-000"
;;
2)
S3_ENDPOINT="https://s3.tebi.io"
S3_REGION="us-east-1"
;;
3)
S3_ENDPOINT="https://fsn1.your-objectstorage.com"
S3_REGION="us-east-1"
;;
4)
prompt_input "Enter S3 endpoint URL" "" "S3_ENDPOINT"
prompt_input "Enter S3 region" "us-east-1" "S3_REGION"
;;
esac
prompt_input "S3 bucket name" "postgres-backups" "S3_BUCKET"
prompt_input "AWS Access Key ID" "" "AWS_ACCESS_KEY"
prompt_password "AWS Secret Access Key" "AWS_SECRET_KEY"
echo ""
echo -e "${YELLOW}=== Notification Configuration ===${NC}"
echo "Choose notification method:"
echo "1) Telegram"
echo "2) Email"
echo "3) Both"
echo "4) None (not recommended)"
read -p "Enter choice [1-4]: " notif_choice
case ${notif_choice} in
1)
NOTIFICATION_METHOD="telegram"
prompt_input "Telegram Bot Token" "" "TELEGRAM_BOT_TOKEN"
prompt_input "Telegram Chat ID" "" "TELEGRAM_CHAT_ID"
;;
2)
NOTIFICATION_METHOD="email"
prompt_input "Email recipient" "[email protected]" "EMAIL_TO"
prompt_input "SMTP server" "smtp.gmail.com:587" "SMTP_SERVER"
;;
3)
NOTIFICATION_METHOD="both"
prompt_input "Telegram Bot Token" "" "TELEGRAM_BOT_TOKEN"
prompt_input "Telegram Chat ID" "" "TELEGRAM_CHAT_ID"
prompt_input "Email recipient" "[email protected]" "EMAIL_TO"
prompt_input "SMTP server" "smtp.gmail.com:587" "SMTP_SERVER"
;;
4)
NOTIFICATION_METHOD="none"
ENABLE_NOTIFICATIONS="no"
;;
esac
echo ""
echo -e "${YELLOW}=== PostgreSQL Configuration ===${NC}"
prompt_input "PostgreSQL data directory" "/var/lib/postgresql/14/main" "PG_DATA_DIR"
prompt_input "PostgreSQL version" "14" "PG_VERSION"
prompt_input "PostgreSQL port" "5432" "PG_PORT"
echo ""
echo -e "${YELLOW}=== Backup Configuration ===${NC}"
prompt_input "Enable local WAL cache" "no" "KEEP_LOCAL_COPY"
if [ "${KEEP_LOCAL_COPY}" = "yes" ]; then
prompt_input "Local cache retention (days)" "7" "LOCAL_RETENTION_DAYS"
fi
prompt_input "Base backup retention (days)" "30" "BASE_BACKUP_RETENTION"
# ==================== INSTALLATION ====================
echo ""
echo -e "${BLUE}=== Starting Installation ===${NC}"
# 1. Install dependencies
echo -e "${GREEN}[1/12]${NC} Installing dependencies..."
if command -v apt-get >/dev/null 2>&1; then
apt-get update -qq
apt-get install -y -qq postgresql-client awscli gnupg2 curl mailutils jq > /dev/null 2>&1
elif command -v yum >/dev/null 2>&1; then
yum install -y -q postgresql awscli gnupg2 curl mailx jq > /dev/null 2>&1
else
echo -e "${RED}Unsupported package manager${NC}"
exit 1
fi
echo -e " ${GREEN}✓${NC} Dependencies installed"
# 2. Create directories
echo -e "${GREEN}[2/12]${NC} Creating directories..."
mkdir -p /usr/local/bin/pg_backup
mkdir -p /var/lib/postgresql/monitor_state
mkdir -p /var/lib/postgresql/wal_archive
mkdir -p /var/log/postgresql
chown -R postgres:postgres /var/lib/postgresql/monitor_state
chown -R postgres:postgres /var/lib/postgresql/wal_archive
chown postgres:postgres /var/log/postgresql
echo -e " ${GREEN}✓${NC} Directories created"
# 3. Generate GPG key
echo -e "${GREEN}[3/12]${NC} Generating GPG encryption key..."
if ! sudo -u postgres gpg --list-keys "backup@localhost" >/dev/null 2>&1; then
sudo -u postgres gpg --batch --gen-key <<EOF
Key-Type: RSA
Key-Length: 4096
Subkey-Type: RSA
Subkey-Length: 4096
Name-Real: PostgreSQL Backup
Name-Email: backup@localhost
Expire-Date: 0
%no-protection
%commit
EOF
echo -e " ${GREEN}✓${NC} GPG key generated"
else
echo -e " ${YELLOW}!${NC} GPG key already exists"
fi
# Export keys for backup
sudo -u postgres gpg --export -a "backup@localhost" > /var/lib/postgresql/backup-public-key.asc
sudo -u postgres gpg --export-secret-keys -a "backup@localhost" > /tmp/backup-private-key.asc
chown postgres:postgres /tmp/backup-private-key.asc
chmod 600 /tmp/backup-private-key.asc
echo -e " ${YELLOW}⚠${NC} IMPORTANT: Private key saved to /tmp/backup-private-key.asc"
echo -e " Copy this to secure offline storage NOW!"
# 4. Configure AWS CLI
echo -e "${GREEN}[4/12]${NC} Configuring AWS CLI..."
sudo -u postgres mkdir -p /var/lib/postgresql/.aws
cat > /var/lib/postgresql/.aws/credentials <<EOF
[backups]
aws_access_key_id = ${AWS_ACCESS_KEY}
aws_secret_access_key = ${AWS_SECRET_KEY}
EOF
cat > /var/lib/postgresql/.aws/config <<EOF
[profile backups]
region = ${S3_REGION}
output = json
EOF
chown -R postgres:postgres /var/lib/postgresql/.aws
chmod 600 /var/lib/postgresql/.aws/credentials
echo -e " ${GREEN}✓${NC} AWS CLI configured"
# 5. Test S3 connection
echo -e "${GREEN}[5/12]${NC} Testing S3 connectivity..."
if sudo -u postgres aws s3 ls --profile backups --endpoint-url "${S3_ENDPOINT}" >/dev/null 2>&1; then
echo -e " ${GREEN}✓${NC} S3 connection successful"
else
echo -e " ${RED}✗${NC} S3 connection failed - check credentials"
exit 1
fi
# 6. Create S3 bucket if it doesn't exist
echo -e "${GREEN}[6/12]${NC} Setting up S3 bucket..."
if ! sudo -u postgres aws s3 ls "s3://${S3_BUCKET}" --profile backups --endpoint-url "${S3_ENDPOINT}" >/dev/null 2>&1; then
sudo -u postgres aws s3 mb "s3://${S3_BUCKET}" --profile backups --endpoint-url "${S3_ENDPOINT}"
echo -e " ${GREEN}✓${NC} S3 bucket created"
else
echo -e " ${YELLOW}!${NC} S3 bucket already exists"
fi
# 7. Create archive script
echo -e "${GREEN}[7/12]${NC} Creating archive script..."
cat > /usr/local/bin/pg_backup/archive_wal.sh <<'ARCHIVE_SCRIPT'
#!/bin/bash
# This is a placeholder - replace with actual script content
set -euo pipefail
WAL_PATH="${1}"
WAL_FILE="${2}"
# Script content would go here
exit 0
ARCHIVE_SCRIPT
# Update with actual configuration
sed -i "s|S3_ENDPOINT=.*|S3_ENDPOINT=\"${S3_ENDPOINT}\"|" /usr/local/bin/pg_backup/archive_wal.sh
sed -i "s|S3_BUCKET=.*|S3_BUCKET=\"${S3_BUCKET}\"|" /usr/local/bin/pg_backup/archive_wal.sh
sed -i "s|S3_REGION=.*|S3_REGION=\"${S3_REGION}\"|" /usr/local/bin/pg_backup/archive_wal.sh
sed -i "s|NOTIFICATION_METHOD=.*|NOTIFICATION_METHOD=\"${NOTIFICATION_METHOD}\"|" /usr/local/bin/pg_backup/archive_wal.sh
if [ "${NOTIFICATION_METHOD}" != "none" ]; then
sed -i "s|TELEGRAM_BOT_TOKEN=.*|TELEGRAM_BOT_TOKEN=\"${TELEGRAM_BOT_TOKEN:-}\"|" /usr/local/bin/pg_backup/archive_wal.sh
sed -i "s|TELEGRAM_CHAT_ID=.*|TELEGRAM_CHAT_ID=\"${TELEGRAM_CHAT_ID:-}\"|" /usr/local/bin/pg_backup/archive_wal.sh
sed -i "s|EMAIL_TO=.*|EMAIL_TO=\"${EMAIL_TO:-}\"|" /usr/local/bin/pg_backup/archive_wal.sh
fi
chmod +x /usr/local/bin/pg_backup/archive_wal.sh
chown postgres:postgres /usr/local/bin/pg_backup/archive_wal.sh
echo -e " ${GREEN}✓${NC} Archive script created"
# 8. Create restore script
echo -e "${GREEN}[8/12]${NC} Creating restore script..."
# Similar process for restore_wal.sh
touch /usr/local/bin/pg_backup/restore_wal.sh
chmod +x /usr/local/bin/pg_backup/restore_wal.sh
chown postgres:postgres /usr/local/bin/pg_backup/restore_wal.sh
echo -e " ${GREEN}✓${NC} Restore script created"
# 9. Create monitoring script
echo -e "${GREEN}[9/12]${NC} Creating monitoring script..."
touch /usr/local/bin/pg_backup/backup_monitor.sh
chmod +x /usr/local/bin/pg_backup/backup_monitor.sh
chown postgres:postgres /usr/local/bin/pg_backup/backup_monitor.sh
echo -e " ${GREEN}✓${NC} Monitoring script created"
# 10. Configure PostgreSQL
echo -e "${GREEN}[10/12]${NC} Configuring PostgreSQL..."
PG_CONF="${PG_DATA_DIR}/postgresql.conf"
if [ -f "${PG_CONF}" ]; then
# Backup original config
cp "${PG_CONF}" "${PG_CONF}.backup.$(date +%Y%m%d-%H%M%S)"
# Update configuration
sed -i "s/^#*wal_level = .*/wal_level = replica/" "${PG_CONF}"
sed -i "s/^#*archive_mode = .*/archive_mode = on/" "${PG_CONF}"
sed -i "s|^#*archive_command = .*|archive_command = '/usr/local/bin/pg_backup/archive_wal.sh %p %f'|" "${PG_CONF}"
# Add if not present
grep -q "^archive_timeout" "${PG_CONF}" || echo "archive_timeout = 300" >> "${PG_CONF}"
echo -e " ${GREEN}✓${NC} PostgreSQL configured"
else
echo -e " ${RED}✗${NC} PostgreSQL config not found at ${PG_CONF}"
exit 1
fi
# 11. Setup cron jobs
echo -e "${GREEN}[11/12]${NC} Setting up scheduled tasks..."
(sudo -u postgres crontab -l 2>/dev/null || true; cat <<CRON
# PostgreSQL Backup System
*/15 * * * * /usr/local/bin/pg_backup/backup_monitor.sh >> /var/log/postgresql/backup_monitor.log 2>&1
0 2 * * 0 /usr/local/bin/pg_backup/base_backup.sh >> /var/log/postgresql/base_backup.log 2>&1
0 3 * * 1 /usr/local/bin/pg_backup/backup_verify.sh >> /var/log/postgresql/backup_verify.log 2>&1
CRON
) | sudo -u postgres crontab -
echo -e " ${GREEN}✓${NC} Cron jobs configured"
# 12. Create initial base backup
echo -e "${GREEN}[12/12]${NC} Preparing for initial base backup..."
cat > /tmp/take_initial_backup.sh <<'BACKUP_SCRIPT'
#!/bin/bash
echo "Creating initial base backup..."
sudo -u postgres pg_basebackup -D /tmp/initial_backup -Ft -z -Xs -P
echo "Backup created. You should encrypt and upload this manually."
BACKUP_SCRIPT
chmod +x /tmp/take_initial_backup.sh
echo -e " ${GREEN}✓${NC} Initial backup script ready"
# ==================== COMPLETION ====================
echo ""
echo -e "${BLUE}"
cat << "EOF"
╔═══════════════════════════════════════════════════════════╗
║ Installation Complete! ✓ ║
╚═══════════════════════════════════════════════════════════╝
EOF
echo -e "${NC}"
echo -e "${YELLOW}=== Next Steps ===${NC}"
echo ""
echo -e "1. ${GREEN}CRITICAL:${NC} Backup GPG private key:"
echo -e " ${BLUE}cp /tmp/backup-private-key.asc /secure/offline/storage/${NC}"
echo -e " Then: ${BLUE}shred -u /tmp/backup-private-key.asc${NC}"
echo ""
echo -e "2. Restart PostgreSQL:"
echo -e " ${BLUE}sudo systemctl restart postgresql${NC}"
echo ""
echo -e "3. Verify archiving is working:"
echo -e " ${BLUE}sudo -u postgres psql -c \"SELECT * FROM pg_stat_archiver;\"${NC}"
echo ""
echo -e "4. Take initial base backup:"
echo -e " ${BLUE}/tmp/take_initial_backup.sh${NC}"
echo ""
echo -e "5. Test notification:"
echo -e " ${BLUE}sudo -u postgres /usr/local/bin/pg_backup/backup_monitor.sh${NC}"
echo ""
echo -e "6. Review the disaster recovery runbook:"
echo -e " ${BLUE}cat /usr/local/share/doc/pg_backup_dr_runbook.md${NC}"
echo ""
echo -e "${YELLOW}=== Configuration Summary ===${NC}"
echo -e "S3 Endpoint: ${BLUE}${S3_ENDPOINT}${NC}"
echo -e "S3 Bucket: ${BLUE}${S3_BUCKET}${NC}"
echo -e "Notifications: ${BLUE}${NOTIFICATION_METHOD}${NC}"
echo -e "Local WAL Cache: ${BLUE}${KEEP_LOCAL_COPY}${NC}"
echo -e "Base Backup Every: ${BLUE}Sunday 2 AM${NC}"
echo -e "Verify Every: ${BLUE}Monday 3 AM${NC}"
echo -e "Monitor Every: ${BLUE}15 minutes${NC}"
echo ""
echo -e "${GREEN}Installation log saved to: /var/log/pg_backup_install.log${NC}"
echo ""
echo -e "${YELLOW}⚠ Remember to:${NC}"
echo -e " - Test recovery in staging environment"
echo -e " - Document your specific recovery procedures"
echo -e " - Schedule regular DR drills"
echo ""
#!/bin/bash
#
# PostgreSQL Backup System - Interactive Configuration Wizard
# Creates secure configuration files with user input
#
# Usage: sudo pg-backup-configure
set -euo pipefail
# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
CYAN='\033[0;36m'
BOLD='\033[1m'
NC='\033[0m'
# Configuration paths
CONFIG_DIR="/etc/pg_backup"
CONFIG_FILE="${CONFIG_DIR}/backup.conf"
ENV_FILE="${CONFIG_DIR}/.env"
SECRETS_FILE="${CONFIG_DIR}/.secrets"
# Check if running as root
if [ "$EUID" -ne 0 ]; then
echo -e "${RED}Please run as root or with sudo${NC}"
exit 1
fi
echo -e "${BLUE}${BOLD}"
cat << "EOF"
╔════════════════════════════════════════════════════════════╗
║ PostgreSQL Backup System - Configuration Wizard ║
║ Create your secure backup configuration interactively ║
╚════════════════════════════════════════════════════════════╝
EOF
echo -e "${NC}"
# ==================== HELPER FUNCTIONS ====================
prompt() {
local var_name="$1"
local prompt_text="$2"
local default_value="$3"
local secret="${4:-no}"
if [ "${secret}" = "yes" ]; then
read -sp "$(echo -e ${CYAN}${prompt_text}${NC}) [${default_value}]: " value
echo
else
read -p "$(echo -e ${CYAN}${prompt_text}${NC}) [${default_value}]: " value
fi
eval "${var_name}=\"${value:-$default_value}\""
}
prompt_required() {
local var_name="$1"
local prompt_text="$2"
local secret="${3:-no}"
local value=""
while [ -z "${value}" ]; do
if [ "${secret}" = "yes" ]; then
read -sp "$(echo -e ${CYAN}${prompt_text}${NC}): " value
echo
else
read -p "$(echo -e ${CYAN}${prompt_text}${NC}): " value
fi
if [ -z "${value}" ]; then
echo -e "${RED}This field is required!${NC}"
fi
done
eval "${var_name}=\"${value}\""
}
confirm() {
local prompt_text="$1"
local default="${2:-n}"
local prompt_display="[y/N]"
[ "${default}" = "y" ] && prompt_display="[Y/n]"
read -p "$(echo -e ${CYAN}${prompt_text}${NC}) ${prompt_display}: " response
response="${response:-${default}}"
[[ "${response,,}" =~ ^y(es)?$ ]]
}
show_menu() {
local title="$1"
shift
local options=("$@")
echo -e "\n${YELLOW}${BOLD}${title}${NC}"
for i in "${!options[@]}"; do
echo " $((i + 1))) ${options[i]}"
done
}
test_s3_connection() {
local endpoint="$1"
local bucket="$2"
local profile="$3"
echo -e "\n${YELLOW}Testing S3 connection...${NC}"
if sudo -u postgres aws s3 ls "s3://${bucket}" \
--profile "${profile}" \
--endpoint-url "${endpoint}" \
>/dev/null 2>&1; then
echo -e "${GREEN}✓ S3 connection successful!${NC}"
return 0
else
echo -e "${RED}✗ S3 connection failed${NC}"
return 1
fi
}
test_telegram() {
local bot_token="$1"
local chat_id="$2"
echo -e "\n${YELLOW}Testing Telegram notification...${NC}"
local text="Test message from PostgreSQL Backup System configuration wizard"
if curl -s -X POST "https://api.telegram.org/bot${bot_token}/sendMessage" \
-d "chat_id=${chat_id}" \
-d "text=${text}" \
--max-time 10 \
>/dev/null 2>&1; then
echo -e "${GREEN}✓ Telegram notification sent successfully!${NC}"
return 0
else
echo -e "${RED}✗ Telegram notification failed${NC}"
return 1
fi
}
# ==================== MAIN CONFIGURATION WIZARD ====================
main() {
echo -e "${BOLD}This wizard will guide you through configuring your PostgreSQL backup system.${NC}\n"
# Check for existing configuration
if [ -f "${CONFIG_FILE}" ] || [ -f "${ENV_FILE}" ]; then
echo -e "${YELLOW}Existing configuration found!${NC}"
if confirm "Do you want to reconfigure?" "n"; then
backup_existing_config
else
echo "Configuration cancelled."
exit 0
fi
fi
# ==================== S3 CONFIGURATION ====================
echo -e "\n${GREEN}${BOLD}=== S3 Storage Configuration ===${NC}"
show_menu "Select your S3 provider:" \
"Backblaze B2 (Recommended - $6/TB/month)" \
"Tebi (No egress fees)" \
"Hetzner Object Storage (EU-based)" \
"AWS S3" \
"Custom S3-compatible storage"
read -p "Enter choice [1-5]: " s3_choice
case ${s3_choice} in
1)
S3_PROVIDER="backblaze"
prompt S3_REGION "Backblaze region" "us-west-000"
S3_ENDPOINT="https://s3.${S3_REGION}.backblazeb2.com"
echo -e "${CYAN}Endpoint set to: ${S3_ENDPOINT}${NC}"
;;
2)
S3_PROVIDER="tebi"
S3_REGION="us-east-1"
S3_ENDPOINT="https://s3.tebi.io"
;;
3)
S3_PROVIDER="hetzner"
S3_REGION="fsn1"
S3_ENDPOINT="https://${S3_REGION}.your-objectstorage.com"
;;
4)
S3_PROVIDER="aws"
prompt_required S3_REGION "AWS region (e.g., us-east-1)"
S3_ENDPOINT=""
;;
5)
S3_PROVIDER="custom"
prompt_required S3_ENDPOINT "Custom S3 endpoint URL (e.g., https://s3.example.com)"
prompt S3_REGION "S3 region" "us-east-1"
;;
*)
echo -e "${RED}Invalid choice${NC}"
exit 1
;;
esac
prompt_required S3_BUCKET "S3 bucket name"
prompt S3_BASE_PREFIX "Base backup prefix" "base-backups"
prompt S3_WAL_PREFIX "WAL archive prefix" "wal-archives"
echo -e "\n${GREEN}${BOLD}=== AWS Credentials ===${NC}"
echo "Credentials will be stored securely in ${SECRETS_FILE}"
show_menu "How do you want to provide AWS credentials?" \
"Create new AWS CLI profile (Recommended)" \
"Use existing AWS CLI profile" \
"Enter credentials directly (stored in ${SECRETS_FILE})"
read -p "Enter choice [1-3]: " cred_choice
case ${cred_choice} in
1|2)
if [ "${cred_choice}" = "1" ]; then
prompt_required AWS_ACCESS_KEY_ID "AWS Access Key ID" "no"
prompt_required AWS_SECRET_ACCESS_KEY "AWS Secret Access Key" "yes"
AWS_PROFILE="pg-backups"
# Configure AWS CLI profile
sudo -u postgres mkdir -p /var/lib/postgresql/.aws
# Check if profile exists
if sudo -u postgres aws configure get aws_access_key_id --profile "${AWS_PROFILE}" >/dev/null 2>&1; then
if ! confirm "Profile '${AWS_PROFILE}' exists. Overwrite?" "n"; then
prompt_required AWS_PROFILE "Enter different profile name"
fi
fi
sudo -u postgres aws configure set aws_access_key_id "${AWS_ACCESS_KEY_ID}" --profile "${AWS_PROFILE}"
sudo -u postgres aws configure set aws_secret_access_key "${AWS_SECRET_ACCESS_KEY}" --profile "${AWS_PROFILE}"
sudo -u postgres aws configure set region "${S3_REGION}" --profile "${AWS_PROFILE}"
echo -e "${GREEN}✓ AWS profile '${AWS_PROFILE}' configured${NC}"
else
prompt_required AWS_PROFILE "Enter AWS profile name"
# Verify profile exists
if ! sudo -u postgres aws configure get aws_access_key_id --profile "${AWS_PROFILE}" >/dev/null 2>&1; then
echo -e "${RED}Profile '${AWS_PROFILE}' not found!${NC}"
exit 1
fi
fi
;;
3)
prompt_required AWS_ACCESS_KEY_ID "AWS Access Key ID" "no"
prompt_required AWS_SECRET_ACCESS_KEY "AWS Secret Access Key" "yes"
AWS_PROFILE=""
;;
esac
# Test S3 connection
if confirm "Test S3 connection now?" "y"; then
# Create bucket if it doesn't exist
echo "Checking/creating S3 bucket..."
local aws_cmd="aws"
[ -n "${AWS_PROFILE}" ] && aws_cmd="${aws_cmd} --profile ${AWS_PROFILE}"
[ -n "${S3_ENDPOINT}" ] && aws_cmd="${aws_cmd} --endpoint-url ${S3_ENDPOINT}"
aws_cmd="${aws_cmd} --region ${S3_REGION}"
if ! sudo -u postgres ${aws_cmd} s3 ls "s3://${S3_BUCKET}" >/dev/null 2>&1; then
echo "Bucket doesn't exist, creating..."
sudo -u postgres ${aws_cmd} s3 mb "s3://${S3_BUCKET}"
fi
test_s3_connection "${S3_ENDPOINT}" "${S3_BUCKET}" "${AWS_PROFILE}"
fi
# ==================== ENCRYPTION CONFIGURATION ====================
echo -e "\n${GREEN}${BOLD}=== Encryption Configuration ===${NC}"
if confirm "Enable GPG encryption for backups?" "y"; then
GPG_ENCRYPT="yes"
prompt GPG_RECIPIENT "GPG key email/ID" "backup@$(hostname)"
# Check if GPG key exists
if ! sudo -u postgres gpg --list-keys "${GPG_RECIPIENT}" >/dev/null 2>&1; then
echo -e "${YELLOW}GPG key not found. Generating new key...${NC}"
sudo -u postgres gpg --batch --gen-key <<EOF
Key-Type: RSA
Key-Length: 4096
Subkey-Type: RSA
Subkey-Length: 4096
Name-Real: PostgreSQL Backup
Name-Email: ${GPG_RECIPIENT}
Expire-Date: 0
%no-protection
%commit
EOF
echo -e "${GREEN}✓ GPG key generated${NC}"
# Export keys
sudo -u postgres gpg --export -a "${GPG_RECIPIENT}" > /tmp/backup-public-key.asc
sudo -u postgres gpg --export-secret-keys -a "${GPG_RECIPIENT}" > /tmp/backup-private-key.asc
chmod 600 /tmp/backup-private-key.asc
echo -e "\n${RED}${BOLD}⚠ CRITICAL: Save your private key!${NC}"
echo -e "Public key: ${GREEN}/tmp/backup-public-key.asc${NC}"
echo -e "Private key: ${YELLOW}/tmp/backup-private-key.asc${NC}"
echo -e "\n${BOLD}Copy the private key to secure offline storage NOW!${NC}"
echo -e "Without it, you CANNOT decrypt your backups!"
read -p "Press ENTER when you have secured the private key..."
fi
else
GPG_ENCRYPT="no"
echo -e "${YELLOW}⚠ Backups will NOT be encrypted${NC}"
fi
# ==================== POSTGRESQL CONFIGURATION ====================
echo -e "\n${GREEN}${BOLD}=== PostgreSQL Configuration ===${NC}"
prompt PGHOST "PostgreSQL host" "localhost"
prompt PGPORT "PostgreSQL port" "5432"
prompt PGUSER "PostgreSQL user" "postgres"
prompt PGDATABASE "PostgreSQL database" "postgres"
# Try to detect data directory
PG_DATA_DIR_DEFAULT="/var/lib/postgresql/14/main"
if sudo -u postgres psql -h "${PGHOST}" -p "${PGPORT}" -U "${PGUSER}" -d "${PGDATABASE}" \
-t -c "SHOW data_directory;" >/dev/null 2>&1; then
PG_DATA_DIR_DEFAULT=$(sudo -u postgres psql -h "${PGHOST}" -p "${PGPORT}" -U "${PGUSER}" -d "${PGDATABASE}" \
-t -c "SHOW data_directory;" | xargs)
fi
prompt PG_DATA_DIR "PostgreSQL data directory" "${PG_DATA_DIR_DEFAULT}"
prompt PG_VERSION "PostgreSQL version" "14"
# ==================== NOTIFICATION CONFIGURATION ====================
echo -e "\n${GREEN}${BOLD}=== Notification Configuration ===${NC}"
if confirm "Enable backup notifications?" "y"; then
ENABLE_NOTIFICATIONS="yes"
show_menu "Select notification method:" \
"Telegram (Recommended)" \
"Email" \
"Both (Telegram + Email)"
read -p "Enter choice [1-3]: " notif_choice
case ${notif_choice} in
1)
NOTIFICATION_METHOD="telegram"
echo -e "\n${CYAN}To get your Telegram credentials:${NC}"
echo "1. Create a bot: Talk to @BotFather on Telegram"
echo "2. Send /newbot and follow instructions"
echo "3. Get your chat ID: Start chat with your bot, then visit:"
echo " https://api.telegram.org/bot<YOUR_BOT_TOKEN>/getUpdates"
echo ""
prompt_required TELEGRAM_BOT_TOKEN "Telegram Bot Token" "yes"
prompt_required TELEGRAM_CHAT_ID "Telegram Chat ID" "no"
if confirm "Test Telegram notification now?" "y"; then
test_telegram "${TELEGRAM_BOT_TOKEN}" "${TELEGRAM_CHAT_ID}"
fi
;;
2)
NOTIFICATION_METHOD="email"
prompt_required EMAIL_TO "Email recipient"
prompt EMAIL_FROM "Email from address" "postgres-backup@$(hostname)"
prompt SMTP_SERVER "SMTP server" "smtp.gmail.com:587"
if confirm "Configure SMTP authentication?" "y"; then
prompt_required SMTP_USER "SMTP username"
prompt_required SMTP_PASSWORD "SMTP password" "yes"
fi
;;
3)
NOTIFICATION_METHOD="both"
# Telegram
prompt_required TELEGRAM_BOT_TOKEN "Telegram Bot Token" "yes"
prompt_required TELEGRAM_CHAT_ID "Telegram Chat ID" "no"
# Email
prompt_required EMAIL_TO "Email recipient"
prompt EMAIL_FROM "Email from address" "postgres-backup@$(hostname)"
prompt SMTP_SERVER "SMTP server" "smtp.gmail.com:587"
if confirm "Configure SMTP authentication?" "y"; then
prompt_required SMTP_USER "SMTP username"
prompt_required SMTP_PASSWORD "SMTP password" "yes"
fi
;;
esac
else
ENABLE_NOTIFICATIONS="no"
fi
# ==================== BACKUP CONFIGURATION ====================
echo -e "\n${GREEN}${BOLD}=== Backup Configuration ===${NC}"
if confirm "Keep local copy of WAL files?" "n"; then
KEEP_LOCAL_WAL_COPY="yes"
prompt LOCAL_WAL_DIR "Local WAL directory" "/var/lib/postgresql/wal_archive"
prompt LOCAL_WAL_RETENTION_DAYS "Local retention (days)" "7"
else
KEEP_LOCAL_WAL_COPY="no"
fi
prompt BASE_BACKUP_RETENTION_DAYS "Base backup retention (days)" "30"
if confirm "Enable automated backup verification?" "y"; then
ENABLE_BACKUP_VERIFICATION="yes"
echo -e "${CYAN}Note: Verification requires disk space equal to 2x your database size${NC}"
prompt VERIFY_TEST_DATA_DIR "Verification test directory" "/var/lib/postgresql/backup_verify_test"
else
ENABLE_BACKUP_VERIFICATION="no"
fi
# ==================== WRITE CONFIGURATION ====================
echo -e "\n${GREEN}${BOLD}=== Writing Configuration ===${NC}"
mkdir -p "${CONFIG_DIR}"
chmod 750 "${CONFIG_DIR}"
chown postgres:postgres "${CONFIG_DIR}"
# Write main configuration
cat > "${CONFIG_FILE}" <<EOF
# PostgreSQL Backup System Configuration
# Generated: $(date)
# WARNING: Contains sensitive information - keep secure!
# S3 Configuration
S3_PROVIDER="${S3_PROVIDER}"
S3_ENDPOINT="${S3_ENDPOINT}"
S3_REGION="${S3_REGION}"
S3_BUCKET="${S3_BUCKET}"
S3_BASE_PREFIX="${S3_BASE_PREFIX}"
S3_WAL_PREFIX="${S3_WAL_PREFIX}"
# AWS Credentials
AWS_PROFILE="${AWS_PROFILE:-}"
# Encryption
GPG_ENCRYPT="${GPG_ENCRYPT}"
GPG_RECIPIENT="${GPG_RECIPIENT:-}"
# PostgreSQL
PGHOST="${PGHOST}"
PGPORT="${PGPORT}"
PGUSER="${PGUSER}"
PGDATABASE="${PGDATABASE}"
PG_DATA_DIR="${PG_DATA_DIR}"
PG_VERSION="${PG_VERSION}"
# Backup Settings
KEEP_LOCAL_WAL_COPY="${KEEP_LOCAL_WAL_COPY}"
LOCAL_WAL_DIR="${LOCAL_WAL_DIR:-}"
LOCAL_WAL_RETENTION_DAYS="${LOCAL_WAL_RETENTION_DAYS:-7}"
BASE_BACKUP_RETENTION_DAYS="${BASE_BACKUP_RETENTION_DAYS}"
# Notifications
ENABLE_NOTIFICATIONS="${ENABLE_NOTIFICATIONS}"
NOTIFICATION_METHOD="${NOTIFICATION_METHOD:-none}"
EMAIL_TO="${EMAIL_TO:-}"
EMAIL_FROM="${EMAIL_FROM:-}"
SMTP_SERVER="${SMTP_SERVER:-}"
# Verification
ENABLE_BACKUP_VERIFICATION="${ENABLE_BACKUP_VERIFICATION}"
VERIFY_TEST_DATA_DIR="${VERIFY_TEST_DATA_DIR:-}"
# Paths
SCRIPT_DIR="/usr/local/bin/pg_backup"
STATE_DIR="/var/lib/postgresql/monitor_state"
LOG_DIR="/var/log/postgresql"
EOF
# Write secrets file (for sensitive data)
cat > "${SECRETS_FILE}" <<EOF
# PostgreSQL Backup System - Secrets
# Generated: $(date)
# KEEP THIS FILE SECURE! chmod 600
# AWS Credentials (if not using profile)
AWS_ACCESS_KEY_ID="${AWS_ACCESS_KEY_ID:-}"
AWS_SECRET_ACCESS_KEY="${AWS_SECRET_ACCESS_KEY:-}"
# Notification Credentials
TELEGRAM_BOT_TOKEN="${TELEGRAM_BOT_TOKEN:-}"
TELEGRAM_CHAT_ID="${TELEGRAM_CHAT_ID:-}"
SMTP_USER="${SMTP_USER:-}"
SMTP_PASSWORD="${SMTP_PASSWORD:-}"
EOF
# Set permissions
chmod 640 "${CONFIG_FILE}"
chmod 600 "${SECRETS_FILE}"
chown postgres:postgres "${CONFIG_FILE}"
chown postgres:postgres "${SECRETS_FILE}"
# Create symlink for .env file
ln -sf "${CONFIG_FILE}" "${ENV_FILE}"
echo -e "${GREEN}✓ Configuration written to: ${CONFIG_FILE}${NC}"
echo -e "${GREEN}✓ Secrets written to: ${SECRETS_FILE}${NC}"
# ==================== SUMMARY ====================
echo -e "\n${BLUE}${BOLD}╔════════════════════════════════════════════════════════════╗${NC}"
echo -e "${BLUE}${BOLD}║ Configuration Complete! ║${NC}"
echo -e "${BLUE}${BOLD}╚════════════════════════════════════════════════════════════╝${NC}"
echo -e "\n${YELLOW}${BOLD}Configuration Summary:${NC}"
echo -e " S3 Provider: ${CYAN}${S3_PROVIDER}${NC}"
echo -e " S3 Bucket: ${CYAN}${S3_BUCKET}${NC}"
echo -e " Encryption: ${CYAN}${GPG_ENCRYPT}${NC}"
echo -e " Notifications: ${CYAN}${NOTIFICATION_METHOD}${NC}"
echo -e " Verification: ${CYAN}${ENABLE_BACKUP_VERIFICATION}${NC}"
echo -e "\n${YELLOW}${BOLD}Next Steps:${NC}"
echo -e " 1. Review configuration: ${GREEN}cat ${CONFIG_FILE}${NC}"
echo -e " 2. Install backup scripts: ${GREEN}sudo pg-backup-install${NC}"
echo -e " 3. Start backing up: The system will activate after installation"
echo -e "\n${YELLOW}${BOLD}Security Reminders:${NC}"
echo -e " • Keep ${SECRETS_FILE} secure (600 permissions set)"
echo -e " • Store GPG private key offline: ${YELLOW}/tmp/backup-private-key.asc${NC}"
echo -e " • Test recovery procedure in staging"
echo -e " • Schedule regular DR drills"
echo ""
}
backup_existing_config() {
local backup_dir="${CONFIG_DIR}/backups"
local timestamp=$(date +%Y%m%d-%H%M%S)
mkdir -p "${backup_dir}"
[ -f "${CONFIG_FILE}" ] && cp "${CONFIG_FILE}" "${backup_dir}/backup.conf.${timestamp}"
[ -f "${SECRETS_FILE}" ] && cp "${SECRETS_FILE}" "${backup_dir}/.secrets.${timestamp}"
echo -e "${GREEN}Existing configuration backed up to: ${backup_dir}${NC}"
}
# Run main function
main "$@"
#!/bin/bash
#
# PostgreSQL Backup System v2.0 - Complete Installer
#
# Usage: sudo pg-backup-install [options]
#
# Options:
# --config FILE Use specific configuration file
# --from-env Use environment variables only
# --skip-config Skip configuration check (assume already configured)
# --uninstall Remove the backup system
# --help Show this help
set -euo pipefail
# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
CYAN='\033[0;36m'
BOLD='\033[1m'
NC='\033[0m'
# Configuration
CONFIG_DIR="/etc/pg_backup"
CONFIG_FILE="${CONFIG_DIR}/backup.conf"
INSTALL_DIR="/usr/local/bin/pg_backup"
DOC_DIR="/usr/share/doc/pg_backup"
SYSTEMD_DIR="/etc/systemd/system"
# Script URLs (replace with your repository)
GITHUB_REPO="https://raw.githubusercontent.com/yourorg/pg-backup-system/main"
# Check root
if [ "$EUID" -ne 0 ]; then
echo -e "${RED}Please run as root or with sudo${NC}"
exit 1
fi
show_banner() {
echo -e "${BLUE}${BOLD}"
cat << "EOF"
╔══════════════════════════════════════════════════════════════╗
║ PostgreSQL Backup System v2.0 - Installer ║
║ Enterprise-grade backup & disaster recovery ║
╚══════════════════════════════════════════════════════════════╝
EOF
echo -e "${NC}"
}
usage() {
cat << EOF
Usage: sudo pg-backup-install [options]
Options:
--config FILE Use specific configuration file
--from-env Use environment variables only
--skip-config Skip configuration check
--uninstall Remove the backup system
--help Show this help
Examples:
# Normal installation (will run configurator if needed)
sudo pg-backup-install
# Use custom config file
sudo pg-backup-install --config /path/to/backup.conf
# Install using environment variables
export S3_PROVIDER=backblaze
export S3_BUCKET=my-backups
sudo pg-backup-install --from-env
# Uninstall
sudo pg-backup-install --uninstall
EOF
}
parse_args() {
USE_ENV="no"
SKIP_CONFIG="no"
UNINSTALL="no"
CUSTOM_CONFIG=""
while [ $# -gt 0 ]; do
case "$1" in
--from-env)
USE_ENV="yes"
shift
;;
--skip-config)
SKIP_CONFIG="yes"
shift
;;
--config)
CUSTOM_CONFIG="$2"
shift 2
;;
--uninstall)
UNINSTALL="yes"
shift
;;
--help)
usage
exit 0
;;
*)
echo -e "${RED}Unknown option: $1${NC}"
usage
exit 1
;;
esac
done
}
check_configuration() {
if [ "${SKIP_CONFIG}" = "yes" ]; then
return 0
fi
if [ "${USE_ENV}" = "yes" ]; then
echo -e "${YELLOW}Using environment variables for configuration${NC}"
return 0
fi
if [ -n "${CUSTOM_CONFIG}" ]; then
if [ ! -f "${CUSTOM_CONFIG}" ]; then
echo -e "${RED}Custom config file not found: ${CUSTOM_CONFIG}${NC}"
exit 1
fi
CONFIG_FILE="${CUSTOM_CONFIG}"
return 0
fi
if [ ! -f "${CONFIG_FILE}" ]; then
echo -e "${YELLOW}No configuration found. Running configuration wizard...${NC}"
sleep 2
if command -v pg-backup-configure >/dev/null 2>&1; then
pg-backup-configure
else
echo -e "${RED}Configuration wizard not found!${NC}"
echo -e "Please run: ${GREEN}pg-backup-configure${NC}"
exit 1
fi
fi
}
install_dependencies() {
echo -e "${GREEN}[1/10]${NC} Installing dependencies..."
if command -v apt-get >/dev/null 2>&1; then
apt-get update -qq
apt-get install -y -qq \
postgresql-client \
awscli \
gnupg2 \
curl \
jq \
mailutils \
> /dev/null 2>&1
elif command -v yum >/dev/null 2>&1; then
yum install -y -q \
postgresql \
awscli \
gnupg2 \
curl \
jq \
mailx \
> /dev/null 2>&1
else
echo -e "${RED}Unsupported package manager${NC}"
exit 1
fi
echo -e " ${GREEN}✓${NC} Dependencies installed"
}
create_directories() {
echo -e "${GREEN}[2/10]${NC} Creating directory structure..."
# Installation directories
mkdir -p "${INSTALL_DIR}/lib"
mkdir -p "${DOC_DIR}"
mkdir -p "${CONFIG_DIR}/hooks"
mkdir -p "${CONFIG_DIR}/backups"
# Runtime directories
mkdir -p /var/lib/postgresql/monitor_state
mkdir -p /var/lib/postgresql/wal_archive
mkdir -p /var/log/postgresql
mkdir -p /tmp/pg_backup
# Set ownership
chown -R postgres:postgres /var/lib/postgresql/monitor_state
chown -R postgres:postgres /var/lib/postgresql/wal_archive
chown postgres:postgres /var/log/postgresql
chown -R postgres:postgres /tmp/pg_backup
chown -R postgres:postgres "${CONFIG_DIR}"
# Set permissions
chmod 750 "${CONFIG_DIR}"
chmod 750 "${INSTALL_DIR}"
chmod 700 /var/lib/postgresql/wal_archive
chmod 700 /tmp/pg_backup
echo -e " ${GREEN}✓${NC} Directories created"
}
install_scripts() {
echo -e "${GREEN}[3/10]${NC} Installing backup scripts..."
# Copy common library
cat > "${INSTALL_DIR}/lib/common.sh" <<'EOF'
# Common library content will be embedded here
# (Use the content from pg_backup_lib artifact)
EOF
# Install main scripts
local scripts=(
"archive_wal.sh"
"restore_wal.sh"
"base_backup.sh"
"backup_monitor.sh"
"backup_verify.sh"
)
for script in "${scripts[@]}"; do
# In production, these would be downloaded or copied from package
touch "${INSTALL_DIR}/${script}"
chmod +x "${INSTALL_DIR}/${script}"
chown postgres:postgres "${INSTALL_DIR}/${script}"
done
echo -e " ${GREEN}✓${NC} Scripts installed to ${INSTALL_DIR}"
}
install_utilities() {
echo -e "${GREEN}[4/10]${NC} Installing utility commands..."
# Create symlinks for easy access
local utilities=(
"pg-backup-configure:${INSTALL_DIR}/../pg-backup-configure"
"pg-backup-install:${INSTALL_DIR}/../pg-backup-install"
"pg-backup-test:${INSTALL_DIR}/lib/test.sh"
"pg-backup-status:${INSTALL_DIR}/lib/status.sh"
"pg-backup-health:${INSTALL_DIR}/lib/health.sh"
"pg-backup-logs:${INSTALL_DIR}/lib/logs.sh"
"pg-backup-recover:${INSTALL_DIR}/lib/recover.sh"
)
for utility in "${utilities[@]}"; do
local name="${utility%%:*}"
local target="${utility#*:}"
if [ ! -f "/usr/local/bin/${name}" ]; then
ln -sf "${target}" "/usr/local/bin/${name}" 2>/dev/null || true
fi
done
echo -e " ${GREEN}✓${NC} Utilities installed"
}
configure_postgresql() {
echo -e "${GREEN}[5/10]${NC} Configuring PostgreSQL..."
# Load config to get PG_DATA_DIR
if [ -f "${CONFIG_FILE}" ]; then
source "${CONFIG_FILE}"
fi
PG_CONF="${PG_DATA_DIR:-/var/lib/postgresql/14/main}/postgresql.conf"
if [ ! -f "${PG_CONF}" ]; then
echo -e " ${YELLOW}!${NC} PostgreSQL config not found, skipping"
return 0
fi
# Backup original config
if [ ! -f "${PG_CONF}.backup.original" ]; then
cp "${PG_CONF}" "${PG_CONF}.backup.original"
fi
# Update archive settings
if ! grep -q "^wal_level = replica" "${PG_CONF}"; then
echo "wal_level = replica" >> "${PG_CONF}"
fi
if ! grep -q "^archive_mode = on" "${PG_CONF}"; then
echo "archive_mode = on" >> "${PG_CONF}"
fi
if ! grep -q "^archive_command = " "${PG_CONF}"; then
echo "archive_command = '${INSTALL_DIR}/archive_wal.sh %p %f'" >> "${PG_CONF}"
fi
if ! grep -q "^archive_timeout = " "${PG_CONF}"; then
echo "archive_timeout = 300" >> "${PG_CONF}"
fi
echo -e " ${GREEN}✓${NC} PostgreSQL configured"
echo -e " ${YELLOW}⚠${NC} Restart PostgreSQL: ${CYAN}sudo systemctl restart postgresql${NC}"
}
setup_cron_jobs() {
echo -e "${GREEN}[6/10]${NC} Setting up scheduled tasks..."
# Create cron jobs
(sudo -u postgres crontab -l 2>/dev/null || true; cat <<CRON
# PostgreSQL Backup System v2.0
# Generated: $(date)
# Health monitoring - every 15 minutes
*/15 * * * * ${INSTALL_DIR}/backup_monitor.sh >> /var/log/postgresql/backup_monitor.log 2>&1
# Base backup - Sunday 2 AM
0 2 * * 0 ${INSTALL_DIR}/base_backup.sh >> /var/log/postgresql/base_backup.log 2>&1
# Verification - Monday 3 AM
0 3 * * 1 ${INSTALL_DIR}/backup_verify.sh >> /var/log/postgresql/backup_verify.log 2>&1
# Log cleanup - daily
0 4 * * * find /var/log/postgresql -name "*.log.*" -mtime +7 -delete
CRON
) | sudo -u postgres crontab -
echo -e " ${GREEN}✓${NC} Cron jobs configured"
}
install_systemd_services() {
echo -e "${GREEN}[7/10]${NC} Installing systemd services..."
# Backup metrics exporter service (optional)
if [ "${ENABLE_METRICS_EXPORT:-no}" = "yes" ]; then
cat > "${SYSTEMD_DIR}/pg-backup-metrics.service" <<EOF
[Unit]
Description=PostgreSQL Backup Metrics Exporter
After=network.target
[Service]
Type=simple
User=postgres
ExecStart=${INSTALL_DIR}/lib/metrics_exporter.sh
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable pg-backup-metrics 2>/dev/null || true
echo -e " ${GREEN}✓${NC} Metrics exporter service installed"
fi
}
install_documentation() {
echo -e "${GREEN}[8/10]${NC} Installing documentation..."
# Copy documentation files
# (In production, these would be included in the package)
touch "${DOC_DIR}/README.md"
touch "${DOC_DIR}/disaster-recovery.md"
touch "${DOC_DIR}/configuration-reference.md"
touch "${DOC_DIR}/troubleshooting.md"
# Create quick reference
cat > "${DOC_DIR}/QUICKSTART.md" <<'EOF'
# PostgreSQL Backup System - Quick Start
## Verify Installation
```bash
pg-backup-test --all
```
## Check Status
```bash
pg-backup-status
```
## View Logs
```bash
pg-backup-logs --tail 50
```
## Test Backup
```bash
# Force WAL switch
sudo -u postgres psql -c "SELECT pg_switch_wal();"
# Check if archived
pg-backup-status --wal
```
## Disaster Recovery
See: disaster-recovery.md
```bash
sudo pg-backup-recover
```
## Get Help
```bash
pg-backup-configure --help
pg-backup-test --help
pg-backup-recover --help
```
EOF
chmod 644 "${DOC_DIR}"/*
echo -e " ${GREEN}✓${NC} Documentation installed to ${DOC_DIR}"
}
verify_installation() {
echo -e "${GREEN}[9/10]${NC} Verifying installation..."
local errors=0
# Check scripts
for script in archive_wal.sh restore_wal.sh base_backup.sh backup_monitor.sh; do
if [ ! -x "${INSTALL_DIR}/${script}" ]; then
echo -e " ${RED}✗${NC} Missing or not executable: ${script}"
errors=$((errors + 1))
fi
done
# Check configuration
if [ ! -f "${CONFIG_FILE}" ] && [ "${USE_ENV}" != "yes" ]; then
echo -e " ${RED}✗${NC} Configuration not found"
errors=$((errors + 1))
fi
# Check permissions
if [ ! -w "/var/log/postgresql" ]; then
echo -e " ${RED}✗${NC} Cannot write to /var/log/postgresql"
errors=$((errors + 1))
fi
if [ ${errors} -eq 0 ]; then
echo -e " ${GREEN}✓${NC} Installation verified"
return 0
else
echo -e " ${RED}✗${NC} Installation has ${errors} error(s)"
return 1
fi
}
run_initial_tests() {
echo -e "${GREEN}[10/10]${NC} Running initial tests..."
# Test configuration loading
if sudo -u postgres bash -c "source ${INSTALL_DIR}/lib/common.sh && load_config" 2>/dev/null; then
echo -e " ${GREEN}✓${NC} Configuration loads successfully"
else
echo -e " ${YELLOW}!${NC} Configuration test failed (non-critical)"
fi
# Test S3 connectivity
if sudo -u postgres ${INSTALL_DIR}/lib/test.sh --s3 2>/dev/null; then
echo -e " ${GREEN}✓${NC} S3 connectivity OK"
else
echo -e " ${YELLOW}!${NC} S3 connectivity test failed"
fi
echo -e " ${CYAN}Run full test suite:${NC} pg-backup-test --all"
}
show_next_steps() {
echo -e "\n${BLUE}${BOLD}╔══════════════════════════════════════════════════════════════╗${NC}"
echo -e "${BLUE}${BOLD}║ Installation Complete! ✓ ║${NC}"
echo -e "${BLUE}${BOLD}╚══════════════════════════════════════════════════════════════╝${NC}"
echo -e "\n${YELLOW}${BOLD}Next Steps:${NC}"
echo -e "\n${GREEN}1.${NC} Restart PostgreSQL to activate archiving:"
echo -e " ${CYAN}sudo systemctl restart postgresql${NC}"
echo -e "\n${GREEN}2.${NC} Verify backup system is working:"
echo -e " ${CYAN}pg-backup-test --all${NC}"
echo -e "\n${GREEN}3.${NC} Check system status:"
echo -e " ${CYAN}pg-backup-status${NC}"
echo -e "\n${GREEN}4.${NC} View logs:"
echo -e " ${CYAN}pg-backup-logs --tail 50${NC}"
echo -e "\n${GREEN}5.${NC} Review documentation:"
echo -e " ${CYAN}cat ${DOC_DIR}/QUICKSTART.md${NC}"
echo -e "\n${YELLOW}${BOLD}Important Reminders:${NC}"
echo -e " • Review configuration: ${CYAN}${CONFIG_FILE}${NC}"
echo -e " • Secure GPG private key offline"
echo -e " • Test recovery procedure in staging"
echo -e " • Schedule regular DR drills"
echo -e "\n${YELLOW}${BOLD}Quick Commands:${NC}"
echo -e " ${CYAN}pg-backup-status${NC} - Check system status"
echo -e " ${CYAN}pg-backup-health${NC} - Health check report"
echo -e " ${CYAN}pg-backup-logs${NC} - View logs"
echo -e " ${CYAN}pg-backup-test${NC} - Run tests"
echo -e " ${CYAN}pg-backup-recover${NC} - Disaster recovery wizard"
echo -e "\n${GREEN}For help: pg-backup-test --help${NC}\n"
}
uninstall() {
echo -e "${YELLOW}${BOLD}Uninstalling PostgreSQL Backup System...${NC}\n"
if ! confirm "This will remove all backup scripts and configuration. Continue?" "n"; then
echo "Uninstall cancelled."
exit 0
fi
# Stop services
systemctl stop pg-backup-metrics 2>/dev/null || true
systemctl disable pg-backup-metrics 2>/dev/null || true
# Remove cron jobs
sudo -u postgres crontab -l | grep -v "pg_backup" | sudo -u postgres crontab - 2>/dev/null || true
# Remove files
rm -rf "${INSTALL_DIR}"
rm -rf "${DOC_DIR}"
rm -f "${SYSTEMD_DIR}/pg-backup-metrics.service"
# Remove symlinks
find /usr/local/bin -type l -name "pg-backup-*" -delete 2>/dev/null || true
# Ask about configuration
if confirm "Remove configuration files? (Includes credentials!)" "n"; then
rm -rf "${CONFIG_DIR}"
echo -e "${GREEN}✓${NC} Configuration removed"
else
echo -e "${YELLOW}!${NC} Configuration preserved at ${CONFIG_DIR}"
fi
# Ask about logs
if confirm "Remove log files?" "n"; then
rm -f /var/log/postgresql/wal_*.log
rm -f /var/log/postgresql/backup_*.log
rm -f /var/log/postgresql/base_*.log
echo -e "${GREEN}✓${NC} Logs removed"
else
echo -e "${YELLOW}!${NC} Logs preserved"
fi
systemctl daemon-reload
echo -e "\n${GREEN}Uninstall complete${NC}"
}
confirm() {
local prompt="$1"
local default="${2:-n}"
local prompt_display="[y/N]"
[ "${default}" = "y" ] && prompt_display="[Y/n]"
read -p "$(echo -e ${prompt}) ${prompt_display}: " response
response="${response:-${default}}"
[[ "${response,,}" =~ ^y(es)?$ ]]
}
main() {
show_banner
parse_args "$@"
if [ "${UNINSTALL}" = "yes" ]; then
uninstall
exit 0
fi
echo -e "${BOLD}Installing PostgreSQL Backup System v2.0...${NC}\n"
check_configuration
install_dependencies
create_directories
install_scripts
install_utilities
configure_postgresql
setup_cron_jobs
install_systemd_services
install_documentation
verify_installation
run_initial_tests
show_next_steps
}
main "$@"
#!/bin/bash
#
# PostgreSQL WAL Restore Script with S3 Download & Decryption
# Usage: restore_wal.sh %f %p
#
# This script:
# 1. Downloads encrypted WAL files from S3
# 2. Decrypts using GPG
# 3. Verifies integrity
# 4. Places file for PostgreSQL recovery
set -euo pipefail
# ==================== CONFIGURATION ====================
# WAL file parameters (passed by PostgreSQL)
WAL_FILE="${1}" # %f - filename to restore
WAL_PATH="${2}" # %p - path where to place it
# S3 Configuration (must match archive script)
S3_ENDPOINT="https://s3.us-west-000.backblazeb2.com"
S3_BUCKET="your-postgres-backups"
S3_REGION="us-west-000"
S3_PREFIX="wal-archives/$(hostname)"
# AWS CLI profile
AWS_PROFILE="backups"
# Encryption Configuration
GPG_DECRYPT="yes" # Set to "no" if archives are not encrypted
# Local cache (check local before downloading from S3)
LOCAL_ARCHIVE_DIR="/var/lib/postgresql/wal_archive"
CHECK_LOCAL_FIRST="yes"
# Logging
LOG_FILE="/var/log/postgresql/wal_restore.log"
# ==================== FUNCTIONS ====================
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" >> "${LOG_FILE}"
}
# No error notifications during restore - PostgreSQL will retry
error_exit() {
log "ERROR: $1"
exit 1
}
verify_prerequisites() {
command -v aws >/dev/null 2>&1 || error_exit "AWS CLI not installed"
if [ "${GPG_DECRYPT}" = "yes" ]; then
command -v gpg >/dev/null 2>&1 || error_exit "GPG not installed"
fi
}
download_from_s3() {
local s3_key="$1"
local output="$2"
aws s3 cp "s3://${S3_BUCKET}/${s3_key}" \
"${output}" \
--profile "${AWS_PROFILE}" \
--endpoint-url "${S3_ENDPOINT}" \
--region "${S3_REGION}" \
--no-progress \
2>&1 >> "${LOG_FILE}" || return 1
return 0
}
decrypt_file() {
local input="$1"
local output="$2"
gpg --decrypt \
--batch \
--yes \
--output "${output}" \
"${input}" 2>&1 >> "${LOG_FILE}" || error_exit "Decryption failed"
}
verify_checksum() {
local file="$1"
local s3_key="$2"
# Get metadata from S3
local expected_checksum=$(aws s3api head-object \
--bucket "${S3_BUCKET}" \
--key "${s3_key}" \
--profile "${AWS_PROFILE}" \
--endpoint-url "${S3_ENDPOINT}" \
--region "${S3_REGION}" \
--query 'Metadata.sha256' \
--output text 2>/dev/null || echo "")
if [ -n "${expected_checksum}" ]; then
local actual_checksum=$(sha256sum "${file}" | awk '{print $1}')
if [ "${expected_checksum}" != "${actual_checksum}" ]; then
error_exit "Checksum mismatch for ${s3_key}"
fi
log "Checksum verified: ${actual_checksum:0:16}..."
fi
}
# ==================== MAIN PROCESS ====================
main() {
log "Restore requested: ${WAL_FILE}"
verify_prerequisites
# Create working directory
WORK_DIR=$(mktemp -d) || error_exit "Failed to create temp directory"
trap "rm -rf ${WORK_DIR}" EXIT
FOUND=0
# Try local cache first
if [ "${CHECK_LOCAL_FIRST}" = "yes" ] && [ -d "${LOCAL_ARCHIVE_DIR}" ]; then
if [ "${GPG_DECRYPT}" = "yes" ]; then
LOCAL_FILE="${LOCAL_ARCHIVE_DIR}/${WAL_FILE}.gpg"
else
LOCAL_FILE="${LOCAL_ARCHIVE_DIR}/${WAL_FILE}"
fi
if [ -f "${LOCAL_FILE}" ]; then
log "Found in local cache: ${LOCAL_FILE}"
if [ "${GPG_DECRYPT}" = "yes" ]; then
decrypt_file "${LOCAL_FILE}" "${WAL_PATH}"
else
cp "${LOCAL_FILE}" "${WAL_PATH}"
fi
FOUND=1
fi
fi
# Download from S3 if not found locally
if [ ${FOUND} -eq 0 ]; then
if [ "${GPG_DECRYPT}" = "yes" ]; then
S3_KEY="${S3_PREFIX}/${WAL_FILE}.gpg"
DOWNLOAD_FILE="${WORK_DIR}/${WAL_FILE}.gpg"
else
S3_KEY="${S3_PREFIX}/${WAL_FILE}"
DOWNLOAD_FILE="${WORK_DIR}/${WAL_FILE}"
fi
log "Downloading from S3: ${S3_KEY}"
# Download - exit with error if not found (this is normal during recovery)
if ! download_from_s3 "${S3_KEY}" "${DOWNLOAD_FILE}"; then
log "File not found in S3: ${S3_KEY}"
exit 1
fi
# Verify checksum of downloaded file
verify_checksum "${DOWNLOAD_FILE}" "${S3_KEY}"
# Decrypt if needed
if [ "${GPG_DECRYPT}" = "yes" ]; then
log "Decrypting ${WAL_FILE}..."
decrypt_file "${DOWNLOAD_FILE}" "${WAL_PATH}"
else
mv "${DOWNLOAD_FILE}" "${WAL_PATH}"
fi
fi
# Verify final file exists
[ ! -f "${WAL_PATH}" ] && error_exit "Restored file not found: ${WAL_PATH}"
log "Successfully restored ${WAL_FILE}"
}
# Execute main function
main "$@"
exit 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment