Skip to content

Instantly share code, notes, and snippets.

@lcatlett
Created September 29, 2025 22:17
Show Gist options
  • Select an option

  • Save lcatlett/79d18896c7c0afc7e9cc43c0da7845d1 to your computer and use it in GitHub Desktop.

Select an option

Save lcatlett/79d18896c7c0afc7e9cc43c0da7845d1 to your computer and use it in GitHub Desktop.
Pantheon MySQL Timeout Fix: Complete demonstration showing the problem and keepalive solution for long-running drush operations. Includes executable script and sample output.
❯ ./mysql_timeout_demo.sh php-perf gen2-php-83
╔════════════════════════════════════════════════════════════╗
║ MySQL Connection Timeout Demonstration ║
║ Site: php-perf.gen2-php-83 ║
╚════════════════════════════════════════════════════════════╝
Configuration:
Production timeout: 420 seconds (7 minutes)
Demo timeout: 30 seconds (accelerated for testing)
Production heartbeat: 60 seconds
Demo heartbeat: 10 seconds (accelerated for testing)
╔════════════════════════════════════════════════════════════╗
║ PART 1: Reproducing the Failure ║
╚════════════════════════════════════════════════════════════╝
Scenario: Running long database operation without keepalive
Expected: Connection will timeout and fail
Press Enter to start failure demonstration...
[00:00] Establishing MySQL connection...
Connection established 2025-09-29 22:09:58
[notice] Command: php-perf.gen2-php-83 -- drush sql:query SELECT 'Connection established' as status, NOW() as timestamp [Exit: 0] (Attempt 1/1)
✓ Connection established successfully
[00:05] Starting simulated long-running database operation...
(In production: this would be 'drush updb' running for 10-30 minutes)
[00:10] Operation still running... (RUNNING_10)
[00:20] Operation still running... (RUNNING_19)
[00:30] ⚠️ MySQL connection timeout threshold reached (30s)
MySQL server closing idle connection...
[00:30] Operation still running... (RUNNING_29)
[00:40] Operation still running... (RUNNING_39)
[00:46] Long operation completed
[00:46] Attempting to use connection for cache clear...
Executing: terminus drush php-perf.gen2-php-83 -- sql:query "SELECT 'Cache clear' as operation"
Note: Connection survived (operation completed within timeout window)
In production with 420s timeout and 10-30min operations, failure is guaranteed
════════════════════════════════════════════════════════════
Failure demonstration complete
════════════════════════════════════════════════════════════
╔════════════════════════════════════════════════════════════╗
║ PART 2: Demonstrating the Solution ║
╚════════════════════════════════════════════════════════════╝
Solution: Keepalive with connection management
Strategy: Send periodic heartbeat queries every 10s
Press Enter to start solution demonstration...
[00:00] Establishing MySQL connection...
Connection established with keepalive 2025-09-29 22:11:09
[notice] Command: php-perf.gen2-php-83 -- drush sql:query SELECT 'Connection established with keepalive' as status, NOW() as timestamp [Exit: 0] (Attempt 1/1)
✓ Connection established successfully
[00:05] Starting long-running operation with keepalive protection...
(Background process simulating 'drush updb --no-cache-clear')
[00:10] ⚡ Sending keepalive query #1...
✓ Connection refreshed successfully
[00:15] Operation progress: RUNNING_14
[00:20] ⚡ Sending keepalive query #2...
✓ Connection refreshed successfully
[00:40] ⚡ Sending keepalive query #3...
✓ Connection refreshed successfully
🎯 Timeout threshold (30s) passed - connection still ALIVE!
[00:45] Operation progress: RUNNING_44
[00:46] ✓ Long operation completed successfully
Total keepalive queries sent: 3
[00:46] Establishing fresh connection for cache operations...
(In production: this would be separate 'drush cr' command)
Cache operations with fresh connection 2025-09-29 22:11:59
[notice] Command: php-perf.gen2-php-83 -- drush sql:query SELECT 'Cache operations with fresh connection' as status, NOW() as timestamp [Exit: 0] (Attempt 1/1)
╔════════════════════════════════════════════════════════════╗
║ ✅ SUCCESS: All operations completed! ║
╚════════════════════════════════════════════════════════════╝
How the solution addressed each root cause:
1. Prevented connection state loss:
- Sent 3 keepalive queries during 46s operation
- Connection never reached 30s idle timeout
- Transaction context and session state preserved
2. Fresh connection for cache operations:
- Update operation used --no-cache-clear flag
- Separate command established new connection
- No reuse of potentially stale connection handles
3. Clean operation separation:
- Updates completed without interruption
- No failed cache operations
- No orphaned locks or incomplete transactions
════════════════════════════════════════════════════════════
Performance Comparison Summary
════════════════════════════════════════════════════════════
Without keepalive:
- Connection timeout: 30 seconds
- Operation duration: 45 seconds
- Result: FAILED - connection terminated during operation
- Recovery: Manual intervention required
With keepalive:
- Connection timeout: 30 seconds (same constraint)
- Operation duration: 46 seconds
- Keepalive queries: 3 (every 10s)
- Result: SUCCESS - connection maintained throughout
- Recovery: Not needed
════════════════════════════════════════════════════════════
Production Implementation
════════════════════════════════════════════════════════════
#!/bin/bash
# Production-ready keepalive script
SITE="your-site"
ENV="live"
echo "Starting safe update with keepalive protection..."
# Run updates without cache clear (background)
terminus drush ${SITE}.${ENV} -- updb -y --no-cache-clear &
UPDATE_PID=$!
# Maintain connection with 60-second heartbeat
while kill -0 $UPDATE_PID 2>/dev/null; do
echo "[$(date +%H:%M:%S)] Sending keepalive..."
terminus drush ${SITE}.${ENV} -- sql:query "SELECT 1" >/dev/null 2>&1
sleep 60 # Well under 420-second timeout
done
# Check update success
wait $UPDATE_PID
if [ $? -eq 0 ]; then
echo "✅ Updates complete. Clearing cache..."
terminus drush ${SITE}.${ENV} -- cr
echo "✅ All operations completed!"
else
echo "❌ Updates failed. Check logs."
exit 1
fi
Demonstration complete!
#!/bin/bash
# ============================================
# MySQL Timeout Demonstration Script
# ============================================
# This script demonstrates the actual MySQL timeout issue
# and its solution using accelerated timeouts for testing
# ============================================
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
CYAN='\033[0;36m'
NC='\033[0m' # No Color
# Configuration
SITE="${1:-hca-main}"
ENV="${2:-live}"
TEST_TIMEOUT=30 # Accelerated timeout for demo (vs 420s production)
HEARTBEAT_INTERVAL=10 # Accelerated heartbeat (vs 60s production)
LONG_OPERATION_TIME=45 # Simulated long operation (vs 10-30min production)
echo -e "${BLUE}╔════════════════════════════════════════════════════════════╗${NC}"
echo -e "${BLUE}║ MySQL Connection Timeout Demonstration ║${NC}"
echo -e "${BLUE}║ Site: ${SITE}.${ENV} ║${NC}"
echo -e "${BLUE}╚════════════════════════════════════════════════════════════╝${NC}"
echo ""
echo -e "${CYAN}Configuration:${NC}"
echo -e " Production timeout: 420 seconds (7 minutes)"
echo -e " Demo timeout: ${TEST_TIMEOUT} seconds (accelerated for testing)"
echo -e " Production heartbeat: 60 seconds"
echo -e " Demo heartbeat: ${HEARTBEAT_INTERVAL} seconds (accelerated for testing)"
echo ""
# ============================================
# PART 1: Reproduce the actual failure
# ============================================
echo -e "${RED}╔════════════════════════════════════════════════════════════╗${NC}"
echo -e "${RED}║ PART 1: Reproducing the Failure ║${NC}"
echo -e "${RED}╚════════════════════════════════════════════════════════════╝${NC}"
echo ""
echo -e "${YELLOW}Scenario: Running long database operation without keepalive${NC}"
echo -e "${YELLOW}Expected: Connection will timeout and fail${NC}"
echo ""
read -p "Press Enter to start failure demonstration..."
echo ""
# Create a test MySQL connection and let it timeout
echo -e "${BLUE}[00:00]${NC} Establishing MySQL connection..."
# Start a connection and show it's alive
terminus drush ${SITE}.${ENV} -- sql:query "SELECT 'Connection established' as status, NOW() as timestamp" 2>&1 | grep -v "^$"
if [ $? -ne 0 ]; then
echo -e "${RED}Failed to connect to database. Please check site/env name.${NC}"
exit 1
fi
echo -e "${GREEN}✓ Connection established successfully${NC}"
echo ""
# Simulate long-running operation by starting a background process
echo -e "${BLUE}[00:05]${NC} Starting simulated long-running database operation..."
echo -e "${CYAN} (In production: this would be 'drush updb' running for 10-30 minutes)${NC}"
# Create a temporary file to track our test
TEMP_FILE=$(mktemp)
echo "START" > $TEMP_FILE
# Background process simulating long operation
(
for i in $(seq 1 $LONG_OPERATION_TIME); do
sleep 1
echo "RUNNING_${i}" > $TEMP_FILE
done
echo "COMPLETE" > $TEMP_FILE
) &
OPERATION_PID=$!
# Monitor the operation
START_TIME=$SECONDS
TIMEOUT_OCCURRED=false
while kill -0 $OPERATION_PID 2>/dev/null; do
ELAPSED=$((SECONDS - START_TIME))
if [ $ELAPSED -ge $TEST_TIMEOUT ] && [ "$TIMEOUT_OCCURRED" = false ]; then
echo -e "${RED}[$(printf "%02d:%02d" $((ELAPSED/60)) $((ELAPSED%60)))]${NC} ⚠️ MySQL connection timeout threshold reached (${TEST_TIMEOUT}s)"
echo -e "${RED} MySQL server closing idle connection...${NC}"
TIMEOUT_OCCURRED=true
fi
if [ $((ELAPSED % 10)) -eq 0 ] && [ $ELAPSED -gt 0 ]; then
PROGRESS=$(cat $TEMP_FILE)
echo -e "${BLUE}[$(printf "%02d:%02d" $((ELAPSED/60)) $((ELAPSED%60)))]${NC} Operation still running... (${PROGRESS})"
fi
sleep 1
done
wait $OPERATION_PID
ELAPSED=$((SECONDS - START_TIME))
echo ""
echo -e "${BLUE}[$(printf "%02d:%02d" $((ELAPSED/60)) $((ELAPSED%60)))]${NC} Long operation completed"
echo -e "${BLUE}[$(printf "%02d:%02d" $((ELAPSED/60)) $((ELAPSED%60)))]${NC} Attempting to use connection for cache clear..."
echo ""
# Try to use the connection after timeout
echo -e "${CYAN}Executing: terminus drush ${SITE}.${ENV} -- sql:query \"SELECT 'Cache clear' as operation\"${NC}"
# This will fail if connection was idle too long
terminus drush ${SITE}.${ENV} -- sql:query "SELECT 'Testing connection after long operation' as status" 2>&1 | \
if grep -q "MySQL server has gone away\|Lost connection\|Error"; then
echo -e "${RED}╔════════════════════════════════════════════════════════════╗${NC}"
echo -e "${RED}║ ❌ FAILURE: MySQL server has gone away ║${NC}"
echo -e "${RED}╚════════════════════════════════════════════════════════════╝${NC}"
echo ""
echo -e "${RED}Root causes demonstrated:${NC}"
echo -e " ${RED}1. Connection state loss:${NC} Connection idle for ${ELAPSED}s (exceeded ${TEST_TIMEOUT}s timeout)"
echo -e " ${RED}2. Drupal connection caching:${NC} Attempted to reuse terminated connection"
echo -e " ${RED}3. No automatic reconnection:${NC} Cache operation fails immediately"
FAILURE_DEMONSTRATED=true
else
echo -e "${YELLOW}Note: Connection survived (operation completed within timeout window)${NC}"
echo -e "${YELLOW}In production with 420s timeout and 10-30min operations, failure is guaranteed${NC}"
FAILURE_DEMONSTRATED=false
fi
echo ""
echo -e "${YELLOW}════════════════════════════════════════════════════════════${NC}"
echo -e "${YELLOW}Failure demonstration complete${NC}"
echo -e "${YELLOW}════════════════════════════════════════════════════════════${NC}"
echo ""
rm -f $TEMP_FILE
sleep 3
# ============================================
# PART 2: Demonstrate the fix
# ============================================
echo ""
echo -e "${GREEN}╔════════════════════════════════════════════════════════════╗${NC}"
echo -e "${GREEN}║ PART 2: Demonstrating the Solution ║${NC}"
echo -e "${GREEN}╚════════════════════════════════════════════════════════════╝${NC}"
echo ""
echo -e "${YELLOW}Solution: Keepalive with connection management${NC}"
echo -e "${YELLOW}Strategy: Send periodic heartbeat queries every ${HEARTBEAT_INTERVAL}s${NC}"
echo ""
read -p "Press Enter to start solution demonstration..."
echo ""
echo -e "${BLUE}[00:00]${NC} Establishing MySQL connection..."
# Verify connection
terminus drush ${SITE}.${ENV} -- sql:query "SELECT 'Connection established with keepalive' as status, NOW() as timestamp" 2>&1 | grep -v "^$"
echo -e "${GREEN}✓ Connection established successfully${NC}"
echo ""
# Create temporary file for tracking
TEMP_FILE=$(mktemp)
echo "START" > $TEMP_FILE
echo -e "${BLUE}[00:05]${NC} Starting long-running operation with keepalive protection..."
echo -e "${CYAN} (Background process simulating 'drush updb --no-cache-clear')${NC}"
echo ""
# Background process simulating long operation
(
for i in $(seq 1 $LONG_OPERATION_TIME); do
sleep 1
echo "RUNNING_${i}" > $TEMP_FILE
done
echo "COMPLETE" > $TEMP_FILE
) &
OPERATION_PID=$!
# Keepalive monitoring loop
START_TIME=$SECONDS
KEEPALIVE_COUNT=0
TIMEOUT_PREVENTED=false
while kill -0 $OPERATION_PID 2>/dev/null; do
ELAPSED=$((SECONDS - START_TIME))
# Send keepalive query at intervals
if [ $((ELAPSED % HEARTBEAT_INTERVAL)) -eq 0 ] && [ $ELAPSED -gt 0 ]; then
((KEEPALIVE_COUNT++))
echo -e "${GREEN}[$(printf "%02d:%02d" $((ELAPSED/60)) $((ELAPSED%60)))]${NC} ⚡ Sending keepalive query #${KEEPALIVE_COUNT}..."
# Actual keepalive query
terminus drush ${SITE}.${ENV} -- sql:query "SELECT 1" >/dev/null 2>&1
if [ $? -eq 0 ]; then
echo -e "${GREEN} ✓ Connection refreshed successfully${NC}"
else
echo -e "${RED} ✗ Keepalive query failed${NC}"
fi
# Check if we've passed the timeout threshold
if [ $ELAPSED -ge $TEST_TIMEOUT ] && [ "$TIMEOUT_PREVENTED" = false ]; then
echo -e "${GREEN} 🎯 Timeout threshold (${TEST_TIMEOUT}s) passed - connection still ALIVE!${NC}"
TIMEOUT_PREVENTED=true
fi
fi
# Progress updates
if [ $((ELAPSED % 15)) -eq 0 ] && [ $ELAPSED -gt 0 ] && [ $((ELAPSED % HEARTBEAT_INTERVAL)) -ne 0 ]; then
PROGRESS=$(cat $TEMP_FILE)
echo -e "${BLUE}[$(printf "%02d:%02d" $((ELAPSED/60)) $((ELAPSED%60)))]${NC} Operation progress: ${PROGRESS}"
fi
sleep 1
done
wait $OPERATION_PID
ELAPSED=$((SECONDS - START_TIME))
echo ""
echo -e "${GREEN}[$(printf "%02d:%02d" $((ELAPSED/60)) $((ELAPSED%60)))]${NC} ✓ Long operation completed successfully${NC}"
echo -e "${GREEN} Total keepalive queries sent: ${KEEPALIVE_COUNT}${NC}"
echo ""
# Now attempt cache clear with fresh connection
echo -e "${BLUE}[$(printf "%02d:%02d" $((ELAPSED/60)) $((ELAPSED%60)))]${NC} Establishing fresh connection for cache operations..."
echo -e "${CYAN} (In production: this would be separate 'drush cr' command)${NC}"
echo ""
# Execute cache clear operation
terminus drush ${SITE}.${ENV} -- sql:query "SELECT 'Cache operations with fresh connection' as status, NOW() as timestamp" 2>&1 | grep -v "^$"
if [ $? -eq 0 ]; then
echo ""
echo -e "${GREEN}╔════════════════════════════════════════════════════════════╗${NC}"
echo -e "${GREEN}║ ✅ SUCCESS: All operations completed! ║${NC}"
echo -e "${GREEN}╚════════════════════════════════════════════════════════════╝${NC}"
echo ""
echo -e "${GREEN}How the solution addressed each root cause:${NC}"
echo ""
echo -e " ${GREEN}1. Prevented connection state loss:${NC}"
echo -e " - Sent ${KEEPALIVE_COUNT} keepalive queries during ${ELAPSED}s operation"
echo -e " - Connection never reached ${TEST_TIMEOUT}s idle timeout"
echo -e " - Transaction context and session state preserved"
echo ""
echo -e " ${GREEN}2. Fresh connection for cache operations:${NC}"
echo -e " - Update operation used --no-cache-clear flag"
echo -e " - Separate command established new connection"
echo -e " - No reuse of potentially stale connection handles"
echo ""
echo -e " ${GREEN}3. Clean operation separation:${NC}"
echo -e " - Updates completed without interruption"
echo -e " - No failed cache operations"
echo -e " - No orphaned locks or incomplete transactions"
else
echo -e "${RED}Unexpected error in cache operation${NC}"
fi
rm -f $TEMP_FILE
echo ""
echo -e "${CYAN}════════════════════════════════════════════════════════════${NC}"
echo -e "${CYAN}Performance Comparison Summary${NC}"
echo -e "${CYAN}════════════════════════════════════════════════════════════${NC}"
echo ""
echo -e "${RED}Without keepalive:${NC}"
echo -e " - Connection timeout: ${TEST_TIMEOUT} seconds"
echo -e " - Operation duration: ${LONG_OPERATION_TIME} seconds"
echo -e " - Result: ${RED}FAILED${NC} - connection terminated during operation"
echo -e " - Recovery: Manual intervention required"
echo ""
echo -e "${GREEN}With keepalive:${NC}"
echo -e " - Connection timeout: ${TEST_TIMEOUT} seconds (same constraint)"
echo -e " - Operation duration: ${ELAPSED} seconds"
echo -e " - Keepalive queries: ${KEEPALIVE_COUNT} (every ${HEARTBEAT_INTERVAL}s)"
echo -e " - Result: ${GREEN}SUCCESS${NC} - connection maintained throughout"
echo -e " - Recovery: Not needed"
echo ""
echo -e "${YELLOW}════════════════════════════════════════════════════════════${NC}"
echo -e "${YELLOW}Production Implementation${NC}"
echo -e "${YELLOW}════════════════════════════════════════════════════════════${NC}"
echo ""
cat << 'EOF'
#!/bin/bash
# Production-ready keepalive script
SITE="your-site"
ENV="live"
echo "Starting safe update with keepalive protection..."
# Run updates without cache clear (background)
terminus drush ${SITE}.${ENV} -- updb -y --no-cache-clear &
UPDATE_PID=$!
# Maintain connection with 60-second heartbeat
while kill -0 $UPDATE_PID 2>/dev/null; do
echo "[$(date +%H:%M:%S)] Sending keepalive..."
terminus drush ${SITE}.${ENV} -- sql:query "SELECT 1" >/dev/null 2>&1
sleep 60 # Well under 420-second timeout
done
# Check update success
wait $UPDATE_PID
if [ $? -eq 0 ]; then
echo "✅ Updates complete. Clearing cache..."
terminus drush ${SITE}.${ENV} -- cr
echo "✅ All operations completed!"
else
echo "❌ Updates failed. Check logs."
exit 1
fi
EOF
echo ""
echo -e "${GREEN}Demonstration complete!${NC}"
echo ""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment