Introduction to PostgreSQL WAL Archiving
Write-Ahead Logging (WAL) is PostgreSQL's fundamental mechanism for ensuring data durability and providing the foundation for various backup and replication strategies. In production environments, implementing a robust WAL archiving system is critical for:
- Data Durability: Ensuring no committed transaction is lost
- Point-in-Time Recovery (PITR): Restoring the database to any specific moment
- High Availability: Maintaining standby servers for failover
- Zero Data Loss: Even in catastrophic hardware failures
"A properly configured WAL archiving system can mean the difference between a 5-minute recovery and a 5-hour data loss incident."
Backup Architecture Overview
Our backup architecture consists of three main components:
Primary Server
Production database server with WAL archiving enabled
WAL Archive Storage
Secure, redundant storage for transaction logs
Standby Server
Hot standby for immediate failover capability
Primary Server Configuration
Step 1: Create Archive & Backup Directories
# Create archive & backup directories
sudo mkdir -p /mnt/server/archivedir
sudo mkdir -p /mnt/pg_backup
# Set proper permissions
sudo chown -R postgres:postgres /mnt/server/archivedir
sudo chmod -R 700 /mnt/server/archivedir
sudo chown -R postgres:postgres /mnt/pg_backup
sudo chmod -R 700 /mnt/pg_backup
Step 2: Configure postgresql.conf
# Enable WAL archiving
archive_mode = on
archive_command = 'cp %p /mnt/server/archivedir/%f'
wal_level = replica
max_wal_senders = 3
wal_keep_size = 64
max_replication_slots = 3
# Recommended production settings
wal_compression = on
archive_timeout = 300
synchronous_commit = remote_write
Step 3: Configure pg_hba.conf
# Allow replication connections from standby servers
host replication replicator standby_ip/32 scram-sha-256
Step 4: Reload PostgreSQL Configuration
sudo -u postgres pg_ctl reload -D /var/lib/postgresql/14/main
# Or using systemctl
sudo systemctl reload postgresql
Backup Process Implementation
Method 1: Using pg_basebackup (Recommended)
# Take base backup with WAL streaming
sudo -u postgres pg_basebackup -D /mnt/pg_backup/base_backup -Ft -z -P -Xs -c fast
# Options explained:
# -D : Directory for backup
# -Ft : Tar format output
# -z : Compress with gzip
# -P : Show progress
# -Xs : Stream WAL during backup
# -c fast : Start backup immediately
Method 2: Manual Backup with pg_start_backup
# Start the backup
psql -c "SELECT pg_start_backup('backup_label', false, false);"
# Use rsync to copy data directory
sudo rsync -av /var/lib/postgresql/14/main/ /mnt/pg_backup/manual_backup/
# Stop the backup
psql -c "SELECT pg_stop_backup(false, true);"
Automated Backup Script
#!/bin/bash
# backup_postgres.sh
BACKUP_DIR="/mnt/pg_backup"
DATE=$(date +%Y%m%d_%H%M%S)
LOG_FILE="/var/log/postgresql/backup_${DATE}.log"
echo "Starting PostgreSQL backup at $(date)" >> ${LOG_FILE}
# Create backup directory
mkdir -p ${BACKUP_DIR}/${DATE}
# Perform base backup
sudo -u postgres pg_basebackup \
-D ${BACKUP_DIR}/${DATE} \
-Ft \
-z \
-P \
-Xs \
-c fast 2>> ${LOG_FILE}
# Check if backup was successful
if [ $? -eq 0 ]; then
echo "Backup completed successfully at $(date)" >> ${LOG_FILE}
# Remove old backups (keep last 7 days)
find ${BACKUP_DIR} -type d -mtime +7 -exec rm -rf {} \;
echo "Old backups cleanup completed" >> ${LOG_FILE}
else
echo "Backup failed at $(date)" >> ${LOG_FILE}
exit 1
fi
Standby Server Configuration
Step 1: Prepare Standby Server
# Install PostgreSQL
sudo apt update
sudo apt install postgresql-14 postgresql-client-14
# Stop PostgreSQL service
sudo systemctl stop postgresql
# Remove existing data directory
sudo rm -rf /var/lib/postgresql/14/main/*
Step 2: Clone Primary Database
# From standby server, clone primary database
sudo -u postgres pg_basebackup \
-h primary_server_ip \
-D /var/lib/postgresql/14/main \
-U replicator \
-v \
-P \
-Xs \
-R \
-S standby_slot1
Step 3: Configure Standby Settings
# Edit postgresql.conf on standby
hot_standby = on
max_standby_streaming_delay = 30s
max_standby_archive_delay = 30s
hot_standby_feedback = on
Step 4: Create Recovery Configuration
# Create standby.signal file
sudo -u postgres touch /var/lib/postgresql/14/main/standby.signal
# Configure recovery settings in postgresql.conf
primary_conninfo = 'host=primary_server_ip port=5432 user=replicator password=your_password application_name=standby1'
primary_slot_name = 'standby_slot1'
restore_command = 'cp /mnt/server/archivedir/%f %p'
recovery_target_timeline = 'latest'
Step 5: Start Standby Server
# Start PostgreSQL service
sudo systemctl start postgresql
# Check replication status
psql -c "SELECT * FROM pg_stat_replication;"
Point-in-Time Recovery (PITR)
Recovery Process
# Step 1: Restore base backup
tar -xzf /mnt/pg_backup/base_backup.tar.gz -C /var/lib/postgresql/14/main/
# Step 2: Create recovery configuration
cat > /var/lib/postgresql/14/main/recovery.conf << EOF
restore_command = 'cp /mnt/server/archivedir/%f %p'
recovery_target_time = '2024-01-15 14:30:00'
recovery_target_action = 'promote'
EOF
# Step 3: Create recovery.signal
sudo -u postgres touch /var/lib/postgresql/14/main/recovery.signal
# Step 4: Start PostgreSQL in recovery mode
sudo -u postgres /usr/lib/postgresql/14/bin/pg_ctl \
-D /var/lib/postgresql/14/main \
-l /var/log/postgresql/recovery.log \
start
Monitoring Recovery Progress
# Check recovery status
psql -c "SELECT * FROM pg_stat_recovery_progress;"
# Monitor WAL replay
psql -c "SELECT * FROM pg_stat_wal_receiver;"
Monitoring & Maintenance
Essential Monitoring Commands
# Check WAL archiving status
psql -c "SELECT * FROM pg_stat_archiver;"
# Monitor replication slots
psql -c "SELECT * FROM pg_replication_slots;"
# Check WAL file generation
psql -c "SELECT pg_current_wal_lsn(), pg_walfile_name(pg_current_wal_lsn());"
# Archive directory usage
du -sh /mnt/server/archivedir/
find /mnt/server/archivedir/ -name "*.backup" | wc -l
Maintenance Script
#!/bin/bash
# wal_maintenance.sh
ARCHIVE_DIR="/mnt/server/archivedir"
LOG_FILE="/var/log/postgresql/wal_maintenance.log"
echo "Starting WAL maintenance at $(date)" >> ${LOG_FILE}
# Remove WAL files older than 30 days
find ${ARCHIVE_DIR} -name "*.backup" -mtime +30 -delete >> ${LOG_FILE} 2>&1
# Check for failed archiving attempts
FAILED_COUNT=$(psql -t -c "SELECT failed_count FROM pg_stat_archiver;" | tr -d ' ')
if [ ${FAILED_COUNT} -gt 0 ]; then
echo "WARNING: ${FAILED_COUNT} failed archiving attempts detected" >> ${LOG_FILE}
fi
echo "WAL maintenance completed at $(date)" >> ${LOG_FILE}
Production Best Practices
Security Considerations
- Use separate replication users with limited privileges
- Implement SSL/TLS for replication connections
- Regularly rotate passwords and encryption keys
- Monitor access logs for unauthorized attempts
Performance Optimization
- Tune
wal_buffersbased on workload - Adjust
checkpoint_timeoutandmax_wal_size - Use separate storage for WAL files
- Implement connection pooling for standby servers
Disaster Recovery Testing
- Test failover procedures monthly
- Validate backup restores quarterly
- Document recovery time objectives (RTO/RPO)
- Conduct disaster recovery drills annually
Troubleshooting Common Issues
Issue: Replication Lag
Check network latency, disk I/O, and max_standby_streaming_delay settings. Consider increasing WAL buffers and optimizing queries.
Issue: Archive Command Failure
Verify permissions, disk space, and command syntax. Check PostgreSQL logs for specific error messages.
Conclusion
Implementing PostgreSQL WAL archiving with Point-in-Time Recovery provides enterprise-grade data protection. The key takeaways are:
- ✅ Regular base backups combined with continuous WAL archiving
- ✅ Properly configured standby servers for high availability
- ✅ Comprehensive monitoring and alerting systems
- ✅ Regular testing of backup and recovery procedures
- ✅ Documentation of recovery processes and timelines
Remember: A backup is only as good as your last successful restore test.