Data loss is not an "if" but a "when" question. Databases get corrupted. Cloud providers have outages. Engineers accidentally run DROP TABLE on production. The difference between a minor incident and a catastrophe is whether you have reliable, tested backups. This lesson gives you the mental model and practical techniques to build a solid backup strategy.
The 3-2-1 rule
The 3-2-1 rule is the backbone of any sensible backup strategy. It's been around for decades because it works.
- 3 copies of your data (the original + 2 backups)
- 2 different storage media or services
- 1 copy offsite (in a different physical location or cloud region)
Think of it like this: if your database is on a server in AWS us-east-1, a backup in another folder on the same server doesn't help when the whole machine fails. A backup in us-west-2 and another in Backblaze B2 means two separate failures would need to happen simultaneously to lose your data.
| Storage type | Examples | Protects against |
|---|---|---|
| Primary (online) | RDS, D1, Postgres on EC2 | Normal reads/writes |
| Secondary (cloud) | S3, R2, GCS | Server failure, accidental deletion |
| Offsite | Different cloud provider, cold storage | Region outage, provider failure |
Backup types
Not all backups work the same way. Understanding the tradeoffs helps you pick the right strategy for each part of your system.
Full vs incremental vs differential
A full backup copies everything. An incremental backup copies only what changed since the last backup (of any kind). A differential backup copies what changed since the last full backup.
Full backup: All data (slow, large, self-contained)
Incremental backup: Changes since last backup (fast, small, requires chain)
Differential backup: Changes since last full (medium speed/size, simpler restore)Database-specific backups
Most databases have purpose-built backup tools that understand transactionWhat is transaction?A group of database operations that either all succeed together or all fail together, preventing partial updates. consistency, something file-level copies can't guarantee.
# PostgreSQL: pg_dump creates a consistent snapshot
pg_dump -h localhost -U postgres -d mydb -F c -f backup_$(date +%Y%m%d).dump
# Restore from a pg_dump backup
pg_restore -h localhost -U postgres -d mydb backup_20240101.dump
# SQLite: just copy the file (while no writes are happening)
sqlite3 mydb.db ".backup 'backup_$(date +%Y%m%d).db'"
# MySQL: mysqldump
mysqldump -u root -p mydb > backup_$(date +%Y%m%d).sqlAutomating backups
A backup strategy that requires a human to remember to run it will fail. Automate everything.
Using cron jobs
# Edit your crontab
crontab -e
# Run a database backup every day at 2am
0 2 * * * /home/deploy/scripts/backup.sh >> /var/log/backup.log 2>&1
# Run a weekly full backup on Sundays at 3am
0 3 * * 0 /home/deploy/scripts/full-backup.sh >> /var/log/backup.log 2>&1#!/bin/bash
# backup.sh - simple PostgreSQL backup to S3
set -e # exit on any error
DATE=$(date +%Y%m%d_%H%M%S)
DB_NAME="myapp_production"
BACKUP_FILE="/tmp/backup_${DATE}.dump"
S3_BUCKET="s3://my-app-backups/db"
echo "Starting backup at ${DATE}"
# Create the backup
pg_dump -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -F c -f "$BACKUP_FILE"
# Upload to S3
aws s3 cp "$BACKUP_FILE" "${S3_BUCKET}/backup_${DATE}.dump"
# Remove local temp file
rm "$BACKUP_FILE"
echo "Backup completed successfully"Retention policies
Don't keep backups forever, storage costs add up, and you rarely need data from 3 years ago. A sensible default:
| Frequency | Keep for |
|---|---|
| Hourly | 24 hours |
| Daily | 30 days |
| Weekly | 3 months |
| Monthly | 1 year |
# Delete S3 backups older than 30 days
aws s3 ls s3://my-app-backups/db/ \
| awk '{print $4}' \
| while read file; do
age=$(( ( $(date +%s) - $(date -d "$(echo $file | grep -oP '\d{8}')" +%s) ) / 86400 ))
if [ $age -gt 30 ]; then
aws s3 rm "s3://my-app-backups/db/$file"
fi
doneRTOWhat is rto?Recovery Time Objective - the maximum time a system can be down before recovery must be complete. and RPOWhat is rpo?Recovery Point Objective - the maximum amount of data loss acceptable after an incident, expressed as time (e.g., no more than 1 hour of lost data).
Before you can evaluate a backup strategy, you need to know what you're optimizing for. Two metrics define your requirements.
RPO (Recovery Point Objective) is the maximum amount of data loss you can accept. If your RPO is 1 hour, you must take backups at least every hour. If you can't lose a single transactionWhat is transaction?A group of database operations that either all succeed together or all fail together, preventing partial updates., you need continuous replication.
RTO (Recovery Time Objective) is the maximum time it can take to restore service. If your RTO is 4 hours, you need to be able to restore everything, database, files, configuration, within 4 hours.
Quick reference
| Requirement | Approach |
|---|---|
| Low RPO (minutes) | Continuous replication, read replicas |
| Medium RPO (hours) | Hourly incremental backups |
| High RPO (days) | Daily full backups |
| Low RTO (minutes) | Hot standby, automatic failover |
| Medium RTO (hours) | Pre-configured restore scripts |
| High RTO (days) | Manual restore from cold storage |