CAP Theorem Explained

Create a free account to save your progress

Earn XP, track streaks, and sync your dashboard across devices.

Lesson

The CAP theorem is one of those things that gets mentioned in every system design interview, often incorrectly. Let's get it right.

In 2000, Eric Brewer proposed a conjecture (later proven as a theorem by Seth Gilbert and Nancy Lynch) that a distributed data store can only provide two of the following three guarantees simultaneously:

Consistency (C): Every read receives the most recent write or an error. All nodes see the same data at the same time.
Availability (A): Every request receives a non-error response, without guaranteeing it contains the most recent write.
Partition tolerance (P): The system continues to operate despite network partitions (messages being dropped or delayed between nodes).

Why you always pick P

Here is the thing that confuses people: you don't actually get to choose between all three pairs (CA, CP, AP). In any distributed system, network partitions will happen. Cables get cut. Switches fail. Data centers lose connectivity. If your system runs on more than one machine, partition tolerance is not optional, it is a requirement.

So the real question is: when a network partition occurs, do you sacrifice consistency or availability?

Claude Code

Network partition happens between Node A and Node B:

CP choice:
  Client -> Node A: "Write x = 5"  -> OK
  Client -> Node B: "Read x"       -> ERROR (can't confirm latest value)

AP choice:
  Client -> Node A: "Write x = 5"  -> OK
  Client -> Node B: "Read x"       -> Returns x = 3 (stale but available)

A CP system will refuse to serve a read if it cannot guarantee the data is current. An AP system will always give you an answer, even if it might be outdated.

CP vs AP: real systems

Let's look at where real databases and systems fall on this spectrum.

System	Category	Behavior During Partition
PostgreSQL (single-primary replication)	CP	Replicas stop serving reads if they lose contact with the primary. Writes only go to the primary.
MongoDB (default config)	CP	Elections choose a new primary; during election, writes are rejected. Reads from secondaries may be stale but default read preference targets primary.
Amazon DynamoDB	AP	Always accepts reads and writes. Uses eventual consistency by default. Offers optional strongly consistent reads at higher latency.
Apache Cassandra	Tunable	You configure consistency level per query. `QUORUM` reads/writes give you CP behavior. `ONE` gives you AP behavior.
Redis Cluster	AP	If a partition isolates a master, the cluster promotes a replica. Writes to the old master during the partition window are lost.
Google Spanner	CP	Uses TrueTime (atomic clocks + GPS) to achieve global strong consistency. Sacrifices some availability during partitions.
CockroachDB	CP	Inspired by Spanner. Ranges become unavailable if a majority of replicas cannot communicate.
Amazon S3	AP	Provides read-after-write consistency for new objects but was eventually consistent for overwrites until 2020 (now strongly consistent).

The "CA" myth

You sometimes see people say "a single-node PostgreSQL database is CA." That is technically true but useless, a single machine has no network partitions because there is no network between nodes. The moment you add a second node, you are in CAP territory and you must choose.

Some people list traditional RDBMS systems as CA. What they mean is: "when running on a single node, it is both consistent and available." But that is not a distributed system, so CAP does not apply. Don't let this confuse you.

PACELC: the extended model

The CAP theorem only describes what happens during a partition. But most of the time, your system is running fine with no partition. Daniel Abadi proposed the PACELC model to cover both cases:

Partition? Choose Availability or Consistency
Else (no partition)? Choose Latency or Consistency

This is the more useful mental model because the "normal operation" tradeoff matters far more than the partition-time behavior (partitions are rare).

System	During Partition (PAC)	Normal Operation (ELC)
PostgreSQL	PC (reject if inconsistent)	EC (consistency over latency)
DynamoDB	PA (stay available)	EL (low latency, eventual consistency by default)
Cassandra	PA/PC (tunable)	EL/EC (tunable per query)
Google Spanner	PC (consistency first)	EC (consistency via TrueTime, higher latency)
MongoDB	PC (elections cause downtime)	EC (reads from primary by default)
CockroachDB	PC (unavailable if no quorum)	EC (serializable by default)

Cassandra: the tunable example

Cassandra deserves a closer look because it lets you choose your consistency level on every single query. This is incredibly powerful.

Claude Code

// AP behavior - fast, but may return stale data
SELECT * FROM users WHERE id = 123
  WITH CONSISTENCY ONE;

// CP behavior - slower, but guaranteed fresh
SELECT * FROM users WHERE id = 123
  WITH CONSISTENCY QUORUM;

// Strongest guarantee - all replicas must respond
SELECT * FROM users WHERE id = 123
  WITH CONSISTENCY ALL;

With a replication factor of 3, QUORUM means 2 out of 3 replicas must agree. This gives you strong consistency for critical reads (account balance) while allowing eventual consistencyWhat is eventual consistency?A guarantee that all copies of data will converge to the same value given enough time, rather than being instantly synchronized after every write. for non-critical reads (user profile view count).

The formula is simple: if R + W > N (read replicas + write replicas > total replicas), you get strong consistency. With N=3, QUORUM for both reads and writes means 2 + 2 > 3, strongly consistent.

Practical takeaway

Don't think of CAP as a rigid classification. Think of it as a spectrum that you navigate per feature:

Your user's account balance? CP. Never show stale data.
Your social feed's like count? AP. Being one second behind is fine.
Your shopping cart? Depends. Some companies go AP (Amazon famously chooses availability for carts) because a customer adding an item to a cart that briefly shows the old count is better than showing an error page.

The best systems don't pick one side globally, they make different tradeoffs for different data and different operations. That is the real lesson of CAP.

AI pitfall

Ask AI "is my system CP or AP?" and it will give you a definitive answer. In reality, most production systems are neither purely CP nor purely AP, they make different tradeoffs for different operations. A system can be CP for payments and AP for user profile reads. AI oversimplifies CAP into a binary classification when it is actually a spectrum.

Good to know

CAP only applies to distributed systems. If your entire application runs on a single database server, CAP is irrelevant, you get both consistency and availability (until that server goes down). Don't over-apply CAP to systems that aren't distributed.

Edge case

Redis Cluster has a subtle failure mode during partitions. If a master gets isolated, the cluster promotes a replica. But writes accepted by the old master during the partition window are silently lost when the partition heals. If you use Redis for anything besides caching, you need to understand this.

Done

Complete & Next