System Design/
Lesson

Bad decisions are expensive, but slow decisions are often more expensive. The goal is to make good-enough decisions quickly, document why, and know which ones deserve extra deliberation.

One-way doors vs two-way doors

Jeff Bezos popularized this at Amazon. It is the most useful decision-making heuristic in engineering.

One-way doors (irreversible): Deserve careful analysis and broad input. Examples: choosing your primary language for a large codebase, picking a cloud providerWhat is provider?A wrapper component that makes data available to all components nested inside it without passing props manually. with vendor lock-inWhat is vendor lock-in?When your code depends on features unique to one platform, making it expensive or difficult to switch to a different provider., defining a database schemaWhat is schema?A formal definition of the structure your data must follow - which fields exist, what types they have, and which are required. for billions of rows, open-sourcing proprietary code.

Two-way doors (reversible): Should be made quickly by individuals or small groups. Examples: choosing an NPM library, picking a CSS framework, deciding on an endpointWhat is endpoint?A specific URL path on a server that handles a particular type of request, like GET /api/users. naming convention, selecting a caching strategy.

Decision TypeTime to DecideWho DecidesProcess
One-way door (irreversible)Days to weeksTeam + leadershipResearch, prototype, document with ADR
Two-way door (reversible)Hours to 1 dayIndividual or pairQuick spike, make a call, move on
Two-way door disguised as one-way1-2 daysSmall groupRecognize it is reversible, decide quickly

The critical insight: most decisions feel like one-way doors but are actually two-way doors. "What if we pick the wrong database?" At 1,000 users, migrating is a weekend project. At 10 million users it is harder, but you will have very different information at that scale.

02

DACI: who decides what

When a decision stalls, it is usually because nobody knows who makes the final call. DACI assigns four roles upfront:

  • Driver: Owns the process. Gathers information, writes up the decision. Does NOT necessarily make the final call.
  • Approver: Makes the final decision. Usually one person. Has veto power.
  • Contributors: Provide input and expertise. Their opinions matter but they don't have a vote.
  • Informed: Need to know the outcome but don't participate in making the decision.
Claude Code
Example: Choosing a message queue for the new notification system

Driver:      Sarah (backend lead) - gathers options, writes comparison doc
Approver:    Mike (engineering manager) - makes the final call
Contributors: James (DevOps), Lisa (frontend, affected by API changes),
             Tom (has experience with RabbitMQ from previous job)
Informed:    Product team, QA team, other engineering teams

Key rules: every decision needs exactly one Driver and one Approver. Contributors provide input by a deadline, miss it, and the decision moves forward. Once decided, everyone commits, including dissenters.

03

Spike before commitWhat is commit?A permanent snapshot of your staged changes saved in Git's history, identified by a unique hash and accompanied by a message describing what changed.

A spike is a time-boxed investigation (1-3 days) to explore an unknown before committing to a large effort.

A spike has three components:

  1. A question: "Can we use WebSockets through our load balancerWhat is load balancer?A server that distributes incoming traffic across multiple backend servers so no single server gets overwhelmed.?"
  2. A timebox: "2 days max."
  3. An output: A short write-up with evidence (prototype, benchmarks, or a clear "no, here is why").
Claude Code
Spike: Can Elasticsearch handle our autocomplete use case?

Timebox: 2 days
Question: Can we get sub-50ms autocomplete for 10M product names
         with typo tolerance?

Day 1: Set up ES locally, indexed 10M synthetic product names,
        tested completion suggester.
Day 2: Benchmarked. p50 = 12ms, p99 = 38ms with typo tolerance.
        Memory usage: 4GB for the index.

Result: Yes. ES handles our use case well. Recommended configuration:
        - 2 nodes, 1 replica
        - completion suggester (not query-time fuzzy)
        - Estimated cost: ~$200/month on Elastic Cloud

The spike prevents the worst outcome: spending 3 weeks building something only to discover it doesn't work.

04

Architecture decision records (ADRs)

An ADRWhat is adr?Architecture Decision Record - a short document capturing one technical choice, why it was made, and what tradeoffs were accepted. captures a significant architectural decision: what was decided, why, and what alternatives were considered.

markdown
# ADR-007: Use PostgreSQL for primary data store

## Status
Accepted (2025-03-15)

## Context
We need a primary database for user data, orders, and inventory.
We expect to reach 1M users within 18 months. Our team has experience
with PostgreSQL and MySQL but not with NoSQL databases.

## Decision
We will use PostgreSQL (hosted on AWS RDS).

## Alternatives considered
- MySQL: Similar capabilities, but team has deeper PostgreSQL expertise
- DynamoDB: Better scalability, but our access patterns are relational
  and would require denormalization we're not ready for
- MongoDB: Flexible schema, but we benefit from ACID transactions
  for order processing

## Consequences
- We accept the scaling ceiling of a single-primary RDBMS
- We will need to shard or migrate if we exceed ~10M users
- Team can be productive immediately (existing expertise)
- We get ACID transactions for order processing out of the box

ADRs prevent re-litigation (new team members read instead of reopening debates), capture context that nobody will remember in six months, and explicitly acknowledge tradeoffs. They should take 30 minutes to write. Store them in your repositoryWhat is repository?A project folder tracked by Git that stores your files along with the complete history of every change, inside a hidden .git directory. (docs/decisions/ or adr/). Don't edit old ADRs, write a new one that references the old.

05

Putting it all together

Claude Code
New technical decision needed


Is it reversible (two-way door)?
   ├── Yes ──> One person decides. Spend hours, not days.
   │           Document briefly if others are affected.

   └── No ───> Time-box a spike (1-3 days) if the
               domain is unfamiliar.


               Assign DACI roles.
               Driver gathers options.
               Contributors provide input by deadline.
               Approver makes the call.


               Write an ADR.
               Communicate to Informed parties.
               Move on and build.

Speed of decision-making compounds over time. A team that decides in days instead of weeks ships months ahead of a team that debates everything.

AI pitfall
AI is excellent at generating ADR templates but terrible at filling in the "Consequences" section honestly. It will list the benefits of your chosen option but gloss over what you are giving up. Always write the consequences section yourself, force yourself to name the downsides explicitly.
Good to know
ADRs should be short. If an ADR takes more than 30 minutes to write, you are overcomplicating it. The goal is to capture the decision and the reasoning, not to write a whitepaper. A two-paragraph ADR that captures the "why" is more valuable than a ten-page document nobody reads.
Edge case
"Two-way door" decisions can become one-way doors over time. Choosing a CSS framework is reversible when you have 5 pages. After 500 pages with deeply nested component styles, switching frameworks is a multi-month project. Recognize when a reversible decision has accumulated enough usage to become effectively irreversible.