Every engineering decision is a tradeoff between cost and performance, not just server costs, but total cost, including the most expensive resource: engineer time.
The diminishing returns curve
Performance improvements are not linear. Going from 1s to 500ms might require a CDNWhat is cdn?Content Delivery Network - a network of servers around the world that caches your files and serves them from the location closest to the user, making pages load faster. (hours of work). From 500ms to 200ms: rewriting database queries (days). From 200ms to 50ms: a complete architecture overhaul (months).
Performance
(response time)
|
1000ms |*
| *
500ms | *
| *
200ms | *
| *
100ms | *
| *
50ms | *
+-------------------------------------------->
Effort / Cost invested80% of performance gains come from 20% of the effort (Pareto principle). For most web applications, 200ms response time is fine. Spending three months to reach 50ms only matters for trading platforms or search engines.
Know where your bottleneck is
Before optimizing anything, measure. The number one mistake is guessing where the problem is instead of profiling.
Request lifecycle (typical web app):
┌──────────────────────────────────────────┐
│ DNS lookup: 5ms │
│ TCP + TLS handshake: 30ms │
│ Server processing: 150ms │ <-- people optimize this
│ └── Database query: 120ms │ <-- this is the real bottleneck
│ └── App logic: 30ms │
│ Response transfer: 20ms │
│ Client rendering: 100ms │
├──────────────────────────────────────────┤
│ Total: 305ms │
└──────────────────────────────────────────┘Rewriting your app logic to be twice as fast saves 15ms. Adding a database indexWhat is index?A data structure the database maintains alongside a table so it can find rows by specific columns quickly instead of scanning everything. saves 100ms. Always find the bottleneck first.
| Layer | Tool | What It Tells You |
|---|---|---|
| Frontend | Chrome DevTools (Performance tab) | Rendering, scripting, and layout time |
| Network | Chrome DevTools (Network tab) | Request waterfalls, slow endpoints |
| Backend | APM tools (Datadog, New Relic) | Endpoint latency breakdown, error rates |
| Database | EXPLAIN ANALYZE (SQL) | Query execution plan, index usage |
| Infrastructure | Cloud monitoring (CloudWatch, etc.) | CPU, memory, I/O utilization |
Total cost of ownership (TCO)
When people say "this costs $50/month," they mean the hosting bill. But TCO includes everything:
Direct costs: Hosting, third-party service fees, licensing.
Indirect costs (usually bigger): Developer time to build and maintain, oncall burden, debugging time, opportunity cost, knowledge silo risk.
A "free" self-hosted solution: one engineer-week to set up ($6,000) + two hours/month to maintain ($3,600/year) = $9,600 in year one. A managed service at $200/month ($2,400/year) is cheaper, and the engineer ships features instead.
Build vs buy
Default to buy (use a managed service or library) unless:
- The feature is your core competitive advantage
- No existing solution fits your requirements
- Existing solutions have unacceptable limitations (security, compliance, performance)
- You have the team to build AND maintain it long-term
| Factor | Build | Buy |
|---|---|---|
| Time to market | Weeks to months | Hours to days |
| Upfront cost | High (engineer time) | Low to medium (subscription) |
| Ongoing maintenance | Your responsibility | Their responsibility |
| Customization | Unlimited | Limited to what they offer |
| Risk: vendor lock-in | None | Medium to high |
| Risk: talent dependency | High (bus factor) | Low |
Buy: AuthenticationWhat is authentication?Verifying who a user is, typically through credentials like a password or token. (Auth0, Clerk), email (SendGrid, Resend), search (Algolia), payments (Stripe), monitoring (Datadog). These are complex, not your competitive advantage, and a full-time job to maintain.
Build: Your core product logic (e.g., Airbnb's pricing algorithm), highly custom workflows no tool fits, regulated data handling requiring specific compliance controls.
The managed services calculation
Self-hosted PostgreSQL:
EC2 instance: CODE_BLOCK00/month
EBS storage: $30/month
Backups (S3): CODE_BLOCK0/month
Engineer time (setup): 40 hours x CODE_BLOCK50 = $6,000 (one-time)
Engineer time (maint): 4 hours/month x CODE_BLOCK50 = $600/month
Oncall burden: Priceless (but not zero)
─────────────────────
Year 1 total: ~CODE_BLOCK4,880
AWS RDS PostgreSQL:
Instance + storage: $250/month
Automated backups: Included
Maintenance: 1 hour/month x CODE_BLOCK50 = CODE_BLOCK50/month
─────────────────────
Year 1 total: ~$4,800The managed service costs 2.5x more in hosting but 3x less in total.
When to optimize (and when not to)
- Is anyone actually complaining? If not, don't optimize. A p99 of 800ms might be fine.
- Is this the bottleneck? Profile before you optimize. Don't speed up the fast part.
- What is the ROI? Shaving 200ms for 500 users over 2 weeks: low return. Same effort for 500,000 users: worth it.
- Can you throw money at it? A $50/month server upgrade that buys six months of headroom beats a week of code optimization.
- One-time or recurring? A regression that worsens with traffic needs a real fix. A monthly slow page can wait.