System Design - Scaling Analysis with AI

Create a free account to save your progress

Earn XP, track streaks, and sync your dashboard across devices.

Lesson

Scaling decisions involve a lot of math, comparison, and scenario modeling. AI is genuinely useful here, not as a decision-maker, but as a fast analyst that can crunch numbers and generate options you might not have considered. The key is knowing where it helps and where it leads you astray.

What AI does well vs. poorly

AI pitfall

AI scaling recommendations are based on generic architectures, not your actual system. It might recommend adding read replicas when your bottleneck is CPU-bound computation, or suggest a CDN when your latency comes from a slow third-party API. Always identify your actual bottleneck with monitoring data before asking AI for solutions.

AI strengths	AI weaknesses
Analyzing structured metrics (CPU, memory, latency)	Understanding organizational constraints (team size, budget, expertise)
Generating capacity estimates from traffic projections	Estimating real-world costs accurately (often underestimates operational overhead)
Comparing scaling strategies with pros/cons tables	Knowing when "good enough" is the right answer
Suggesting optimizations from code or query patterns	Accounting for human factors (on-call burden, hiring difficulty)
Creating back-of-envelope calculations quickly	Recommending proportionate solutions (jumps to microservices too fast)
Generating monitoring dashboards and alert thresholds	Understanding your specific business context and risk tolerance

Prompt templates

1. Analyze bottlenecks from metrics

Prompt:

Here are the metrics from my production system over the last 24 hours:

- Web servers: 4x t3.large, CPU avg 78%, memory avg 62%
- Database: 1x r6g.xlarge PostgreSQL, CPU avg 45%, connections avg 180/200
- Redis: 1x cache.t3.medium, memory 89%, hit rate 72%
- RPS: avg 850, peak 2,400 (during business hours)
- p50 latency: 120ms, p99 latency: 1,800ms
- Error rate: 0.3% (mostly 503 during peak)

Identify the top 3 bottlenecks in order of severity.
For each, explain why it is a problem and suggest a fix
with estimated cost and implementation time.

AI will typically produce a solid analysis here because the data is structured and the task is well-defined. It will likely identify: the Redis memory pressure and low hit rate, the database connection saturation, and the CPU spike on web servers during peak. These are reasonable conclusions from the numbers.

What to verify: AI might miss that the high p99 latencyWhat is latency?The time delay between sending a request and receiving the first byte of the response, usually measured in milliseconds. could be caused by a single slow endpointWhat is endpoint?A specific URL path on a server that handles a particular type of request, like GET /api/users. rather than overall capacity. Check your APM data for specific endpoint latencies before scaling everything.

2. Capacity planning for growth

Prompt:

My SaaS application currently handles:
- 10,000 daily active users
- 850 RPS average, 2,400 RPS peak
- 500 GB database, growing 2 GB/day
- Running on AWS (us-east-1)

We expect 5x growth over the next 12 months.

Create a capacity plan with:
1. Month-by-month resource projections
2. When each scaling threshold will be hit
3. Recommended infrastructure changes at each stage
4. Estimated monthly AWS cost at each stage
Include both a conservative and aggressive growth scenario.

This is where AI shines, generating structured projections that would take you hours to build in a spreadsheet. The output will include timelines, cost estimates, and decision points.

What to verify: AI almost always overestimates growth curves (it assumes linear or exponential growth when real growth is lumpy and unpredictable). It also underestimates the engineering time needed to implement changes. Treat the numbers as directional, not precise.

3. Recommend a scaling strategy

Prompt:

I run an e-commerce platform on a monolith (Node.js + PostgreSQL).
Current setup: 2 app servers, 1 database (read replica planned).
Team: 4 backend engineers.
Problem: Black Friday traffic is 20x normal, and last year we went
down for 2 hours.

Suggest a scaling strategy that:
- Handles 20x traffic spikes lasting 8 hours
- Can be implemented by 4 engineers in 3 months
- Minimizes ongoing operational complexity
- Stays within $5,000/month infrastructure budget

Do NOT suggest microservices. We want to keep the monolith.

Notice the explicit constraint: "Do NOT suggest microservicesWhat is microservices?An architecture where an application is split into small, independently deployed services that communicate over the network, each owning its own data.." Without this, AI will almost certainly recommend splitting into services, which is the wrong advice for a 4-person team with a 3-month timeline. Being specific about constraints produces dramatically better recommendations.

What to verify: common AI mistakes

1. AI jumps to microservicesWhat is microservices?An architecture where an application is split into small, independently deployed services that communicate over the network, each owning its own data. too fast

AI has been trained on thousands of articles praising microservices. It defaults to recommending them even when a well-tuned monolithWhat is monolith?A software architecture where the entire application lives in a single codebase and deploys as one unit. Simpler to build and debug than microservices. would handle the load. For a team under 10 engineers, microservices almost always add more problems than they solve.

Reality check: Can you solve this with a bigger database instance and read replicas? With better caching? With query optimization? If yes, do that first.

2. AI overestimates traffic

When asked to project growth, AI tends to assume hockey-stick curves. It will project your 1,000 RPS to 50,000 RPS in a year when the realistic number might be 3,000 RPS.

Reality check: Look at your actual growth rate over the last 6 months. Extrapolate conservatively. Build for 2-3x your projection, not 10x.

3. AI ignores cost

AI recommends architectures that are technically elegant but financially absurd. Running a managed Kubernetes cluster with auto-scaling across three availability zones is great engineering, but it costs $2,000/month in base infrastructure before you serve a single request.

Reality check: Always ask AI to include cost estimates. Then double them (AI consistently underestimates operational costs like data transfer, logging, monitoring, and support plans).

4. AI underestimates operational complexity

"Just add a Redis cluster for caching" sounds simple in a recommendation. In practice, it means: choosing a caching strategy, handling cache invalidationWhat is cache invalidation?Removing or updating cached data when the original data changes, so users never see outdated information., monitoring cache hit rates, managing Redis failoverWhat is failover?Automatically switching traffic from a failed server or service to a healthy backup to keep the system running., adding connection pooling, and training your team on a new technology.

Reality check: For each recommendation, ask yourself: who on the team knows how to operate this? What happens at 3 AM when it breaks?

Hybrid workflow

The most effective approach combines AI speed with human judgment:

Gather real metrics: collect actual numbers from your monitoring tools (not estimates)
Feed AI the numbers: give it structured data and specific constraints (budget, team size, timeline)
Get multiple options: ask for at least 3 approaches ranked by complexity
Add constraints AI misses: team expertise, on-call burden, existing vendor relationships, migrationWhat is migration?A versioned script that changes your database structure (add a column, create a table) so every developer and server stays in sync. risk
Validate costs independently: check cloud providerWhat is provider?A wrapper component that makes data available to all components nested inside it without passing props manually. pricing calculators against AI estimates
Start with the simplest option: if the simplest recommendation solves 80% of the problem, do that first

AI is your analyst, not your architect. It processes data faster than you can, generates options you might miss, and formats everything into clean comparisons. But the decision, especially when it involves tradeoffs between cost, complexity, and team capacity, is always yours.

Good to know

The most underrated scaling technique is simply writing efficient code. Before adding infrastructure (caches, replicas, CDNs), check if your slow endpoint is doing unnecessary database queries, loading unused data, or computing things that could be precomputed. An N+1 query fix can improve performance by 10x with zero infrastructure changes.

Create a free account to save your progress

Essential to know

What AI does well vs. poorly

Prompt templates

1. Analyze bottlenecks from metrics

2. Capacity planning for growth

3. Recommend a scaling strategy

What to verify: common AI mistakes

1. AI jumps to microservicesWhat is microservices?An architecture where an application is split into small, independently deployed services that communicate over the network, each owning its own data. Ask AI for more too fast

2. AI overestimates traffic

3. AI ignores cost

4. AI underestimates operational complexity

Hybrid workflow

1. AI jumps to microservicesWhat is microservices?An architecture where an application is split into small, independently deployed services that communicate over the network, each owning its own data. too fast