Scaling decisions involve a lot of math, comparison, and scenario modeling. AI is genuinely useful here, not as a decision-maker, but as a fast analyst that can crunch numbers and generate options you might not have considered. The key is knowing where it helps and where it leads you astray.
What AI does well vs. poorly
| AI strengths | AI weaknesses |
|---|---|
| Analyzing structured metrics (CPU, memory, latency) | Understanding organizational constraints (team size, budget, expertise) |
| Generating capacity estimates from traffic projections | Estimating real-world costs accurately (often underestimates operational overhead) |
| Comparing scaling strategies with pros/cons tables | Knowing when "good enough" is the right answer |
| Suggesting optimizations from code or query patterns | Accounting for human factors (on-call burden, hiring difficulty) |
| Creating back-of-envelope calculations quickly | Recommending proportionate solutions (jumps to microservices too fast) |
| Generating monitoring dashboards and alert thresholds | Understanding your specific business context and risk tolerance |
Prompt templates
1. Analyze bottlenecks from metrics
Prompt:
Here are the metrics from my production system over the last 24 hours:
- Web servers: 4x t3.large, CPU avg 78%, memory avg 62%
- Database: 1x r6g.xlarge PostgreSQL, CPU avg 45%, connections avg 180/200
- Redis: 1x cache.t3.medium, memory 89%, hit rate 72%
- RPS: avg 850, peak 2,400 (during business hours)
- p50 latency: 120ms, p99 latency: 1,800ms
- Error rate: 0.3% (mostly 503 during peak)
Identify the top 3 bottlenecks in order of severity.
For each, explain why it is a problem and suggest a fix
with estimated cost and implementation time.AI will typically produce a solid analysis here because the data is structured and the task is well-defined. It will likely identify: the Redis memory pressure and low hit rate, the database connection saturation, and the CPU spike on web servers during peak. These are reasonable conclusions from the numbers.
What to verify: AI might miss that the high p99 latencyWhat is latency?The time delay between sending a request and receiving the first byte of the response, usually measured in milliseconds. could be caused by a single slow endpointWhat is endpoint?A specific URL path on a server that handles a particular type of request, like GET /api/users. rather than overall capacity. Check your APM data for specific endpoint latencies before scaling everything.
2. Capacity planning for growth
Prompt:
My SaaS application currently handles:
- 10,000 daily active users
- 850 RPS average, 2,400 RPS peak
- 500 GB database, growing 2 GB/day
- Running on AWS (us-east-1)
We expect 5x growth over the next 12 months.
Create a capacity plan with:
1. Month-by-month resource projections
2. When each scaling threshold will be hit
3. Recommended infrastructure changes at each stage
4. Estimated monthly AWS cost at each stage
Include both a conservative and aggressive growth scenario.This is where AI shines, generating structured projections that would take you hours to build in a spreadsheet. The output will include timelines, cost estimates, and decision points.
What to verify: AI almost always overestimates growth curves (it assumes linear or exponential growth when real growth is lumpy and unpredictable). It also underestimates the engineering time needed to implement changes. Treat the numbers as directional, not precise.
3. Recommend a scaling strategy
Prompt:
I run an e-commerce platform on a monolith (Node.js + PostgreSQL).
Current setup: 2 app servers, 1 database (read replica planned).
Team: 4 backend engineers.
Problem: Black Friday traffic is 20x normal, and last year we went
down for 2 hours.
Suggest a scaling strategy that:
- Handles 20x traffic spikes lasting 8 hours
- Can be implemented by 4 engineers in 3 months
- Minimizes ongoing operational complexity
- Stays within $5,000/month infrastructure budget
Do NOT suggest microservices. We want to keep the monolith.Notice the explicit constraint: "Do NOT suggest microservicesWhat is microservices?An architecture where an application is split into small, independently deployed services that communicate over the network, each owning its own data.." Without this, AI will almost certainly recommend splitting into services, which is the wrong advice for a 4-person team with a 3-month timeline. Being specific about constraints produces dramatically better recommendations.
What to verify: common AI mistakes
1. AI jumps to microservicesWhat is microservices?An architecture where an application is split into small, independently deployed services that communicate over the network, each owning its own data. too fast
AI has been trained on thousands of articles praising microservices. It defaults to recommending them even when a well-tuned monolithWhat is monolith?A software architecture where the entire application lives in a single codebase and deploys as one unit. Simpler to build and debug than microservices. would handle the load. For a team under 10 engineers, microservices almost always add more problems than they solve.
Reality check: Can you solve this with a bigger database instance and read replicas? With better caching? With query optimization? If yes, do that first.
2. AI overestimates traffic
When asked to project growth, AI tends to assume hockey-stick curves. It will project your 1,000 RPS to 50,000 RPS in a year when the realistic number might be 3,000 RPS.
Reality check: Look at your actual growth rate over the last 6 months. Extrapolate conservatively. Build for 2-3x your projection, not 10x.
3. AI ignores cost
AI recommends architectures that are technically elegant but financially absurd. Running a managed Kubernetes cluster with auto-scaling across three availability zones is great engineering, but it costs $2,000/month in base infrastructure before you serve a single request.
Reality check: Always ask AI to include cost estimates. Then double them (AI consistently underestimates operational costs like data transfer, logging, monitoring, and support plans).
4. AI underestimates operational complexity
"Just add a Redis cluster for caching" sounds simple in a recommendation. In practice, it means: choosing a caching strategy, handling cache invalidationWhat is cache invalidation?Removing or updating cached data when the original data changes, so users never see outdated information., monitoring cache hit rates, managing Redis failoverWhat is failover?Automatically switching traffic from a failed server or service to a healthy backup to keep the system running., adding connection pooling, and training your team on a new technology.
Reality check: For each recommendation, ask yourself: who on the team knows how to operate this? What happens at 3 AM when it breaks?
Hybrid workflow
The most effective approach combines AI speed with human judgment:
- Gather real metrics: collect actual numbers from your monitoring tools (not estimates)
- Feed AI the numbers: give it structured data and specific constraints (budget, team size, timeline)
- Get multiple options: ask for at least 3 approaches ranked by complexity
- Add constraints AI misses: team expertise, on-call burden, existing vendor relationships, migrationWhat is migration?A versioned script that changes your database structure (add a column, create a table) so every developer and server stays in sync. risk
- Validate costs independently: check cloud providerWhat is provider?A wrapper component that makes data available to all components nested inside it without passing props manually. pricing calculators against AI estimates
- Start with the simplest option: if the simplest recommendation solves 80% of the problem, do that first
AI is your analyst, not your architect. It processes data faster than you can, generates options you might miss, and formats everything into clean comparisons. But the decision, especially when it involves tradeoffs between cost, complexity, and team capacity, is always yours.