Reliability patterns have a lot of boilerplateWhat is boilerplate?Repetitive, standardized code that follows a known pattern and appears in nearly every project - like setting up a server or wiring up database connections.. Circuit breakers, retry logic, timeout wrappers, and rate limiters all follow well-known templates. AI is very good at generating these templates. But the devil is in the details, the specific thresholds, the failure scenarios, and the interactions between patterns are where AI regularly gets things wrong.
What AI does well vs. poorly
| AI does well | AI does poorly |
|---|---|
| Generating circuit breaker implementations | Choosing threshold values for your specific traffic |
| Writing retry logic with exponential backoff | Timeout propagation across service chains |
| Implementing token bucket rate limiters | Reasoning about cascading failure scenarios |
| Creating timeout wrapper utilities | Fallback strategies for partial failure states |
| Scaffolding bulkhead patterns with semaphores | Interactions between multiple reliability patterns |
| Identifying missing reliability patterns in code reviews | Tuning for production traffic patterns you describe vaguely |
The pattern is consistent: AI handles the structural code well but struggles with the contextual decisions, the things that depend on your specific system, traffic, and failure modes.
Prompt templates
1. Analyze failure modes for a service architecture
Prompt:
Analyze the failure modes of this architecture:
- API Gateway → User Service → PostgreSQL
- API Gateway → Order Service → PostgreSQL + Payment API (external)
- API Gateway → Notification Service → SendGrid API
For each service and connection, list:
1. What can fail
2. Impact on the user
3. Recommended reliability pattern (circuit breaker, timeout, fallback, etc.)
4. Specific configuration recommendationsAI will typically produce a thorough analysis covering obvious failure modes. What to verify: AI often underestimates the impact of slow responses (not just failures) and may miss correlated failures, for example, the PostgreSQL being shared between User Service and Order Service means a database issue affects both simultaneously.
2. Implement a circuit breakerWhat is circuit breaker?A pattern that stops sending requests to a failing service after repeated errors, giving it time to recover before trying again. with fallback
Prompt:
Implement a circuit breaker for a Node.js service calling an
external payment API with these requirements:
- Open after 5 failures in 30 seconds
- Half-open test after 15 seconds
- Fallback: queue the payment for async processing
- Log state transitions for monitoring
- TypeScript, using the opossum libraryWhat to verify in AI output:
- The
volumeThresholdsetting, AI often omits it, causing the circuit to trip on the very first failure - Fallback error handling, if the queue itself fails, is that handled?
- The
errorFilter, AI rarely includes this, but you may want to exclude 4xx client errors from tripping the circuit (a 400 Bad Request is not a service failure)
3. Design a timeout strategy for a service chain
Prompt:
Design a timeout strategy for this request flow:
- Client → API Gateway (user expects response in 5 seconds)
- API Gateway → Auth Service (fast, should be <200ms)
- API Gateway → Product Service (usually 500ms, sometimes 2s)
- Product Service → Pricing Service (usually 100ms)
- API Gateway → Recommendation Service (optional, can be dropped)
Provide specific timeout values for each hop and explain
the timeout budget distribution.AI typically generates reasonable values but watch for these mistakes:
- Setting individual timeouts that sum to more than the total budget
- Not accounting for network overhead between services (add 50-100ms per hop)
- Missing the recommendation service timeout, marking it "optional" does not mean it needs no timeout
4. Review code for missing reliability patterns
Prompt:
Review this service code for missing reliability patterns.
For each issue found, explain the failure scenario and
suggest a specific fix:
[paste your code]
Check for: missing timeouts, missing circuit breakers,
missing fallbacks, shared resource pools, missing rate
limiting, retry storms, and cascading failure risks.This is where AI genuinely shines. It will catch obvious gaps like missing timeouts on fetch calls, shared connection pools with no bulkhead, and retry loops without backoff. It may miss subtler issues like timeout budget propagation or interactions between your circuit breaker and retry logic (retrying inside a circuit breaker can trip it faster than expected).
errorFilter in circuit breaker configurations. A 400 Bad Request means the client sent invalid data, the server is fine. A 503 means the server is down. Without error filtering, client errors trip the circuit breaker and prevent legitimate requests from reaching a healthy server.What to always verify
After AI generates reliability code, check these specific things:
- Circuit breakerWhat is circuit breaker?A pattern that stops sending requests to a failing service after repeated errors, giving it time to recover before trying again. thresholds: AI defaults to round numbers (5 failures, 30-second reset). Your service might need 3 failures if it handles payments, or 20 if it is a high-volume, error-tolerant endpointWhat is endpoint?A specific URL path on a server that handles a particular type of request, like GET /api/users.. Base thresholds on your actual error rate baseline and SLAs.
- Timeout propagation: If AI sets a 10-second timeout on your gateway and 10-second timeouts on each downstream call, the math does not work. Verify the total time budget and distribute it correctly.
- Fallback completeness: AI generates the happy-path fallback but often misses what happens when the fallback itself fails. Cache misses, empty defaults, and stale data all need handling.
- Error classification: Not every error should trip a circuit breaker. A 400 Bad Request is a client error, the server is fine. A 503 Service Unavailable is a server error. AI-generated circuit breakers rarely include error filters.
- Retry and circuit breaker interaction: If you retry 3 times inside a circuit breaker with a threshold of 5, two failed user requests (3 retries each = 6 failures) will trip the circuit. Is that what you want?
Hybrid workflow
The most effective approach combines AI speed with human judgment:
- AI generates the skeleton: circuit breakerWhat is circuit breaker?A pattern that stops sending requests to a failing service after repeated errors, giving it time to recover before trying again., retry, timeout, and fallback boilerplateWhat is boilerplate?Repetitive, standardized code that follows a known pattern and appears in nearly every project - like setting up a server or wiring up database connections.
- You map your dependencies: classify each as critical, degraded, or optional
- You set the thresholds: based on actual traffic data, not AI guesses
- AI reviews your implementation: catches structural gaps you might have missed
- You verify failure paths: manually trace what happens when each dependencyWhat is dependency?A piece of code written by someone else that your project needs to work. Think of it as a building block you import instead of writing yourself. goes down
- AI generates tests: failure scenario tests, timeout tests, circuit breaker state transition tests
This workflow lets AI handle the repetitive scaffoldingWhat is scaffolding?Auto-generating the basic file structure and starter code for a project or feature so you don't have to write it from scratch. while you handle the decisions that require understanding your specific system.