Logs tell you what happened to a single request. Metrics tell you what is happening to your system right now. They answer questions like "is response time getting worse?", "are we handling more traffic than last week?", and "are we running out of database connections?", questions that individual log lines cannot answer because they require aggregation over time.
If logs are the flight recorder, metrics are the cockpit instruments. You do not wait for a crash to read the instruments, you glance at them continuously to detect problems before they become outages.
The four golden signals
Google's Site Reliability Engineering book defines four signals that every service should monitor. If you only measure four things, make it these.
LatencyWhat is latency?The time delay between sending a request and receiving the first byte of the response, usually measured in milliseconds.
How long requests take to complete. Track both successful and failed requests separately, a fast error (instant 500) can mask slow successful responses if you average them together.
# What to measure
# - p50 (median): typical user experience
# - p95: what the slowest 5% of users experience
# - p99: tail latency - often reveals resource contentionA service with 200ms median but 5-second p99 has a problem that the average hides. One in a hundred users waits 25x longer than everyone else.
Traffic
How many requests your service handles. This is your baseline for capacity planning. Track it by endpointWhat is endpoint?A specific URL path on a server that handles a particular type of request, like GET /api/users., a spike in /api/search means something different than a spike in /api/login.
Errors
The rate of failed requests. Track by type: HTTPWhat is http?The protocol browsers and servers use to exchange web pages, API data, and other resources, defining how requests and responses are formatted. 5xx (your fault), HTTP 4xx (client's fault, but high 4xx might mean a broken frontend), and application-level errors (like failed database queries that you handle gracefully).
Saturation
How full your service is. Database connection pools, memory usage, CPU utilization, disk space. Saturation metrics predict failures before they happen, when your connection poolWhat is connection pool?A set of pre-opened database connections that your app reuses instead of opening and closing a new one for every request. is 90% full, you know a traffic spike will push it to 100% and start dropping requests.
| Signal | What it tells you | Alarm when |
|---|---|---|
| Latency | User experience quality | p95 > 2x baseline |
| Traffic | Demand on your system | Sudden spike or unexpected drop |
| Errors | Things breaking | Error rate > 1% of requests |
| Saturation | Resource headroom | >80% of any resource |
Prometheus metrics with prometheus-client
Prometheus is the open-source standard for application metrics. Your application exposes metrics on a /metrics endpointWhat is endpoint?A specific URL path on a server that handles a particular type of request, like GET /api/users., and a Prometheus server scrapes that endpoint at regular intervals (typically every 15 seconds).
pip install prometheus-clientMetric types
Prometheus has three primary metric types, each suited for different measurements.
Counters
Counters only go up. Use them for things you count: total requests, total errors, total bytes processed. Prometheus calculates rates (requests per second) from the raw counter values.
from prometheus_client import Counter
REQUEST_COUNT = Counter(
"http_requests_total",
"Total HTTP requests",
["method", "endpoint", "status"]
)
# In your middleware or route handler
REQUEST_COUNT.labels(
method="POST",
endpoint="/api/orders",
status="200"
).inc()The labels let you slice and dice: "show me POST requests to /api/orders that returned 500 in the last hour."
method, endpoint, and status as labels on HTTP request counters.Histograms
Histograms track the distribution of values, most commonly request duration. Instead of just the average, you get percentiles (p50, p95, p99).
from prometheus_client import Histogram
REQUEST_DURATION = Histogram(
"http_request_duration_seconds",
"HTTP request duration in seconds",
["method", "endpoint"],
buckets=[0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0]
)
# Measure how long a request takes
@app.middleware("http")
async def track_request_duration(request, call_next):
start_time = time.time()
response = await call_next(request)
duration = time.time() - start_time
REQUEST_DURATION.labels(
method=request.method,
endpoint=request.url.path
).observe(duration)
return responseThe buckets define the histogram boundaries. Choose them based on your SLAWhat is sla?A formal commitment defining the minimum uptime or performance level a service promises to deliver, usually expressed as a percentage like 99.9%., if your target is "95% of requests under 500ms," your buckets should have granularity around that threshold.
Gauges
Gauges can go up or down. Use them for current values: active connections, queue depth, memory usage, temperatureWhat is temperature?A setting that controls how creative or predictable an AI's output is. Low temperature gives consistent answers; high temperature produces more varied responses..
from prometheus_client import Gauge
ACTIVE_CONNECTIONS = Gauge(
"db_active_connections",
"Number of active database connections"
)
IN_PROGRESS_REQUESTS = Gauge(
"http_in_progress_requests",
"Number of HTTP requests currently being processed"
)
# Track in-flight requests
@app.middleware("http")
async def track_in_progress(request, call_next):
IN_PROGRESS_REQUESTS.inc()
try:
response = await call_next(request)
return response
finally:
IN_PROGRESS_REQUESTS.dec()Exposing the /metrics endpointWhat is endpoint?A specific URL path on a server that handles a particular type of request, like GET /api/users.
Prometheus scrapes metrics from an HTTPWhat is http?The protocol browsers and servers use to exchange web pages, API data, and other resources, defining how requests and responses are formatted. endpoint. Add it to your FastAPI app:
from prometheus_client import generate_latest, CONTENT_TYPE_LATEST
from fastapi import Response
@app.get("/metrics")
async def metrics():
return Response(
content=generate_latest(),
media_type=CONTENT_TYPE_LATEST
)This endpoint returns all registered metrics in Prometheus text format. The Prometheus server hits this endpoint every 15 seconds and stores the time-series data.
Health checkWhat is health check?An API endpoint that verifies your application and its dependencies are working, so monitoring tools can alert you when something fails. endpoints that actually check health
A health check endpointWhat is endpoint?A specific URL path on a server that handles a particular type of request, like GET /api/users. that returns 200 OK unconditionally is theater, it tells you the process is running, but not whether it can actually serve requests.
# Bad - always returns 200, even when the database is down
@app.get("/health")
async def health():
return {"status": "ok"}
# Good - actually checks dependencies
@app.get("/health")
async def health():
checks = {}
# Check database connectivity
try:
await db.execute("SELECT 1")
checks["database"] = "ok"
except Exception as e:
checks["database"] = f"error: {str(e)}"
# Check Redis connectivity
try:
await redis.ping()
checks["redis"] = "ok"
except Exception as e:
checks["redis"] = f"error: {str(e)}"
all_healthy = all(v == "ok" for v in checks.values())
return JSONResponse(
status_code=200 if all_healthy else 503,
content={"status": "healthy" if all_healthy else "degraded", "checks": checks}
)Return 503 Service Unavailable when a critical dependencyWhat is dependency?A piece of code written by someone else that your project needs to work. Think of it as a building block you import instead of writing yourself. is down. Load balancers and Kubernetes use this status codeWhat is status code?A three-digit number in an HTTP response that tells the client what happened: 200 means success, 404 means not found, 500 means the server broke. to stop routing traffic to unhealthy instances.
{"status": "ok"} unconditionally. It looks like monitoring, but it catches nothing. The health check should fail when the service genuinely cannot handle requests.Grafana dashboards
Prometheus stores raw metrics. Grafana visualizes them. A well-designed dashboard shows the four golden signals at a glance, letting you spot problems in seconds instead of querying metrics manually.
Dashboard layout pattern
A production dashboard should answer "is anything broken?" within 3 seconds of looking at it. Organize panels by the four golden signals:
| Row | Panels | What to show |
|---|---|---|
| Top (traffic) | Request rate by endpoint | Spike or drop in traffic |
| Second (errors) | Error rate, error rate by endpoint | Rising error percentage |
| Third (latency) | p50, p95, p99 response time | Latency degradation |
| Bottom (saturation) | DB connections, memory, CPU | Resource exhaustion |
Use red/yellow/green thresholds on each panel. If the dashboard is all green, the service is healthy. A yellow panel means "watch this." A red panel means "investigate now."
External uptime monitoring
Internal metrics have a blind spot: they cannot tell you about problems that prevent requests from reaching your service at all. DNSWhat is dns?The system that translates human-readable domain names like google.com into the numerical IP addresses computers use to find each other. failures, CDNWhat is cdn?Content Delivery Network - a network of servers around the world that caches your files and serves them from the location closest to the user, making pages load faster. issues, network partitions between your users and your servers, your Prometheus scraper sees nothing because it is inside the same network.
External monitoring services (UptimeRobot, Pingdom, Better Uptime) ping your APIWhat is api?A set of rules that lets one program talk to another, usually over the internet, by sending requests and getting responses. from locations around the world. If the response is not a 200 within a timeout, they alert you.
| Monitor type | What it checks | Blind spots |
|---|---|---|
| Internal (Prometheus) | Application metrics | Network-level failures |
| External (UptimeRobot) | "Can users reach you?" | Application-level details |
| Synthetic (Playwright) | Full user flows | Expensive, slow to run |
Use all three layers. Internal metrics for granular debugging, external pings for availability, and synthetic tests for critical user flows (login, checkout, etc.).
Quick reference
| Concept | Tool | What it measures |
|---|---|---|
| Request count | Prometheus Counter | Traffic signal |
| Request duration | Prometheus Histogram | Latency signal (p50/p95/p99) |
| Active connections | Prometheus Gauge | Saturation signal |
| Error rate | Counter with status label | Error signal |
| Health check | FastAPI endpoint | Dependency liveness |
| Dashboards | Grafana | Visual overview of all signals |
| External uptime | UptimeRobot / Pingdom | Reachability from user perspective |