Shipping Python APIs/
Lesson

Logs tell you what happened to a single request. Metrics tell you what is happening to your system right now. They answer questions like "is response time getting worse?", "are we handling more traffic than last week?", and "are we running out of database connections?", questions that individual log lines cannot answer because they require aggregation over time.

If logs are the flight recorder, metrics are the cockpit instruments. You do not wait for a crash to read the instruments, you glance at them continuously to detect problems before they become outages.

The four golden signals

Google's Site Reliability Engineering book defines four signals that every service should monitor. If you only measure four things, make it these.

LatencyWhat is latency?The time delay between sending a request and receiving the first byte of the response, usually measured in milliseconds.

How long requests take to complete. Track both successful and failed requests separately, a fast error (instant 500) can mask slow successful responses if you average them together.

# What to measure
# - p50 (median): typical user experience
# - p95: what the slowest 5% of users experience
# - p99: tail latency - often reveals resource contention

A service with 200ms median but 5-second p99 has a problem that the average hides. One in a hundred users waits 25x longer than everyone else.

Traffic

How many requests your service handles. This is your baseline for capacity planning. Track it by endpointWhat is endpoint?A specific URL path on a server that handles a particular type of request, like GET /api/users., a spike in /api/search means something different than a spike in /api/login.

Errors

The rate of failed requests. Track by type: HTTPWhat is http?The protocol browsers and servers use to exchange web pages, API data, and other resources, defining how requests and responses are formatted. 5xx (your fault), HTTP 4xx (client's fault, but high 4xx might mean a broken frontend), and application-level errors (like failed database queries that you handle gracefully).

Saturation

How full your service is. Database connection pools, memory usage, CPU utilization, disk space. Saturation metrics predict failures before they happen, when your connection poolWhat is connection pool?A set of pre-opened database connections that your app reuses instead of opening and closing a new one for every request. is 90% full, you know a traffic spike will push it to 100% and start dropping requests.

SignalWhat it tells youAlarm when
LatencyUser experience qualityp95 > 2x baseline
TrafficDemand on your systemSudden spike or unexpected drop
ErrorsThings breakingError rate > 1% of requests
SaturationResource headroom>80% of any resource
02

Prometheus metrics with prometheus-client

Prometheus is the open-source standard for application metrics. Your application exposes metrics on a /metrics endpointWhat is endpoint?A specific URL path on a server that handles a particular type of request, like GET /api/users., and a Prometheus server scrapes that endpoint at regular intervals (typically every 15 seconds).

pip install prometheus-client

Metric types

Prometheus has three primary metric types, each suited for different measurements.

Counters

Counters only go up. Use them for things you count: total requests, total errors, total bytes processed. Prometheus calculates rates (requests per second) from the raw counter values.

from prometheus_client import Counter

REQUEST_COUNT = Counter(
    "http_requests_total",
    "Total HTTP requests",
    ["method", "endpoint", "status"]
)

# In your middleware or route handler
REQUEST_COUNT.labels(
    method="POST",
    endpoint="/api/orders",
    status="200"
).inc()

The labels let you slice and dice: "show me POST requests to /api/orders that returned 500 in the last hour."

AI pitfall
AI often creates counters without labels, making it impossible to filter by endpoint or status code. Always include at least method, endpoint, and status as labels on HTTP request counters.

Histograms

Histograms track the distribution of values, most commonly request duration. Instead of just the average, you get percentiles (p50, p95, p99).

from prometheus_client import Histogram

REQUEST_DURATION = Histogram(
    "http_request_duration_seconds",
    "HTTP request duration in seconds",
    ["method", "endpoint"],
    buckets=[0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0]
)

# Measure how long a request takes
@app.middleware("http")
async def track_request_duration(request, call_next):
    start_time = time.time()
    response = await call_next(request)
    duration = time.time() - start_time

    REQUEST_DURATION.labels(
        method=request.method,
        endpoint=request.url.path
    ).observe(duration)

    return response

The buckets define the histogram boundaries. Choose them based on your SLAWhat is sla?A formal commitment defining the minimum uptime or performance level a service promises to deliver, usually expressed as a percentage like 99.9%., if your target is "95% of requests under 500ms," your buckets should have granularity around that threshold.

Gauges

Gauges can go up or down. Use them for current values: active connections, queue depth, memory usage, temperatureWhat is temperature?A setting that controls how creative or predictable an AI's output is. Low temperature gives consistent answers; high temperature produces more varied responses..

from prometheus_client import Gauge

ACTIVE_CONNECTIONS = Gauge(
    "db_active_connections",
    "Number of active database connections"
)

IN_PROGRESS_REQUESTS = Gauge(
    "http_in_progress_requests",
    "Number of HTTP requests currently being processed"
)

# Track in-flight requests
@app.middleware("http")
async def track_in_progress(request, call_next):
    IN_PROGRESS_REQUESTS.inc()
    try:
        response = await call_next(request)
        return response
    finally:
        IN_PROGRESS_REQUESTS.dec()
03

Exposing the /metrics endpointWhat is endpoint?A specific URL path on a server that handles a particular type of request, like GET /api/users.

Prometheus scrapes metrics from an HTTPWhat is http?The protocol browsers and servers use to exchange web pages, API data, and other resources, defining how requests and responses are formatted. endpoint. Add it to your FastAPI app:

from prometheus_client import generate_latest, CONTENT_TYPE_LATEST
from fastapi import Response

@app.get("/metrics")
async def metrics():
    return Response(
        content=generate_latest(),
        media_type=CONTENT_TYPE_LATEST
    )

This endpoint returns all registered metrics in Prometheus text format. The Prometheus server hits this endpoint every 15 seconds and stores the time-series data.

04

Health checkWhat is health check?An API endpoint that verifies your application and its dependencies are working, so monitoring tools can alert you when something fails. endpoints that actually check health

A health check endpointWhat is endpoint?A specific URL path on a server that handles a particular type of request, like GET /api/users. that returns 200 OK unconditionally is theater, it tells you the process is running, but not whether it can actually serve requests.

# Bad - always returns 200, even when the database is down
@app.get("/health")
async def health():
    return {"status": "ok"}

# Good - actually checks dependencies
@app.get("/health")
async def health():
    checks = {}

    # Check database connectivity
    try:
        await db.execute("SELECT 1")
        checks["database"] = "ok"
    except Exception as e:
        checks["database"] = f"error: {str(e)}"

    # Check Redis connectivity
    try:
        await redis.ping()
        checks["redis"] = "ok"
    except Exception as e:
        checks["redis"] = f"error: {str(e)}"

    all_healthy = all(v == "ok" for v in checks.values())

    return JSONResponse(
        status_code=200 if all_healthy else 503,
        content={"status": "healthy" if all_healthy else "degraded", "checks": checks}
    )

Return 503 Service Unavailable when a critical dependencyWhat is dependency?A piece of code written by someone else that your project needs to work. Think of it as a building block you import instead of writing yourself. is down. Load balancers and Kubernetes use this status codeWhat is status code?A three-digit number in an HTTP response that tells the client what happened: 200 means success, 404 means not found, 500 means the server broke. to stop routing traffic to unhealthy instances.

AI pitfall
AI almost always generates a health check that returns {"status": "ok"} unconditionally. It looks like monitoring, but it catches nothing. The health check should fail when the service genuinely cannot handle requests.
05

Grafana dashboards

Prometheus stores raw metrics. Grafana visualizes them. A well-designed dashboard shows the four golden signals at a glance, letting you spot problems in seconds instead of querying metrics manually.

Dashboard layout pattern

A production dashboard should answer "is anything broken?" within 3 seconds of looking at it. Organize panels by the four golden signals:

RowPanelsWhat to show
Top (traffic)Request rate by endpointSpike or drop in traffic
Second (errors)Error rate, error rate by endpointRising error percentage
Third (latency)p50, p95, p99 response timeLatency degradation
Bottom (saturation)DB connections, memory, CPUResource exhaustion

Use red/yellow/green thresholds on each panel. If the dashboard is all green, the service is healthy. A yellow panel means "watch this." A red panel means "investigate now."

06

External uptime monitoring

Internal metrics have a blind spot: they cannot tell you about problems that prevent requests from reaching your service at all. DNSWhat is dns?The system that translates human-readable domain names like google.com into the numerical IP addresses computers use to find each other. failures, CDNWhat is cdn?Content Delivery Network - a network of servers around the world that caches your files and serves them from the location closest to the user, making pages load faster. issues, network partitions between your users and your servers, your Prometheus scraper sees nothing because it is inside the same network.

External monitoring services (UptimeRobot, Pingdom, Better Uptime) ping your APIWhat is api?A set of rules that lets one program talk to another, usually over the internet, by sending requests and getting responses. from locations around the world. If the response is not a 200 within a timeout, they alert you.

Monitor typeWhat it checksBlind spots
Internal (Prometheus)Application metricsNetwork-level failures
External (UptimeRobot)"Can users reach you?"Application-level details
Synthetic (Playwright)Full user flowsExpensive, slow to run

Use all three layers. Internal metrics for granular debugging, external pings for availability, and synthetic tests for critical user flows (login, checkout, etc.).

07

Quick reference

ConceptToolWhat it measures
Request countPrometheus CounterTraffic signal
Request durationPrometheus HistogramLatency signal (p50/p95/p99)
Active connectionsPrometheus GaugeSaturation signal
Error rateCounter with status labelError signal
Health checkFastAPI endpointDependency liveness
DashboardsGrafanaVisual overview of all signals
External uptimeUptimeRobot / PingdomReachability from user perspective