System Design/
Lesson

Every developer knows how to log. console.log("here"), console.log("it works"), console.log(data), we have all been there. But production logging is a completely different discipline. When your application handles thousands of requests per minute across multiple servers, your logs are either your most powerful debugging tool or an expensive pile of noise. The difference is structure and discipline.

Structured vs unstructured logging

Unstructured logging means printing human-readable strings:

[2025-03-15 14:30:01] ERROR: Payment failed for user john@example.com, order #789, amount $99.50

This is easy to read with your eyes. It is nearly impossible to query programmatically. Want to find all payment failures above $50? You need regexWhat is regex?A compact pattern language for matching, searching, and replacing text, built into nearly every programming language and code editor.. Want to count failures per user? More regex. Want to correlate with other services? Good luck.

Structured loggingWhat is structured logging?Writing log entries as machine-readable JSON objects with consistent fields instead of plain text, making them searchable by log analysis tools. means emitting machine-readable objects, typically JSONWhat is json?A text format for exchanging data between systems. It uses key-value pairs and arrays, and every programming language can read and write it.:

json
{
  "timestamp": "2025-03-15T14:30:01.234Z",
  "level": "error",
  "service": "payment-service",
  "event": "payment_failed",
  "orderId": "order-789",
  "amount": 99.50,
  "currency": "USD",
  "errorCode": "CARD_DECLINED",
  "traceId": "abc123def456"
}

Now you can query: event = "payment_failed" AND amount > 50. You can aggregate: count by errorCode. You can correlate: find all logs with traceId = "abc123def456" across every service. Structured logging is the foundation of effective observability.

Setting up structured logging in Node.js

Use a logging library like pino (fast, JSON by default) or winston (more configurable). Never use console.log in production.

import pino from 'pino';

const logger = pino({
  level: process.env.LOG_LEVEL || 'info',
  formatters: {
    level: (label) => ({ level: label }),
  },
  timestamp: pino.stdTimeFunctions.isoTime,
});

// Create child loggers with persistent context
const paymentLogger = logger.child({ service: 'payment-service' });

// Every log from this logger automatically includes { service: "payment-service" }
paymentLogger.info({ orderId: 'order-789', amount: 99.50 }, 'Processing payment');
paymentLogger.error({ orderId: 'order-789', errorCode: 'CARD_DECLINED' }, 'Payment failed');

Child loggers are powerful. Create one per request with the request ID attached, and every subsequent log in that request automatically includes the correlation ID.

02

Log levels

Log levels exist so you can control verbosity per environment. In development, you want everything. In production, you want info and above. During an incident, you might temporarily lower the level to debug on a specific service.

LevelWhen to useExampleProduction?
debugDetailed internal state, variable values, flow tracing"Cache miss for key user:42"Usually off
infoNormal operations, business events"Order #789 created successfully"Yes
warnRecoverable issues, degraded behavior"Redis connection timeout, using fallback"Yes
errorFailures that need investigation"Payment processing failed: card declined"Yes
fatalApplication cannot continue, about to crash"Database connection lost, shutting down"Yes

Common mistakes with log levels:

  • Logging errors at info level. If it needs investigation, it is an error. Do not downplay it.
  • Using error for expected rejections. A user entering a wrong password is not an error, it is expected behavior. Log it at info or warn.
  • No debug logs. When an incident happens and you need more detail, you wish you had debug logs. Add them during development so you can turn them on later.
  • console.error for everything. This is the equivalent of crying wolf. When everything is an error, nothing is.
03

What to log and what NOT to log

What to log

Log business-relevant events and their context:

// Good: business event with context
logger.info({
  event: 'user_signup',
  userId: user.id,
  plan: 'free',
  referrer: signupSource,
}, 'New user registered');

// Good: operation result with timing
logger.info({
  event: 'db_query',
  table: 'orders',
  operation: 'select',
  duration_ms: 45,
  rowCount: 12,
}, 'Database query completed');

// Good: error with full context
logger.error({
  event: 'payment_failed',
  orderId: order.id,
  errorCode: err.code,
  errorMessage: err.message,
  attempt: retryCount,
}, 'Payment processing failed');

What NOT to log

// NEVER: passwords or secrets
logger.info({ password: req.body.password }); // Exposed in log storage forever
logger.info({ apiKey: process.env.STRIPE_KEY }); // Rotatable, but still terrible

// NEVER: PII without masking
logger.info({ email: user.email, ssn: user.ssn }); // GDPR/privacy violation

// BETTER: mask or omit
logger.info({ email: maskEmail(user.email), userId: user.id });
// maskEmail("[email protected]") => "j***@example.com"

// NEVER: entire request/response bodies
logger.info({ body: req.body }); // Could contain anything - passwords, tokens, PII

// NEVER: health check noise
// Don't log every /health or /readiness probe - it drowns out real signals
CategoryLog it?Reason
Passwords / tokensNeverSecurity breach if logs are compromised
Email addressesMasked onlyPrivacy regulations (GDPR, CCPA)
Credit card numbersNever (last 4 digits OK)PCI compliance violation
Request IDs / trace IDsAlwaysEssential for correlation
User IDs (opaque)YesNeeded for debugging, not PII by itself
Full request bodiesNoUnknown content, could contain secrets
Error stack tracesYes (at error level)Needed for debugging
Health check hitsNoNoise that drowns out real signals
04

Log aggregation

Logs sitting on individual servers are useless. You need centralized log aggregation to search across all your services in one place.

The classic stack is ELK (Elasticsearch, Logstash, Kibana): Logstash collects and transforms logs, Elasticsearch indexes and stores them, Kibana provides the search UI. It is powerful but operationally heavy, Elasticsearch clusters need tuning, disk space, and monitoring of their own.

Grafana Loki is a lighter alternative. It indexes only the labels (service name, level, environment), not the log content itself. This makes it much cheaper to run, but full-text search is slower. If your logs are well-structured with good labels, Loki is often the better choice.

Datadog Logs, AWS CloudWatch, and Google Cloud Logging are managed alternatives. You pay per GB ingested, but you do not manage any infrastructure.

05

Retention and cost

Log storage is one of the biggest hidden costs in production systems. A moderately busy application can generate gigabytes of logs per day. At $0.50-3.00 per GB/month for indexed storage, costs add up fast.

The strategy: keep hot logs (fully indexed, fast search) for 7-30 days. Archive older logs to cold storage (S3, GCS) for compliance if needed. Drop debug-level logs entirely in production unless you are actively investigating an issue.

Hot storage (7-30 days):  Fast search, expensive
Warm storage (30-90 days): Slower search, cheaper
Cold archive (90+ days):  No search, cheapest (compliance only)

Before adding a new log line, ask yourself: "Will anyone ever search for this?" If the answer is no, do not log it. Every log line costs money to transport, indexWhat is index?A data structure the database maintains alongside a table so it can find rows by specific columns quickly instead of scanning everything., and store.

AI pitfall
AI-generated logging code tends to log everything at INFO level. A controller that logs the full request body and full response body for every request will generate gigabytes of logs per day. What AI gets wrong: it does not consider volume. Log the request ID, status code, and duration at INFO level. Log full payloads at DEBUG level and only enable it when investigating an issue.
Good to know
Correlation IDs (also called request IDs) are the single most valuable thing you can add to your logs. Generate a unique ID at the entry point of every request, attach it to every log line, and pass it to downstream services. When a user reports a bug, you can trace their entire request through every service and database query with one search.
Edge case
Logging sensitive data (passwords, API keys, credit card numbers, personal health information) is not just bad practice, it can be a compliance violation (GDPR, HIPAA, PCI-DSS). Always redact sensitive fields before logging. AI-generated logging code almost never includes redaction, you must add it yourself.