Every developer knows how to log. console.log("here"), console.log("it works"), console.log(data), we have all been there. But production logging is a completely different discipline. When your application handles thousands of requests per minute across multiple servers, your logs are either your most powerful debugging tool or an expensive pile of noise. The difference is structure and discipline.
Structured vs unstructured logging
Unstructured logging means printing human-readable strings:
[2025-03-15 14:30:01] ERROR: Payment failed for user john@example.com, order #789, amount $99.50This is easy to read with your eyes. It is nearly impossible to query programmatically. Want to find all payment failures above $50? You need regexWhat is regex?A compact pattern language for matching, searching, and replacing text, built into nearly every programming language and code editor.. Want to count failures per user? More regex. Want to correlate with other services? Good luck.
Structured loggingWhat is structured logging?Writing log entries as machine-readable JSON objects with consistent fields instead of plain text, making them searchable by log analysis tools. means emitting machine-readable objects, typically JSONWhat is json?A text format for exchanging data between systems. It uses key-value pairs and arrays, and every programming language can read and write it.:
{
"timestamp": "2025-03-15T14:30:01.234Z",
"level": "error",
"service": "payment-service",
"event": "payment_failed",
"orderId": "order-789",
"amount": 99.50,
"currency": "USD",
"errorCode": "CARD_DECLINED",
"traceId": "abc123def456"
}Now you can query: event = "payment_failed" AND amount > 50. You can aggregate: count by errorCode. You can correlate: find all logs with traceId = "abc123def456" across every service. Structured logging is the foundation of effective observability.
Setting up structured logging in Node.js
Use a logging library like pino (fast, JSON by default) or winston (more configurable). Never use console.log in production.
import pino from 'pino';
const logger = pino({
level: process.env.LOG_LEVEL || 'info',
formatters: {
level: (label) => ({ level: label }),
},
timestamp: pino.stdTimeFunctions.isoTime,
});
// Create child loggers with persistent context
const paymentLogger = logger.child({ service: 'payment-service' });
// Every log from this logger automatically includes { service: "payment-service" }
paymentLogger.info({ orderId: 'order-789', amount: 99.50 }, 'Processing payment');
paymentLogger.error({ orderId: 'order-789', errorCode: 'CARD_DECLINED' }, 'Payment failed');Child loggers are powerful. Create one per request with the request ID attached, and every subsequent log in that request automatically includes the correlation ID.
Log levels
Log levels exist so you can control verbosity per environment. In development, you want everything. In production, you want info and above. During an incident, you might temporarily lower the level to debug on a specific service.
| Level | When to use | Example | Production? |
|---|---|---|---|
debug | Detailed internal state, variable values, flow tracing | "Cache miss for key user:42" | Usually off |
info | Normal operations, business events | "Order #789 created successfully" | Yes |
warn | Recoverable issues, degraded behavior | "Redis connection timeout, using fallback" | Yes |
error | Failures that need investigation | "Payment processing failed: card declined" | Yes |
fatal | Application cannot continue, about to crash | "Database connection lost, shutting down" | Yes |
Common mistakes with log levels:
- Logging errors at
infolevel. If it needs investigation, it is an error. Do not downplay it. - Using
errorfor expected rejections. A user entering a wrong password is not an error, it is expected behavior. Log it atinfoorwarn. - No
debuglogs. When an incident happens and you need more detail, you wish you had debug logs. Add them during development so you can turn them on later. console.errorfor everything. This is the equivalent of crying wolf. When everything is an error, nothing is.
What to log and what NOT to log
What to log
Log business-relevant events and their context:
// Good: business event with context
logger.info({
event: 'user_signup',
userId: user.id,
plan: 'free',
referrer: signupSource,
}, 'New user registered');
// Good: operation result with timing
logger.info({
event: 'db_query',
table: 'orders',
operation: 'select',
duration_ms: 45,
rowCount: 12,
}, 'Database query completed');
// Good: error with full context
logger.error({
event: 'payment_failed',
orderId: order.id,
errorCode: err.code,
errorMessage: err.message,
attempt: retryCount,
}, 'Payment processing failed');What NOT to log
// NEVER: passwords or secrets
logger.info({ password: req.body.password }); // Exposed in log storage forever
logger.info({ apiKey: process.env.STRIPE_KEY }); // Rotatable, but still terrible
// NEVER: PII without masking
logger.info({ email: user.email, ssn: user.ssn }); // GDPR/privacy violation
// BETTER: mask or omit
logger.info({ email: maskEmail(user.email), userId: user.id });
// maskEmail("[email protected]") => "j***@example.com"
// NEVER: entire request/response bodies
logger.info({ body: req.body }); // Could contain anything - passwords, tokens, PII
// NEVER: health check noise
// Don't log every /health or /readiness probe - it drowns out real signals| Category | Log it? | Reason |
|---|---|---|
| Passwords / tokens | Never | Security breach if logs are compromised |
| Email addresses | Masked only | Privacy regulations (GDPR, CCPA) |
| Credit card numbers | Never (last 4 digits OK) | PCI compliance violation |
| Request IDs / trace IDs | Always | Essential for correlation |
| User IDs (opaque) | Yes | Needed for debugging, not PII by itself |
| Full request bodies | No | Unknown content, could contain secrets |
| Error stack traces | Yes (at error level) | Needed for debugging |
| Health check hits | No | Noise that drowns out real signals |
Log aggregation
Logs sitting on individual servers are useless. You need centralized log aggregation to search across all your services in one place.
The classic stack is ELK (Elasticsearch, Logstash, Kibana): Logstash collects and transforms logs, Elasticsearch indexes and stores them, Kibana provides the search UI. It is powerful but operationally heavy, Elasticsearch clusters need tuning, disk space, and monitoring of their own.
Grafana Loki is a lighter alternative. It indexes only the labels (service name, level, environment), not the log content itself. This makes it much cheaper to run, but full-text search is slower. If your logs are well-structured with good labels, Loki is often the better choice.
Datadog Logs, AWS CloudWatch, and Google Cloud Logging are managed alternatives. You pay per GB ingested, but you do not manage any infrastructure.
Retention and cost
Log storage is one of the biggest hidden costs in production systems. A moderately busy application can generate gigabytes of logs per day. At $0.50-3.00 per GB/month for indexed storage, costs add up fast.
The strategy: keep hot logs (fully indexed, fast search) for 7-30 days. Archive older logs to cold storage (S3, GCS) for compliance if needed. Drop debug-level logs entirely in production unless you are actively investigating an issue.
Hot storage (7-30 days): Fast search, expensive
Warm storage (30-90 days): Slower search, cheaper
Cold archive (90+ days): No search, cheapest (compliance only)Before adding a new log line, ask yourself: "Will anyone ever search for this?" If the answer is no, do not log it. Every log line costs money to transport, indexWhat is index?A data structure the database maintains alongside a table so it can find rows by specific columns quickly instead of scanning everything., and store.