Logs tell you what happened. Metrics tell you how often. Traces tell you why, specifically, why a request was slow, where it spent its time, and which service in the chain was the bottleneck.
For integrations, tracing is essential because a single user action crosses multiple service boundaries. A payment flow might hit your APIWhat is api?A set of rules that lets one program talk to another, usually over the internet, by sending requests and getting responses. server, then Stripe, then a webhookWhat is webhook?An HTTP request that a service sends to your server when a specific event occurs, like a payment completing. Your server processes the event automatically. back to you, then a fulfillment service. Without tracing, you have four separate views of the same operation. With tracing, you have one connected timeline.
Core tracing concepts
| Concept | Definition | Integration example |
|---|---|---|
| Trace | The full journey of one operation | "User places order" from API request to fulfillment confirmation |
| Span | A single unit of work within a trace | "Call Stripe payment intent API", one span |
| Parent span | The span that initiated a child span | Your API handler span is the parent of the Stripe call span |
| Trace ID | Unique identifier for the entire trace | 4bf92f3577b34da6a3ce929d0e0e4736 |
| Span ID | Unique identifier for one span | 00f067aa0ba902b7 |
| Context propagation | Passing trace/span IDs between services | traceparent header on HTTP calls |
| Sampling | Only tracing a fraction of requests | Trace 10% of requests to reduce data volume |
| Baggage | Key-value pairs propagated with the trace | partnerId=stripe carried through all spans |
OpenTelemetryWhat is opentelemetry?A vendor-neutral open standard and SDK for collecting distributed traces, metrics, and logs from your applications. setup
OpenTelemetry (OTel) is the industry standard. It provides a single APIWhat is api?A set of rules that lets one program talk to another, usually over the internet, by sending requests and getting responses. for instrumentation that works with any tracing backend.
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { Resource } from '@opentelemetry/resources';
import {
ATTR_SERVICE_NAME,
ATTR_SERVICE_VERSION,
} from '@opentelemetry/semantic-conventions';
const sdk = new NodeSDK({
resource: new Resource({
[ATTR_SERVICE_NAME]: 'order-service',
[ATTR_SERVICE_VERSION]: '1.2.0',
}),
traceExporter: new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_ENDPOINT || 'http://localhost:4318/v1/traces',
}),
instrumentations: [
getNodeAutoInstrumentations({
// Auto-instrument HTTP, Express, fetch, etc.
'@opentelemetry/instrumentation-http': { enabled: true },
'@opentelemetry/instrumentation-express': { enabled: true },
}),
],
});
sdk.start();
console.log('OpenTelemetry tracing initialized');
// Graceful shutdown
process.on('SIGTERM', () => sdk.shutdown());Auto-instrumentation handles HTTPWhat is http?The protocol browsers and servers use to exchange web pages, API data, and other resources, defining how requests and responses are formatted. calls, Express routes, and database queries automatically. But for integration-specific spans, you will want manual instrumentation too.
Creating spans for integration calls
Auto-instrumentation creates spans for outgoing HTTPWhat is http?The protocol browsers and servers use to exchange web pages, API data, and other resources, defining how requests and responses are formatted. calls, but integration calls deserve richer context, the partner name, the business operation, the retry attempt:
import { trace, SpanKind, SpanStatusCode } from '@opentelemetry/api';
const tracer = trace.getTracer('integration-service');
async function createPaymentIntent(
amount: number,
currency: string,
customerId: string
) {
// Create a span specifically for this integration call
return tracer.startActiveSpan(
'stripe.createPaymentIntent',
{
kind: SpanKind.CLIENT,
attributes: {
'integration.partner': 'stripe',
'integration.operation': 'create_payment_intent',
'payment.amount': amount,
'payment.currency': currency,
// Never put PII in span attributes
},
},
async (span) => {
try {
const intent = await stripe.paymentIntents.create({
amount,
currency,
customer: customerId,
});
span.setAttributes({
'payment.intent_id': intent.id,
'payment.status': intent.status,
'http.status_code': 200,
});
span.setStatus({ code: SpanStatusCode.OK });
return intent;
} catch (error) {
span.setStatus({
code: SpanStatusCode.ERROR,
message: (error as Error).message,
});
span.recordException(error as Error);
throw error;
} finally {
span.end();
}
}
);
}This creates a spanWhat is span?One unit of work within a distributed trace, with a start time, duration, and optional attributes describing the operation. named stripe.createPaymentIntent with attributes that tell you exactly what happened. In your tracing UI, you will see this span nested under the parent APIWhat is api?A set of rules that lets one program talk to another, usually over the internet, by sending requests and getting responses. handler span, with its duration, status, and all the attributes you set.
W3C trace context propagation
The traceparent header is the standard way to propagate trace context between services. It looks like this:
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
^ ^ ^ ^
| | | |
version traceId (32 hex) spanId (16 hex) flags (sampled)OpenTelemetryWhat is opentelemetry?A vendor-neutral open standard and SDK for collecting distributed traces, metrics, and logs from your applications. handles this automatically for HTTPWhat is http?The protocol browsers and servers use to exchange web pages, API data, and other resources, defining how requests and responses are formatted. calls made with standard libraries. When you call fetch() inside an active spanWhat is span?One unit of work within a distributed trace, with a start time, duration, and optional attributes describing the operation., OTel injects the traceparent header. When an incoming request has a traceparent header, OTel extracts it and links the new spans to the existing trace.
// This happens automatically with OTel auto-instrumentation:
// Outgoing request gets traceparent header injected
const response = await fetch('https://partner-api.com/orders', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(orderData),
});
// OTel adds: traceparent: 00-<traceId>-<newSpanId>-01
// If the partner service also runs OTel, it picks up the trace
// and all its spans appear in the same trace viewWhen partners do not support trace context
Most third-party APIs (Stripe, Twilio, etc.) do not propagate your trace context. The trace breaks at the boundary. To handle this, record the partner's request ID as a span attribute so you can manually correlate later:
const response = await fetch('https://api.stripe.com/v1/charges', options);
// Stripe returns its own request ID
const stripeRequestId = response.headers.get('request-id');
span.setAttribute('stripe.request_id', stripeRequestId || 'unknown');
// Now you can cross-reference Stripe's dashboard with your tracesTracing webhookWhat is webhook?An HTTP request that a service sends to your server when a specific event occurs, like a payment completing. Your server processes the event automatically. flows
Webhooks are the hardest thing to trace because the HTTPWhat is http?The protocol browsers and servers use to exchange web pages, API data, and other resources, defining how requests and responses are formatted. connection is initiated by the external service, not by you. The traceparent header is not set by Stripe or Twilio, they do not know about your traces.
Strategy: embed trace context in the originating request
When you create a resource that will later trigger a webhook, store the trace context:
import { context, propagation } from '@opentelemetry/api';
async function createStripeSubscription(customerId: string, priceId: string) {
// Capture current trace context
const carrier: Record<string, string> = {};
propagation.inject(context.active(), carrier);
const subscription = await stripe.subscriptions.create({
customer: customerId,
items: [{ price: priceId }],
metadata: {
// Store trace context in Stripe metadata
trace_parent: carrier.traceparent || '',
trace_state: carrier.tracestate || '',
},
});
return subscription;
}
// Later, when the webhook arrives:
app.post('/webhooks/stripe', async (req, res) => {
const event = req.body;
const metadata = event.data.object.metadata;
// Extract the original trace context
const parentContext = propagation.extract(context.active(), {
traceparent: metadata?.trace_parent,
tracestate: metadata?.trace_state,
});
// Create a span linked to the original trace
context.with(parentContext, () => {
tracer.startActiveSpan('webhook.stripe.received', (span) => {
span.setAttribute('webhook.event_type', event.type);
span.setAttribute('webhook.event_id', event.id);
// Process...
span.end();
});
});
res.status(200).json({ received: true });
});This is not perfect, Stripe metadata has size limits, and not all webhook events have associated metadata. But for key flows like subscriptions and payment intents, it connects the trace from the original request through to the webhook processing.
Sampling strategies
Tracing every request generates enormous amounts of data. In production, you need a sampling strategy:
import { TraceIdRatioBasedSampler } from '@opentelemetry/sdk-trace-base';
const sdk = new NodeSDK({
// Trace 10% of requests
sampler: new TraceIdRatioBasedSampler(0.1),
// ... rest of config
});| Strategy | How it works | Best for |
|---|---|---|
| Rate-based (10%) | Randomly sample a percentage | General production traffic |
| Always-on for errors | Sample all requests that result in errors | Never miss a failure trace |
| Head-based | Decision made at trace start | Consistent: either all spans or none |
| Tail-based | Decision made at trace end | Capture interesting traces (slow, errored) |
| Per-partner override | Higher sampling for flaky partners | Deep visibility into troubled integrations |
A practical approach: sample 10% of all traffic, but always sample errors and requests to partners that are currently degraded.
Quick reference
| Tool / Standard | Purpose |
|---|---|
| OpenTelemetry SDK | Vendor-neutral instrumentation |
traceparent header | W3C standard for context propagation |
| Jaeger | Open-source tracing backend |
| Honeycomb | SaaS tracing with powerful querying |
SpanKind.CLIENT | Mark spans as outgoing integration calls |
span.recordException() | Attach error details to a span |