You have probably written code where service A calls service B directly over HTTPWhat is http?The protocol browsers and servers use to exchange web pages, API data, and other resources, defining how requests and responses are formatted.. It works fine until service B goes down, takes too long, or gets overwhelmed. A message queue sits between them: A drops a message into the queue, B picks it up when ready. They never talk to each other directly.
What is a message queue?
A message queue accepts messages from producers, stores them temporarily, and delivers them to consumers. The sender and receiver never need to be available at the same time.
// Without a queue: tight coupling
async function processOrder(order) {
await paymentService.charge(order); // What if payment is slow?
await inventoryService.reserve(order); // What if inventory is down?
await emailService.sendConfirmation(order); // Do we really need to wait?
return { status: 'completed' };
}
// With a queue: decoupled
async function processOrder(order) {
await paymentService.charge(order); // Only the critical path is synchronous
await queue.publish('order.completed', order); // Everything else is async
return { status: 'completed' };
}
// Separate consumers handle the rest independently
queue.subscribe('order.completed', async (order) => {
await inventoryService.reserve(order);
});
queue.subscribe('order.completed', async (order) => {
await emailService.sendConfirmation(order);
});In the first version, the user waits for all three services. In the second, the user waits only for payment. Inventory and email happen in the background, and if the email service is down, the email gets sent later when it recovers.
FIFO ordering
FIFO stands for First In, First Out. Messages enter the queue in order and should exit in order. But strict FIFO ordering is expensive in distributed systems because the moment you distribute the queue across multiple servers (partitions), messages on partition A and partition B have no shared clock.
// Messages published in order
queue.publish('user.updated', { userId: 1, name: 'Alice' }); // t=0
queue.publish('user.updated', { userId: 1, name: 'Alice B' }); // t=1
// With multiple partitions, consumer might see:
// Partition 1: { name: 'Alice B' } (processed first!)
// Partition 2: { name: 'Alice' } (processed second, overwrites!)
// Result: user name is 'Alice' instead of 'Alice B'How to preserve ordering when it matters:
- Use a partition key (e.g., userId) so all messages for the same entity go to the same partition
- Ordering is only guaranteed within a partition, not across partitions
- Global ordering requires a single partition, which limits throughputWhat is throughput?The number of requests or operations a system can handle per unit of time, like requests per second.
| Ordering level | How | Throughput | Use case |
|---|---|---|---|
| No ordering guarantee | Multiple partitions, any assignment | Highest | Logging, analytics events |
| Per-key ordering | Partition by entity ID | High | User updates, order state changes |
| Strict global ordering | Single partition | Lowest | Financial ledger, sequential processing |
Most systems only need per-key ordering. Strict global ordering is rare and kills scalability.
Delivery guarantees
There are three delivery guarantees, and only two of them actually exist in practice.
At-most-once
The producer sends the message and does not retry. If the network drops it or the consumer crashes before processing, the message is lost.
// At-most-once: fire and forget
async function logAnalyticsEvent(event) {
try {
await queue.publish('analytics', event);
} catch (err) {
// We shrug and move on. It's just analytics.
console.warn('Analytics event lost:', err.message);
}
}When to use it: Logging, analytics, metrics. Losing a few data points is acceptable. Speed matters more than completeness.
At-least-once
The producer retries until it gets an acknowledgment. The consumer acknowledges after processing. If the consumer crashes after processing but before acknowledging, the message gets delivered again. You might process it twice.
// At-least-once: consumer must handle duplicates
queue.subscribe('payment.process', async (message) => {
const { orderId, amount } = message.data;
// Idempotency check: did we already process this?
const existing = await db.query(
'SELECT id FROM payments WHERE order_id = ? AND status = ?',
[orderId, 'completed']
);
if (existing.length > 0) {
console.log(`Payment for order ${orderId} already processed, skipping`);
message.ack(); // Acknowledge without reprocessing
return;
}
await paymentGateway.charge(orderId, amount);
await db.query(
'INSERT INTO payments (order_id, amount, status) VALUES (?, ?, ?)',
[orderId, amount, 'completed']
);
message.ack();
});When to use it: Anything where losing a message is unacceptable: payment processing, order fulfillment, user notifications. Your consumer must be idempotentWhat is idempotent?An operation that produces the same result whether you perform it once or multiple times, making retries safe. (safe to run twice with the same input).
Exactly-once (the myth)
In a distributed system, true exactly-once delivery is impossible. At every boundary (producer-to-queue, queue-to-consumer), there is a gap where a lost acknowledgment can cause duplicates or losses.
What "exactly-once" actually means in practice: Systems like Kafka advertise "exactly-once semantics," but they really provide at-least-once delivery combined with idempotent producers and transactional consumers. The infrastructure handles deduplication internally, but your application code still needs to be idempotent as a safety net.
| Guarantee | Message loss? | Duplicates? | Consumer requirement | Best for |
|---|---|---|---|---|
| At-most-once | Yes, possible | No | None | Analytics, logging, metrics |
| At-least-once | No | Yes, possible | Must be idempotent | Payments, orders, notifications |
| "Exactly-once" | No | Handled by infra | Still should be idempotent | When available and you need simplicity |
BackpressureWhat is backpressure?A mechanism that slows down a fast producer when a slow consumer can't keep up, preventing memory from filling up with unprocessed data.
When producers send messages faster than consumers can process them, the queue grows, memory fills up, and eventually something breaks. Here are the main strategies for handling it:
// Strategy 1: Bounded queue with rejection
const queue = new BoundedQueue({ maxSize: 10000 });
async function publish(message) {
if (queue.isFull()) {
// Tell the producer to slow down
throw new Error('Queue full, try again later');
// The producer can implement exponential backoff
}
await queue.enqueue(message);
}
// Strategy 2: Rate limiting producers
const rateLimiter = new RateLimiter({
maxPerSecond: 1000,
strategy: 'sliding-window'
});
async function publish(message) {
await rateLimiter.acquire(); // Blocks until a slot is available
await queue.enqueue(message);
}
// Strategy 3: Auto-scaling consumers
async function monitorQueueDepth() {
const depth = await queue.getDepth();
const processingRate = await queue.getProcessingRate();
const timeToEmpty = depth / processingRate;
if (timeToEmpty > 300) { // More than 5 minutes behind
await scaleConsumers(Math.ceil(depth / processingRate / 60));
}
}| Strategy | Mechanism | Tradeoff |
|---|---|---|
| Bounded queue | Reject messages when full | Producers must handle rejection |
| Rate limiting | Slow producers down | Adds latency to producers |
| Auto-scale consumers | Spin up more workers | Costs more, takes time to scale |
| Overflow to disk | Spill from memory to disk | Slower processing, disk I/O limits |
| Drop oldest messages | Evict old messages to make room | Acceptable only for time-sensitive data |
For payment processing, you never drop messages: you scale consumers or rate-limit producers. For real-time sensor data, dropping the oldest reading might be fine because only the latest value matters.
Queue technologies at a glance
| Technology | Type | Ordering | Best for |
|---|---|---|---|
| RabbitMQ | Message broker | Per-queue FIFO | Task distribution, RPC patterns |
| Apache Kafka | Distributed log | Per-partition | Event streaming, high throughput |
| AWS SQS | Managed queue | Best-effort (FIFO available) | Simple async tasks, serverless |
| Redis Streams | In-memory log | Per-stream | Low-latency, ephemeral workloads |
| Google Pub/Sub | Managed pub/sub | Per-subscription | Cloud-native, global distribution |
The choice flows from your delivery guarantee requirements, ordering needs, and throughputWhat is throughput?The number of requests or operations a system can handle per unit of time, like requests per second. expectations. Start with the requirements, not the technology.