Message Queue Fundamentals

Create a free account to save your progress

Earn XP, track streaks, and sync your dashboard across devices.

Lesson

You have probably written code where service A calls service B directly over HTTPWhat is http?The protocol browsers and servers use to exchange web pages, API data, and other resources, defining how requests and responses are formatted.. It works fine until service B goes down, takes too long, or gets overwhelmed. A message queue sits between them: A drops a message into the queue, B picks it up when ready. They never talk to each other directly.

What is a message queue?

A message queue accepts messages from producers, stores them temporarily, and delivers them to consumers. The sender and receiver never need to be available at the same time.

// Without a queue: tight coupling
async function processOrder(order) {
  await paymentService.charge(order);    // What if payment is slow?
  await inventoryService.reserve(order);  // What if inventory is down?
  await emailService.sendConfirmation(order); // Do we really need to wait?
  return { status: 'completed' };
}

// With a queue: decoupled
async function processOrder(order) {
  await paymentService.charge(order); // Only the critical path is synchronous
  await queue.publish('order.completed', order); // Everything else is async
  return { status: 'completed' };
}

// Separate consumers handle the rest independently
queue.subscribe('order.completed', async (order) => {
  await inventoryService.reserve(order);
});

queue.subscribe('order.completed', async (order) => {
  await emailService.sendConfirmation(order);
});

In the first version, the user waits for all three services. In the second, the user waits only for payment. Inventory and email happen in the background, and if the email service is down, the email gets sent later when it recovers.

FIFO ordering

FIFO stands for First In, First Out. Messages enter the queue in order and should exit in order. But strict FIFO ordering is expensive in distributed systems because the moment you distribute the queue across multiple servers (partitions), messages on partition A and partition B have no shared clock.

// Messages published in order
queue.publish('user.updated', { userId: 1, name: 'Alice' });   // t=0
queue.publish('user.updated', { userId: 1, name: 'Alice B' }); // t=1

// With multiple partitions, consumer might see:
// Partition 1: { name: 'Alice B' }  (processed first!)
// Partition 2: { name: 'Alice' }     (processed second, overwrites!)
// Result: user name is 'Alice' instead of 'Alice B'

How to preserve ordering when it matters:

Use a partition key (e.g., userId) so all messages for the same entity go to the same partition
Ordering is only guaranteed within a partition, not across partitions
Global ordering requires a single partition, which limits throughputWhat is throughput?The number of requests or operations a system can handle per unit of time, like requests per second.

Ordering level	How	Throughput	Use case
No ordering guarantee	Multiple partitions, any assignment	Highest	Logging, analytics events
Per-key ordering	Partition by entity ID	High	User updates, order state changes
Strict global ordering	Single partition	Lowest	Financial ledger, sequential processing

Most systems only need per-key ordering. Strict global ordering is rare and kills scalability.

Delivery guarantees

There are three delivery guarantees, and only two of them actually exist in practice.

At-most-once

The producer sends the message and does not retry. If the network drops it or the consumer crashes before processing, the message is lost.

// At-most-once: fire and forget
async function logAnalyticsEvent(event) {
  try {
    await queue.publish('analytics', event);
  } catch (err) {
    // We shrug and move on. It's just analytics.
    console.warn('Analytics event lost:', err.message);
  }
}

When to use it: Logging, analytics, metrics. Losing a few data points is acceptable. Speed matters more than completeness.

At-least-once

The producer retries until it gets an acknowledgment. The consumer acknowledges after processing. If the consumer crashes after processing but before acknowledging, the message gets delivered again. You might process it twice.

// At-least-once: consumer must handle duplicates
queue.subscribe('payment.process', async (message) => {
  const { orderId, amount } = message.data;

  // Idempotency check: did we already process this?
  const existing = await db.query(
    'SELECT id FROM payments WHERE order_id = ? AND status = ?',
    [orderId, 'completed']
  );

  if (existing.length > 0) {
    console.log(`Payment for order ${orderId} already processed, skipping`);
    message.ack(); // Acknowledge without reprocessing
    return;
  }

  await paymentGateway.charge(orderId, amount);
  await db.query(
    'INSERT INTO payments (order_id, amount, status) VALUES (?, ?, ?)',
    [orderId, amount, 'completed']
  );

  message.ack();
});

When to use it: Anything where losing a message is unacceptable: payment processing, order fulfillment, user notifications. Your consumer must be idempotentWhat is idempotent?An operation that produces the same result whether you perform it once or multiple times, making retries safe. (safe to run twice with the same input).

Exactly-once (the myth)

In a distributed system, true exactly-once delivery is impossible. At every boundary (producer-to-queue, queue-to-consumer), there is a gap where a lost acknowledgment can cause duplicates or losses.

What "exactly-once" actually means in practice: Systems like Kafka advertise "exactly-once semantics," but they really provide at-least-once delivery combined with idempotent producers and transactional consumers. The infrastructure handles deduplication internally, but your application code still needs to be idempotent as a safety net.

Guarantee	Message loss?	Duplicates?	Consumer requirement	Best for
At-most-once	Yes, possible	No	None	Analytics, logging, metrics
At-least-once	No	Yes, possible	Must be idempotent	Payments, orders, notifications
"Exactly-once"	No	Handled by infra	Still should be idempotent	When available and you need simplicity

BackpressureWhat is backpressure?A mechanism that slows down a fast producer when a slow consumer can't keep up, preventing memory from filling up with unprocessed data.

When producers send messages faster than consumers can process them, the queue grows, memory fills up, and eventually something breaks. Here are the main strategies for handling it:

// Strategy 1: Bounded queue with rejection
const queue = new BoundedQueue({ maxSize: 10000 });

async function publish(message) {
  if (queue.isFull()) {
    // Tell the producer to slow down
    throw new Error('Queue full, try again later');
    // The producer can implement exponential backoff
  }
  await queue.enqueue(message);
}

// Strategy 2: Rate limiting producers
const rateLimiter = new RateLimiter({
  maxPerSecond: 1000,
  strategy: 'sliding-window'
});

async function publish(message) {
  await rateLimiter.acquire(); // Blocks until a slot is available
  await queue.enqueue(message);
}

// Strategy 3: Auto-scaling consumers
async function monitorQueueDepth() {
  const depth = await queue.getDepth();
  const processingRate = await queue.getProcessingRate();
  const timeToEmpty = depth / processingRate;

  if (timeToEmpty > 300) { // More than 5 minutes behind
    await scaleConsumers(Math.ceil(depth / processingRate / 60));
  }
}

Strategy	Mechanism	Tradeoff
Bounded queue	Reject messages when full	Producers must handle rejection
Rate limiting	Slow producers down	Adds latency to producers
Auto-scale consumers	Spin up more workers	Costs more, takes time to scale
Overflow to disk	Spill from memory to disk	Slower processing, disk I/O limits
Drop oldest messages	Evict old messages to make room	Acceptable only for time-sensitive data

For payment processing, you never drop messages: you scale consumers or rate-limit producers. For real-time sensor data, dropping the oldest reading might be fine because only the latest value matters.

Queue technologies at a glance

Technology	Type	Ordering	Best for
RabbitMQ	Message broker	Per-queue FIFO	Task distribution, RPC patterns
Apache Kafka	Distributed log	Per-partition	Event streaming, high throughput
AWS SQS	Managed queue	Best-effort (FIFO available)	Simple async tasks, serverless
Redis Streams	In-memory log	Per-stream	Low-latency, ephemeral workloads
Google Pub/Sub	Managed pub/sub	Per-subscription	Cloud-native, global distribution

The choice flows from your delivery guarantee requirements, ordering needs, and throughputWhat is throughput?The number of requests or operations a system can handle per unit of time, like requests per second. expectations. Start with the requirements, not the technology.

AI pitfall

AI will often recommend Kafka for any queue-related problem because Kafka dominates the training data. But Kafka is overkill for most applications. If you need a simple task queue for sending emails or processing images, SQS or a Redis-based queue like BullMQ is simpler to operate and costs a fraction of a Kafka cluster.

Edge case

AWS SQS standard queues are much cheaper and higher throughput than SQS FIFO queues. If your messages are independent (like sending notification emails), strict ordering is unnecessary and you are paying extra for a guarantee you do not need.

Done

Complete & Next