Node.js & Express - Streams: processing data in motion

Create a free account to save your progress

Earn XP, track streaks, and sync your dashboard across devices.

Lesson

Imagine you are moving a library of books to a new building. You could load every single book into a truck, drive once, and unload, but what if you have more books than the truck can hold? Streams are the opposite approach: you carry a stack at a time, back and forth, until the job is done. Your truck never overflows, and the new building starts receiving books immediately.

That is exactly what Node.js streams do with data.

Why streams matter

Without streams, processing a large file means loading it entirely into RAM first.

Approach	2 GB file	Memory usage
`readFile`	Loads everything	~2 GB RAM
`createReadStream`	Chunk by chunk	~64 KB RAM
Stream + pipe	Chunk by chunk	~64 KB RAM

The difference is not academic, a 2 GB readFile on a small server will crash the process. Streams keep memory usage flat regardless of file size.

The four streamWhat is stream?A way to process data in small chunks as it arrives instead of loading everything into memory at once, keeping memory usage low for large files. types

Readable  →  data flows out  (reading a file, HTTP request body)
Writable  →  data flows in   (writing a file, HTTP response)
Duplex    →  both directions (TCP socket)
Transform →  read + modify + write (gzip, encryption, CSV parsing)

Reading with a Readable streamWhat is stream?A way to process data in small chunks as it arrives instead of loading everything into memory at once, keeping memory usage low for large files.

import { createReadStream } from 'fs';

const stream = createReadStream('./huge-file.txt', {
  encoding: 'utf-8',
  highWaterMark: 1024 * 64  // 64 KB chunks
});

stream.on('data', (chunk) => {
  console.log('Received chunk:', chunk.length, 'bytes');
  // Process chunk here - no need to store it
});

stream.on('end', () => {
  console.log('Finished reading');
});

stream.on('error', (err) => {
  console.error('Read error:', err.message);
});

Writing with a Writable streamWhat is stream?A way to process data in small chunks as it arrives instead of loading everything into memory at once, keeping memory usage low for large files.

import { createWriteStream } from 'fs';

const output = createWriteStream('./output.txt');

output.write('First line\n');
output.write('Second line\n');
output.end('Last line\n');  // Signal that writing is done

output.on('finish', () => {
  console.log('File written successfully');
});

Call .end() when you are done writing. It flushes the internal buffer and emits finish. If you forget it, the file stays open and finish never fires.

Piping streams together

.pipe() connects a Readable to a Writable and handles the data flow automatically, including pausing the source when the destination is too slow (backpressureWhat is backpressure?A mechanism that slows down a fast producer when a slow consumer can't keep up, preventing memory from filling up with unprocessed data.).

import { createReadStream, createWriteStream } from 'fs';
import { createGzip } from 'zlib';

// Copy a file
createReadStream('./input.txt')
  .pipe(createWriteStream('./output.txt'));

// Compress while copying
createReadStream('./input.txt')
  .pipe(createGzip())               // Transform stream
  .pipe(createWriteStream('./input.txt.gz'));

Each step in the chain only holds one chunk at a time. The whole pipelineWhat is pipeline?A sequence of automated steps (install, lint, test, build, deploy) that code passes through before reaching production. uses about the same memory whether the file is 1 KB or 10 GB.

The modern approach: pipelineWhat is pipeline?A sequence of automated steps (install, lint, test, build, deploy) that code passes through before reaching production.

.pipe() has one annoying problem, errors in the middle of the chain do not automatically propagate. pipeline() from stream/promises fixes that and plays nicely with async/await.

import { pipeline } from 'stream/promises';
import { createReadStream, createWriteStream } from 'fs';
import { createGzip } from 'zlib';

async function compressFile(input, output) {
  try {
    await pipeline(
      createReadStream(input),
      createGzip(),
      createWriteStream(output)
    );
    console.log(`Compressed ${input} → ${output}`);
  } catch (err) {
    console.error('Compression failed:', err.message);
  }
}

await compressFile('./video.mp4', './video.mp4.gz');

Building a Transform streamWhat is stream?A way to process data in small chunks as it arrives instead of loading everything into memory at once, keeping memory usage low for large files.

A Transform stream sits in the middle of a pipelineWhat is pipeline?A sequence of automated steps (install, lint, test, build, deploy) that code passes through before reaching production.. It receives chunks, modifies them, and pushes the result downstream. This is how createGzip() works internally.

import { Transform } from 'stream';

const toUpperCase = new Transform({
  transform(chunk, encoding, callback) {
    // Push the transformed chunk
    this.push(chunk.toString().toUpperCase());
    callback(); // Signal that this chunk is done
  }
});

// Pipe stdin through the transformer to stdout
process.stdin
  .pipe(toUpperCase)
  .pipe(process.stdout);
// Type "hello" → outputs "HELLO"

Processing a large CSV file

Streams shine when you need to process millions of rows without loading the whole file.

import { createReadStream } from 'fs';
import { createInterface } from 'readline';

async function processCSV(filename) {
  const fileStream = createReadStream(filename);
  const rl = createInterface({
    input: fileStream,
    crlfDelay: Infinity
  });

  let lineCount = 0;

  for await (const line of rl) {
    const [name, email, age] = line.split(',');
    // Process one row at a time - never more than one line in memory
    await saveToDatabase({ name, email, age });
    lineCount++;
  }

  console.log(`Processed ${lineCount} rows`);
}

Streaming an HTTPWhat is http?The protocol browsers and servers use to exchange web pages, API data, and other resources, defining how requests and responses are formatted. response

HTTP responses and requests are streams too. You can pipe a file directly to the response without ever buffering the full content in your server.

import { createServer } from 'http';
import { createReadStream } from 'fs';

const server = createServer((req, res) => {
  res.setHeader('Content-Type', 'application/zip');
  res.setHeader('Content-Disposition', 'attachment; filename="file.zip"');

  const fileStream = createReadStream('./large-file.zip');

  fileStream.on('error', () => {
    res.statusCode = 500;
    res.end('File not found');
  });

  fileStream.pipe(res); // res is a Writable stream
});

server.listen(3000);

The client starts receiving data immediately: it does not wait for the entire file to be read into memory first. This makes downloads feel faster and keeps your server memory usage low even under heavy traffic.

Create a free account to save your progress

Essential to know

Why streams matter

The four streamWhat is stream?A way to process data in small chunks as it arrives instead of loading everything into memory at once, keeping memory usage low for large files. Ask AI for more types

Reading with a Readable streamWhat is stream?A way to process data in small chunks as it arrives instead of loading everything into memory at once, keeping memory usage low for large files. Ask AI for more

Writing with a Writable streamWhat is stream?A way to process data in small chunks as it arrives instead of loading everything into memory at once, keeping memory usage low for large files. Ask AI for more