Mock Data Generation with Factories & Faker

Create a free account to save your progress

Earn XP, track streaks, and sync your dashboard across devices.

Lesson

Every test needs data. You can't test a user creation endpointWhat is endpoint?A specific URL path on a server that handles a particular type of request, like GET /api/users. without a user object, can't test a search feature without records to search through, can't test paginationWhat is pagination?Splitting a large set of results into smaller pages so the server and client only handle a manageable chunk at a time. without enough rows in the database. The question isn't whether you need mockWhat is mock?A fake replacement for a real dependency in tests that records how it was called so you can verify interactions. data — it's how you generate it without creating a maintenance nightmare.

If you've ever seen a test file with 200 lines of hand-crafted JSONWhat is json?A text format for exchanging data between systems. It uses key-value pairs and arrays, and every programming language can read and write it. objects at the top, you already know the problem. Let's see how to do it better.

The problem with hard-coded test data

Here's what test data looks like when you start:

const testUser = {
  name: 'John Doe',
  email: '[email protected]',
  age: 30,
  role: 'admin',
  createdAt: '2025-01-01T00:00:00Z'
};

This works fine for one test. But then you need a second user — with a different email because your database has a unique constraint. Then a third. Then you need one that's a non-admin. Then one with a very long name to test truncation. Before you know it, you're managing 15 nearly-identical objects that differ in one field each.

AI pitfall

When you ask AI to write tests, it generates unique hard-coded objects for every test case. This works but creates fragile, duplicated data that's painful to update when your schema changes. Ask for factories instead.

Factory functions — the right abstraction

A factory is a function that returns a new object every time, with sensible defaults you can override:

function createTestUser(overrides = {}) {
  return {
    id: crypto.randomUUID(),
    name: 'Test User',
    email: `user-${Date.now()}@test.com`,
    age: 25,
    role: 'user',
    createdAt: new Date().toISOString(),
    ...overrides,
  };
}

Now your tests read like this:

// Default user — all you need for most tests
const user = createTestUser();

// Admin user — override just what matters
const admin = createTestUser({ role: 'admin' });

// User with specific email — for duplicate-check tests
const alice = createTestUser({ email: '[email protected]' });

The ...overrides spread is the key pattern. Every call gets unique defaults, but you can override any field. When you add a new field to your user schemaWhat is schema?A formal definition of the structure your data must follow - which fields exist, what types they have, and which are required., you update one function — not 50 test objects.

Multiple factories for related data

Real apps have related entities. Build factories that compose:

function createTestPost(overrides = {}) {
  return {
    id: crypto.randomUUID(),
    title: 'Test Post',
    body: 'This is a test post body with enough content to be realistic.',
    authorId: crypto.randomUUID(),
    tags: ['test'],
    published: true,
    createdAt: new Date().toISOString(),
    ...overrides,
  };
}

// Create a user with their posts
const author = createTestUser();
const posts = [
  createTestPost({ authorId: author.id, title: 'First post' }),
  createTestPost({ authorId: author.id, title: 'Second post' }),
];

Faker.js — realistic data at scale

Factory functions solve the structure problem. But "Test User" and "[email protected]" don't look like real data — which means your tests might miss bugs that only surface with realistic inputs (unicode names, long emails, special characters).

Faker.js generates realistic-looking data:

npm install -D @faker-js/faker

import { faker } from '@faker-js/faker';

function createTestUser(overrides = {}) {
  return {
    id: crypto.randomUUID(),
    name: faker.person.fullName(),
    email: faker.internet.email(),
    age: faker.number.int({ min: 18, max: 80 }),
    role: 'user',
    avatar: faker.image.avatar(),
    bio: faker.lorem.sentence(),
    createdAt: faker.date.past().toISOString(),
    ...overrides,
  };
}

const user = createTestUser();
// { name: 'María García', email: '[email protected]', age: 34, ... }

Every call produces different data. Names come from real name databases, emails follow realistic patterns, dates fall in sensible ranges.

Useful Faker methods

Method	Generates	Example output
`faker.person.fullName()`	Full name	"Elena Kowalski"
`faker.internet.email()`	Email	"[email protected]"
`faker.internet.url()`	URL	"https://fair-bicycle.info"
`faker.lorem.sentence()`	Sentence	"Voluptas eum deserunt..."
`faker.lorem.paragraphs(2)`	Paragraphs	Two realistic paragraphs
`faker.number.int({ min, max })`	Integer	42
`faker.date.past()`	Past date	2024-08-15T...
`faker.date.future()`	Future date	2026-02-20T...
`faker.image.avatar()`	Avatar URL	"https://avatars..."
`faker.string.uuid()`	UUID	"a1b2c3d4-..."
`faker.helpers.arrayElement([...])`	Random pick	Picks one from array

Good to know

Faker supports locales. Use import { faker } from '@faker-js/faker/locale/fr' to get French names and addresses. This matters when you test locale-specific features like postal code validation.

Seeding for reproducible tests

Random data makes tests flaky if the randomness triggers different code paths. Fix this with a seed:

import { faker } from '@faker-js/faker';

// Same seed = same data every time
faker.seed(12345);

const user = createTestUser();
// Always returns the exact same "random" user

Use seeding when:

Tests depend on specific data values (sorting, filtering)
You need snapshot testing with predictable output
Debugging a flaky test — seed it, reproduce it, fix it

Skip seeding when:

Tests should work with any valid data (most tests)
You want to find edge cases through randomness

Generating data in bulk

For testing paginationWhat is pagination?Splitting a large set of results into smaller pages so the server and client only handle a manageable chunk at a time., search, or performance, you need many records:

function createTestUsers(count, overrides = {}) {
  return Array.from({ length: count }, (_, i) =>
    createTestUser({
      name: faker.person.fullName(),
      email: faker.internet.email(),
      ...overrides,
    })
  );
}

// 100 users for pagination tests
const users = createTestUsers(100);

// 50 admin users
const admins = createTestUsers(50, { role: 'admin' });

Edge case data

This is where most AI-generated test data falls short. Real users submit surprising input. Your mockWhat is mock?A fake replacement for a real dependency in tests that records how it was called so you can verify interactions. data should too:

const edgeCases = [
  createTestUser({ name: '' }),                          // empty name
  createTestUser({ name: 'A' }),                         // single char
  createTestUser({ name: 'José María García-López' }),   // accented + hyphen
  createTestUser({ name: '日本太郎' }),                   // CJK characters
  createTestUser({ name: 'A'.repeat(500) }),             // very long
  createTestUser({ email: '[email protected]' }),     // plus addressing
  createTestUser({ age: 0 }),                            // zero
  createTestUser({ age: -1 }),                           // negative
  createTestUser({ age: 999 }),                          // unrealistic but valid int
  createTestUser({ bio: null }),                         // null optional field
];