Every test needs data. You can't test a user creation endpointWhat is endpoint?A specific URL path on a server that handles a particular type of request, like GET /api/users. without a user object, can't test a search feature without records to search through, can't test paginationWhat is pagination?Splitting a large set of results into smaller pages so the server and client only handle a manageable chunk at a time. without enough rows in the database. The question isn't whether you need mockWhat is mock?A fake replacement for a real dependency in tests that records how it was called so you can verify interactions. data — it's how you generate it without creating a maintenance nightmare.
If you've ever seen a test file with 200 lines of hand-crafted JSONWhat is json?A text format for exchanging data between systems. It uses key-value pairs and arrays, and every programming language can read and write it. objects at the top, you already know the problem. Let's see how to do it better.
The problem with hard-coded test data
Here's what test data looks like when you start:
const testUser = {
name: 'John Doe',
email: 'john@test.com',
age: 30,
role: 'admin',
createdAt: '2025-01-01T00:00:00Z'
};This works fine for one test. But then you need a second user — with a different email because your database has a unique constraint. Then a third. Then you need one that's a non-admin. Then one with a very long name to test truncation. Before you know it, you're managing 15 nearly-identical objects that differ in one field each.
Factory functions — the right abstraction
A factory is a function that returns a new object every time, with sensible defaults you can override:
function createTestUser(overrides = {}) {
return {
id: crypto.randomUUID(),
name: 'Test User',
email: `user-${Date.now()}@test.com`,
age: 25,
role: 'user',
createdAt: new Date().toISOString(),
...overrides,
};
}Now your tests read like this:
// Default user — all you need for most tests
const user = createTestUser();
// Admin user — override just what matters
const admin = createTestUser({ role: 'admin' });
// User with specific email — for duplicate-check tests
const alice = createTestUser({ email: 'alice@company.com' });The ...overrides spread is the key pattern. Every call gets unique defaults, but you can override any field. When you add a new field to your user schemaWhat is schema?A formal definition of the structure your data must follow - which fields exist, what types they have, and which are required., you update one function — not 50 test objects.
Multiple factories for related data
Real apps have related entities. Build factories that compose:
function createTestPost(overrides = {}) {
return {
id: crypto.randomUUID(),
title: 'Test Post',
body: 'This is a test post body with enough content to be realistic.',
authorId: crypto.randomUUID(),
tags: ['test'],
published: true,
createdAt: new Date().toISOString(),
...overrides,
};
}
// Create a user with their posts
const author = createTestUser();
const posts = [
createTestPost({ authorId: author.id, title: 'First post' }),
createTestPost({ authorId: author.id, title: 'Second post' }),
];Faker.js — realistic data at scale
Factory functions solve the structure problem. But "Test User" and "user-1234@test.com" don't look like real data — which means your tests might miss bugs that only surface with realistic inputs (unicode names, long emails, special characters).
Faker.js generates realistic-looking data:
npm install -D @faker-js/fakerimport { faker } from '@faker-js/faker';
function createTestUser(overrides = {}) {
return {
id: crypto.randomUUID(),
name: faker.person.fullName(),
email: faker.internet.email(),
age: faker.number.int({ min: 18, max: 80 }),
role: 'user',
avatar: faker.image.avatar(),
bio: faker.lorem.sentence(),
createdAt: faker.date.past().toISOString(),
...overrides,
};
}
const user = createTestUser();
// { name: 'María García', email: 'maria.garcia42@hotmail.com', age: 34, ... }Every call produces different data. Names come from real name databases, emails follow realistic patterns, dates fall in sensible ranges.
Useful Faker methods
| Method | Generates | Example output |
|---|---|---|
faker.person.fullName() | Full name | "Elena Kowalski" |
faker.internet.email() | "elena.k@gmail.com" | |
faker.internet.url() | URL | "https://fair-bicycle.info" |
faker.lorem.sentence() | Sentence | "Voluptas eum deserunt..." |
faker.lorem.paragraphs(2) | Paragraphs | Two realistic paragraphs |
faker.number.int({ min, max }) | Integer | 42 |
faker.date.past() | Past date | 2024-08-15T... |
faker.date.future() | Future date | 2026-02-20T... |
faker.image.avatar() | Avatar URL | "https://avatars..." |
faker.string.uuid() | UUID | "a1b2c3d4-..." |
faker.helpers.arrayElement([...]) | Random pick | Picks one from array |
import { faker } from '@faker-js/faker/locale/fr' to get French names and addresses. This matters when you test locale-specific features like postal code validation.Seeding for reproducible tests
Random data makes tests flaky if the randomness triggers different code paths. Fix this with a seed:
import { faker } from '@faker-js/faker';
// Same seed = same data every time
faker.seed(12345);
const user = createTestUser();
// Always returns the exact same "random" userUse seeding when:
- Tests depend on specific data values (sorting, filtering)
- You need snapshot testing with predictable output
- Debugging a flaky test — seed it, reproduce it, fix it
Skip seeding when:
- Tests should work with any valid data (most tests)
- You want to find edge cases through randomness
Generating data in bulk
For testing paginationWhat is pagination?Splitting a large set of results into smaller pages so the server and client only handle a manageable chunk at a time., search, or performance, you need many records:
function createTestUsers(count, overrides = {}) {
return Array.from({ length: count }, (_, i) =>
createTestUser({
name: faker.person.fullName(),
email: faker.internet.email(),
...overrides,
})
);
}
// 100 users for pagination tests
const users = createTestUsers(100);
// 50 admin users
const admins = createTestUsers(50, { role: 'admin' });Edge case data
This is where most AI-generated test data falls short. Real users submit surprising input. Your mockWhat is mock?A fake replacement for a real dependency in tests that records how it was called so you can verify interactions. data should too:
const edgeCases = [
createTestUser({ name: '' }), // empty name
createTestUser({ name: 'A' }), // single char
createTestUser({ name: 'José María García-López' }), // accented + hyphen
createTestUser({ name: '日本太郎' }), // CJK characters
createTestUser({ name: 'A'.repeat(500) }), // very long
createTestUser({ email: 'user+tag@example.com' }), // plus addressing
createTestUser({ age: 0 }), // zero
createTestUser({ age: -1 }), // negative
createTestUser({ age: 999 }), // unrealistic but valid int
createTestUser({ bio: null }), // null optional field
];Quick reference
| Approach | Best for | Trade-off |
|---|---|---|
| Hard-coded objects | One-off, very specific tests | Doesn't scale, high duplication |
| Factory functions | Most test suites | Requires a small upfront investment |
| Factory + Faker | Realistic data at scale | Extra dependency, random by default |
| Seeded Faker | Reproducible randomized data | Must manage seed values |
| Bulk generators | Pagination, performance, search | Can slow down test setup |
| Factory pattern | Example |
|---|---|
| Default object | createTestUser() |
| Override one field | createTestUser({ role: 'admin' }) |
| Related entities | createTestPost({ authorId: user.id }) |
| Bulk generation | createTestUsers(100) |
| Edge case set | Array of factory calls with extreme values |