You prompt an AI to build a user registration flow. It gives you 200 lines of clean, well-structured code in seconds. The happy path works. But that code has never been tested against malformed input, concurrent requests, or empty databases. AI generates code but rarely generates the tests that would reveal those gaps.
The AI testing gap
AI almost never writes tests that validate its own output unprompted. And when you explicitly ask for tests, it produces tests that mirror the implementation rather than challenge it. The edge cases, empty arrays, null values, Unicode strings, race conditions, remain invisible until a real user hits them.
What testing actually is
Testing is writing code that calls a function with known input and checks that the output matches what you expect.
// Your function
function calculateTotal(items) {
return items.reduce((sum, item) => sum + item.price * item.quantity, 0);
}
// Your test
test('calculateTotal() sums price * quantity for each item', () => {
const items = [
{ price: 10, quantity: 2 },
{ price: 5, quantity: 1 }
];
expect(calculateTotal(items)).toBe(25);
});AI will generate a test like this. But it will not generate the test for what happens when items is null, when a price is negative, or when quantity is Infinity.
Why testing matters more now than ever
Before AI, developers mentally traced edge cases as they wrote code. With AI generating a function in 5 seconds, you skip the mental model entirely. Testing replaces that lost thinking time.
| Without tests | With tests |
|---|---|
| AI generates code, you deploy it hoping it works | AI generates code, tests verify it actually works |
| Bugs surface in production, users report them | Bugs surface in development, tests catch them |
| Refactoring AI code is terrifying, what might break? | Refactoring is safe, tests confirm nothing broke |
| You cannot tell if AI output is correct or plausible | You have concrete evidence of correctness |
| Technical debt accumulates silently | Regressions are caught immediately |
Tests as documentation
Tests document behavior in a way that is always up to date, if the behavior changes and the test was not updated, the test fails.
test('user can login with valid credentials', () => {
const result = login({ email: 'alice@example.com', password: 'correct123' });
expect(result.success).toBe(true);
expect(result.token).toBeDefined();
expect(result.user.email).toBe('alice@example.com');
});
test('user cannot login with wrong password', () => {
const result = login({ email: 'alice@example.com', password: 'wrong' });
expect(result.success).toBe(false);
expect(result.token).toBeUndefined();
});When AI generates a 50-line authenticationWhat is authentication?Verifying who a user is, typically through credentials like a password or token. function, these tests tell you what it is supposed to do without reading a single line of the implementation.
Confidence to refactor AI code
AI-generated code often needs improvement, inefficient logic, unclear naming, poor architecture. Tests give you a safety net: change the implementation, run the tests, and if they pass, you know you did not break anything. Always write tests first, then refactor.
The three types of tests
The testing pyramid gives you a framework for how many of each type to write.
| Type | What it tests | Speed | Quantity | Example |
|---|---|---|---|---|
| Unit | One function or component in isolation | Milliseconds | Many | calculateTax(100, 0.2) returns 20 |
| Integration | Multiple units working together | Seconds | Some | Cart + price formatter display correct total |
| End-to-end | Full application in a real browser | Minutes | Few | User completes checkout flow |
/\
/ \ E2E tests (few, critical paths)
/____\
/ \ Integration tests (some, workflows)
/________\ Unit tests (many, business logic)More unit tests, fewer E2E testsWhat is e2e test?An automated check that drives the full application the way a user would, clicking buttons and filling forms in a real browser.. Unit tests run in milliseconds and pinpoint exactly what failed. E2E tests take minutes and might fail for multiple reasons.
What to test and what to skip
Not everything deserves a test. Focus your energy where bugs cause the most damage.
Test these:
- Business logic (calculations, validations, data transformations)
- Edge cases (empty inputs, null values, extreme numbers, Unicode)
- Critical user flows (authenticationWhat is authentication?Verifying who a user is, typically through credentials like a password or token., payments, data persistence)
- Bug fixes (a test for every bug prevents regression)
- AI-generated code you do not fully understand
Skip these:
- Simple getters and setters with no logic
- Third-party libraries (they maintain their own tests)
- Throwaway prototypes you will delete next week
- Static content with no conditional logic
Coverage is a guide, not a goal
Code coverageWhat is code coverage?A metric showing what percentage of your code is exercised by your tests, measured by lines, branches, or functions. measures what percentage of your code runs during tests. 100% coverage does not mean bug-free code, it means every line was executed, not that every scenario was tested.
Aim for 70-80% coverage on meaningful code. Focus on business logic, complex branches, and historically bug-prone areas.
The workflow for testing AI-generated code
- Prompt AI for the feature code
- Read the code and identify the main behaviors, inputs, and outputs
- Write tests yourself for those behaviors, including edge cases
- Optionally ask AI for additional test ideas, but review every one
- Run the tests: if they all pass on the first run, you probably did not test hard enough
- Fix failures in the code, not in the tests (unless the test expectation was wrong)