System Design/
Lesson

AI tools like ChatGPT, Claude, and Copilot are remarkably good at generating database schemas. Give them a description of your application, and they'll produce CREATE TABLE statements, relationships, indexes, and even sample queries in seconds. But "remarkably good" is not the same as "production-ready." This lesson covers where AI helps, where it fails, and how to build a hybrid workflow that gets you to a solid schemaWhat is schema?A formal definition of the structure your data must follow - which fields exist, what types they have, and which are required. faster.

AI strengths and weaknesses for schemaWhat is schema?A formal definition of the structure your data must follow - which fields exist, what types they have, and which are required. design

AreaAI strengthAI weakness
Initial schema generationGenerates normalized tables from requirements in secondsOver-normalizes, creates lookup tables for things that could be an enum
SQL syntaxCorrect CREATE TABLE, ALTER TABLE, constraintsMay use syntax specific to one database (e.g., MySQL syntax when you want PostgreSQL)
Naming conventionsConsistent snake_case, clear column namesSometimes verbose (user_profile_image_url vs avatar_url)
RelationshipsCorrectly identifies one-to-many, many-to-manyMay miss implicit relationships or create unnecessary join tables
IndexesSuggests indexes for obvious queriesOver-indexes (every column) or misses composite indexes for real queries
NormalizationKnows the rules of 1NF, 2NF, 3NF perfectlyNormalizes when denormalization would be the right call
Access patternsCan reason about stated patternsCannot anticipate patterns you haven't described
Domain knowledgeGeneral patterns for common appsMisses industry-specific requirements (HIPAA, PCI, GDPR)
Migration strategyCan generate migration SQLDoesn't consider live data, downtime, or rollback plans
02

Prompt patterns that work

Generating a schemaWhat is schema?A formal definition of the structure your data must follow - which fields exist, what types they have, and which are required. from requirements

A good prompt is specific about the domain, the expected scale, and the access patterns.

Design a PostgreSQL schema for an online bookstore with these requirements:
- Users can browse books, add them to a cart, and place orders
- Books have authors (a book can have multiple authors)
- Users can leave reviews (1-5 stars + text) on books they've purchased
- We need to show "average rating" on every book listing page (high traffic)
- Orders must track status: pending, paid, shipped, delivered
- Expected scale: 50K users, 100K books, 1M orders/year

Include: table definitions, primary keys, foreign keys, indexes, and any
denormalization you'd recommend for the high-traffic book listing page.

Notice how the prompt includes scale expectations and a specific performance requirement (high-traffic listing page). Without these details, AI will give you a textbook-correct but potentially impractical schema.

Reviewing normalizationWhat is normalization?Organizing database tables to reduce duplicated data by splitting information into related tables connected by foreign keys.

Here's my current schema:

CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  user_id INTEGER REFERENCES users(id),
  user_email TEXT,
  user_name TEXT,
  product_name TEXT,
  product_price NUMERIC,
  quantity INTEGER,
  total NUMERIC
);

Review this for normalization issues. For each issue:
1. Identify the normal form violation
2. Explain the concrete problem it causes (with an example)
3. Show the corrected schema

AI is excellent at spotting normalization violations because the rules are well-defined and mechanical. It will correctly identify that user_email and user_name create update anomalies, and that product_price stored in the order means you can't update prices without affecting history (which might actually be intentional, something AI might miss).

Suggesting indexes

Given this schema and these queries, suggest indexes:

Tables: users, orders, order_items, products

Frequent queries:
1. Get all orders for a user, sorted by date (runs 10K times/day)
2. Get total revenue by product for the last 30 days (runs 5 times/day)
3. Search products by name (runs 50K times/day)
4. Get user by email for login (runs 100K times/day)

For each index, explain why it helps and estimate the impact.

Including query frequency helps AI prioritize which indexes matter most.

03

What to verify: common AI mistakes

Over-normalizationWhat is normalization?Organizing database tables to reduce duplicated data by splitting information into related tables connected by foreign keys.. AI loves creating separate tables. It might suggest a statuses table with four rows ("pending", "paid", "shipped", "delivered") when a simple VARCHAR column or PostgreSQL ENUM would be simpler and faster.

-- AI might suggest this (over-engineered)
CREATE TABLE order_statuses (
  id SERIAL PRIMARY KEY,
  name VARCHAR(50) UNIQUE
);
-- Then JOIN to it every time you query orders

-- Better for a small, fixed set of values
CREATE TYPE order_status AS ENUM ('pending', 'paid', 'shipped', 'delivered');
CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  status order_status DEFAULT 'pending'
);

Missing access patterns. AI designs based on what you tell it. If you forget to mention "we need to find all orders shipped to a specific ZIP code," AI won't create an indexWhat is index?A data structure the database maintains alongside a table so it can find rows by specific columns quickly instead of scanning everything. for it. Always review the schemaWhat is schema?A formal definition of the structure your data must follow - which fields exist, what types they have, and which are required. against every screen and APIWhat is api?A set of rules that lets one program talk to another, usually over the internet, by sending requests and getting responses. endpointWhat is endpoint?A specific URL path on a server that handles a particular type of request, like GET /api/users. in your application.

Unnecessary indexes. AI might suggest an index on every foreign keyWhat is foreign key?A column in one database table that references a row in another table, creating a link between the two. and every column mentioned in a WHERE clause. Each index slows down writes and uses storage. Only index columns that appear in frequent, performance-critical queries.

Database-specific syntax. AI sometimes mixes MySQL and PostgreSQL syntax. Always verify that the generated SQLWhat is sql?A language for querying and managing data in relational databases, letting you insert, read, update, and delete rows across tables. runs on your target database. AUTO_INCREMENT is MySQL; SERIAL or GENERATED ALWAYS AS IDENTITY is PostgreSQL.

04

Hybrid workflow

The most effective approach combines AI speed with human judgment:

  1. Describe your domain to AI: include entities, relationships, scale, and access patterns
  2. Get the initial schemaWhat is schema?A formal definition of the structure your data must follow - which fields exist, what types they have, and which are required.: AI generates CREATE TABLE statements in seconds
  3. Review against your access patterns: for every page/APIWhat is api?A set of rules that lets one program talk to another, usually over the internet, by sending requests and getting responses. endpointWhat is endpoint?A specific URL path on a server that handles a particular type of request, like GET /api/users., check if the schema supports the queries you need efficiently
  4. Challenge the normalizationWhat is normalization?Organizing database tables to reduce duplicated data by splitting information into related tables connected by foreign keys.: ask yourself "do I actually need this as a separate table, or is an enum/JSONBWhat is jsonb?PostgreSQL's binary-encoded JSON column type that is indexable and supports efficient querying, unlike plain JSON. column simpler?"
  5. Validate indexes: remove indexes for infrequent queries, add composite indexes for common multi-column filters
  6. Test with realistic data: load sample data and run EXPLAIN ANALYZEWhat is explain analyze?A PostgreSQL command that shows the actual execution plan and timing for a query, used to diagnose slow queries. on your critical queries
  7. Iterate with AI: paste your EXPLAIN output and ask AI to suggest optimizations
-- Step 6: test with EXPLAIN ANALYZE
EXPLAIN ANALYZE
SELECT p.name, p.price, AVG(r.rating) AS avg_rating
FROM products p
LEFT JOIN reviews r ON r.product_id = p.id
WHERE p.category_id = 5
GROUP BY p.id;

-- Paste the output into your AI tool:
-- "This query takes 450ms. Here's the EXPLAIN ANALYZE output.
--  Suggest index changes to get it under 50ms."

AI is a fantastic first-draft tool for schema design. It eliminates the blank-page problem and handles the mechanical parts (syntax, constraints, naming) flawlessly. But it cannot replace your understanding of how your application actually uses data. The schema that looks perfect on paper might perform terribly in production because AI didn't know about your most common query. Use AI to go fast, then apply your own judgment to go right.

AI pitfall
AI-generated schemas often include created_at and updated_at columns on every table, which is a reasonable default. But AI also tends to add soft deletes (deleted_at column) everywhere, which complicates every query with WHERE deleted_at IS NULL. Only add soft deletes to tables where you actually need to recover deleted data, for most tables, a hard delete is simpler and faster.
Good to know
The EXPLAIN ANALYZE feedback loop with AI is one of the highest-value uses of AI in database work. Paste the query plan output, describe the problem, and AI can suggest index changes, query rewrites, or schema modifications with surprising accuracy, because it's working with concrete data rather than guessing about your access patterns.