Your application might be fast in development with 50 rows, then grind to a halt in production with 5 million. The database is usually where that slowdown lives. Understanding how databases actually find and return data gives you the tools to fix it systematically rather than guessing.
How databases find your data
The full table scanWhat is full table scan?Reading every row in a database table because no suitable index exists for the query - becomes very slow on large tables. problem
When you run a query, the database has two main strategies: scan every single row (a full table scan), or jump directly to the relevant rows using an indexWhat is index?A data structure the database maintains alongside a table so it can find rows by specific columns quickly instead of scanning everything.. Think of it like finding a word in a book, you can either read every page, or use the index at the back.
A full table scan is fine when your table has 500 rows. It becomes catastrophic when it has 500,000.
-- This forces a full table scan on every execution
SELECT * FROM users WHERE email LIKE '%@gmail.com%';
-- The leading wildcard means the database can't use an index on email.
-- It has to check every single row.
-- Better: if you know the domain, use a suffix index or a different approach
SELECT id, name, email FROM users WHERE email_domain = 'gmail.com';Reading query execution plans
EXPLAIN is your most important debugging tool. It shows you exactly what the database plans to do before it does it. EXPLAIN ANALYZE actually runs the query and shows you what it did, including real timing.
EXPLAIN ANALYZE
SELECT u.id, u.name, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01';The output tells you whether it used an index, how many rows it estimated vs. actually scanned, and where time was spent. The two things you're looking for are Seq Scan (bad on large tables) and high row counts in intermediate steps.
| Plan node | What it means | When it's a problem |
|---|---|---|
Seq Scan | Reads every row in the table | Large tables without a WHERE index |
Index Scan | Jumps to matching rows via index | Usually fast, what you want |
Index Only Scan | Reads only the index, no table rows | Best case: all needed columns are in the index |
Hash Join | Builds a hash table for the join | Memory-intensive on very large datasets |
Nested Loop | For each row in A, scans B | Good with small row counts, bad otherwise |
Indexes: the right tool for the right job
Single-column indexes
An indexWhat is index?A data structure the database maintains alongside a table so it can find rows by specific columns quickly instead of scanning everything. on a column lets the database find matching rows in O(log n) time instead of O(n). You should index any column you frequently filter on, joinWhat is join?A SQL operation that combines rows from two or more tables based on a shared column, letting you query related data in one request. on, or sort by.
-- Add an index after the fact (PostgreSQL / MySQL syntax)
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_users_email ON users(email);
-- Now this join is fast even with millions of orders
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;Composite indexes and column order
A composite indexWhat is composite index?A database index on two or more columns - its column order determines which queries can use it efficiently. on (a, b) can satisfy queries that filter on a alone or on a AND b, but NOT on b alone. The database reads the index left to right, and it stops being useful once it hits a gap.
-- This index helps queries that filter on (user_id), (user_id, status),
-- or (user_id, status, created_at)
CREATE INDEX idx_orders_user_status_date
ON orders(user_id, status, created_at);
-- Efficient - uses the full composite index
SELECT * FROM orders
WHERE user_id = 42 AND status = 'pending'
ORDER BY created_at DESC;
-- Also efficient - uses the leftmost prefix
SELECT * FROM orders WHERE user_id = 42;
-- NOT efficient - skips the leftmost column
SELECT * FROM orders WHERE status = 'pending';The N+1 queryWhat is n+1 query?A performance bug where fetching a list triggers one extra database query per item instead of loading all related data in a single query. problem
What it is
This is one of the most common performance killers in web apps, and it's often invisible in development. You fetch a list of 100 users, then for each user you run another query to get their orders. That's 101 queries instead of 1.
// Bad: N+1 pattern - 1 query for users + N queries for orders
const users = await db.query('SELECT * FROM users LIMIT 100');
for (const user of users) {
user.orders = await db.query(
'SELECT * FROM orders WHERE user_id = ?', [user.id]
);
}
// Good: fetch everything in two queries, join in application code
const users = await db.query('SELECT * FROM users LIMIT 100');
const userIds = users.map(u => u.id);
const orders = await db.query(
'SELECT * FROM orders WHERE user_id IN (?)', [userIds]
);
// Map orders back to users
const ordersByUser = orders.reduce((acc, order) => {
acc[order.user_id] = acc[order.user_id] || [];
acc[order.user_id].push(order);
return acc;
}, {});
for (const user of users) {
user.orders = ordersByUser[user.id] || [];
}Using JOINs properly
Sometimes it's cleaner to let the database do the joining rather than fetching in two steps.
-- One query, properly joined, with only the columns you need
SELECT
u.id,
u.name,
u.email,
COUNT(o.id) AS order_count,
SUM(o.total) AS lifetime_value
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id, u.name, u.email;Selecting only what you need
Avoid SELECT *
SELECT * is convenient but wasteful. Every extra column means more data transferred from the database to your application. On a table with large TEXT or BLOB columns, this can be significant.
-- Bad: fetches all columns including large bio and avatar_url columns
SELECT * FROM users WHERE id = 42;
-- Good: fetch only what the UI actually needs
SELECT id, name, email, created_at FROM users WHERE id = 42;Being explicit about columns also makes your code more resilient, if someone adds a column to the table later, your query won't accidentally start fetching it.
Quick reference
| Problem | Symptom | Fix |
|---|---|---|
| No index on filter column | Seq Scan on large table | CREATE INDEX on that column |
| Leading wildcard in LIKE | Full scan despite index | Restructure data or use full-text search |
| N+1 queries | Many similar queries in logs | Use IN (...) or JOIN |
| SELECT * on wide table | Slow response, high bandwidth | List only needed columns |
| Missing foreign key index | Slow JOINs | Index all foreign key columns |
| Wrong composite index order | Index not used | Put equality columns first, range columns last |