Database Query Optimization

Create a free account to save your progress

Earn XP, track streaks, and sync your dashboard across devices.

Lesson

Your application might be fast in development with 50 rows, then grind to a halt in production with 5 million. The database is usually where that slowdown lives. Understanding how databases actually find and return data gives you the tools to fix it systematically rather than guessing.

How databases find your data

The full table scanWhat is full table scan?Reading every row in a database table because no suitable index exists for the query - becomes very slow on large tables. problem

When you run a query, the database has two main strategies: scan every single row (a full table scan), or jump directly to the relevant rows using an indexWhat is index?A data structure the database maintains alongside a table so it can find rows by specific columns quickly instead of scanning everything.. Think of it like finding a word in a book, you can either read every page, or use the index at the back.

A full table scan is fine when your table has 500 rows. It becomes catastrophic when it has 500,000.

-- This forces a full table scan on every execution
SELECT * FROM users WHERE email LIKE '%@gmail.com%';

-- The leading wildcard means the database can't use an index on email.
-- It has to check every single row.

-- Better: if you know the domain, use a suffix index or a different approach
SELECT id, name, email FROM users WHERE email_domain = 'gmail.com';

Reading query execution plans

EXPLAIN is your most important debugging tool. It shows you exactly what the database plans to do before it does it. EXPLAIN ANALYZE actually runs the query and shows you what it did, including real timing.

EXPLAIN ANALYZE
SELECT u.id, u.name, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01';

The output tells you whether it used an index, how many rows it estimated vs. actually scanned, and where time was spent. The two things you're looking for are Seq Scan (bad on large tables) and high row counts in intermediate steps.

Plan node	What it means	When it's a problem
`Seq Scan`	Reads every row in the table	Large tables without a WHERE index
`Index Scan`	Jumps to matching rows via index	Usually fast, what you want
`Index Only Scan`	Reads only the index, no table rows	Best case: all needed columns are in the index
`Hash Join`	Builds a hash table for the join	Memory-intensive on very large datasets
`Nested Loop`	For each row in A, scans B	Good with small row counts, bad otherwise

Indexes: the right tool for the right job

Single-column indexes

An indexWhat is index?A data structure the database maintains alongside a table so it can find rows by specific columns quickly instead of scanning everything. on a column lets the database find matching rows in O(log n) time instead of O(n). You should index any column you frequently filter on, joinWhat is join?A SQL operation that combines rows from two or more tables based on a shared column, letting you query related data in one request. on, or sort by.

-- Add an index after the fact (PostgreSQL / MySQL syntax)
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_users_email ON users(email);

-- Now this join is fast even with millions of orders
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;

Indexes speed up reads but slow down writes. Every INSERT, UPDATE, and DELETE has to update all relevant indexes. Don't add indexes speculatively, add them when you have evidence that a query is slow.

Composite indexes and column order

A composite indexWhat is composite index?A database index on two or more columns - its column order determines which queries can use it efficiently. on (a, b) can satisfy queries that filter on a alone or on a AND b, but NOT on b alone. The database reads the index left to right, and it stops being useful once it hits a gap.

-- This index helps queries that filter on (user_id), (user_id, status),
-- or (user_id, status, created_at)
CREATE INDEX idx_orders_user_status_date
  ON orders(user_id, status, created_at);

-- Efficient - uses the full composite index
SELECT * FROM orders
WHERE user_id = 42 AND status = 'pending'
ORDER BY created_at DESC;

-- Also efficient - uses the leftmost prefix
SELECT * FROM orders WHERE user_id = 42;

-- NOT efficient - skips the leftmost column
SELECT * FROM orders WHERE status = 'pending';

The N+1 queryWhat is n+1 query?A performance bug where fetching a list triggers one extra database query per item instead of loading all related data in a single query. problem

What it is

This is one of the most common performance killers in web apps, and it's often invisible in development. You fetch a list of 100 users, then for each user you run another query to get their orders. That's 101 queries instead of 1.

// Bad: N+1 pattern - 1 query for users + N queries for orders
const users = await db.query('SELECT * FROM users LIMIT 100');
for (const user of users) {
  user.orders = await db.query(
    'SELECT * FROM orders WHERE user_id = ?', [user.id]
  );
}

// Good: fetch everything in two queries, join in application code
const users = await db.query('SELECT * FROM users LIMIT 100');
const userIds = users.map(u => u.id);
const orders = await db.query(
  'SELECT * FROM orders WHERE user_id IN (?)', [userIds]
);

// Map orders back to users
const ordersByUser = orders.reduce((acc, order) => {
  acc[order.user_id] = acc[order.user_id] || [];
  acc[order.user_id].push(order);
  return acc;
}, {});

for (const user of users) {
  user.orders = ordersByUser[user.id] || [];
}

Using JOINs properly

Sometimes it's cleaner to let the database do the joining rather than fetching in two steps.

-- One query, properly joined, with only the columns you need
SELECT
  u.id,
  u.name,
  u.email,
  COUNT(o.id) AS order_count,
  SUM(o.total) AS lifetime_value
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id, u.name, u.email;

Be careful with JOINs on unindexed columns, a join without an index on the foreign key is just a full table scan dressed up. Always index your foreign keys.

Selecting only what you need

Avoid SELECT *

SELECT * is convenient but wasteful. Every extra column means more data transferred from the database to your application. On a table with large TEXT or BLOB columns, this can be significant.

-- Bad: fetches all columns including large bio and avatar_url columns
SELECT * FROM users WHERE id = 42;

-- Good: fetch only what the UI actually needs
SELECT id, name, email, created_at FROM users WHERE id = 42;

Being explicit about columns also makes your code more resilient, if someone adds a column to the table later, your query won't accidentally start fetching it.

Quick reference

Problem	Symptom	Fix
No index on filter column	`Seq Scan` on large table	`CREATE INDEX` on that column
Leading wildcard in LIKE	Full scan despite index	Restructure data or use full-text search
N+1 queries	Many similar queries in logs	Use `IN (...)` or JOIN
SELECT * on wide table	Slow response, high bandwidth	List only needed columns
Missing foreign key index	Slow JOINs	Index all foreign key columns
Wrong composite index order	Index not used	Put equality columns first, range columns last

Done

Complete & Next