Imagine running a business and needing to know: "What's our total revenue?" "How many customers do we have per country?" "What's the average order value?" These questions don't care about individual rows-they need summaries across many rows. That's exactly what SQLWhat is sql?A language for querying and managing data in relational databases, letting you insert, read, update, and delete rows across tables. aggregations provide.
Aggregate functions take many rows and boil them down to a single value. Combined with GROUP BY, they become incredibly powerful, letting you slice and dice your data any way you need.
The five essential aggregate functions
These five functions handle 95% of your aggregation needs:
-- COUNT: How many rows?
SELECT COUNT(*) FROM users; -- Total users
SELECT COUNT(email) FROM users; -- Non-NULL emails
SELECT COUNT(DISTINCT country) FROM users; -- Unique countries
-- SUM: Add up numeric values
SELECT SUM(total) FROM orders; -- Total sales
SELECT SUM(total) FROM orders WHERE status = 'completed'; -- Completed sales
-- AVG: Calculate the average
SELECT AVG(age) FROM users;
SELECT AVG(price) FROM products WHERE category = 'Electronics';
-- MIN / MAX: Find extremes
SELECT MIN(price), MAX(price) FROM products;
SELECT MIN(created_at) AS first_order, MAX(created_at) AS last_order FROM orders;
-- Combine them for a full report
SELECT
COUNT(*) AS total_orders,
SUM(total) AS revenue,
AVG(total) AS average_order,
MIN(total) AS min_order,
MAX(total) AS max_order
FROM orders;| Function | What it does | Returns NULL when |
|---|---|---|
COUNT(*) | Counts all rows | Never (returns 0) |
COUNT(column) | Counts non-NULL values | All values are NULL |
SUM(column) | Adds all values | All values are NULL |
AVG(column) | Calculates mean | All values are NULL |
MIN(column) | Smallest value | All values are NULL |
MAX(column) | Largest value | All values are NULL |
COUNT(*). COUNT(column) counts only non-NULL values. If all values in a group are NULL, aggregates return NULL (except COUNT which returns 0).GROUP BY: Creating summary reports
Aggregate functions become truly powerful when you group your data. GROUP BY collapses rows with the same values into single summary rows:
-- Count users by country
SELECT
country,
COUNT(*) AS user_count
FROM users
GROUP BY country;
-- Result:
| country | user_count |
|------------|------------|
| France | 150 |
| Belgium | 25 |
| Switzerland| 10 |How GROUP BY works
Think of GROUP BY as a funnel:
- The database sorts all rows by the GROUP BY column(s)
- Rows with identical values get grouped together
- Aggregate functions calculate within each group
- Each group produces one result row
-- Revenue by month
SELECT
DATE_TRUNC('month', created_at) AS month,
COUNT(*) AS orders,
SUM(total) AS revenue
FROM orders
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month;
-- Statistics by category
SELECT
category,
COUNT(*) AS product_count,
AVG(price) AS avg_price,
MIN(price) AS min_price,
MAX(price) AS max_price
FROM products
GROUP BY category;The GROUP BY rule
Here's the most common error beginners make: every column in your SELECT clause must either be in the GROUP BY or be an aggregate function:
-- ❌ WRONG: name isn't in GROUP BY and isn't aggregated
SELECT name, country, COUNT(*)
FROM users
GROUP BY country;
-- ✅ CORRECT: Either add to GROUP BY or use aggregate
SELECT country, COUNT(*)
FROM users
GROUP BY country;
-- ✅ ALSO CORRECT: Include name in GROUP BY
SELECT name, country, COUNT(*)
FROM users
GROUP BY name, country;The database doesn't know which "name" to show when you've grouped by country-there might be hundreds of different names in France. Either include the column in GROUP BY or use an aggregate function like MAX(name) or STRING_AGG(name, ', ').
HAVING: Filter groups after aggregation
WHERE filters rows before grouping. HAVING filters groups after aggregation. This distinction is crucial:
-- ❌ IMPOSSIBLE: WHERE can't use aggregate functions
SELECT country, COUNT(*) AS cnt
FROM users
WHERE COUNT(*) > 10 -- ERROR! WHERE happens before COUNT
GROUP BY country;
-- ✅ CORRECT: Use HAVING to filter after aggregation
SELECT country, COUNT(*) AS cnt
FROM users
GROUP BY country
HAVING COUNT(*) > 10; -- Only countries with 10+ usersExecution order matters
SQLWhat is sql?A language for querying and managing data in relational databases, letting you insert, read, update, and delete rows across tables. processes clauses in this order:
- FROM: Get the tables
- WHERE: Filter rows
- GROUP BY: Group the rows
- HAVING: Filter groups
- SELECT: Choose columns
- ORDER BY: Sort results
| Clause | Filters | When it runs |
|---|---|---|
| WHERE | Individual rows | Before grouping |
| HAVING | Groups | After aggregation |
-- Complex example: High-value countries
SELECT
u.country,
COUNT(DISTINCT o.id) AS order_count,
SUM(o.total) AS total_revenue
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at >= '2023-01-01' -- Filter users first
GROUP BY u.country
HAVING COUNT(DISTINCT o.id) >= 5 -- At least 5 orders
AND SUM(o.total) > 10000 -- Revenue > 10000
ORDER BY total_revenue DESC;Subqueries: Queries within queries
Sometimes you need to use the result of one query as input for another. That's what subqueries do:
-- Users with more orders than average
SELECT name, order_count
FROM (
SELECT
u.name,
COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name
) AS user_stats
WHERE order_count > (
SELECT AVG(order_count)
FROM (
SELECT COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id
) AS counts
);This query has three levels:
- Innermost: Count orders per user
- Middle: Calculate average order count
- Outer: Find users above that average
Common Table Expressions (CTEs) with WITH
Deeply nested subqueries get hard to read. CTEs let you define named subqueries up front:
-- Much cleaner with CTEs
WITH user_orders AS (
SELECT
u.id,
u.name,
COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name
)
SELECT name, order_count
FROM user_orders
WHERE order_count > (SELECT AVG(order_count) FROM user_orders);CTEs make complex queries readable by letting you build them step by step, naming each intermediate result.
Conditional logic with CASE
CASE lets you add if-then-else logic to your SQLWhat is sql?A language for querying and managing data in relational databases, letting you insert, read, update, and delete rows across tables.:
-- Categorize orders by size
SELECT
id,
total,
CASE
WHEN total < 50 THEN 'Small'
WHEN total < 200 THEN 'Medium'
ELSE 'Large'
END AS order_size
FROM orders;
-- Use CASE in aggregation
SELECT
CASE
WHEN age < 18 THEN 'Minor'
WHEN age < 65 THEN 'Adult'
ELSE 'Senior'
END AS age_group,
COUNT(*) AS user_count
FROM users
GROUP BY age_group;Handling NULL values
Real data is messy. COALESCE and NULLIF help you handle missing values gracefully:
-- COALESCE: Return first non-NULL value
SELECT
name,
COALESCE(phone, email, 'No contact info') AS contact_method
FROM users;
-- NULLIF: Return NULL if values are equal
SELECT
name,
NULLIF(discount, 0) AS discount_or_null
FROM products;
-- Avoid division by zero
SELECT
name,
total_sales / NULLIF(total_orders, 0) AS avg_order_value
FROM sales_summary;Aggregations transform raw data into actionable insights. Whether you're generating business reports, analyzing user behavior, or creating dashboards, these techniques let you ask sophisticated questions and get meaningful answers. But sometimes you need even more power-calculating across rows without collapsing them, or performing calculations relative to other rows. That's where window functions come in, which you'll explore in the next lesson.