SQL Aggregations: COUNT, SUM, GROUP BY

Create a free account to save your progress

Earn XP, track streaks, and sync your dashboard across devices.

Lesson

Imagine running a business and needing to know: "What's our total revenue?" "How many customers do we have per country?" "What's the average order value?" These questions don't care about individual rows-they need summaries across many rows. That's exactly what SQLWhat is sql?A language for querying and managing data in relational databases, letting you insert, read, update, and delete rows across tables. aggregations provide.

Aggregate functions take many rows and boil them down to a single value. Combined with GROUP BY, they become incredibly powerful, letting you slice and dice your data any way you need.

The five essential aggregate functions

These five functions handle 95% of your aggregation needs:

-- COUNT: How many rows?
SELECT COUNT(*) FROM users;                    -- Total users
SELECT COUNT(email) FROM users;                -- Non-NULL emails
SELECT COUNT(DISTINCT country) FROM users;     -- Unique countries

-- SUM: Add up numeric values
SELECT SUM(total) FROM orders;                          -- Total sales
SELECT SUM(total) FROM orders WHERE status = 'completed'; -- Completed sales

-- AVG: Calculate the average
SELECT AVG(age) FROM users;
SELECT AVG(price) FROM products WHERE category = 'Electronics';

-- MIN / MAX: Find extremes
SELECT MIN(price), MAX(price) FROM products;
SELECT MIN(created_at) AS first_order, MAX(created_at) AS last_order FROM orders;

-- Combine them for a full report
SELECT
  COUNT(*) AS total_orders,
  SUM(total) AS revenue,
  AVG(total) AS average_order,
  MIN(total) AS min_order,
  MAX(total) AS max_order
FROM orders;

Function	What it does	Returns NULL when
`COUNT(*)`	Counts all rows	Never (returns 0)
`COUNT(column)`	Counts non-NULL values	All values are NULL
`SUM(column)`	Adds all values	All values are NULL
`AVG(column)`	Calculates mean	All values are NULL
`MIN(column)`	Smallest value	All values are NULL
`MAX(column)`	Largest value	All values are NULL

NULL handling

All aggregate functions ignore NULL values except COUNT(*). COUNT(column) counts only non-NULL values. If all values in a group are NULL, aggregates return NULL (except COUNT which returns 0).

GROUP BY: Creating summary reports

Aggregate functions become truly powerful when you group your data. GROUP BY collapses rows with the same values into single summary rows:

-- Count users by country
SELECT
  country,
  COUNT(*) AS user_count
FROM users
GROUP BY country;

-- Result:
| country    | user_count |
|------------|------------|
| France     | 150        |
| Belgium    | 25         |
| Switzerland| 10         |

How GROUP BY works

Think of GROUP BY as a funnel:

The database sorts all rows by the GROUP BY column(s)
Rows with identical values get grouped together
Aggregate functions calculate within each group
Each group produces one result row

-- Revenue by month
SELECT
  DATE_TRUNC('month', created_at) AS month,
  COUNT(*) AS orders,
  SUM(total) AS revenue
FROM orders
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month;

-- Statistics by category
SELECT
  category,
  COUNT(*) AS product_count,
  AVG(price) AS avg_price,
  MIN(price) AS min_price,
  MAX(price) AS max_price
FROM products
GROUP BY category;

The GROUP BY rule

Here's the most common error beginners make: every column in your SELECT clause must either be in the GROUP BY or be an aggregate function:

-- ❌ WRONG: name isn't in GROUP BY and isn't aggregated
SELECT name, country, COUNT(*)
FROM users
GROUP BY country;

-- ✅ CORRECT: Either add to GROUP BY or use aggregate
SELECT country, COUNT(*)
FROM users
GROUP BY country;

-- ✅ ALSO CORRECT: Include name in GROUP BY
SELECT name, country, COUNT(*)
FROM users
GROUP BY name, country;

The database doesn't know which "name" to show when you've grouped by country-there might be hundreds of different names in France. Either include the column in GROUP BY or use an aggregate function like MAX(name) or STRING_AGG(name, ', ').

HAVING: Filter groups after aggregation

WHERE filters rows before grouping. HAVING filters groups after aggregation. This distinction is crucial:

-- ❌ IMPOSSIBLE: WHERE can't use aggregate functions
SELECT country, COUNT(*) AS cnt
FROM users
WHERE COUNT(*) > 10    -- ERROR! WHERE happens before COUNT
GROUP BY country;

-- ✅ CORRECT: Use HAVING to filter after aggregation
SELECT country, COUNT(*) AS cnt
FROM users
GROUP BY country
HAVING COUNT(*) > 10;   -- Only countries with 10+ users

Execution order matters

SQLWhat is sql?A language for querying and managing data in relational databases, letting you insert, read, update, and delete rows across tables. processes clauses in this order:

FROM: Get the tables
WHERE: Filter rows
GROUP BY: Group the rows
HAVING: Filter groups
SELECT: Choose columns
ORDER BY: Sort results

Clause	Filters	When it runs
WHERE	Individual rows	Before grouping
HAVING	Groups	After aggregation

-- Complex example: High-value countries
SELECT
  u.country,
  COUNT(DISTINCT o.id) AS order_count,
  SUM(o.total) AS total_revenue
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at >= '2023-01-01'  -- Filter users first
GROUP BY u.country
HAVING COUNT(DISTINCT o.id) >= 5    -- At least 5 orders
   AND SUM(o.total) > 10000          -- Revenue > 10000
ORDER BY total_revenue DESC;

Subqueries: Queries within queries

Sometimes you need to use the result of one query as input for another. That's what subqueries do:

-- Users with more orders than average
SELECT name, order_count
FROM (
  SELECT
    u.name,
    COUNT(o.id) AS order_count
  FROM users u
  LEFT JOIN orders o ON u.id = o.user_id
  GROUP BY u.id, u.name
) AS user_stats
WHERE order_count > (
  SELECT AVG(order_count)
  FROM (
    SELECT COUNT(o.id) AS order_count
    FROM users u
    LEFT JOIN orders o ON u.id = o.user_id
    GROUP BY u.id
  ) AS counts
);

This query has three levels:

Innermost: Count orders per user
Middle: Calculate average order count
Outer: Find users above that average

Common Table Expressions (CTEs) with WITH

Deeply nested subqueries get hard to read. CTEs let you define named subqueries up front:

-- Much cleaner with CTEs
WITH user_orders AS (
  SELECT
    u.id,
    u.name,
    COUNT(o.id) AS order_count
  FROM users u
  LEFT JOIN orders o ON u.id = o.user_id
  GROUP BY u.id, u.name
)
SELECT name, order_count
FROM user_orders
WHERE order_count > (SELECT AVG(order_count) FROM user_orders);

CTEs make complex queries readable by letting you build them step by step, naming each intermediate result.

Conditional logic with CASE

CASE lets you add if-then-else logic to your SQLWhat is sql?A language for querying and managing data in relational databases, letting you insert, read, update, and delete rows across tables.:

-- Categorize orders by size
SELECT
  id,
  total,
  CASE
    WHEN total < 50 THEN 'Small'
    WHEN total < 200 THEN 'Medium'
    ELSE 'Large'
  END AS order_size
FROM orders;

-- Use CASE in aggregation
SELECT
  CASE
    WHEN age < 18 THEN 'Minor'
    WHEN age < 65 THEN 'Adult'
    ELSE 'Senior'
  END AS age_group,
  COUNT(*) AS user_count
FROM users
GROUP BY age_group;

Handling NULL values

Real data is messy. COALESCE and NULLIF help you handle missing values gracefully:

-- COALESCE: Return first non-NULL value
SELECT
  name,
  COALESCE(phone, email, 'No contact info') AS contact_method
FROM users;

-- NULLIF: Return NULL if values are equal
SELECT
  name,
  NULLIF(discount, 0) AS discount_or_null
FROM products;

-- Avoid division by zero
SELECT
  name,
  total_sales / NULLIF(total_orders, 0) AS avg_order_value
FROM sales_summary;

Aggregations transform raw data into actionable insights. Whether you're generating business reports, analyzing user behavior, or creating dashboards, these techniques let you ask sophisticated questions and get meaningful answers. But sometimes you need even more power-calculating across rows without collapsing them, or performing calculations relative to other rows. That's where window functions come in, which you'll explore in the next lesson.

Done

Complete & Next