Databases & SQL/
Lesson

You've mastered the fundamentals of SQLWhat is sql?A language for querying and managing data in relational databases, letting you insert, read, update, and delete rows across tables.-selecting, filtering, joining, and aggregating data. But real-world queries often get complex. You might need to find the top 3 products in each category, calculate running totals, or compare each row to its neighbors. These advanced techniques separate SQL beginners from SQL masters.

This lesson covers two powerful features: Common Table Expressions (CTEs) that help organize complex queries, and Window Functions that enable sophisticated calculations across rows. Together, they unlock analytical capabilities you never thought possible in SQL.

Common Table Expressions (CTEs)

CTEs let you define temporary named result sets that exist only for the duration of your query. Think of them as "query variables" that break complex logic into manageable pieces.

Basic CTEWhat is cte?Common Table Expression - a named temporary result set defined with the WITH clause in SQL, making complex queries more readable. syntax

-- Simple CTE
WITH recent_users AS (
  SELECT id, name, email
  FROM users
  WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT * FROM recent_users
ORDER BY created_at DESC;

The CTE acts like a temporary view. You define it once, give it a name, then use it in your main query. This makes complex queries readable by breaking them into logical steps.

Multiple CTEs

You can define multiple CTEs, and later ones can reference earlier ones:

WITH
  monthly_sales AS (
    SELECT
      DATE_TRUNC('month', created_at) AS month,
      SUM(total) AS revenue
    FROM orders
    GROUP BY 1
  ),
  growth_calc AS (
    SELECT
      month,
      revenue,
      LAG(revenue) OVER (ORDER BY month) AS prev_month
    FROM monthly_sales
  )
SELECT
  month,
  revenue,
  prev_month,
  ROUND((revenue - prev_month) / prev_month * 100, 2) AS growth_pct
FROM growth_calc
WHERE prev_month IS NOT NULL;

Notice how growth_calc references monthly_sales. This step-by-step approach turns a complex query into a readable pipelineWhat is pipeline?A sequence of automated steps (install, lint, test, build, deploy) that code passes through before reaching production..

When to use CTEs

SituationSolution
Query is getting too complexBreak into logical CTEs
Same subquery used multiple timesDefine once in CTE, reference multiple times
Need recursive queries (trees, hierarchies)Use RECURSIVE CTEs
Improving readabilityName intermediate results
Performance note
CTEs are usually optimized by the query planner. In PostgreSQL and modern databases, CTEs are often "inlined"-the database optimizes them as if you'd written one big query. However, in some databases (older PostgreSQL, SQL Server), CTEs can be materialized (executed once and stored), which might help or hurt performance depending on the query.

Recursive CTEs for hierarchical data

Recursive CTEs can traverse tree structures like org charts or category hierarchies:

-- Employee hierarchy (who reports to whom)
WITH RECURSIVE employee_tree AS (
  -- Anchor: Start with the CEO (no manager)
  SELECT id, name, manager_id, 0 AS level
  FROM employees
  WHERE manager_id IS NULL

  UNION ALL

  -- Recursive: Find employees whose manager is already in our tree
  SELECT e.id, e.name, e.manager_id, et.level + 1
  FROM employees e
  INNER JOIN employee_tree et ON e.manager_id = et.id
)
SELECT
  REPEAT('  ', level) || name AS org_chart,
  level
FROM employee_tree
ORDER BY level, name;

The query starts with the anchor (CEO), then repeatedly finds employees whose manager is already in the result set, building the hierarchy level by level.

02

Window Functions: Analytics in SQLWhat is sql?A language for querying and managing data in relational databases, letting you insert, read, update, and delete rows across tables.

Window functions are game-changers. Unlike aggregate functions that collapse rows into summaries, window functions calculate across rows while keeping all rows intact. They're like having a "window" that slides over your data, letting you see context around each row.

The basic syntax is:

function_name(expression) OVER (
  [PARTITION BY partition_column]
  [ORDER BY order_column]
  [frame_clause]
)

Ranking functions: ROW_NUMBER, RANK, DENSE_RANK

These assign rankings to rows based on specified ordering:

-- Different ranking behaviors
SELECT
  name,
  sales,
  ROW_NUMBER() OVER (ORDER BY sales DESC) AS row_num,    -- 1, 2, 3, 4 (unique)
  RANK() OVER (ORDER BY sales DESC) AS rank,              -- 1, 1, 3, 4 (ties share rank, skips numbers)
  DENSE_RANK() OVER (ORDER BY sales DESC) AS dense_rank  -- 1, 1, 2, 3 (ties share rank, no gaps)
FROM products;

If two products tie for first:

  • ROW_NUMBER: 1, 2 (arbitrary tie-break)
  • RANK: 1, 1, 3 (both get 1, next gets 3)
  • DENSE_RANK: 1, 1, 2 (both get 1, next gets 2)

PARTITION BY: Ranking within groups

PARTITION BY divides your data into groups, letting you rank within each group:

-- Top 3 products per category
SELECT
  category,
  name,
  sales,
  ROW_NUMBER() OVER (
    PARTITION BY category
    ORDER BY sales DESC
  ) AS rank_in_category
FROM products
QUALIFY rank_in_category <= 3;  -- Only keep top 3 per category
QUALIFY clause
Some databases (BigQuery, Snowflake) support QUALIFY to filter window function results directly. In PostgreSQL/MySQL, use a subquery or CTE: SELECT * FROM (...) ranked WHERE rank_in_category <= 3.

LAG and LEAD: Access neighboring rows

These let you peek at previous or next rows without self-joins:

-- Daily revenue with day-over-day change
SELECT
  date,
  revenue,
  LAG(revenue) OVER (ORDER BY date) AS prev_day_revenue,
  revenue - LAG(revenue) OVER (ORDER BY date) AS day_change,
  ROUND(
    (revenue - LAG(revenue) OVER (ORDER BY date)) /
    LAG(revenue) OVER (ORDER BY date) * 100,
    2
  ) AS change_pct
FROM daily_revenue;

LAG defaults to looking 1 row back, but you can specify:

LAG(revenue, 7) OVER (ORDER BY date)  -- Look back 7 rows
LAG(revenue, 1, 0) OVER (ORDER BY date)  -- Default to 0 if no previous row

Running totals and moving averages

Window functions with frame specifications calculate cumulative values:

-- Running total and 7-day moving average
SELECT
  date,
  daily_sales,
  SUM(daily_sales) OVER (
    ORDER BY date
    ROWS UNBOUNDED PRECEDING
  ) AS running_total,
  AVG(daily_sales) OVER (
    ORDER BY date
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS moving_avg_7d
FROM daily_sales;

Frame clauses control which rows the window includes:

FrameMeaning
ROWS UNBOUNDED PRECEDINGAll rows from start to current
ROWS BETWEEN 2 PRECEDING AND CURRENT ROWCurrent row + 2 before
ROWS BETWEEN CURRENT ROW AND 2 FOLLOWINGCurrent row + 2 after
RANGE BETWEEN INTERVAL '7 days' PRECEDING AND CURRENT ROWValue-based (date ranges)

FIRST_VALUE, LAST_VALUE, NTH_VALUE

Access specific positions within the window:

SELECT
  category,
  name,
  price,
  FIRST_VALUE(name) OVER (
    PARTITION BY category
    ORDER BY price DESC
  ) AS most_expensive_in_category,
  NTH_VALUE(name, 2) OVER (
    PARTITION BY category
    ORDER BY price DESC
  ) AS second_most_expensive
FROM products;
03

Practical examples combining CTEs and window functions

Year-over-year monthly comparison:

WITH monthly_data AS (
  SELECT
    EXTRACT(MONTH FROM created_at) AS month,
    EXTRACT(YEAR FROM created_at) AS year,
    SUM(total) AS revenue
  FROM orders
  GROUP BY 1, 2
)
SELECT
  month,
  MAX(CASE WHEN year = 2023 THEN revenue END) AS revenue_2023,
  MAX(CASE WHEN year = 2022 THEN revenue END) AS revenue_2022,
  MAX(CASE WHEN year = 2023 THEN revenue END) -
    MAX(CASE WHEN year = 2022 THEN revenue END) AS yoy_change
FROM monthly_data
GROUP BY month
ORDER BY month;

Finding consecutive streaks (like login streaks):

WITH login_with_groups AS (
  SELECT
    user_id,
    login_date,
    login_date - ROW_NUMBER() OVER (
      PARTITION BY user_id ORDER BY login_date
    ) AS grp
  FROM logins
)
SELECT user_id, COUNT(*) AS streak_length
FROM login_with_groups
GROUP BY user_id, grp
HAVING COUNT(*) >= 7;  -- Users with 7+ consecutive day streaks

You've now unlocked SQLWhat is sql?A language for querying and managing data in relational databases, letting you insert, read, update, and delete rows across tables.'s most powerful analytical capabilities. CTEs help you organize complex logic into readable steps, while window functions let you perform calculations that previously required application code or multiple queries. Together, they enable sophisticated data analysis directly in your database-faster, more efficient, and more maintainable than processing data in application code.

SQL is more than just a way to store and retrieve data. It's a complete analytical toolkit. With the skills you've learned across these four lessons-from basic queries to advanced window functions-you can answer virtually any question your data holds. The only limit is your imagination.