PostgreSQL Advanced: Indexing & Performance

Create a free account to save your progress

Earn XP, track streaks, and sync your dashboard across devices.

Lesson

You've mastered the basics. Now let's explore PostgreSQL's advanced capabilities that set it apart from simpler databases. These features solve real problems you'll encounter in production applications.

Indexing for performance

Imagine trying to find a name in a phone book with 10 million entries. Without an indexWhat is index?A data structure the database maintains alongside a table so it can find rows by specific columns quickly instead of scanning everything., you flip through every page. With an index, you jump straight to the right section. Database indexes work the same way.

Types of indexes

PostgreSQL offers several index types for different use cases:

-- B-tree (default): Great for equality and range queries
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_created ON orders(created_at);

-- Composite index: Multiple columns together
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);

-- Unique index: Enforces uniqueness
CREATE UNIQUE INDEX idx_users_lower_email ON users(LOWER(email));

-- Partial index: Only index specific rows
CREATE INDEX idx_active_users ON users(email) WHERE active = true;

-- GIN for JSONB and array searches
CREATE INDEX idx_events_data ON events USING GIN(data);

-- GiST for geospatial data (with PostGIS)
CREATE INDEX idx_locations_coords ON locations USING GIST(coords);

Trade-off warning

Indexes speed up reads but slow down writes. Every INSERT, UPDATE, or DELETE must update the indexes too. Don't index everything, focus on columns you query frequently.

When to use which index

Index type	Best for	Example
B-tree	Equality, range queries	`WHERE id = 5`, `WHERE age > 18`
Hash	Exact matches only	Rarely needed (B-tree is usually better)
GIN	JSONB, arrays, full-text	`WHERE data @> '{"status": "active"}'`
GiST	Geospatial data	Location-based queries
BRIN	Very large, naturally ordered tables	Time-series data

Full-text search

Most applications need search functionality. Instead of adding Elasticsearch immediately, try PostgreSQL's built-in full-text search:

-- Enable trigram extension for fuzzy matching
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Add a search vector column
ALTER TABLE articles ADD COLUMN search_vector tsvector;

-- Update existing rows
UPDATE articles SET
  search_vector = setweight(to_tsvector('english', title), 'A') ||
                  setweight(to_tsvector('english', content), 'B');

-- Create GIN index for fast searching
CREATE INDEX idx_articles_search ON articles USING GIN(search_vector);

-- Search with ranking
SELECT
  title,
  ts_rank(search_vector, query) AS rank
FROM articles, 
     to_tsquery('english', 'postgresql & tutorial') query
WHERE search_vector @@ query
ORDER BY rank DESC;

This finds articles containing both "postgresql" and "tutorial", ranked by relevance. The setweight function makes title matches score higher than content matches.

Views and materialized views

Views are saved queries that look like tables. Materialized views cache the results for better performance.

Regular views

-- Create a view for active users
CREATE VIEW active_users AS
SELECT id, name, email, created_at
FROM users
WHERE active = true;

-- Query it like a table
SELECT * FROM active_users WHERE created_at > '2024-01-01';

Views are great for:

Simplifying complex queries
Restricting access to specific columns
Presenting data in different formats

Materialized views

When queries are expensive (aggregations, joins across many tables), materialized views pre-compute and cache results:

-- Create a materialized view for daily sales
CREATE MATERIALIZED VIEW daily_sales AS
SELECT 
  DATE(created_at) as date,
  COUNT(*) as order_count,
  SUM(total_amount) as revenue
FROM orders
GROUP BY DATE(created_at);

-- Create index on materialized view
CREATE INDEX idx_daily_sales_date ON daily_sales(date);

-- Query is now instant (no aggregation needed!)
SELECT * FROM daily_sales ORDER BY date DESC LIMIT 30;

The catch: materialized views don't update automatically. You must refresh them:

-- Refresh the view (blocking)
REFRESH MATERIALIZED VIEW daily_sales;

-- Refresh without blocking reads
REFRESH MATERIALIZED VIEW CONCURRENTLY daily_sales;

Schedule refreshes with a cron job or trigger when underlying data changes.

Table partitioning

When tables grow to hundreds of millions of rows, queries slow down. Partitioning splits large tables into smaller, more manageable pieces:

-- Create partitioned table for time-series data
CREATE TABLE measurements (
  id SERIAL,
  sensor_id INTEGER,
  value NUMERIC,
  recorded_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (recorded_at);

-- Create partitions for each month
CREATE TABLE measurements_2024_01 PARTITION OF measurements
  FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE measurements_2024_02 PARTITION OF measurements
  FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

CREATE TABLE measurements_2024_03 PARTITION OF measurements
  FOR VALUES FROM ('2024-03-01') TO ('2024-04-01');

Benefits of partitioning:

Faster queries (only scan relevant partitions)
Easier data archiving (detach old partitions)
Faster maintenance (vacuum, reindex per partition)

Backup and recovery

Your data is valuable. Protect it with regular backups.

pg_dump for logical backups

# Backup entire database
pg_dump -h localhost -U postgres myapp > backup.sql

# Backup specific tables
pg_dump -t users -t orders myapp > tables_backup.sql

# Compressed backup
pg_dump myapp | gzip > backup.sql.gz

# Restore from backup
psql myapp < backup.sql

Automated backups with cron

# Add to crontab (daily at 2 AM)
0 2 * * * pg_dump myapp | gzip > /backups/myapp_$(date +\%Y\%m\%d).sql.gz

Good to know

pg_dump creates logical backups (SQL commands). For very large databases, use pg_basebackup for physical backups, which are faster to restore.

Monitoring query performance

PostgreSQL tracks slow queries automatically. Enable it:

-- Log queries taking more than 1 second
ALTER SYSTEM SET log_min_duration_statement = 1000;
SELECT pg_reload_conf();

Then use pg_stat_statements to find your slowest queries:

-- Enable extension
CREATE EXTENSION pg_stat_statements;

-- Find slowest queries by total time
SELECT 
  query,
  calls,
  mean_exec_time,
  total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

Quick reference: advanced features

Feature	Use case	Key command
B-tree index	Speed up WHERE clauses	`CREATE INDEX idx ON table(column)`
Full-text search	Search text content	`to_tsvector()`, `@@` operator
Materialized views	Cache expensive aggregations	`CREATE MATERIALIZED VIEW`
Partitioning	Handle huge tables	`PARTITION BY RANGE`
pg_dump	Logical backups	`pg_dump db > backup.sql`
pg_stat_statements	Find slow queries	Enable extension, query view

These advanced features transform PostgreSQL from a simple data store into a powerful analytical engine. Master them, and you can handle data challenges that break lesser databases.