SQL Query Optimization: The Techniques That Move the Needle

Most SQL performance problems are obvious once you know how to read a query plan. The challenge is that most application developers have never been taught to read them — they just write queries and accept whatever performance they get. This guide changes that.

I am going to walk through the process I use when a query is slow: how to read the plan, what the plan tells you, and which changes actually make a difference.

Reading EXPLAIN ANALYZE

EXPLAIN ANALYZE is the single most important SQL debugging tool. It shows:

What execution plan the query planner chose
Estimated vs. Actual row counts
Actual time spent at each node

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT
 u.id,
 u.name,
 COUNT(p.id) AS post_count
FROM users u
LEFT JOIN posts p ON p.author_id = u.id AND p.published = true
WHERE u.created_at > '2025-01-01'
GROUP BY u.id, u.name
ORDER BY post_count DESC
LIMIT 20;

The output reads bottom-up — the innermost nodes run first:

Limit (cost=1234.56..1234.57 rows=20 width=...)
 Sort (cost=1234.56..1237.89 rows=... Width=...)
 HashAggregate (cost=... Rows=... Width=...)
 Hash Left Join (...)
 Seq Scan on users (cost=0.00..245.00 rows=5000 actual rows=4892 time=0.12..8.34)
 Filter: (created_at > '2025-01-01')
 Hash (...)
 Index Scan on posts (cost=0.43..892.33 rows=... Actual rows=... Time=...)

Key things to look for:

Seq Scan on large tables: A sequential scan on a table with many rows means no useful index exists. Look at the filter condition — that column needs an index.

Estimated vs actual rows: If the plan estimates 5 rows but scans 5,000, the planner made bad decisions based on stale statistics. Run ANALYZE table_name.

Nested Loop with large row counts: A nested loop join on large tables is quadratic. Hash joins and merge joins scale better.

The BUFFERS Option

Add BUFFERS to see how many 8KB disk pages were read:

Seq Scan on posts (actual rows=50000 time=234ms)
 Buffers: shared hit=1234 read=8901

shared hit: pages found in PostgreSQL's shared buffer cache (fast) read: pages read from disk (slow)

High read numbers on repeated queries indicate your shared_buffers setting is too small or the working set does not fit in RAM.

N+1 Queries in SQL

ORM-level N+1 queries were covered in the Prisma article. The SQL equivalent is correlated subqueries:

-- BAD: Correlated subquery runs once per user row
SELECT
 u.id,
 u.name,
 (SELECT COUNT(*) FROM posts p WHERE p.author_id = u.id AND p.published = true) AS post_count
FROM users u;

-- GOOD: Single join + aggregation
SELECT
 u.id,
 u.name,
 COUNT(p.id) AS post_count
FROM users u
LEFT JOIN posts p ON p.author_id = u.id AND p.published = true
GROUP BY u.id, u.name;

Correlated subqueries in SELECT execute once per row. On a 10,000-row result set, that is 10,000 extra queries. Replace them with JOINs and aggregations.

CTEs and Their Performance Implications

Common Table Expressions (CTEs) are useful for readability. Be careful about their optimization behavior:

-- In older PostgreSQL versions, CTEs were "optimization fences"
-- The planner could not push predicates into them
WITH recent_posts AS (
 SELECT * FROM posts WHERE created_at > NOW() - INTERVAL '30 days'
)
SELECT * FROM recent_posts WHERE author_id = 42;

-- The old behavior: fetch all recent posts, then filter for author 42
-- This could be very inefficient on a large posts table

PostgreSQL 12+ changed CTEs to be inlined by default, allowing the planner to optimize across the CTE boundary. But you can still control this behavior:

-- Materialized (old behavior): CTE result is computed once and stored
WITH MATERIALIZED recent_posts AS (...)

-- Not materialized (inline): planner can optimize through the CTE
WITH NOT MATERIALIZED recent_posts AS (...)

Use materialized CTEs when:

The CTE is expensive and referenced multiple times
You want the result to be computed once regardless of how it is used

Use non-materialized (or just avoid CTEs) when:

The CTE is referenced once and the planner should be able to push predicates through

Window Functions: The Right Tool for Running Calculations

Window functions compute values across sets of rows related to the current row. They eliminate the need for self-joins in many common queries:

-- Rank posts by views within each category
SELECT
 id,
 title,
 category,
 view_count,
 RANK() OVER (PARTITION BY category ORDER BY view_count DESC) AS rank_in_category,
 SUM(view_count) OVER (PARTITION BY category) AS category_total_views,
 view_count::FLOAT / SUM(view_count) OVER (PARTITION BY category) AS share_of_category
FROM posts
WHERE published = true;

Running totals:

SELECT
 date_trunc('month', created_at) AS month,
 COUNT(*) AS new_users,
 SUM(COUNT(*)) OVER (ORDER BY date_trunc('month', created_at)) AS cumulative_users
FROM users
GROUP BY 1
ORDER BY 1;

Finding duplicate rows:

-- Find users with duplicate emails
WITH ranked AS (
 SELECT
 id,
 email,
 ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at) AS rn
 FROM users
)
SELECT email, COUNT(*) FROM ranked WHERE rn > 1 GROUP BY email;

Window functions often replace complex subqueries and self-joins. When you find yourself joining a table to itself to compute a ranking or running total, a window function is almost certainly cleaner and faster.

Partial Aggregation for Complex Filters

Filtering after aggregation (HAVING) can be expensive. When possible, filter before aggregating:

-- SLOWER: Aggregate everything, then filter
SELECT author_id, COUNT(*) as post_count
FROM posts
GROUP BY author_id
HAVING COUNT(*) > 10;

-- FASTER (when pre-filtering reduces rows): Same result
SELECT author_id, COUNT(*) as post_count
FROM posts
WHERE published = true -- Filter BEFORE aggregation when possible
GROUP BY author_id
HAVING COUNT(*) > 10;

The WHERE clause reduces the working set before the expensive GROUP BY and COUNT operations.

Optimizing Pagination

Offset pagination degrades with large offsets:

-- SLOW on large tables: must scan and discard 10000 rows
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 10000;

Cursor-based pagination avoids this:

-- FAST: Uses the index to jump directly to the cursor position
SELECT * FROM posts
WHERE created_at < '2025-01-15T12:00:00Z' -- cursor value
ORDER BY created_at DESC
LIMIT 20;

For this to work, created_at must be indexed and the cursor value must be the actual value from the previous page's last row (not an offset).

Statistics and the Query Planner

PostgreSQL's query planner relies on statistics collected by ANALYZE (or autovacuum). Stale statistics lead to bad plans.

Force statistics collection after bulk operations:

-- After inserting or updating a large number of rows
ANALYZE posts;

-- Full statistics collection on the entire database
ANALYZE;

Increase statistics precision for columns with skewed distributions:

-- Default: 100 most common values tracked
-- Increase for columns with many distinct values
ALTER TABLE posts ALTER COLUMN category SET STATISTICS 500;
ANALYZE posts;

The Optimization Workflow

My process when a query is slow:

Run EXPLAIN (ANALYZE, BUFFERS) and read the plan
Find the slowest node (highest actual time)
Identify why it is slow (seq scan, bad estimate, excessive rows)
Check indexes relevant to the filter/join conditions
Add a missing index or rewrite the query
Verify with EXPLAIN ANALYZE again

Document slow queries with pg_stat_statements:

-- Enable in postgresql.conf
shared_preload_libraries = 'pg_stat_statements'

-- Find the slowest queries
SELECT
 query,
 calls,
 mean_exec_time,
 total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;

Query optimization is investigative work. Read the plan, understand what the database is doing, and make targeted changes. Avoid guessing — measure before and after every change.

Working on a performance problem in your PostgreSQL application or want help reading query plans and designing an optimization strategy? Book a call: calendly.com/jamesrossjr.