Database Indexing Strategies That Actually Make Queries Fast

Database indexing is the highest-leverage performance optimization available to most application developers. A missing index on a frequently-queried column can be the difference between a 50ms query and a 5000ms query on a table with 1 million rows. Adding the right index takes minutes. Finding the missing index takes knowing where to look.

This guide is about knowing where to look.

What an Index Actually Is

An index is a separate data structure maintained by the database that allows finding rows matching specific conditions without scanning every row in the table. PostgreSQL's default index type is a B-tree (balanced tree), which keeps keys sorted and supports equality and range lookups efficiently.

When you run SELECT * FROM users WHERE email = 'james@example.com' without an index on email, PostgreSQL scans every row in the users table to find matches. This is called a sequential scan. With an index on email, PostgreSQL traverses the B-tree in O(log n) operations and retrieves matching rows directly. On a million-row table, that is the difference between reading 1,000,000 rows and reading about 20.

Reading Query Plans

Before adding indexes, understand what your queries are actually doing. EXPLAIN ANALYZE shows the query plan with actual execution costs:

EXPLAIN ANALYZE
SELECT u.name, COUNT(p.id) as post_count
FROM users u
LEFT JOIN posts p ON p.author_id = u.id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id, u.name
ORDER BY post_count DESC
LIMIT 20;

Look for:

Seq Scan: The database is reading every row in the table. A red flag on large tables.
Index Scan: Using an index. Good.
Index Only Scan: Reading only the index, not the table. Best — means the index covers all needed columns.
Rows (estimated vs actual): Large discrepancies indicate stale statistics. Run ANALYZE table_name to refresh them.
Actual Time: The actual milliseconds spent on each step.

Single-Column Indexes

Start here. Create an index on any column you filter by frequently:

CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_posts_author_id ON posts(author_id);
CREATE INDEX idx_posts_published_at ON posts(published_at DESC);

The DESC on published_at matters when your most common query orders by newest first. An index in the right sort order can avoid a sort operation.

Foreign keys always get indexes. posts.author_id should be indexed from day one — every JOIN users ON users.id = posts.author_id is a performance problem waiting to happen without it.

Composite Indexes: Column Order Matters

A composite index on (a, b) can answer queries for:

WHERE a = ?
WHERE a = ? AND b = ?
WHERE a = ? ORDER BY b

But NOT for:

WHERE b = ? (leading column must be present)

-- This query benefits from a composite index on (user_id, status)
SELECT * FROM orders
WHERE user_id = $1
AND status = 'pending'
ORDER BY created_at DESC;

CREATE INDEX idx_orders_user_status ON orders(user_id, status);

The rule of thumb: put the most selective column first (the one that filters to the fewest rows), followed by columns used in equality conditions, followed by columns used in range conditions or ordering.

For the example above:

If users typically have 100 orders, and only 5% are pending, status in position 2 filters from 100 to 5 rows.
Putting status first would only help if you query WHERE status = 'pending' without a user_id filter.

Partial Indexes

A partial index covers only the rows that match a condition. This is useful when you frequently query a subset of rows that is much smaller than the full table:

-- Index only unread notifications (most notifications get marked read quickly)
CREATE INDEX idx_notifications_unread
ON notifications(user_id, created_at)
WHERE read = false;

-- Index only active users
CREATE INDEX idx_users_active_email
ON users(email)
WHERE deleted_at IS NULL;

A partial index on active records is smaller and faster than a full-table index, and it exactly matches the query pattern.

Covering Indexes

An Index Only Scan is the fastest possible plan — the database reads only the index and never touches the table. This happens when the index contains all the columns the query needs:

-- Query that reads user list page
SELECT id, name, email, created_at
FROM users
WHERE status = 'active'
ORDER BY created_at DESC;

-- Covering index: includes all columns in the SELECT
CREATE INDEX idx_users_active_covering
ON users(status, created_at DESC)
INCLUDE (id, name, email);

The INCLUDE clause adds non-key columns to the index. They cannot be used for filtering or ordering, but they are available for Index Only Scans. This is powerful for read-heavy queries on frequently accessed rows.

Indexes for Pattern Matching

Standard B-tree indexes do not support prefix-insensitive pattern matching. LIKE '%term%' is always a sequential scan. For text search, you have options:

pg_trgm for fuzzy matching:

CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_products_name_trgm
ON products USING gin(name gin_trgm_ops);

-- Now this query can use the index
SELECT * FROM products WHERE name ILIKE '%widget%';

Full-text search indexes:

CREATE INDEX idx_posts_search
ON posts USING gin(to_tsvector('english', title || ' ' || content));

SELECT * FROM posts
WHERE to_tsvector('english', title || ' ' || content)
@@ to_tsquery('english', 'postgresql & indexing');

For exact prefix matching (LIKE 'term%'), a standard B-tree index works:

CREATE INDEX idx_users_name ON users(name);
-- LIKE 'James%' uses the index; LIKE '%James%' does not

JSON and JSONB Indexes

For JSONB columns, GIN indexes enable querying nested fields:

-- Index all keys in a JSONB column
CREATE INDEX idx_metadata ON items USING gin(metadata);

-- Or index a specific path for better performance
CREATE INDEX idx_metadata_category
ON items((metadata->>'category'));

-- Query that uses the expression index
SELECT * FROM items WHERE metadata->>'category' = 'electronics';

When NOT to Add an Index

Indexes are not free. Every index:

Takes disk space
Slows down INSERT, UPDATE, and DELETE operations (the index must be updated)
Must be maintained by VACUUM and autovacuum

Do not index:

Columns with very low cardinality (boolean columns, status columns with 2-3 values)
Columns that are never queried in WHERE, JOIN, or ORDER BY
Small tables (under ~1,000 rows) where sequential scans are faster

Detecting Missing Indexes in Production

PostgreSQL tracks sequential scans on each table. Query pg_stat_user_tables to find tables with many sequential scans:

SELECT
 schemaname,
 tablename,
 seq_scan,
 seq_tup_read,
 idx_scan,
 n_live_tup
FROM pg_stat_user_tables
WHERE seq_scan > 100
AND n_live_tup > 10000
ORDER BY seq_scan DESC;

Tables with many sequential scans and many rows are your index candidates. Cross-reference with pg_stat_statements (if enabled) to find the specific queries driving those scans.

Enable slow query logging to catch queries that take over 100ms:

-- In postgresql.conf
log_min_duration_statement = 100 -- log queries over 100ms

Indexing is not a one-time activity. As your application grows and query patterns change, revisit your index strategy. The indexes that served you at 10,000 rows need review at 10,000,000 rows.

Dealing with slow database queries or want help designing an indexing strategy for a growing application? This is exactly the kind of problem I help with. Book a call: calendly.com/jamesrossjr.