What is a Database Index?

Think of a database index like the index at the back of a textbook. Without it, finding every mention of "B-tree" means reading every single page from start to finish. With an index, you jump directly to the right pages. A database index works exactly the same way.

An index is a separate data structure maintained by the database engine. It holds a sorted copy of one or more column values, along with pointers to the actual rows in the table. When you run a query with a WHERE clause, the database can consult the index to find matching rows directly — instead of scanning every row in the table.

Without any index, a query like SELECT * FROM orders WHERE customer_id = 12345 on a table with 50 million rows must examine all 50 million rows. With an index on customer_id, the database performs a logarithmic search (O(log n)) and finds the matching rows in microseconds.

How B-tree Indexes Work

B-tree (Balanced Tree) is the default index type in PostgreSQL, MySQL InnoDB, SQL Server, and Oracle. Understanding its structure explains why indexes are so fast.

A B-tree stores index values in a hierarchically sorted structure. The tree has:

  • Root node: The top of the tree — entry point for every search
  • Internal nodes: Intermediate nodes that guide the search left or right
  • Leaf nodes: The bottom level, containing sorted key values and row pointers
B-tree structure (conceptual) Root: [50 | 100] / \ \ [10|25|40] [60|75|90] [110|150|200] (leaf) (leaf) (leaf) Each leaf node stores: value → row pointer (page + offset)

To find customer_id = 75, the database starts at the root, compares 75 to 50 and 100, determines it falls in the middle branch, and traverses to the correct leaf node. For a table with 1 billion rows, the tree is only about 30 levels deep — meaning at most 30 page reads to find any value.

Index Types

Index TypeSupported QueriesBest For
B-treeEquality, range, ORDER BY, LIKE prefixMost use cases — the default
HashEquality only (=, IN)Exact lookups, NOT range queries
PartialRows matching a conditionIndex only active/non-null rows
Full-textText search (MATCH AGAINST, tsvector)Search across text columns
GINArray, JSONB, full-text (PostgreSQL)JSON documents, arrays
GiSTGeometric/range types (PostgreSQL)Geographic data, IP ranges
CoveringAll query columns in the indexAvoid heap lookups entirely

When to Add an Index

Not every column needs an index. Here are the situations where adding an index delivers the most value:

  • Foreign keys: Always index FK columns — JOIN queries and ON DELETE CASCADE operations depend on them
  • Columns in WHERE clauses: Columns you filter on frequently are prime index candidates
  • ORDER BY and GROUP BY columns: Indexes can satisfy sort operations without a separate sort step
  • High-cardinality columns: An index on a column with 10 million unique values is far more useful than one with 3 possible values (like a boolean "is_active")

Use EXPLAIN to Find Missing Indexes

Before adding any index, run EXPLAIN (PostgreSQL) or EXPLAIN ANALYZE to see how the database plans to execute your query. Look for "Seq Scan" (sequential/full scan) on large tables — that's where an index would help most.

Reading EXPLAIN Output

PostgreSQL EXPLAIN ANALYZE example -- Without index: EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 12345; -- Output: Seq Scan on orders (cost=0.00..89432.00 rows=12 width=156) -- actual time=45.231..3821.445 rows=12 loops=1 -- After: CREATE INDEX idx_orders_customer ON orders(customer_id); EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 12345; -- Output: Index Scan using idx_orders_customer on orders -- (cost=0.56..54.78 rows=12 width=156) -- actual time=0.048..0.112 rows=12 loops=1

Composite Indexes — Column Order Matters

A composite index covers multiple columns. The order of columns in the index definition is critical. Consider an index on (last_name, first_name, date_of_birth):

  • Queries filtering on last_name alone: can use the index
  • Queries filtering on last_name AND first_name: can use the index
  • Queries filtering on first_name alone: cannot use the index (violates the left-prefix rule)
  • Queries filtering on date_of_birth alone: cannot use the index
Composite index rules -- Index: (last_name, first_name, city) -- These queries CAN use the index: WHERE last_name = 'Smith' WHERE last_name = 'Smith' AND first_name = 'John' WHERE last_name = 'Smith' AND first_name = 'John' AND city = 'NYC' -- These queries CANNOT use the index: WHERE first_name = 'John' -- skips first column WHERE city = 'NYC' -- skips first two columns WHERE first_name = 'John' AND city = 'NYC' -- skips first column

Design Rule: Equality Before Range

In a composite index, put columns used in equality conditions (=) before columns used in range conditions (>, <, BETWEEN). For example, if you filter on status = 'active' AND created_at > '2026-01-01', the index should be (status, created_at) — not (created_at, status).

Clustered vs Non-Clustered Indexes

These two index types differ in how they relate to the physical storage of table data:

PropertyClustered IndexNon-Clustered Index
Physical data orderTable rows stored in index orderSeparate structure; rows anywhere on disk
Count per tableOne onlyMany (up to hundreds)
Range query speedFastest (rows are adjacent on disk)Slower (random I/O to fetch rows)
MySQL InnoDBAlways the PRIMARY KEYAll other indexes
PostgreSQLNo true clustered; use CLUSTER commandDefault heap table + separate indexes
Row lookupNo extra step — data IS the index leafIndex lookup + row fetch (two steps)

Index Gotchas and Common Mistakes

Functions on indexed columns kill index usage

Index-defeating queries -- BAD: function on indexed column — index not used WHERE YEAR(created_at) = 2026 WHERE LOWER(email) = 'user@example.com' WHERE TRIM(phone) = '555-1234' -- GOOD: rewrite to use the index WHERE created_at >= '2026-01-01' AND created_at < '2027-01-01' -- For LOWER: create a functional index instead CREATE INDEX idx_email_lower ON users(LOWER(email));

NULL values and indexes

In PostgreSQL, NULL values ARE stored in B-tree indexes. In MySQL, NULLs are also indexed. However, queries like WHERE column IS NULL may or may not use the index depending on the database and NULL distribution. A partial index can be useful: CREATE INDEX idx_pending ON orders(id) WHERE status IS NULL.

Too Many Indexes Will Hurt Write Performance

Every index must be kept up to date on INSERT, UPDATE, and DELETE. A heavily indexed table (15+ indexes) on a high-write workload can spend more time maintaining indexes than doing actual data work. Regularly audit unused indexes and drop them. In PostgreSQL, use pg_stat_user_indexes to see which indexes have never been scanned.

Covering Indexes — Avoid Heap Lookups

A covering index includes all columns needed by a query — so the database never needs to fetch the actual row. It gets everything it needs from the index itself. This is the fastest possible read path.

Covering index example -- Query: frequently run report SELECT customer_id, total_amount, status FROM orders WHERE status = 'pending' AND created_at > '2026-01-01'; -- Covering index includes all columns the query touches: CREATE INDEX idx_orders_covering ON orders(status, created_at, customer_id, total_amount); -- Now the database reads ONLY the index — never touches the main table (heap)

Index Maintenance

Over time, B-tree indexes can become fragmented as rows are inserted and deleted. Periodic maintenance improves performance:

  • PostgreSQL: VACUUM ANALYZE reclaims dead tuples and updates statistics. REINDEX rebuilds a fragmented index.
  • MySQL: OPTIMIZE TABLE rebuilds the table and its indexes. ANALYZE TABLE updates index statistics.
  • Auto-vacuum: PostgreSQL runs auto-vacuum by default. Ensure it is not blocked on large tables.

How We Research and Update This Guide

We test the underlying formula or workflow, compare outputs with reliable references, and revise examples whenever the page content changes.

  • The workflow or formula is tested directly in the tool and compared against independent reference examples.
  • Examples are kept practical so readers can verify the result without hidden assumptions.
  • Pages are revised whenever the interface, calculation flow, or surrounding guidance materially changes.

Frequently Asked Questions — Database Indexing