What is a Database Index?
Think of a database index like the index at the back of a textbook. Without it, finding every mention of "B-tree" means reading every single page from start to finish. With an index, you jump directly to the right pages. A database index works exactly the same way.
An index is a separate data structure maintained by the database engine. It holds a sorted copy of one or more column values, along with pointers to the actual rows in the table. When you run a query with a WHERE clause, the database can consult the index to find matching rows directly — instead of scanning every row in the table.
Without any index, a query like SELECT * FROM orders WHERE customer_id = 12345 on a table with 50 million rows must examine all 50 million rows. With an index on customer_id, the database performs a logarithmic search (O(log n)) and finds the matching rows in microseconds.
How B-tree Indexes Work
B-tree (Balanced Tree) is the default index type in PostgreSQL, MySQL InnoDB, SQL Server, and Oracle. Understanding its structure explains why indexes are so fast.
A B-tree stores index values in a hierarchically sorted structure. The tree has:
- Root node: The top of the tree — entry point for every search
- Internal nodes: Intermediate nodes that guide the search left or right
- Leaf nodes: The bottom level, containing sorted key values and row pointers
To find customer_id = 75, the database starts at the root, compares 75 to 50 and 100, determines it falls in the middle branch, and traverses to the correct leaf node. For a table with 1 billion rows, the tree is only about 30 levels deep — meaning at most 30 page reads to find any value.
Index Types
| Index Type | Supported Queries | Best For |
|---|---|---|
| B-tree | Equality, range, ORDER BY, LIKE prefix | Most use cases — the default |
| Hash | Equality only (=, IN) | Exact lookups, NOT range queries |
| Partial | Rows matching a condition | Index only active/non-null rows |
| Full-text | Text search (MATCH AGAINST, tsvector) | Search across text columns |
| GIN | Array, JSONB, full-text (PostgreSQL) | JSON documents, arrays |
| GiST | Geometric/range types (PostgreSQL) | Geographic data, IP ranges |
| Covering | All query columns in the index | Avoid heap lookups entirely |
When to Add an Index
Not every column needs an index. Here are the situations where adding an index delivers the most value:
- Foreign keys: Always index FK columns — JOIN queries and ON DELETE CASCADE operations depend on them
- Columns in WHERE clauses: Columns you filter on frequently are prime index candidates
- ORDER BY and GROUP BY columns: Indexes can satisfy sort operations without a separate sort step
- High-cardinality columns: An index on a column with 10 million unique values is far more useful than one with 3 possible values (like a boolean "is_active")
Use EXPLAIN to Find Missing Indexes
Before adding any index, run EXPLAIN (PostgreSQL) or EXPLAIN ANALYZE to see how the database plans to execute your query. Look for "Seq Scan" (sequential/full scan) on large tables — that's where an index would help most.
Reading EXPLAIN Output
Composite Indexes — Column Order Matters
A composite index covers multiple columns. The order of columns in the index definition is critical. Consider an index on (last_name, first_name, date_of_birth):
- Queries filtering on
last_namealone: can use the index - Queries filtering on
last_nameANDfirst_name: can use the index - Queries filtering on
first_namealone: cannot use the index (violates the left-prefix rule) - Queries filtering on
date_of_birthalone: cannot use the index
Design Rule: Equality Before Range
In a composite index, put columns used in equality conditions (=) before columns used in range conditions (>, <, BETWEEN). For example, if you filter on status = 'active' AND created_at > '2026-01-01', the index should be (status, created_at) — not (created_at, status).
Clustered vs Non-Clustered Indexes
These two index types differ in how they relate to the physical storage of table data:
| Property | Clustered Index | Non-Clustered Index |
|---|---|---|
| Physical data order | Table rows stored in index order | Separate structure; rows anywhere on disk |
| Count per table | One only | Many (up to hundreds) |
| Range query speed | Fastest (rows are adjacent on disk) | Slower (random I/O to fetch rows) |
| MySQL InnoDB | Always the PRIMARY KEY | All other indexes |
| PostgreSQL | No true clustered; use CLUSTER command | Default heap table + separate indexes |
| Row lookup | No extra step — data IS the index leaf | Index lookup + row fetch (two steps) |
Index Gotchas and Common Mistakes
Functions on indexed columns kill index usage
NULL values and indexes
In PostgreSQL, NULL values ARE stored in B-tree indexes. In MySQL, NULLs are also indexed. However, queries like WHERE column IS NULL may or may not use the index depending on the database and NULL distribution. A partial index can be useful: CREATE INDEX idx_pending ON orders(id) WHERE status IS NULL.
Too Many Indexes Will Hurt Write Performance
Every index must be kept up to date on INSERT, UPDATE, and DELETE. A heavily indexed table (15+ indexes) on a high-write workload can spend more time maintaining indexes than doing actual data work. Regularly audit unused indexes and drop them. In PostgreSQL, use pg_stat_user_indexes to see which indexes have never been scanned.
Covering Indexes — Avoid Heap Lookups
A covering index includes all columns needed by a query — so the database never needs to fetch the actual row. It gets everything it needs from the index itself. This is the fastest possible read path.
Index Maintenance
Over time, B-tree indexes can become fragmented as rows are inserted and deleted. Periodic maintenance improves performance:
- PostgreSQL:
VACUUM ANALYZEreclaims dead tuples and updates statistics.REINDEXrebuilds a fragmented index. - MySQL:
OPTIMIZE TABLErebuilds the table and its indexes.ANALYZE TABLEupdates index statistics. - Auto-vacuum: PostgreSQL runs auto-vacuum by default. Ensure it is not blocked on large tables.
How We Research and Update This Guide
We test the underlying formula or workflow, compare outputs with reliable references, and revise examples whenever the page content changes.
- The workflow or formula is tested directly in the tool and compared against independent reference examples.
- Examples are kept practical so readers can verify the result without hidden assumptions.
- Pages are revised whenever the interface, calculation flow, or surrounding guidance materially changes.
Frequently Asked Questions — Database Indexing
An index is a separate data structure (usually a B-tree) that holds sorted copies of one or more columns alongside pointers to the actual rows. Without an index, the database scans every row (full table scan). With an index, it traverses the B-tree in O(log n) steps to find matching rows. The tradeoff: indexes speed up reads but slow down writes (every INSERT/UPDATE/DELETE must also update all relevant indexes).
B-tree (Balanced Tree) is the default index type in PostgreSQL, MySQL, and most relational databases. It stores data in sorted order in a balanced tree structure. B-tree indexes support equality queries (WHERE id = 5), range queries (WHERE date > '2026-01-01'), ORDER BY, and prefix LIKE queries (WHERE name LIKE 'John%'). B-tree operations are O(log n) — very efficient even on tables with billions of rows.
Add an index on columns used in WHERE clauses frequently, JOIN conditions, ORDER BY/GROUP BY, and foreign keys. Do NOT index every column — each index adds overhead to writes and consumes disk space. A good rule: if a query takes more than 100ms and runs frequently, check if an index on the filtered/sorted columns would help. Use EXPLAIN to see if queries are doing full table scans.
A composite (multi-column) index covers multiple columns in a defined order. An index on (last_name, first_name) can serve queries on last_name alone, or last_name + first_name — but NOT first_name alone (left prefix rule). The order matters. Put the most selective column first, and put the column used in equality conditions before range conditions.
A clustered index determines the physical order of data on disk — the table rows are stored in index order. Each table can have only one clustered index (usually the primary key in MySQL InnoDB). A non-clustered index is a separate structure with pointers back to the actual rows — you can have many per table. PostgreSQL calls them "heap" tables with separate index structures; MySQL InnoDB always uses a clustered primary key.
Yes. Every index must be updated on every INSERT, UPDATE, or DELETE. A table with 15 indexes on a high-write table can be significantly slower on writes than the same table with 3 carefully chosen indexes. Indexes also consume disk space and memory (buffer pool). Audit your indexes periodically — remove unused ones (most databases track index usage statistics).