What is Database Indexing and Why It Matters [2026]

Q: What is a database index?

An index is a separate data structure (usually a B-tree) that holds sorted copies of one or more columns alongside pointers to the actual rows. Without an index, the database scans every row (full table scan). With an index, it traverses the B-tree in O(log n) steps to find matching rows. The tradeoff: indexes speed up reads but slow down writes (every INSERT/UPDATE/DELETE must also update all relevant indexes).

Q: When should I add an index?

Add an index on columns used in WHERE clauses frequently, JOIN conditions, ORDER BY/GROUP BY, and foreign keys. Do NOT index every column — each index adds overhead to writes and consumes disk space. A good rule: if a query takes more than 100ms and runs frequently, check if an index on the filtered/sorted columns would help. Use EXPLAIN to see if queries are doing full table scans.

Q: What is a composite index?

A composite (multi-column) index covers multiple columns in a defined order. An index on (last_name, first_name) can serve queries on last_name alone, or last_name + first_name — but NOT first_name alone (left prefix rule). The order matters. Put the most selective column first, and put the column used in equality conditions before range conditions.

Q: What is the difference between a clustered and non-clustered index?

A clustered index determines the physical order of data on disk — the table rows are stored in index order. Each table can have only one clustered index (usually the primary key in MySQL InnoDB). A non-clustered index is a separate structure with pointers back to the actual rows — you can have many per table. PostgreSQL calls them "heap" tables with separate index structures; MySQL InnoDB always uses a clustered primary key.

Q: Can too many indexes hurt performance?

Yes. Every index must be updated on every INSERT, UPDATE, or DELETE. A table with 15 indexes on a high-write table can be significantly slower on writes than the same table with 3 carefully chosen indexes. Indexes also consume disk space and memory (buffer pool). Audit your indexes periodically — remove unused ones (most databases track index usage statistics).

What is a Database Index?

Think of a database index like the index at the back of a textbook. Without it, finding every mention of "B-tree" means reading every single page from start to finish. With an index, you jump directly to the right pages. A database index works exactly the same way.

An index is a separate data structure maintained by the database engine. It holds a sorted copy of one or more column values, along with pointers to the actual rows in the table. When you run a query with a WHERE clause, the database can consult the index to find matching rows directly — instead of scanning every row in the table.

Without any index, a query like SELECT * FROM orders WHERE customer_id = 12345 on a table with 50 million rows must examine all 50 million rows. With an index on customer_id, the database performs a logarithmic search (O(log n)) and finds the matching rows in microseconds.

How B-tree Indexes Work

B-tree (Balanced Tree) is the default index type in PostgreSQL, MySQL InnoDB, SQL Server, and Oracle. Understanding its structure explains why indexes are so fast.

A B-tree stores index values in a hierarchically sorted structure. The tree has:

Root node: The top of the tree — entry point for every search
Internal nodes: Intermediate nodes that guide the search left or right
Leaf nodes: The bottom level, containing sorted key values and row pointers

B-tree structure (conceptual) Root: [50 | 100] / \ \ [10|25|40] [60|75|90] [110|150|200] (leaf) (leaf) (leaf) Each leaf node stores: value → row pointer (page + offset)

To find customer_id = 75, the database starts at the root, compares 75 to 50 and 100, determines it falls in the middle branch, and traverses to the correct leaf node. For a table with 1 billion rows, the tree is only about 30 levels deep — meaning at most 30 page reads to find any value.

Index Types

Index Type	Supported Queries	Best For
B-tree	Equality, range, ORDER BY, LIKE prefix	Most use cases — the default
Hash	Equality only (=, IN)	Exact lookups, NOT range queries
Partial	Rows matching a condition	Index only active/non-null rows
Full-text	Text search (MATCH AGAINST, tsvector)	Search across text columns
GIN	Array, JSONB, full-text (PostgreSQL)	JSON documents, arrays
GiST	Geometric/range types (PostgreSQL)	Geographic data, IP ranges
Covering	All query columns in the index	Avoid heap lookups entirely

When to Add an Index

Not every column needs an index. Here are the situations where adding an index delivers the most value:

Foreign keys: Always index FK columns — JOIN queries and ON DELETE CASCADE operations depend on them
Columns in WHERE clauses: Columns you filter on frequently are prime index candidates
ORDER BY and GROUP BY columns: Indexes can satisfy sort operations without a separate sort step
High-cardinality columns: An index on a column with 10 million unique values is far more useful than one with 3 possible values (like a boolean "is_active")

Use EXPLAIN to Find Missing Indexes

Before adding any index, run EXPLAIN (PostgreSQL) or EXPLAIN ANALYZE to see how the database plans to execute your query. Look for "Seq Scan" (sequential/full scan) on large tables — that's where an index would help most.

Reading EXPLAIN Output

PostgreSQL EXPLAIN ANALYZE example -- Without index: EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 12345; -- Output: Seq Scan on orders (cost=0.00..89432.00 rows=12 width=156) -- actual time=45.231..3821.445 rows=12 loops=1 -- After: CREATE INDEX idx_orders_customer ON orders(customer_id); EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 12345; -- Output: Index Scan using idx_orders_customer on orders -- (cost=0.56..54.78 rows=12 width=156) -- actual time=0.048..0.112 rows=12 loops=1

Composite Indexes — Column Order Matters

A composite index covers multiple columns. The order of columns in the index definition is critical. Consider an index on (last_name, first_name, date_of_birth):

Queries filtering on last_name alone: can use the index
Queries filtering on last_name AND first_name: can use the index
Queries filtering on first_name alone: cannot use the index (violates the left-prefix rule)
Queries filtering on date_of_birth alone: cannot use the index

Composite index rules -- Index: (last_name, first_name, city) -- These queries CAN use the index: WHERE last_name = 'Smith' WHERE last_name = 'Smith' AND first_name = 'John' WHERE last_name = 'Smith' AND first_name = 'John' AND city = 'NYC' -- These queries CANNOT use the index: WHERE first_name = 'John' -- skips first column WHERE city = 'NYC' -- skips first two columns WHERE first_name = 'John' AND city = 'NYC' -- skips first column

Design Rule: Equality Before Range

In a composite index, put columns used in equality conditions (=) before columns used in range conditions (>, <, BETWEEN). For example, if you filter on status = 'active' AND created_at > '2026-01-01', the index should be (status, created_at) — not (created_at, status).

Clustered vs Non-Clustered Indexes

These two index types differ in how they relate to the physical storage of table data:

Property	Clustered Index	Non-Clustered Index
Physical data order	Table rows stored in index order	Separate structure; rows anywhere on disk
Count per table	One only	Many (up to hundreds)
Range query speed	Fastest (rows are adjacent on disk)	Slower (random I/O to fetch rows)
MySQL InnoDB	Always the PRIMARY KEY	All other indexes
PostgreSQL	No true clustered; use CLUSTER command	Default heap table + separate indexes
Row lookup	No extra step — data IS the index leaf	Index lookup + row fetch (two steps)

Index Gotchas and Common Mistakes

Functions on indexed columns kill index usage

Index-defeating queries -- BAD: function on indexed column — index not used WHERE YEAR(created_at) = 2026 WHERE LOWER(email) = 'user@example.com' WHERE TRIM(phone) = '555-1234' -- GOOD: rewrite to use the index WHERE created_at >= '2026-01-01' AND created_at < '2027-01-01' -- For LOWER: create a functional index instead CREATE INDEX idx_email_lower ON users(LOWER(email));

NULL values and indexes

In PostgreSQL, NULL values ARE stored in B-tree indexes. In MySQL, NULLs are also indexed. However, queries like WHERE column IS NULL may or may not use the index depending on the database and NULL distribution. A partial index can be useful: CREATE INDEX idx_pending ON orders(id) WHERE status IS NULL.

Too Many Indexes Will Hurt Write Performance

Every index must be kept up to date on INSERT, UPDATE, and DELETE. A heavily indexed table (15+ indexes) on a high-write workload can spend more time maintaining indexes than doing actual data work. Regularly audit unused indexes and drop them. In PostgreSQL, use pg_stat_user_indexes to see which indexes have never been scanned.

Covering Indexes — Avoid Heap Lookups

A covering index includes all columns needed by a query — so the database never needs to fetch the actual row. It gets everything it needs from the index itself. This is the fastest possible read path.

Covering index example -- Query: frequently run report SELECT customer_id, total_amount, status FROM orders WHERE status = 'pending' AND created_at > '2026-01-01'; -- Covering index includes all columns the query touches: CREATE INDEX idx_orders_covering ON orders(status, created_at, customer_id, total_amount); -- Now the database reads ONLY the index — never touches the main table (heap)

Index Maintenance

Over time, B-tree indexes can become fragmented as rows are inserted and deleted. Periodic maintenance improves performance:

PostgreSQL: VACUUM ANALYZE reclaims dead tuples and updates statistics. REINDEX rebuilds a fragmented index.
MySQL: OPTIMIZE TABLE rebuilds the table and its indexes. ANALYZE TABLE updates index statistics.
Auto-vacuum: PostgreSQL runs auto-vacuum by default. Ensure it is not blocked on large tables.

How We Research and Update This Guide

We test the underlying formula or workflow, compare outputs with reliable references, and revise examples whenever the page content changes.

The workflow or formula is tested directly in the tool and compared against independent reference examples.
Examples are kept practical so readers can verify the result without hidden assumptions.
Pages are revised whenever the interface, calculation flow, or surrounding guidance materially changes.

Frequently Asked Questions — Database Indexing

What is a database index?

What is a B-tree index?

When should I add an index?

What is a composite index?

What is the difference between a clustered and non-clustered index?

Can too many indexes hurt performance?