Database Indexing Do’s and Don’ts
A new database with few records won’t suffer from poor indexing, but if you give it time, and you will soon see sluggishness in your application. This sluggishness only gets worse as you continue to add records to a poorly designed and poorly indexed database. The right indexes can make a huge difference. Add an index to the right column, and your queries go from running in a few seconds (terrible performance) to running in less than a millisecond (excellent performance).
You can over-index a database, however, so you shouldn’t just place an index on every column. Indexes sort records, making search queries much faster since it takes less time for the database engine to find the data it’s seeking. The trick is to set indexes on the right columns and place keys on the right columns.
Here are the do’s and don’ts for indexing.
Do Set Indexes on Linked Columns
“Linked columns” is the layman’s term for foreign keys. When you design your database, you connect data sets by setting primary and foreign key columns on each table. The primary key is always indexed and must be unique, or the database engine will give you an error. Tables linked to any table with a primary key contains a column with the primary key value, and this column is known as the foreign key.
Because these columns are not indexed by default like the primary key column, you must manually set indexes on them. If you don’t, queries on the main table might be fast, but once you query on a joined table with no index, you will see severe performance issues. You can fix the problem by adding an index to the foreign key column.
Do Set Indexes on Columns Commonly Used in a WHERE Clause
The WHERE clause is a statement in SQL that filters out results based on criteria you use in your queries. The filters may be set with input from your users, or with static variables in your procedures.
Database developers are urged to query on columns that have indexes, but what happens when you have no indexes on a table other than the primary key column? The developer works with your DBA to create an index on the table. Most administrators ask for testing first, but a DBA can determine if an index on a commonly used column is necessary.
For instance, you may have an order table, and most often use the price column for your reports. This column wouldn’t typically be indexed, but if you have several queries using it, then you could improve performance by adding an index to it.
Do Set Naming Standards for Indexes
Index names must be unique within the database, but you don’t want developers or DBAs setting names to anything that comes to mind. Instead, set naming standards. For instance, in a Microsoft SQL Server database, it’s common to start clustered primary key index names with a prefix of “PK_” and then the table name. This name indicates to anyone reviewing indexes that it’s a primary key-clustered index.
Secondary, non-dustered indexes set on other columns have a prefix of “IX_” with the primary key table and the foreign key table name. This lets the reviewer know which two tables are linked and that the two columns have indexes.
You can make up your naming standards, but make sure each DBA follows the standards. It gets confusing when you name indexes to anything that comes to mind because DBAs can lose track of which ones are critical to performance and which ones can be safely discarded.
Don’t Add an Index Just to Add an Index
Some people think that performance issues immediately mean an index is needed. While it’s common for performance issues to be solved using indexes, it’s not always the case. Some performance issues are hardware-based while others are due to poor programming or even just bandwidth. Always test indexes first and understand what is happening before you add an index to a table. Remember that too many indexes can lead to even worse performance. You can do more harm than good with an over-indexed table.
Do Add an Index to Sorting and Grouping Columns
It’s common for developers to sort records based on a particular column. The developer can do this on the web server, but most prefer to let the database handle any sorting and grouping. The SORT and GROUP BY statements are used to sort and group in SQL. Check SQL procedures for these statements and check that they have an index on the appropriate columns.
Do Use Multiple Columns for Primary Keys If Necessary
Primary keys are set automatically as clustered indexes, but the data in the column must be unique. For instance, you don’t want more than one customer to have the same ID, so you must make sure this column data is always unique. Some tables won’t have columns that offer unique data, but you can use what is called a composite key. A composite key is one that uses dual columns as the primary key. It could be any two columns that you know will always be unique. Combine a time and date column with another, and you are sure to have a unique value. Just make sure to use the time down to milliseconds.
Don’t Add an Index to a Table That is About to Receive a Huge Data Dump
An index can slow the import of large amounts of data. Indexes work well for SELECT queries, but not so much for INSERT statements. Because you do imports on large reporting tables, you should have the opportunity to remove the index, add the data, and then recreate the index after the import. Adding an index before the import is a big mistake made by new DBAs and developers.
What’s Next After Indexing?
Before you deploy any indexes to production, you should always test them. You can test them on a server that emulates the production environment. Run a large query on the tables and use a data analyzer to identify any performance issues. Some DBAs create indexes in production because it’s clear that one is needed. It’s risky but sometimes necessary. You should always see a performance boost after indexing even if it’s on a small table