Introduction
There are many ways to measure a database’s
performance. We can look at its overall availability; that is: are our
users able to get the information they need when they need it? We can
consider its concurrency; how many users can query the database at one
time? We can even watch the number of transactions that occur per second
to gauge its level of activity.
Often for our users however, it comes down to one very simple,
but somewhat nebulous, measurement – perceived responsiveness. Or put
another way, when a user issues a query does he receive his results
without having to wait too long? Of course, there are many factors that influence this. There is the hardware itself and how its configured. There is the physical implementation of the database; the logical design of the database; and even how the queries themselves are written.
In this article however, I will focus on one specific technique that can be used to improve performance – creating covering indexes.
An indexing refresher
Before taking on the challenge of covering indexes, let’s take a moment for a brief but very relevant review of index types and terminology.Indexes allow SQL Server to quickly find rows in a table based on key values, much like the index of a book allows us to easily find the pages that contain the information we want. There are two types of indexes in SQL Server, clustered and nonclustered indexes.
Clustered Indexes
A clustered index is an index whose leaf nodes, that is the lowest level of the index, contain the actual data pages of the underlying table. Hence the index and the table itself are, for all practical purposes, one and the same. Each table can have only one clustered index. For more information on clustered indexes, see the Books Online topic "Clustered Index Structures" (http://msdn.microsoft.com/en-us/library/ms177443.aspx).When a clustered index is used to resolve a query, SQL Server begins at the root node for the index and traverses the intermediate nodes until it locates the data page that contains the row it's seeking.
Many database designs make prolific use of clustered indexes. In fact, it is generally considered a best practice to include a clustered index on each table; of course that's painting with a very broad brush and there will most assuredly be exceptions. For more information about the benefits of clustered indexes, see the SQL Server Best Practices Article entitled "Comparing Tables Organized with Clustered Indexes versus Heaps" on TechNet .
Lets’ consider an example. In the Figure 1, the Customers table has a clustered index defined on the Customer_ID column. When a query is executed that searches by the Customer_ID column, SQL Server navigates through the clustered index to locate the row in question and returns the data. This can be seen in the Clustered Index Seek operation in the query’s Execution Plan.
Figure 1. Clustered index execution plan
CREATE CLUSTERED INDEX [ix_Customer_ID] ON [dbo].[Customers] (
[Customer_ID] ASC) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
Nonclustered indexes
Nonclustered indexes use a similar methodology to store indexed data for tables within SQL Server. However in a nonclustered index, the lowest level of the index does not contain the data page of the table. Instead, it contains the information that allows SQL Server to navigate to the data pages it needs. For tables that have a clustered index, the leaf node of the nonclustered index contains the clustered index keys. In the previous example, the leaf node of a nonclustered index on the Customers table would contain the Customer_ID key.If the underlying table does not have a clustered index (this data structure is known as a heap), the leaf node of the nonclustered index contains a row locator to the heap data pages.
In the following example, a nonclustered composite index has been created on the Customers table as described in the following Transact-SQL code.
CREATE NONCLUSTERED INDEX [ix_Customer_Name] ON [dbo].[Customers] (
[Last_Name] ASC,
[First_Name] ASC) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
Figure 2. Nonclustered index execution plan
Using Nonclustered indexes
As illustrated in the preceding example, nonclustered indexes may be employed to provide SQL Server with an efficient way to retrieve data rows. However, under some circumstances, the overhead associated with nonclustered indexes may be deemed too great by the query optimizer and SQL Server will resort to a table scan to resolve the query. To understand how this may happen, let's examine the preceding example in more detail.Key Lookups
Looking again at the graphical query execution plan depicted in Figure 2, notice that the plan not only includes an Index Seek operation that uses the ix_Customer_Name nonclustered index, it also includes a Key Lookup operation.SQL Server uses a Key Lookup to retrieve non-key data from the data page when a nonclustered index is used to resolve the query. That is, once SQL Server has used the nonclustered index to identify each row that matches the query criteria, it must then retrieve the column information for those rows from the data pages of the table.
Since the leaf node of the nonclustered index contains the key value information for the row, SQL Server must navigate through the clustered index to retrieve the columnar information for each row of the result set. In this example, SQL Server choose to do this using a nested loop join type.
This query produced a result of 1,000 rows and nearly 100% of the expense of the query was directly attributed to the Key Lookup operation. Digging a little deeper into the Key Lookup operation, we can see why.
Figure 3. Key lookup operation properties
Resorting to Table Scans
As the number of rows in the result set increases, so does the number of Key lookups. At some point, the cost associated with the Key Lookup will outweigh any benefit provided by the nonclustered index.To illustrate this point, let's modify the query so that it retrieves more rows. Figure 4. depicts the new query along with the actual execution plan used to resolve the query.
Figure 4. Resorting to a table scan
Figure 5 shows some additional properties of the table scan.
Figure 5. Properties for the table scan operation
One may be tempted to force SQL Server to resolve the query using the nonclustered index by supplying a table hint as shown in the following illustration.
Figure 6. Using a table hint to resolve the query
Figure 7 shows the properties for the Key Lookup when we forced SQL Server to use the nonclustered ix_Customer_Name index. The Estimated Operator Cost for the Key Lookup is 57.02 compared to 12.17 for the Clustered Index Scan shown in Figure 5. Forcing SQL Server to use the index significantly affected performance, and not for the better!
Figure 7. Properties of the Key Lookup for the table hint
Covering Indexes
So, if Key Lookups can be detrimental to performance during query resolution for large result sets, the natural question is: how can we avoid them? To answer that question, let's consider a query that does not require a Key Lookup.Let's begin by modifying our query so that it no longer selects the Email_Address column. Figure 8 illustrates this updated query along with its actual execution plan.
Figure 8. Reduced query to eliminate the Key Lookup
Figure 9. Reduced query to eliminate the Key Lookup properties
The Estimated Operator Cost went down dramatically, from 12.17 in Figure 5 to 0.22 in Figure 9. We could also look at the Logical and Physical Read characteristics by setting STATISTICS IO on, however for this demonstration it’s sufficient to view the Operator Costs for each operation.
The observed improvement is due to the fact that the nonclustered index contained all of the required information to resolve the query. No Key Lookups were required. An index that contains all information required to resolve the query is known as a "Covering Index"; it completely covers the query.
Using the Clustered Key Column
Recall that if a table has a clustered index, those key columns are automatically part of the nonclustered index. So, the following query is a covering query by default.Figure 10. Covering index using the clustered index keys
Adding Key Columns to the Index
To increase the likelihood that a nonclustered index is a covering index, it is tempting to begin adding additional columns to the index key. For example, if we regularly query the customer's middle name and telephone number, we could add those columns to the ix_Customer_Name index. Or, to continue with our previous example, we could the Email_Address column to the index as shown in the following Transact-SQL code.CREATE NONCLUSTERED INDEX [ix_Customer_Email] ON [dbo].[Customers] (
[Last_Name] ASC,
[First_Name] ASC,
[Email_Address] ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
Furthermore, there are some built in limitations for indexes. Specifically, indexes are limited to 16 key columns or 900 bytes, whichever comes first, in both SQL Server 2005 and SQL Server 2008. And some datatypes cannot be used as index keys, varchar(max) for instance.
Including Non-Key columns
SQL Server 2005 provided a new feature for nonclustered indexes, the ability to include additional, non-key columns in the leaf level of the nonclustered indexes. These columns are technically not part of the index, however they are included in the leaf node of the index. SQL Server 2005 and SQL Server 2008 allow up to 1023 columns to be included in the leaf node.To create a nonclustered index with included columns, use the following Transact-SQL syntax.
CREATE NONCLUSTERED INDEX [ix_Customer_Email] ON [dbo].[Customers] (
[Last_Name] ASC,
[First_Name] ASC)INCLUDE ( [Email_Address]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
Figure 11. Covering query with included columns
Figure 12. Execution properties for the covering query with included columns
From this execution plan, we can see that the Estimate Operator Cost decreased from 12.17 in Figure 5 to 0.41. Including an additional non-key column has dramatically improved the performance of the query.
For more information about including non-key columns in a nonclustered index, See "Index with Included Columns" in Books Online .
No comments:
Post a Comment