In this post I showed you a basic indexing strategy and huge benefits of indexed views. Now, I would like to draw your attention to another useful optimization method â€” creating indices on computed columns.
Indexing for denormalization
As you probably noticed the indices that are most useful for speeding up queries contain duplicate data â€” that is the case with non-clustered indices and with indexed views as well. In my opinion, indices are the first and quite frequently the best option to denormalize database, for a couple of reasons:
1. They are easy to create and maintain.
2. SQL Server will take care of data synchronization for you and will do this as fast as possible.
3. An administrator can add those indices without being worried about breaking any database or client-side code functionality.
However, first you have to somehow add those redundant data to your tables. And to do so you have two basic choices â€” you can add an additional column, populate it with a trigger and put an index on it, or you can add a computed column (just an expression), and index it. The thing is that only the second option allows you to fully use indices, as you are going to see shortly.
To show you the power of indices on computed columns, let me create a sample database and switch it into a SIMPLE recovery mode:
In this database Iâ€™m going to create the simplest table possible â€” it will only have four columns: a primary key and three columns needed to compute a line total:
Now let me insert 120 thousand or so rows and measure the time it takes â€” this will be the first part of our baseline:
This load took about 0.2 second.
To create the second part of our baseline we have to measure query time â€” for example, how long does it take to get orders over 20 000?
0.21 second. So, right now the load is blazing fast (0.2 second), but query performance suffers â€” selecting 90 rows took about 0.2 second.
Why you should avoid triggers at all costs?
The first attempt to denormalize our data will use a regular column and a trigger. First, we need to add a LineTotal column to our table:
Then, we need a trigger that will compute line totals automatically:
Finally, a nonclustered index has to be created on this LineTotal column:
Time to load the same data again:
Note, this time inserting a 120 thousand rows took more than 3 seconds, almost 15 times longer.
So, what we gain for this price:
Surprisingly, nothing. The query execution plan is exactly the same as before, and as a result the query cost and execution time also didnâ€™t change.
Only if we could change our queries, we would benefit from this index on LineTotal â€” namely, we have to explicitly use this new column in our queries:
Indices on computed columns come to rescue
The previous attempt was a disaster â€” we managed to tremendously slow down inserts and updates and got no improvement in query performance, at least not without changing it.
Now, let me create the third version of our table, this time with a LineTotal as a computed column:
All else we need to do is to index this column:
Time to put our solution under a test. First, we are going to insert the same data into this table again:
As you can see, it took about 0.7 second, so it is 3 times slower than our baseline but 4 times faster than it was for a table with a trigger.
Lesson learned â€” triggers are really, really slow and if you can, you should replace them with computed columns. Beside performance reasons, computed columns are a much cleaner and more error-proof solution than triggers.
The last task is to test our query performance:
Now it takes 0 milliseconds, the very only right time for OLTP databases. Note, that the query doesnâ€™t have to be changed in any way, SQL Server is smart enough to figure out that exact the same expression was used in a query and in the computed column definition.
See you next time