Indices on computed columns

In this post I showed you a basic indexing strategy and huge benefits of indexed views. Now, I would like to draw your attention to another useful optimization method — creating indices on computed columns.

Indexing for denormalization

As you probably noticed the indices that are most useful for speeding up queries contain duplicate data — that is the case with non-clustered indices and with indexed views as well. In my opinion, indices are the first and quite frequently the best option to denormalize database, for a couple of reasons:

1. They are easy to create and maintain.

2. SQL Server will take care of data synchronization for you and will do this as fast as possible.

3. An administrator can add those indices without being worried about breaking any database or client-side code functionality.

However, first you have to somehow add those redundant data to your tables. And to do so you have two basic choices — you can add an additional column, populate it with a trigger and put an index on it, or you can add a computed column (just an expression), and index it. The thing is that only the second option allows you to fully use indices, as you are going to see shortly.

A baseline

To show you the power of indices on computed columns, let me create a sample database and switch it into a SIMPLE recovery mode:

CREATE DATABASE [test]
 CONTAINMENT = NONE
 ON  PRIMARY 
( NAME = N'test', 
FILENAME = N'C:SQLtest.mdf' , 
SIZE = 102400KB , FILEGROWTH = 10240KB )
 LOG ON 
( NAME = N'test_log', FILENAME = N'C:SQLtest_log.ldf' , 
SIZE = 102400KB , FILEGROWTH = 10240KB )
GO

ALTER DATABASE [test] SET RECOVERY SIMPLE
GO

 

In this database I’m going to create the simplest table possible — it will only have four columns: a primary key and three columns needed to compute a line total:

USE test
GO

CREATE TABLE dbo.SellingPrice
(
    SellingPriceID int IDENTITY(1,1) 
      CONSTRAINT PK_SellingPrice PRIMARY KEY,
    SubTotal decimal(18,2) NOT NULL,
    TaxAmount decimal(18,2) NOT NULL,
    OrderQty int NOT NULL
);
GO

 

Now let me insert 120 thousand or so rows and measure the time it takes — this will be the first part of our baseline:

SET STATISTICS TIME ON
GO

INSERT INTO dbo.SellingPrice (SubTotal,TaxAmount,OrderQty)
SELECT [UnitPrice], [UnitPrice]*.23, [OrderQty]
FROM [AdventureWorks2008R2].Sales.SalesOrderDetail
GO
-----
SQL Server Execution Times:
   CPU time = 234 ms,  elapsed time = 229 ms.

(121317 row(s) affected)

 

This load took about 0.2 second.

To create the second part of our baseline we have to measure query time — for example, how long does it take to get orders over 20 000?

SELECT *
FROM dbo.SellingPrice
WHERE (SubTotal + TaxAmount ) * OrderQty >20000
GO
----
(90 row(s) affected)

 SQL Server Execution Times:
   CPU time = 16 ms,  elapsed time = 20 ms.

 

0.21 second. So, right now the load is blazing fast (0.2 second), but query performance suffers — selecting 90 rows took about 0.2 second.

Why you should avoid triggers at all costs?

The first attempt to denormalize our data will use a regular column and a trigger. First, we need to add a LineTotal column to our table:

CREATE TABLE dbo.SellingPriceV2
(
    SellingPriceID int IDENTITY(1,1) 
      CONSTRAINT PK_SellingPricev2 PRIMARY KEY,
    SubTotal decimal(18,2) NOT NULL,
    TaxAmount decimal(18,2) NOT NULL,
    OrderQty int NOT NULL,
    LineTotal decimal(18,2) NULL
);
GO

 

Then, we need a trigger that will compute line totals automatically:

CREATE TRIGGER TR_SellingPrice_InsertUpdate
ON dbo.SellingPriceV2
AFTER INSERT, UPDATE AS BEGIN
  SET NOCOUNT ON;
  UPDATE sp
  SET sp.LineTotal = (sp.SubTotal 
                        + sp.TaxAmount )
                        * sp.OrderQty
  FROM dbo.SellingPriceV2 AS sp
  INNER JOIN inserted AS i
  ON sp.SellingPriceID = i.SellingPriceId;
END;
GO

 

Finally, a nonclustered index has to be created on this LineTotal column:

CREATE INDEX SellingPriceV2LineTotal
ON dbo.SellingPriceV2(LineTotal)
GO

 

Time to load the same data again:

INSERT INTO dbo.SellingPriceV2 (SubTotal,TaxAmount,OrderQty)
SELECT [UnitPrice], [UnitPrice]*.23, [OrderQty]
FROM [AdventureWorks2008R2].Sales.SalesOrderDetail
GO
----
SQL Server Execution Times:
   CPU time = 1544 ms,  elapsed time = 2547 ms.

 SQL Server Execution Times:
   CPU time = 2137 ms,  elapsed time = 3233 ms.

(121317 row(s) affected)

 

Note, this time inserting a 120 thousand rows took more than 3 seconds, almost 15 times longer.

So, what we gain for this price:

SELECT *
FROM dbo.SellingPriceV2
WHERE (SubTotal + TaxAmount ) * OrderQty >20000;
GO
----
(90 row(s) affected)

 SQL Server Execution Times:
   CPU time = 16 ms,  elapsed time = 20 ms.

 

Surprisingly, nothing. The query execution plan is exactly the same as before, and as a result the query cost and execution time also didn’t change.

Only if we could change our queries, we would benefit from this index on LineTotal — namely, we have to explicitly use this new column in our queries:

image

Indices on computed columns come to rescue

The previous attempt was a disaster — we managed to tremendously slow down inserts and updates and got no improvement in query performance, at least not without changing it.

Now, let me create the third version of our table, this time with a LineTotal as a computed column:

CREATE TABLE dbo.SellingPriceV3
(
    SellingPriceID int IDENTITY(1,1) 
      CONSTRAINT PK_SellingPriceV3 PRIMARY KEY,
    SubTotal decimal(18,2) NOT NULL,
    TaxAmount decimal(18,2) NOT NULL,
    OrderQty int NOT NULL,
    LineTotal AS ((SubTotal + TaxAmount) * OrderQty) 
);
GO

 

All else we need to do is to index this column:

CREATE INDEX SellingPriceV3ExtendedAmount
ON dbo.SellingPriceV3(LineTotal)
GO

 

Time to put our solution under a test. First, we are going to insert the same data into this table again:

INSERT INTO dbo.SellingPriceV3 (SubTotal,TaxAmount,OrderQty)
SELECT [UnitPrice], [UnitPrice]*.23, [OrderQty]
FROM [AdventureWorks2008R2].Sales.SalesOrderDetail
GO
-----
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 1 ms.

 SQL Server Execution Times:
   CPU time = 639 ms,  elapsed time = 775 ms.

 

As you can see, it took about 0.7 second, so it is 3 times slower than our baseline but 4 times faster than it was for a table with a trigger.

Lesson learned — triggers are really, really slow and if you can, you should replace them with computed columns. Beside performance reasons, computed columns are a much cleaner and more error-proof solution than triggers.

The last task is to test our query performance:

SELECT *
FROM dbo.SellingPriceV3
WHERE (SubTotal + TaxAmount ) * OrderQty >20000
GO
-----
(90 row(s) affected)

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

 

Now it takes 0 milliseconds, the very only right time for OLTP databases. Note, that the query doesn’t have to be changed in any way, SQL Server is smart enough to figure out that exact the same expression was used in a query and in the computed column definition.

See you next time

Share this article ...

Google Plus
Ihren LinkedIn Kontakten zeigen



This entry was posted in Administration, Performance and tagged by Marcin Szeliga. Bookmark the permalink.
Marcin Szeliga

About Marcin Szeliga

Since 2006 invariably awarded Microsoft Most Valuable Professional title in the SQL category. A consultant, lecturer, authorized Microsoft trainer with 15 years’ experience, and a database systems architect. He prepared Microsoft partners for the upgrade to SQL Server 2008 and 2012 versions within the Train to Trainers program. A speaker at numerous conferences, including Microsoft Technology Summit, SQL Saturday, SQL Day, Microsoft Security Summit, Heroes Happen {Here}, as well as at user groups meetings. The author of many books and articles devoted to SQL Server.

Leave a Reply

Connect with:
  • This field its required.
  • This field its required.
    • Message is required