Indexed Views. Part 1. The Benefits

Indices serve three main purposes:

1. They are used to limit (or minimize) the amount of data being read to execute queries.

2. They can eliminate costly operators (like sort or aggregate) from query execution plans.

3. Finally, they can tremendously improve concurrency.

On the other hand, indices aren’t free — not only they take additional space (on disk, in memory and in backups), but also they have to be automatically synchronized with tables. This means additional overhead for each data modification, in terms of time and transaction logging.

In this blog series we are going to investigate two first benefits and the drawbacks of indices, including indexed views — the indices created on views, not base tables.

Basic indexing strategy

Our goal is to improve as much as possible performance of a simple grouping query. The base table (as usually I use AdventureWorks2008R2 sample database) already has three not so useful for this query indices:

USE [AdventureWorks2008R2]
GO

EXEC sp_helpindex '[Sales].[SalesOrderDetail]';
GO
-----
index_name                    index_description                                index_keys
AK_SalesOrderDetail_rowguid    nonclustered, unique located on PRIMARY        rowguid
IX_SalesOrderDetail_ProductID    nonclustered located on PRIMARY                ProductID
PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID    clustered, unique, primary key located on PRIMARY    SalesOrderID, SalesOrderDetailID

 

To do the comparison, we will need not only the statistics IO, but also the query cost and memory grant values — both available in the actual execution plan. Let me execute our test query:

SET STATISTICS IO ON
--Turn on actual execution plan (Ctrl+M)
GO
SELECT [ProductID], SUM([UnitPrice]) AS TotalPrice
FROM [Sales].[SalesOrderDetail]    
GROUP BY [ProductID]
ORDER BY [ProductID]
OPTION (MAXDOP 1);
GO
--Cost: 1.6395
--Memory grant: 1632
--Reads: 1241

 

image

As you can see, right now SQL Server has to:

1. Scan the whole table — generally a bad idea.

2. Compute a sum using a hash aggregate — bad in OLTP environments for a couple of reasons:

a. This operator is a blocking one, which means that the first row will be returned to the next (parent) operator no sooner than the whole input set was processed.

b. A hash match requires a memory grant to store intermediate results (buckets) in it.

c. This operator is computation heavy.

3. Sort the results. Note that sorting is done after grouping, so the cost of doing this is minimal. Still, sort is an operator to be avoided.

The first attempt to improve a query performance through nonclustered indices

The first nonclustered index will be created just to cover our query. To do this we have to store in it all necessary columns, in this case UnitPrice and ProductID. Note the index keys order:

CREATE INDEX Idx1
ON [Sales].[SalesOrderDetail] ([UnitPrice],[ProductID]);
GO

 

When we execute the query again, we will notice that the execution plan has not changed, however the overall cost dropped from 1.6 to 1 — all due to smaller number of reads:

SELECT [ProductID], SUM([UnitPrice]) AS TotalPrice
FROM [Sales].[SalesOrderDetail]    
GROUP BY [ProductID]
ORDER BY [ProductID]
OPTION (MAXDOP 1);
GO
--Cost: 1.01652
--Memory grant: 1632
--Reads: 400

 

image

Lesson learned — SQL Server always uses nonclustered, covering indices, even just for limiting the number of pages being read. Nevertheless, this kind of indices are in general not recommended — we can gain a lot more for the same price.

POC indices

The index key columns obviously matters. The best kind of indices for grouping queries are called POC indices, where POC stands for Partitioning, Ordering, and Covering. This means that the first key column(s) should be the one(s) used for partitioning (ProductID in our case), the second one(s) — for ordering (again ProductID in this very simple example) and the last one(s) — for covering (UnitPrice):

CREATE INDEX Idx2
ON [Sales].[SalesOrderDetail] ([ProductID],[UnitPrice]);
GO

 

If we now execute the sample query, we should notice two things:

1. The query cost has further dropped to 0.5

The query execution plan has significantly changed — because the rows are read in order defined by ProductIDs, hash aggregate could be replace with stream one, and the sort can be omitted altogether:

SELECT [ProductID], SUM([UnitPrice]) AS TotalPrice
FROM [Sales].[SalesOrderDetail]    
GROUP BY [ProductID]
ORDER BY [ProductID]
OPTION (MAXDOP 1);
GO
--Cost: 0.500
--Memory grant: NONE
--Reads: 400

 

image

Generally, stream aggregate is a way better (in OLTP systems) then a hash one — it is a non-blocking operator, and it does not require a memory grant. But still the aggregation has to be done during queries execution. To do better we will need a feature that is available in all SQL Server Edition — indexed views.

Lesson learned — almost all nonclustered indices should be composite ones. In this kind of indices key order matters. And the very efficient indexing strategy is to create POC indices.

Indexed views

Before we will create necessary views (in all edition except Enterprise two views will be needed), we really should check cardinality of the GROUP BY clause. Indexed views performs very poorly (in terms of data modifications) when the number of groups is very high and when there is too much rows per groups. For example, for the clients table the overhead probably will be too big to be acceptable for:

1. A query that does the grouping on countries (too many rows in each group).

2. And for a query that does the grouping on last names (too many groups).

In this case, the query seems to be suitable for this kind of tuning: there are less than 300 groups with less than 500 rows in each, on average:

SELECT COUNT (DISTINCT [ProductID]) AS ProductIDs, COUNT(*) AS Rows
FROM  [Sales].[SalesOrderDetail];
GO
-----
ProductIDs    Rows
266            121317

 

First, let me create a view that can be indexed. This kind of view has to:

1. Be created WITH SCHEMABINDING option (one of the consequences is that the base table or tables cannot be modified in a way that would affect the view definition).

2. Have some SET option correctly set.

3. Be deterministic.

4. Have COUNT_BIG(*) included.

For the complete list of requirement, see “Create Indexed Views” topic in BOL.

CREATE VIEW [Sales].[_SumUnitPriceByProducts]
WITH SCHEMABINDING
AS
SELECT [ProductID], SUM([UnitPrice]) AS TotalPrice, COUNT_BIG(*) AS NumberOfProducts
FROM [Sales].[SalesOrderDetail]    
GROUP BY [ProductID];
GO

 

Now we can materialize this view by creating a unique, clustered index on it:

CREATE UNIQUE CLUSTERED INDEX Idx3
ON [Sales].[_SumUnitPriceByProducts] ([ProductID]);
GO

 

Note, that a ProductID is the perfect candidate for the index key — it is used for grouping (so it will be unique) as well as for ordering.

If you are using Enterprise edition, the work is done: the query execution plan contains only one operator (clustered index scan), so we successfully eliminated all costly operators. In addition, the number of reads dropped to 2. In result, the query cost in now 0.003, so it’s almost 550 times cheaper then the original one:

SELECT *
FROM [Sales].[_SumUnitPriceByProducts]
ORDER BY [ProductID];
GO
--Cost: 0.003
--Memory grant: NONE
--Reads: 2

 

image

But all other editions will happily ignore our index, unless the NOEXPAND will be used in a query. Because changing a client code is tedious and sometimes even unachievable, we have to put this index on a server side. And the best place to use it is inside another (wrapper) view, like this:

CREATE VIEW [Sales].[SumUnitPriceByProducts]
AS 
SELECT *
FROM [Sales].[_SumUnitPriceByProducts] WITH (NOEXPAND)
ORDER BY [ProductID];
GO

 

Lesson learned — you can easily overcome one of the non-enterprise edition limitation (at least, in very limited scope) and use indexed views in your application. Still, before you get overexcited, please read upcoming part of this series and find out the overhead of indexed views.

Share this article ...

Google Plus
Ihren LinkedIn Kontakten zeigen



This entry was posted in Administration, Performance and tagged by Marcin Szeliga. Bookmark the permalink.
Marcin Szeliga

About Marcin Szeliga

Since 2006 invariably awarded Microsoft Most Valuable Professional title in the SQL category. A consultant, lecturer, authorized Microsoft trainer with 15 years’ experience, and a database systems architect. He prepared Microsoft partners for the upgrade to SQL Server 2008 and 2012 versions within the Train to Trainers program. A speaker at numerous conferences, including Microsoft Technology Summit, SQL Saturday, SQL Day, Microsoft Security Summit, Heroes Happen {Here}, as well as at user groups meetings. The author of many books and articles devoted to SQL Server.

Leave a Reply

Connect with:
  • This field its required.
  • This field its required.
    • Message is required