The Meaning of Stored Procedures. Part 4 — When plan reusing is a bad thing or how to deal with “Parameter Sniffing problem”

Finally, after three articles in which I did my best to convince you that plan caching and reusing has massive positive impact on performance (that was the topic of the first article The Meaning of Stored Procedures.Part 1 — Plan Caching and Reuse is a Good Thing) and showed, what can be done to avoid unnecessary recompilations (if you are interested, have a look at these parts: The Meaning of Stored Procedures. Part 2 — avoiding recompilations due to plan stability-related reasons and The Meaning of Stored Procedures. Part 3 — avoiding recompilations due to plan optimality-related reasons), it is time to say that sometimes plan reusing can lead to severe performance problems.

In this post we are going to investigate the first and probably best-known issues related to reusing execution plans — the “Parameter sniffing problem”.

Preface to the problem

Before we get down to the practical ramifications, let me illustrate the root of this problem with a simple query. First of all, let’s create an index on SubTotal column Sales.SalesOrderHeader table (as usual, I am using AdventureWorks2008R2 database):

Selec All Code:
1
2
3
4
5
6
USE [AdventureWorks2008R2];
GO
 
CREATE INDEX idxSalesOrderHeaderSubTotal
ON Sales.SalesOrderHeader ([SubTotal]);
GO

Then, turn on the actual execution plan and run this query:

Selec All Code:
1
2
3
4
5
6
SELECT * 
FROM Sales.SalesOrderHeader h
JOIN Sales.SalesOrderDetail d
    ON h.SalesOrderID = d.SalesOrderId
WHERE h.[SubTotal] =3578.27;
GO

The execution plan contains two clustered index scans and a single merge join, which means that the newly created non-clustered index was not used. Since we are selecting rows based on SubTotal, the analysis should start from the top right operator — the scan of SalesOrderHeader table:

image

As you can see, SQL Server correctly assumed 1 551 rows with SubTotal equals to 3578.27 and decided to go for a table scan, not for a nonclusterd index seek. This assumption led to another one, that there will be more than 120 000 corresponding rows in SalesOrderDetail table. This estimation is way off track due to implicit conversion, but this is a topic for completely different article:

image

As SQL Server assumed, that there will be a substantial number of rows to join and saw an opportunity to use a merge join without additional sorting, the chosen plan looks quite good.

Let’s see what execution plan will be chosen for much more selective SubTotal value, i.e. 1.374:

Selec All Code:
1
2
3
4
5
6
SELECT * 
FROM Sales.SalesOrderHeader h
JOIN Sales.SalesOrderDetail d
     ON h.SalesOrderID = d.SalesOrderId
WHERE h.[SubTotal] = 1.374;
GO

As expected, this time SQL Server decided to use our index to seek rows (as the number of qualified rows was low, the Key Lookup was cheap enough to be used):

image

As previously, the number and values of matching rows from one table were used to estimate the join cardinality. This time, the estimated number of rows was less than 4, so a clustered index seek and then a nested loop join operator were picked:

image

The point is, that SQL Server was able to find the “best” execution plans based on different SubTotal values used in WHERE clause.

But, if we modify our query slightly and use a variable instead a literal, the situation changes:

Selec All Code:
1
2
3
4
5
6
7
DECLARE @SubTotal money;
SET @SubTotal = 1.374;
SELECT * 
FROM Sales.SalesOrderHeader h JOIN Sales.SalesOrderDetail d
     ON h.SalesOrderID = d.SalesOrderId
WHERE h.[SubTotal] = @SubTotal;
GO

image

Please note that we are still searching for SubTotal equals to 1.374, but this time the estimated number of rows was different. The reason is simple — the variable had no value at compilation and optimization time. As we already know, SQL Server has to assume some number of rows, otherwise it would not be able to compute and compare execution costs.

If we check the statistics created for our index, we will find information about rows number and density for SubTotal column:

Selec All Code:
1
2
DBCC SHOW_STATISTICS ('Sales.SalesOrderHeader','idxSalesOrderHeaderSubTotal');
GO

Simple multiplication rows number by SubTotal density will reveal how SQL Server assumed those 6.6 rows:

SELECT 31465 * 0.0002106594;

This time, the error was not big enough to influence the execution plan, but what happen if we changed the SubTotal to 3578.27?

Selec All Code:
1
2
3
4
5
6
7
DECLARE @SubTotal money 
SET @SubTotal = 3578.27
SELECT * 
FROM Sales.SalesOrderHeader h JOIN Sales.SalesOrderDetail d
     ON h.SalesOrderID = d.SalesOrderId
WHERE h.[SubTotal] = @SubTotal;
GO

The execution plan stayed exactly the same. And if we look at the Index Seek operator, we will find that the estimated number of rows sticks at 6.6:

image

Lesson learned

  • Optimization and execution are two completely separate processes and sometime during an optimization SQL Server does not know the actual values used in queries.
  • Incorrect estimation in one part of the execution plan can (and probably will) be spread to others part of the plan.

Is Parameter Sniffing a real problem?

First of all, it is an expected SQL Server behavior. The problem is that many database programmers and BDAs do not expect it. The following stored procedure illustrates this:

Selec All Code:
1
2
3
4
5
6
CREATE PROCEDURE uspGetOrdersBySubTotal(@SubTotal money) AS
SELECT * 
FROM Sales.SalesOrderHeader h JOIN Sales.SalesOrderDetail d
      ON h.SalesOrderID = d.SalesOrderId
WHERE h.[SubTotal] = @SubTotal;
GO

When we execute this procedure for the first time, SQL Server will compile, optimize and store the execution plan prepared for the sniffed parameter value:

Selec All Code:
1
2
EXECUTE uspGetOrdersBySubTotal 3578.27;
GO

To see this behavior, select the leftmost operator (the Select one), press F4 and then expand the Parameter List section of the Properties window:

image

Subsequent executions will reuse the same plan, even if the stored procedure is executed with different parameter values:

Selec All Code:
1
2
EXECUTE uspGetOrdersBySubTotal 1.374;
GO

image

Unfortunately, the plan used for the second execution was suboptimal — our index on SubTotal column was not used, and SQL Server read far more pages then necessary:

Selec All Code:
1
2
3
4
5
6
7
8
9
10
11
SET STATISTICS IO ON
GO
SELECT * 
FROM Sales.SalesOrderHeader h JOIN Sales.SalesOrderDetail d
	ON h.SalesOrderID = d.SalesOrderId
WHERE h.[SubTotal] =1.374;
 
EXECUTE uspGetOrdersBySubTotal 1.374
GO
SET STATISTICS IO OFF
GO

Lesson learned — parameters values used for the first execution of a stored procedure will be used to compile an execution plan and this plan will be used over and over again.

So, what’s the problem with parameter sniffing? The problem arises when a stored procedure is executed for the first time with unusual parameter values. In our example, during the first execution the stored procedure returns about 5 percent of the rows from the table and a clustered index scan is chosen. Even if the majority of other values is stored in less than 1% of the rows, and a useful index exists, the following executions of this stored procedure will involve full table scans.

Note. Stored procedures that use LIKE, >= or < are especially prone to the “parameter sniffing problem” because the variability in the number of returned rows is much higher.

What can we do about this?

The first, but definitely not the best option, is to disable plan caching for the problematic stored procedure completely. We can do this by adding WITH RECOMPILE to the procedure header:

Selec All Code:
1
2
3
4
5
6
7
ALTER PROCEDURE uspGetOrdersBySubTotal(@SubTotal money) 
WITH RECOMPILE AS
SELECT * 
FROM Sales.SalesOrderHeader h JOIN Sales.SalesOrderDetail d
	ON h.SalesOrderID = d.SalesOrderId
WHERE h.[SubTotal] = @SubTotal;
GO

From now on, the execution plan for this procedure will not be cached, so the parameter sniffing problem will vanish. Before adopting this solution you should remember, that if the stored procedure has more than one query in it (as most do), it will recompile every time all of the queries in the stored procedure, even those that are not affected due to atypical parameters. To check this, execute the altered stored procedure with different parameters and observe execution plans:

Selec All Code:
1
2
3
4
EXECUTE uspGetOrdersBySubTotal 3578.27;
GO
EXECUTE uspGetOrdersBySubTotal 1.374;
GO

image

There are better alternatives to completely remove the benefits of query plan reuse. To see them, first revert the stored procedure to its original state:

Selec All Code:
1
2
3
4
5
6
ALTER PROCEDURE uspGetOrdersBySubTotal(@SubTotal money) AS
SELECT * 
FROM Sales.SalesOrderHeader h JOIN Sales.SalesOrderDetail d
	ON h.SalesOrderID = d.SalesOrderId
WHERE h.[SubTotal] = @SubTotal;
GO

The second option is to force a new compilation for a specific execution of a stored procedure only. We can do this by adding WITH RECOMPILE to the EXEC statement. As a result, this execution of the stored procedure will not use a cached execution plan, and the execution plan for this call will not be cached as well. You can observe this behavior by executing the following queries and compare the execution plans:

Selec All Code:
1
2
3
4
5
6
7
8
DBCC FREEPROCCACHE
GO
EXECUTE uspGetOrdersBySubTotal 3578.27;
GO 
EXECUTE uspGetOrdersBySubTotal 1.374 WITH RECOMPILE;
GO
EXECUTE uspGetOrdersBySubTotal 3578.27;
GO

image

The first and last executions use the same plan, but the second one was compiled on demand due to WITH RECOMPILE phrase.

If we check how many execution plans are in cache and how many times they were used, we will find only one execution plan used twice:

Selec All Code:
1
2
3
4
5
6
7
SELECT cp.usecounts, qt.text, qp.query_plan
FROM sys.dm_exec_cached_plans AS cp
CROSS APPLY sys.dm_exec_sql_text(plan_handle) AS qt
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS qp
WHERE qt.text LIKE '%Sales.SalesOrderHeader%'
AND qt.text NOT like '%dm_exec_sql_text%';
GO

Lesson learned — if a stored procedure is being executed with unusual parameters, WITH RECOMPILE should be used.

But there is still another option (even two). Please, note that for everything to work as expected it is the parameter values used for compilation must be typical. “Typical” means that the value density should be as close as possible to the average column density. And we already know how to achieve this result — by using variables:

Selec All Code:
1
2
3
4
5
6
7
8
ALTER PROCEDURE uspGetOrdersBySubTotal (@SubTotal money) AS
DECLARE @_SubTotal money;
SET @_SubTotal = @SubTotal;
SELECT * 
FROM Sales.SalesOrderHeader h JOIN Sales.SalesOrderDetail d
	ON h.SalesOrderID = d.SalesOrderId
WHERE h.[SubTotal] = @SubTotal;
GO

If we execute this altered stored procedure, the execution plan will always be the same, as the actual value used for the first execution does not matter anymore. Please, note that the first execution is with atypical value, even though nonclustered index was used:

Selec All Code:
1
2
3
4
EXECUTE uspGetOrdersBySubTotal 3578.27;
GO 
EXECUTE uspGetOrdersBySubTotal 1.374;
GO

image

SQL Server 2008 onwards allows us to achieve exactly the same result by using OPTIMIZE FOR UNKNOWN query hint (with SQL 2005 only OPTIMZE FOR <specific value> hint can be used):

Selec All Code:
1
2
3
4
5
6
7
ALTER PROCEDURE uspGetOrdersBySubTotal (@SubTotal money) AS
SELECT * 
FROM Sales.SalesOrderHeader h JOIN Sales.SalesOrderDetail d
	ON h.SalesOrderID = d.SalesOrderId
WHERE h.[SubTotal] = @SubTotal
OPTION (OPTIMIZE FOR UNKNOWN);
GO

From now on, the actual parameter values for the first execution don’t matter (as far as the query plan is concern) as SQL Server will always optimize this stored procedure for typical value based on the statistics:

Selec All Code:
1
2
3
4
EXECUTE uspGetOrdersBySubTotal 3578.27;
GO
EXECUTE uspGetOrdersBySubTotal 1.374; 
GO

image

Conclusion

Parameter sniffing is a feature rather than a bug — it allows reusing execution plans thus achieving a better overall performance. But sometimes, especially in Data Warehouses, recompiling a new plan for a given parameter value might be way cheaper than reusing a plan optimized for different values. So, if execution of a stored procedure takes much longer in application than in SSMS (that is the most obvious symptom of parameter sniffing problem) you should take the following actions:

1. The simplest way to check if you are actually suffering from parameter sniffing problem is to drop the existing plan from the procedure cache, which causes a stored procedure (and a trigger) to be recompiled the next time that it is run. To do this execute sp_recompile system stored procedure passing your procedure name as a parameter.

2. You can also check (using scripts from this and the first part of the series) cached execution plans, how many times they were reused and for what values they were compiled.

If you confirm that a parameter sniffing is the root of the problem, the next step depends on your data distribution and the stored procedure execution patterns:

1. If the stored procedure is sometimes called with unusual parameters, add WITH RECOMPILE to those executions.

2. If the majority of row values have similar distribution, add OPTIMIZE FOR UNKNOWN query hint to the problematic query inside the stored procedure. Or, if you are still using SQL 2005, use local variables inside a stored procedure to achieve the same result.

Cheers

Share this article ...

Google Plus
Ihren LinkedIn Kontakten zeigen



This entry was posted in SQL Server, Stored Procedures and tagged , , , , by Marcin Szeliga. Bookmark the permalink.
Marcin Szeliga

About Marcin Szeliga

Since 2006 invariably awarded Microsoft Most Valuable Professional title in the SQL category. A consultant, lecturer, authorized Microsoft trainer with 15 years’ experience, and a database systems architect. He prepared Microsoft partners for the upgrade to SQL Server 2008 and 2012 versions within the Train to Trainers program. A speaker at numerous conferences, including Microsoft Technology Summit, SQL Saturday, SQL Day, Microsoft Security Summit, Heroes Happen {Here}, as well as at user groups meetings. The author of many books and articles devoted to SQL Server.

Comments are closed.