It has been more than a month since the first article on this series was published, so let me revise what we have already discussed:
1. Part 1 â€” â€śPlan caching and reuse is a good thingâ€ť was all about execution plans caching and reusing. We proved, that recompilation not only takes a lot of resources (both CPU cycles and memory), but also can considerably slow down your queries. We also saw, how to use two other mechanisms, beside stored procedures (preparation and auto-parameterization queries) to avoid unnecessary recompilations.
2. In part 2 â€” â€śAvoiding recompilations due to plan stability-related reasonsâ€ť we discussed different recompilation reasons and agreed that temporary tables, not the permanent ones, should be used inside stored procedures. We also saw why setting plan reuse affecting SET option inside a stored procedure has a much less impact on SQL Server 2005 onwards than it had in SQL 2000.
3. Part 3 â€” â€śAvoiding recompilations due to plan optimality-related reasonsâ€ť was the last part of the series in which we tried to achieve our primary goal â€” minimize unnecessary recompilations. To do this, we had to discuss statistics creation, maintenance and uses. Based on this knowledge we could make an informed choice when to disable recompilations and how to do this without disabling AUTO_CREATE_STATISTICS and/or AUTO_UPDATE_STATISTICS database options (and without using table variables as well).
4. In the most anticipated part 4 â€” â€śWhen plan reusing is a bad thing or how to deal with Parameter Sniffing problemâ€ť we grasped the reasons behind infamous â€śParameter Sniffingâ€ť thing, and agreed, that was rather a feature than a bug. We also learned how to deal with this SQL Server behavior.
Now, when the concept of separation between query optimization and execution is clear, and the consequences of passing parameters to stored procedures are well-understood, we can go a little further. In this article we are going to analyze another common problem, the one which is caused by using a conditional logic (namely an IF statement) inside stored procedures.
How to follow multiple execution paths?
Using a conditional logic inside stored procedures is very common as it allows us creating more flexible and universal procedures to implement our business logic. And it seems to be a great way to minimize the number of necessary code modules, hence reduce the code duplications. But it breaches the very important rule that said a stored procedure should perform a single task only.
To see why this can lead to serious performance problems, letâ€™s create a quite simple stored procedure that uses conditional logic:
Then execute it without a parameter and have a look at the execution plan:
Everything seems to be OK â€” SQL Server scanned both tables, joined them using Hash Match operator and compute the SubTotal sum. But to see what is really going on, we have to check the execution plan for the whole procedure. To do this, please execute the following query and click the XML returned in the query_plan column:
Now we can see three interesting things:
1. The first (leftmost) operator is a conditional one and it is very cheap.
2. The first branch is used when the stored procedure is executed without a parameter. Please, check the estimated row numbers for both tables and compare them with the result of the following query â€” they will be exactly the same:
3. Then have a look at the second branch, the one that will be used when the stored procedure is executed with a parameter. The most important things are again the estimate row numbers for both tables. In this branch the estimated numbers of rows will be one. And this actually makes sense â€” during the first execution of the stored procedure this branch was not used, so there were no rows flowing this path.
Note. SQL Server assumes zero rows being read only if there is a contradiction in the query, otherwise the smallest estimated row number will be one.
Lesson learned â€” during the first execution SQL Server compiles and stores the execution plan for the whole stored procedure. If there are multiple execution paths, all of them will be included, though only one of them will be actually used.
Can you see where the problem is? Sure, you can, but before we will jump to a conclusion, let me go through this step by step.
Firstly, clear the procedure cache, turn on IO statistics and execute our stored procedure twice. Note, that the first execution (and as we already know the most important one) will be parameterless:
During the first execution nonclustered indices were scanned, but the second execution reads far more pages then the whole SalesOrderHeader table has:
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
(1 row(s) affected)
Table ”Worktable”. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ”SalesOrderDetail”. Scan count 1, logical reads 228, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ”SalesOrderHeader”. Scan count 1, logical reads 73, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
(1 row(s) affected)
Table ”SalesOrderHeader”. Scan count 0, logical reads 347652, physical reads 89, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ”SalesOrderDetail”. Scan count 1, logical reads 1241, physical reads 3, read-ahead reads 1251, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
And if we check the execution plans, we will see the reason for this â€” the actual number of rows (almost 116 thousand) was far bigger than the estimated one (estimated number of rows was 1):
Just to be sure, check how many times the execution plan was used:
It was used twice, as expected.
Secondly, clear the procedure cache and execute our stored procedure again, this time in a reverse order:
The execution plan for the first (now the second one) branch stays the same (SQL Server always can check a table row count), but the second (now, the first) branch was executed differently, by using Merge, not a Nested Loop, join:
Please note, that this is not the same situation that we discussed in the previous article. SQL Server of course sniffed the parameter values used for the first execution, but now it has a real problem. Now the optimizer has to find good enough execution plans for a query that will never be executed with the given parameter values. Whatâ€™s more, the risk of using a bad plan grows with the number of execution paths (the ELSE IF statements). And OPTIMIZE FOR UNKNOWN hint will not help us, as we will see later.
Before we go on, let me establish a baseline in which we compare the execution time of an ad-hoc query with our stored procedure. To do this, we will use a free utility written by Adam Mechanic â€”SQL Query Stress:
Simultaneous execution of a hundred queries in ten sessions took about 14 seconds. Now let me execute our stored procedure:
It took more than 66 seconds, because the suboptimal plan, which was prepared for a single row, was used.
The only solution
To solve this problem we have to eliminate multiple execution paths from query plans. And the only robust solution is to move the conditional logic to the wrapper stored procedure, from which we will call the inner ones. The new set of stored procedures might look like this:
From now on, each stored procedure that performs the data access will have its own execution plan and the problem will be gone:
Lesson learned â€” do not use conditional logic inside stored procedures that do data access because more often than not the execution will use suboptimal, based on wrong estimations, query plan.
To be fair, first the stored procedure will be executed without a parameter, and then a thousand times with it:
The execution time dropped back to about 14 seconds. It is up to you, but in my opinion rewriting stored procedures in the way which was shown will pay off.
What about OPTIMIZE FOR UNKNOWN?
To see why this hint will not help, please alter our original stored procedure by adding OPTIMIZE FOR UNKNOWN to both queries:
Execute it once without a parameter (which really doesnâ€™t matter) and then execute it a hundred times in ten parallel sessions:
As you can see, it took almost a minute on the same machine. This is a hint after all, and as Query Optimizer takes them very seriously, you should not put it blindly in every query. Hence, if you care about your stored procedures performance, just move IF statements to the wrappers.