Lost and found — calling a system function on per row basis

Those of you who have tried using user-define scalar functions probably found out that they are painfully slow. On the other hand, system scalar functions are freakishly fast. Have you ever wondered why? The very reason is because system functions are called once per query, not per row — yes, this is a huge oversimplification, but a really useful one.

Calling a typical system function

Let’s assume that your task is to generate some random data. Or to be more precise — to add this data to an existing table.

So, let me first create this table:

USE tempdb
GO
CREATE TABLE tab 
    (col1 int IDENTITY PRIMARY KEY,
    col2 int NULL,
    col3 int NULL);
GO

INSERT INTO tab
DEFAULT VALUES
GO 100

 

As you can see, col2 and col3 are empty:

SELECT TOP 5 * 
FROM tab;
GO
-----
col1    col2    col3
1    NULL    NULL
2    NULL    NULL
3    NULL    NULL
4    NULL    NULL
5    NULL    NULL

 

One could think that the following UPDATE statement will populate col2 with random values:

UPDATE tab
SET col2 = RAND()*100;
GO

 

But unfortunately that is not the case — the RAND function was called only once:

SELECT TOP 5 * 
FROM tab;
GO
-----
col1    col2    col3
1    32    NULL
2    32    NULL
3    32    NULL
4    32    NULL
5    32    NULL

 

Lesson learned — if called without a parameter, system functions would execute only once.

The special case of NEWID

There is one exception to this rule — NEWID system function is always executed per row, not per query. As a result, if we use this function as a parameter, the outer one will have to be called separately for each row.

The small problem with data types (NEWID returns uniqueidentifiers) is easy to solve with a function that takes any data types and returns ints, for example by using CHECKSUM:

UPDATE tab
SET col3 = RAND(CHECKSUM(NEWID()))*100;
GO

 

The col3 column was correctly populated with random values:

SELECT TOP 5 * 
FROM tab;
GO
----
col1    col2    col3
1    32    56
2    32    64
3    32    98
4    32    88
5    32    89

 

Interestingly, SSMS shows costs of those two UPDATE statements as exactly the same — maybe next time we will discuss this:

image

See you then

Share this article ...

Google Plus
Ihren LinkedIn Kontakten zeigen



This entry was posted in Lost and Found and tagged by Marcin Szeliga. Bookmark the permalink.
Marcin Szeliga

About Marcin Szeliga

Since 2006 invariably awarded Microsoft Most Valuable Professional title in the SQL category. A consultant, lecturer, authorized Microsoft trainer with 15 years’ experience, and a database systems architect. He prepared Microsoft partners for the upgrade to SQL Server 2008 and 2012 versions within the Train to Trainers program. A speaker at numerous conferences, including Microsoft Technology Summit, SQL Saturday, SQL Day, Microsoft Security Summit, Heroes Happen {Here}, as well as at user groups meetings. The author of many books and articles devoted to SQL Server.

Leave a Reply

Connect with:
  • This field its required.
  • This field its required.
    • Message is required