How to load data effectively?

With this post I would like to share with you some results of my own, limited data load performance tests. Even based on this narrow and platform-specific test general conclusion could be drawn, I think.

Configuration

Test server was HP Proliant DL580G8 equipped with 2 Intel E7 processors, 256 GB RAM and EMC VNX 5600 storage array. Destination database data files were located on drive g:, tempdb was located on local SSD drive (J:). These drives performance was measured using CristalDiskMark:

image

for test data. For this DBGen tool was used from http://www.tpc.org//tpch/. So, after a while I had 8 files with data, the largest one was lineitem (226 GB), stored on a separate drive.

Note. DBGen tool was used just to simplify database and sample data generation. I didn’t measure TPC-H for this server. Instead, I just measured data load time and, in addition, a query execution time.

The data was loaded several times, with the very same BULK LOAD statement, but with different destination table structures:

BULK INSERT [dbo].[REGION] FROM ''E:tpchregion.tbl'' WITH (TABLOCK, DATAFILETYPE=''char'',CODEPAGE=''row'',FIELDTERMINATOR = ''|'');
BULK INSERT [dbo].[CUSTOMER] FROM ''E:tpchcustomer.tbl'' WITH (TABLOCK, DATAFILETYPE=''char'',CODEPAGE=''row'',FIELDTERMINATOR = ''|'');
BULK INSERT [dbo].[PARTSUPP] FROM ''E:tpchpartsupp.tbl'' WITH (TABLOCK, DATAFILETYPE=''char'',CODEPAGE=''row'',FIELDTERMINATOR = ''|'');
BULK INSERT [dbo].[NATION] FROM ''E:tpchTPCnation.tbl'' WITH (TABLOCK, DATAFILETYPE=''char'',CODEPAGE=''row'',FIELDTERMINATOR = ''|'');
BULK INSERT [dbo].[SUPPLIER] FROM ''E:tpchsupplier.tbl'' WITH (TABLOCK, DATAFILETYPE=''char'',CODEPAGE=''row'',FIELDTERMINATOR = ''|'');
BULK INSERT [dbo].[PART] FROM ''E:tpchpart.tbl'' WITH (TABLOCK, DATAFILETYPE=''char'',CODEPAGE=''row'',FIELDTERMINATOR = ''|'');
BULK INSERT [dbo].[ORDERS] FROM ''E:tpchorders.tbl'' WITH (TABLOCK, DATAFILETYPE=''char'',CODEPAGE=''row'',FIELDTERMINATOR = ''|'');
BULK INSERT [dbo].[LINEITEM] FROM ''E:tpchlineitem.tbl'' WITH (TABLOCK, DATAFILETYPE=''char'',CODEPAGE=''row'',FIELDTERMINATOR = ''|'');

Results

During the first load destination tables were defined as b-trees (with clustered index on key column), except LINEITEM which was defined as clustered columnstore index. No partitions nor PAGE/ROW compression. Destination database contains 10 equally-sized data files within 1 filegroup. With this I loaded 300 GB of data in 6 hours and 47 minutes, but my goal was to load this amount of data in less than 5 hours:

table load time rows rows/sec in thousand processing time
customer 00:03:59 45 000 000 188,2845188 0
partsupp 00:22:26 240 000 000 178,3060921 0
supplier 00:00:14 3 000 000 214,2857143 0
part 00:05:12 60 000 000 192,3076923 0
orders 00:41:28 450 000 000 180,8681672 0
lineitem 05:33:54 1 799 989 091 89,84671513 0
Total time: 06:47:13

And ended up with following tables:

table Usedpages (MB) RowCount
REGION 0,02 5
NATION 0,02 25
PART 8658,23 60 000 000
SUPPLIER 470,19 3 000 000
PARTSUPP 37173,91 240 000 000
CUSTOMER 7718,12 45 000 000
ORDERS 51925,63 450 000 000
LINEITEM 80862,91 1 799 989 091

Then following query was executed (again it is one of TPC-H benchmark queries, but again used just to asses query execution time):

select
    supp_nation,
    cust_nation,
    l_year,
    sum(volume) as revenue
from
    (select
n1.n_name as supp_nation,
        n2.n_name as cust_nation,
        datepart(year, l_shipdate) as l_year,
        l_extendedprice * (1 - l_discount) as volume
        from
            supplier,
            lineitem,
            orders,
            customer,
            nation n1,
            nation n2
        where
            s_suppkey = l_suppkey
            and o_orderkey = l_orderkey
            and c_custkey = o_custkey
            and s_nationkey = n1.n_nationkey
            and c_nationkey = n2.n_nationkey
            and l_shipdate between ''1995-01-01'' and ''1996-12-31''
    ) as shipping
group by supp_nation, cust_nation, l_year
order by supp_nation, cust_nation, l_year

In both cases (with massive amount of read-ahead read and logical read only) the query executed in less than 2 minutes, which meets my requirements:

Cache CPU time (s) Elapsed time (s)
Cold 738,71 101
Warm 571,44 81

Let’s try completely different configuration — with a single data file, partitions and no indexes. In this case, after load of data, for the two biggest tables (both partitioned by date) columnstore clustered indexes were defined, for remaining tables clustered indexes (Primary keys) were defined. With this I got following results:

Table load time rows rows/sec in thousand processing time
customer 00:05:14 45 000 000 143,3121019 00:00:32
partsupp 00:26:46 240 000 000 149,4396015 00:03:00
supplier 00:00:18 3 000 000 166,6666667 00:00:02
part 00:06:47 60 000 000 147,4201474 00:00:33
orders 00:39:09 450 000 000 191,5708812 00:12:00
lineitem 03:44:05 1 799 989 091 140,1314979 00:26:56
Total time: 05:45:22

The tables looked like this:

TableName IndexName PartitionNr LowerBoundary UpperBoundary UsedPages_MB RowCount
REGION PK__REGION__ 1 NULL NULL 0,02 5
NATION PK__NATION__ 1 NULL NULL 0,02 25
PART PK__PART__ 1 NULL NULL 8658,2 60000000
SUPPLIER PK__SUPPLIER__ 1 NULL NULL 470,13 3000000
PARTSUPP PK__PARTSUPP__ 1 NULL NULL 37173,84 240000000
CUSTOMER PK__CUSTOMER__ 1 NULL NULL 7718,09 45000000
ORDERS CSOrders 1 NULL 1993-01-01 3679,66 68452530
ORDERS CSOrders 2 1993-01-01 1994-01-01 3668,88 68267051
ORDERS CSOrders 3 1994-01-01 1995-01-01 3668,88 68267414
ORDERS CSOrders 4 1995-01-01 1996-01-01 3674,43 68268870
ORDERS CSOrders 5 1996-01-01 1997-01-01 3678,72 68448295
ORDERS CSOrders 6 1997-01-01 1998-01-01 3670,31 68270249
ORDERS CSOrders 7 1998-01-01 1999-01-01 2150,66 40025591
ORDERS CSOrders 8 1999-01-01 NULL 0,02 0
LINEITEM CSlineitem 1 NULL 1993-01-01 10069,45 228169657
LINEITEM CSlineitem 2 1993-01-01 1994-01-01 12063,61 273079700
LINEITEM CSlineitem 3 1994-01-01 1995-01-01 12097,91 273058377
LINEITEM CSlineitem 4 1995-01-01 1996-01-01 12075,75 273042202
LINEITEM CSlineitem 5 1996-01-01 1997-01-01 12097,15 273818467
LINEITEM CSlineitem 6 1997-01-01 1998-01-01 12067,52 273100092
LINEITEM CSlineitem 7 1998-01-01 1999-01-01 9055,74 205720596
LINEITEM CSlineitem 8 1999-01-01 NULL 0,02 0

And query was CPU heavy but blazing fast:

Cache CPU time (s) Elapsed time (s)
Cold 852,50 81
Warm 679,47 40

Observations

1. The load of data into a heap and then compress it into columnstore is very efficient.

2. On the other hand, single-threaded data load into a heap was even slower than into a b-tree.

Armed with this knowledge I went back into 10 data files, added –E SQL Server startup parameter, and switched on delayed durability for destination database. Naturally, the two biggest tables were defined as partitioned heaps and compressed into CCI later, smaller tables were defined as b-trees. With these tweaks I managed to meet my initial requirements:

Table load time rows rows/sec in thousand processing time
customer 00:02:09 45 000 000 348,8372093 0
partsupp 00:09:30 240 000 000 421,0526316 0
supplier 00:00:08 3 000 000 375 0
part 00:02:54 60 000 000 344,8275862 0
orders 00:37:51 450 000 000 198,1505945 00:06:32
lineitem 03:26:28 1 799 989 091 145,3010245 00:24:23
Total time: 04:49:55

Of course, query times did not change significantly.

Observations

1. This storage, configured in this way, was capable of handling 10 data files per LUN.

Wrap-up

Even if presented results are not new or surprising, we saw a couple of interesting things:

1. First, how to use publicly available tools (like DBGen) for our own purpose.

2. Second, that following the best practices usually pay off.

3. And last but not least, that it’s really worth your time to do some test in your specific, production environments.

Share this article ...

Google Plus
Ihren LinkedIn Kontakten zeigen



This entry was posted in Data Warehouse, Performance and tagged , , by Marcin Szeliga. Bookmark the permalink.
Marcin Szeliga

About Marcin Szeliga

Since 2006 invariably awarded Microsoft Most Valuable Professional title in the SQL category. A consultant, lecturer, authorized Microsoft trainer with 15 years’ experience, and a database systems architect. He prepared Microsoft partners for the upgrade to SQL Server 2008 and 2012 versions within the Train to Trainers program. A speaker at numerous conferences, including Microsoft Technology Summit, SQL Saturday, SQL Day, Microsoft Security Summit, Heroes Happen {Here}, as well as at user groups meetings. The author of many books and articles devoted to SQL Server.

Leave a Reply

Connect with:
  • This field its required.
  • This field its required.
    • Message is required