Marcin Szeliga

About Marcin Szeliga

Since 2006 invariably awarded Microsoft Most Valuable Professional title in the SQL category. A consultant, lecturer, authorized Microsoft trainer with 15 years’ experience, and a database systems architect. He prepared Microsoft partners for the upgrade to SQL Server 2008 and 2012 versions within the Train to Trainers program. A speaker at numerous conferences, including Microsoft Technology Summit, SQL Saturday, SQL Day, Microsoft Security Summit, Heroes Happen {Here}, as well as at user groups meetings. The author of many books and articles devoted to SQL Server.

Koniec roku z SQLExpert.pl

Ostatnie miesiące roku mają to do siebie, że ledwie się obejrzymy, a już są Święta. Zanim to jednak nastąpi, chcemy zaprosić Was na spotkania z  naszymi Ekspertami. We wrześniu (19.) będzie nas można spotkać w Bukareszcie, gdzie na konferencji SQLSaturday opowiemy o uczeniu maszynowym w wydaniu Azure ML oraz o tym, jak szybko zbudować system [...]

A za tydzień

Nie, ja nie o wyborach prezydenckich, tylko o warsztatach eScience i Machine Learning odbywających się pierwszego dnia konferencji SQL Day. Trochę to trwało, ale kolejne dwa eksperymenty (z systemów rekomendujących i analizy tekstów) są gotowe. Teraz jeszcze tylko opisać je na slajdach… Co, jak pokazał eksperyment w ramach którego przedstawiam metodykę eScience, język R, modele [...]

Drugi eksperyment gotowy

Pierwszy z eksperymentów zaplanowanych na warsztaty “Mistrz danych (Data Scientist) w Chmurze” miał na celu przedstawienie platformy Azure ML oraz klasyfikatorów (modeli ML które klasyfikują przypadki, takie jak las drzew decyzyjnych). Sam eksperyment nie jest specjalnie rozbudowany: Za to przygotowanie do niego teorii zajęło mi dwa dni i 50 slajdów. Kierując się Waszymi sugestiami, scenariusz [...]

Titanic – Machine Learning From Disaster

Jak pewnie wiecie, do konferencji SQLDAY zostały już tylko okrągłe 33 dni. Pomyślałem że to najwyższa pora wziąć się za przygotowywanie warsztatów. Najpierw zacząłem wypisywać tematy o których chciałbym opowiedzieć. Jak doszedłem do trzeciej strony, uznałem że z warsztatów zrobiłby się długi i nudny wykład. Postanowiłem więc zamienić wykład w którym tematy są ułożone mniej [...]

Mistrzowie danych

Są elitą. To od nich zależą najważniejsze decyzje, zarówno te dotyczące długofalowych planów firmy, jak i jej codziennego funkcjonowania. Jest ich niewielu — tylko wybrane firmy mogą się nimi pochwalić. Są przyszłością IT i zapotrzebowanie na nich będzie gwałtownie rosło. Nic więc dziwnego, że już dziś ich średnie wynagrodzenie znacznie przekracza pensje innych informatyków. Kim [...]

Inny niż zwykle konkurs Szelora

Już za niecałe dwa miesiące, dokładnie 11.05, będę miał przyjemność pokazać niektórym z Was moją nową pasję – mówię o warsztatach “Mistrz danych (Data Scientist) w Chmurze.” odbywających się w ramach największej SQLowej konferencji w Polsce, czyli SQL Day. Ponieważ jestem pewny że przyszłość należy do mistrzów danych, chciałbym Was  zaprosić na te warsztaty. A [...]

Using R to get Data into Azure ML Studio — file.choose gotcha

This is the first post related to upcoming SQL Day conference at which I will have a chance to deliver a pre-conf “Data Scientist in the Cloud” (http://sqlday.pl/marcin-szeligamistrz-danych-data-scientist-w-chmurze/). Microsoft Azure Machine Learning is a beautiful combination of two different approaches: 1. With growing library of predefined tasks you can easily perform complex action (like building [...]

How to load data effectively?

With this post I would like to share with you some results of my own, limited data load performance tests. Even based on this narrow and platform-specific test general conclusion could be drawn, I think. Configuration Test server was HP Proliant DL580G8 equipped with 2 Intel E7 processors, 256 GB RAM and EMC VNX 5600 [...]

Indexed Computed Columns and DBCC CHECKDB

Recently I spoke a couple times about using indices to solve some otherwise unsolvable performance problems, like turning non-searchable arguments into SARG ones. Now it’s time to point out a hidden cost of this solution — incredibly slow database consistency check. Problem Let me illustrate this problem with a sample database I used i.e. at [...]

Lost and found — calling a system function on per row basis

Those of you who have tried using user-define scalar functions probably found out that they are painfully slow. On the other hand, system scalar functions are freakishly fast. Have you ever wondered why? The very reason is because system functions are called once per query, not per row — yes, this is a huge oversimplification, [...]

Indexed Views. Part 1. The Benefits

Indices serve three main purposes: 1. They are used to limit (or minimize) the amount of data being read to execute queries. 2. They can eliminate costly operators (like sort or aggregate) from query execution plans. 3. Finally, they can tremendously improve concurrency. On the other hand, indices aren’t free — not only they take [...]

Cross-database access. Part 3 – the first attempt to use code signing

In previous parts of the series we examined two insecure “solutions”. Even if we were able to get cross-database access, the first one (you can read about it here: Cross-database access. Part 1 – the worst nightmare or why applications should not use sa login) requires sysadmin privilege, the second one (described here: Cross-database access. [...]

Cross-database access. Part 2 – trustworthy databases and dbo as an authenticator

In the previous part “Cross-database access. Part 1 – the worst nightmare or why applications should not use sa login” we saw what happens if an application uses sa login. Now, we are going to discuss a better and more secure, but still not the best, solution. Can a dbo access resources from another database? [...]

Cross-database access. Part 1 – the worst nightmare or why applications should not use sa login

In this series I am going to show you three different ways of enabling cross-database access, starting with the worst, but for whatever reason, still common “solution”. The second one will be better, but still not very secure. Finally, we will see how cryptography can ultimately solve this problem.   Remember, login sa is mapped [...]

Lost and found — sp_monitor

For those of you who do not spend evenings reading Book Online, there is a little known, but useful stored procedure called sp_monitor. If you ever wanted to get a quick impression of how the SQL Server is going in terms of CPU, I/O or network activity, just execute it: EXEC sp_monitor; —– last_run current_run [...]

Useful things about data recovery that every DBA should know. Part 4 — Restore planning

In the last part of this series we are going to find an answer to the ultimate question — what backup files should be used and in what order, to restore a database to their most current state? We will examine two cases: 1. When a database is being restored on the original SQL Server [...]

Useful things about data recovery that every DBA should know. Part 3 — Three differential backups tricks

In this article we are going to answers those questions: 1. Is differential backup a cumulative, or an incremental one? 2. Why you really should take a full backup just before switching any file group to read-only? 3. Do you really need a full backup to restart a broken log backup chain? If you missed [...]

Useful things about data recovery that every DBA should know. Part 2 — What does a full backup contain?

In the first part of the series, “Useful things about data recovery that every DBA should know. Part 1 — To what point in time full backup is being restored?”, we saw, that a full backup can be only restored to the point in time when a data copy phase finished. Now it is time [...]

Useful things about data recovery that every DBA should know. Part 1 — To what point in time full backup is being restored?

Every DBA took at least once full database backup. But do you happen to know to what point in time this backup would be restored? To the moment when backup started? To the time when it finished? Or maybe there is a third option? If you had chosen one of the first two answers, you [...]

The Case of VARCHAR(4), or when you should not use a variable length column?

Have you ever consider using VARCHAR(4) or similar data type? If so, you really should read this post. If not, please continue reading and find out why you were right. Don’t be cheap on data types SQL Server, at least until SQL Server’s 2014, reads and writes whole pages (8-KB blocks of data), neither rows [...]

Qvo Vadis, TechEd?

Last week I had a great pleasure to be at TechEd Europe, standing at Data Warehousing booth. Although it was my seventh or eighth TechEd, I missed two last ones and was not so sure what to expect this time. Honestly, my primary concern was about organizational stuff. But the venue and the event itself [...]

The Hidden Effect of Rolling Back Transaction from Triggers

Yes, you can rollback a transaction from a trigger. But just because you can doesn”t mean you should. In the contrary – if you do not catch exceptions on the SQL Server side, using a ROLLBACK statement inside triggers put you in serious risk. Understanding T-SQL Error Handling If you asked me what part of [...]

Nested Transactions in SQL Server

If you think there is such a thing as nested transaction in SQL Server, this post is for you. Nesting Transaction vs. Nested Transactions Technically you can begin a new transaction inside another one. What’s more, there is @@TRANCOUNT function that returns current transaction nesting level. Note. This function is quite handy as there is [...]

SQLDay Rules

As a person who had the great pleasure to speak at all six SQLDays, I can only say WOW!. And I am not talking about how big this conference has grown over those years. This is nice, but in my opinion there are far more impressive things. First of all, I spoke with a whole [...]

How Much Data do You Need? Part 3 —Data Samples and Convergence

This is the last part of the series in which we are going to talk about converging on a representative sample. If you missed the previous parts, before going further take your time and read them: 1. How much data do you need? Part 1 — The Amount of Data is Rather Irrelevant for Data [...]

How much data do you need? Part 1 — the amount of data is rather irrelevant for data mining algorithms

Do data mining algorithms need a ton of data (like millions of cases) to find hidden patterns? Absolutely not. Does this means that you can use only a handful of data to mine? Probably not either. In this article I am going to explain the first statement. In the upcoming one we will see why [...]

A handful of resources for SQL Server 2008 Microsoft Certified Master Knowledge Exam

I have just found out passed exam 088-970 (the knowledge one), so I am allowed to put myself into the ultimate test and take the famous lab exam. But beforehand, I would like to share with you some useful resources and tips (at least, they turned out to be useful for me). The range of [...]

Max degree of parallelism and maintenance tasks

In this posts I would like to show you a huge drawback of another commonly seen “best practice” — the one about setting ‘max degree of parallelism’ server option to 1. What does this recommendation come from? As our machines grow bigger and bigger many database administrators and programmers find out that databases which used [...]

I don’t have a SAN, so how many tempdb files do I need? Part 2 — PFS, GAM, and SGAM contention

In the previous article I don’t have a SAN, so how many tempdb files do I need? Part 1 — When splitting tempdb into to many files can be counterproductive to performance we saw that splitting tempdb into multiple files can be problematic. Now, it is time to have a closer look at the most [...]

I don’t have a SAN, so how many tempdb files do I need? Part 1 — When splitting tempdb into to many files can be counterproductive to performance

You can find some great and overwhelming load of not so good information about optimizing tempdb performance all over the Internet. This post has one purpose — to warn you of the consequences that you will face blindly following one of those “best practices”. Namely, if somebody tells you without even checking what your storage [...]

The Meaning of Stored Procedures. Part 8 — The case of “sp_” prefix and a geek riddle

Let me start the last part of the series with a simple question — What does the “sp_” prefix stand for? If your immediate answer is “system stored procedures, of course” you are only partially right. The better answer would be “sp stands for special prefix”. It might sound funny, but they are good reasons [...]

The Meaning of Stored Procedures Part 7 — No direct owners, name resolutions and the difference between EXECUTE AS SELF and EXECUTE AS OWNER

There is surprisingly much misunderstanding about schema-user separation, and about how name resolution as well as impersonation work in SQL Server. The ability to impersonate users by code modules was added in SQL Server 2005, along with database schemas (SQL Server 2000 only allowed context switching by executing SETUSER statement), but those features are still [...]

The Meaning of Stored Procedures Part 6 — NOCOUNT, ROWCOUNT and @@ROWCOUNT

If you remember, in the second part of this series (“Avoiding recompilations due to plan stability-related reasons”) I was talking about a drawback of setting plan reuse affecting SET option inside stored procedures. Nevertheless, this time I will show you two SET options that really should be set to on. We will start easily — [...]

The Meaning of Stored Procedures. Part 5 — When plan reusing is a bad thing or how to implement conditional logic inside stored procedures.

It has been more than a month since the first article on this series was published, so let me revise what we have already discussed: 1. Part 1 — “Plan caching and reuse is a good thing” was all about execution plans caching and reusing. We proved, that recompilation not only takes a lot of [...]

The Meaning of Stored Procedures. Part 4 — When plan reusing is a bad thing or how to deal with “Parameter Sniffing problem”

Finally, after three articles in which I did my best to convince you that plan caching and reusing has massive positive impact on performance (that was the topic of the first article The Meaning of Stored Procedures.Part 1 — Plan Caching and Reuse is a Good Thing) and showed, what can be done to avoid [...]

The Meaning of Stored Procedures. Part 3 – Avoiding recompilations due to plan optimality-related reasons.

As you recall, in previous parts of this series we agreed that reusing execute plans might have tremendous impact on SQL Server performance, and saw how to avoid unnecessary recompilation by proper use of temporary tables and not changing cache-key SET options. If you missed those articles, they can be found here: The Meaning of [...]

The Meaning of Stored Procedures. Part 2 — Avoiding recompilations due to plan stability-related reasons.

In the previous article I tried to convince you that stored procedures are a proper way to avoid unnecessary compilations and recompilations. They may not be the fastest (in most cases prepared queries will be faster), but they are widely used, mainly because they are great from security and manageability perspectives. Unfortunately, stored procedures are [...]

The Meaning of Stored Procedures. Part 1 — Plan Caching and Reuse is a Good Thing.

Everybody knows that use of stored procedures offers a number of benefits over issuing T-SQL code directly from an application. One of the most common reasons for using stored procedures is as a security boundary — a user can be given access to execute a stored procedure without having permissions directly on the underlying objects. [...]