Useful things about data recovery that every DBA should know. Part 2 — What does a full backup contain?

In the first part of the series, “Useful things about data recovery that every DBA should know. Part 1 — To what point in time full backup is being restored?”, we saw, that a full backup can be only restored to the point in time when a data copy phase finished. Now it is time to check, what does a full backup really contain, and by the way introduce the most fundamental thing about restore planning, the concept of backup chains.

Does a full backup contain every data page?

Let’s start simple. I am going to create a sample database with a 50 MB data file and 20 MB log file:

CREATE DATABASE BackupTest
ON PRIMARY
    (NAME = 'datafile',
      FILENAME = 'C:Program FilesMicrosoft SQL ServerMSSQL11.MSSQLSERVERMSSQLDataBackupTest.mdf',
      SIZE = 50MB,
      MAXSIZE = 500MB,
      FILEGROWTH = 10)
LOG ON
    ( NAME = 'logfile',
      FILENAME = 'C:Program FilesMicrosoft SQL ServerMSSQL11.MSSQLSERVERMSSQLDataBackupTest.ldf',
      SIZE = 20MB,
      MAXSIZE = 500MB,
      FILEGROWTH = 10);
GO

In this database a small table with 10 rows was created:

USE BackupTest;
GO
CREATE TABLE Tab
    (col1 int IDENTITY PRIMARY KEy,
    col2 time DEFAULT SYSDATETIME(),
    col3 char(4100) DEFAULT 'a');
GO

INSERT INTO Tab DEFAULT VALUES;
GO 10

Finally, a full database backup was taken:

BACKUP DATABASE BackupTest
TO DISK = 'InitailFullBackup.bak'
WITH INIT;
GO

As you can see, the backup contains 336 data pages (336*8KB = 2 688 KB), thus only about 2 MB from a 50 MB data file. Consequently, a full database backup does not contain every data page. In fact, only allocated extents (extents that have at least one allocated page on them) are being stored in a backup file.

Does a full backup contain whole transaction log?

Now things are getting interesting. There are only two options:

1. If a full backup held the whole transaction log, it would be extremely inefficient and unnecessary large. Remember, that only in SIMPLE recovery model the transaction log is automatically cleared during checkpoints, in two others recovery models the only way to clear the log is by taking a log backup.

2. Otherwise, the first log sequence number (the LSN of the earliest log record stored in this backup) has to have been stored with a backup. It would also be convenient if the last sequence number (for example, the LSN pointing at the next log record beyond the end of the backup) would have as well been stored. This way an administrator will be able to find out what transactions are stored in which backup and what the order of those backup files is.

The easiest way to check those LSNs is by querying msdb.dbo.backupmediafamily system table. For straightforwardness, I will clear the whole backup history — the operation you probably won’t do on your production server:

EXEC msdb..sp_delete_backuphistory '20130722';

Now I’m ready to take a full database backup with a random name:

DECLARE @name char(36)
SET @name=NEWID()
BACKUP DATABASE BackupTest
TO DISK = @name
WITH INIT;
GO

The name was random to clearly show you that the most important part of metadata about a backup file is not its name, but those two LSNs: first and last. As I said, the easiest way to find this information is to query msdb.dbo.backupmediafamily system table. To see you the whole picture, three other pieces of data will also be returned:

1. checkpoint_lsn — the name suggests that this is the LSN of the last checkpoint, but to be precise, it’s the LSN from which the REDO must start (if there were active transactions during a checkpoint, this number would be the LSN of the oldest active transaction).

2. database_backup_lsn — Log sequence number of the most recent full database backup.

3. physical_device_name — a piece of information taken from a related system table sdb.dbo.backupmediafamily. This is just a backup file name:

SELECT physical_device_name,first_lsn, last_lsn, checkpoint_lsn, database_backup_lsn
FROM   msdb.dbo.backupset bs 
INNER JOIN msdb.dbo.backupmediafamily bmf 
    ON bs.media_set_id = bmf.media_set_id
WHERE database_name = 'BackupTest'
ORDER  BY bs.backup_start_date DESC;
GO 
-----
physical_device_name    first_lsn    last_lsn         checkpoint_lsn     database_backup_lsn 
C:Program FilesMicrosoft SQL ServerMSSQL11.MSSQLSERVERMSSQLBackup89DBD257-BAA5-413C-AEAD-89A5753252C0    35000000031600037    35000000033300001    35000000031600037    35000000022000207

Take a closer look at those LSNs:

· FirstLSN is 35000000031600037 which means that the oldest transaction stored in this backup is 35000000031600037.

· LastLSN is 35000000033300001 which means that the next backup in the chain has to have transaction 35000000033300001.

· CheckpointLSN is the same as the FirstLSN — you should remember that the very first thing that happened during a database backup is a checkpoint. And this database was idle when this backup was taken.

· DatabaseBackupLSN is the oldest one: 35000000022000207. To see what this number means we will have to check the same metadata about the initial backup. Hence the backup history was cleared, the only way to do this is by executing RESTORE HEADERONLY:

RESTORE HEADERONLY 
FROM DISK = ''C:Program FilesMicrosoft SQL ServerMSSQL11.MSSQLSERVERMSSQLBackupInitailFullBackup.bak'';

Along with the awful lot of other information, the FirstLSN, LastLSN and CheckpointLSN will be returned. And they look like this:

FirstLSN             LastLSN              CheckpointLSN
35000000022000207    35000000030600001    35000000022000207

As you can see, the DatabaseBackupLSN match the FirstLSN of the previous backup.

Just to be sure that this is clear, let me insert another 10 rows (thus fill a transaction log), take another full backup and execute our diagnostic query one more time:

INSERT INTO Tab DEFAULT VALUES;
GO 10

DECLARE @name char(36)
SET @name=NEWID()
BACKUP DATABASE BackupTest
TO DISK = @name
WITH INIT;
GO

SELECT first_lsn, last_lsn, checkpoint_lsn, database_backup_lsn
FROM   msdb.dbo.backupset
WHERE database_name = ''BackupTest''
ORDER  BY backup_start_date;
GO 
-----
first_lsn             last_lsn            checkpoint_lsn      database_backup_lsn
35000000031600037    35000000033300001    35000000031600037    35000000022000207
35000000046400043    35000000048300001    35000000046400043    35000000031600037

Lessons learned:

1. A full backup contains only allocated extents. So there is no drawback to have larger than necessary data file. Consequently, by shrinking a database one will not improve a backup time.

2. A full database backup contains only a part of transaction log that is necessary to recover a database from this backup.

3. This first transaction number is stored within a backup and can be seen by executing RESTORE HEADERONLY. The same data is also stored on a SQL Server where the backup was taken, in a msdb.dbo.backupset system table.

4. For your convenience the last transaction number is also stored in those two locations.

In the upcoming parts we will dig deeper into introduced topic of backup chain. First, we will check some little known facts about a differential backup. Then we will see how a log backup works, ad finally, how to use this knowledge in practice, to restore a database to chosen point in time (i.e. the most recent or the time when somebody deletes data by mistake) from a set of backup files that have meaningless names.

See you

Share this article ...

Google Plus
Ihren LinkedIn Kontakten zeigen



This entry was posted in Administration, Data Recovery, SQL Server and tagged , by Marcin Szeliga. Bookmark the permalink.
Marcin Szeliga

About Marcin Szeliga

Since 2006 invariably awarded Microsoft Most Valuable Professional title in the SQL category. A consultant, lecturer, authorized Microsoft trainer with 15 years’ experience, and a database systems architect. He prepared Microsoft partners for the upgrade to SQL Server 2008 and 2012 versions within the Train to Trainers program. A speaker at numerous conferences, including Microsoft Technology Summit, SQL Saturday, SQL Day, Microsoft Security Summit, Heroes Happen {Here}, as well as at user groups meetings. The author of many books and articles devoted to SQL Server.

Leave a Reply

Connect with:
  • This field its required.
  • This field its required.
    • Message is required