For all of you how had the challenge to get backups of different databases in sync there is a nice solution by marking transactions. Let’s have a look at the initial situation.
We have three databases used for the Data Warehouse: Staging, Enterprise Store and one for the Data Mart. Typically there are also 3 kind of ETL processes, each of those scheduled independently.
- 1.) Source System to Staging
- 2.) Staging to Enterprise Store
- 3.) Enterprise Store to Data Mart
- In case we want to backup all databases we need to make sure they are in sync. This can be achieved if no ETL processes are running during a backup, but in some customer scenarios there is no certain timeslot for backup.
In this case it can happen that ETL processes are running during the backups and keeps them out of sync. Example: Every ETL load from Staging to the Enterprise Store marks the dataset in the Staging database as loaded, after the ETL process finishes. In case the backup of the Staging database has been created before an additional ETL process to the Enterprise Store was started the two backups are out of sync.
One solutions is to work with marked transactions: http://msdn.microsoft.com/en-us/library/ms188623.aspx
How does it fits to our problem? Before the backup time slot a distributed transaction is started over all databases that has a special name. When the backups are recovered on another system the recovery can be executed to the point of the named transaction.
The execution of the names transaction can be easily done within a SSIS package that starts a named distributed transaction over the 3 databases and inserts a small record into a helper table that resides in each of the databases.
Comments or other solutions? Feel free to leave a comment.