Yesterday someone asked me how do I typically build a Data Warehouse? I think a very good question and I have a good answer: “I need 4 steps”. Only 4 steps? Hmm…so let’s have a look.
Step1: Dimensional Modeling
First of all I start with a process called Dimensional Modeling. What is it? I typically arrange workshops together with the business department. Within these workshops we define the business data model. We use an abstract modeling layer based on mind maps for this and try to define the measures, dimensions, attributes, hierarchies, calculations, … all the stuff you need for a Data Warehouse. Also the business users understand that technique after a while and feel quite comfortable defining the data model in the right way.
The important thing is that you don’t look at the source systems. Everything is allowed during the modeling phase, we don’t care about restrictions, which maybe exist in current systems. So the focus is “how do I want to analyze my data”. The following picture show an example of a dimensional model within a mind map.
Step 2: Star Schema Generation
After the definition of the business data model we need to generate the Star Schema out of it. Why a Star Schema? Because it’s a perfect data model for analytics. If you are more interested in good data models have a look at Thomas Kejser’s post about “Defining the Good Data Model”.
Ok so let’s get back to our Star Schema. As I said the Star Schema gets generated out of our mind map by a tool or you have to create it manually.
I normally recommend a prototype at the end of the second step. This gives you the ability to proof the data model and business users can play around with their data model by creating their own reports. That means that the business gets a very early chance to have a look at their data warehouse and can use the prototype for further testing or discussions.
Step 3: Data Mapping
Now the hardest part begins: Data Mapping. On the one side the star schema defines the destination model of the Data Warehouse. On the other side we have different source systems providing the data for the Data Warehouse. Now both sides need to be mapped together, which is difficult because…
- …there are different names meaning the same.
- …there are different data types.
- …data is on a different level of aggregation.
- …different source systems deliver the same data.
- …some data need to be calculated, because it’s not provided by the source systems.
- …some data is maybe even not available within the source systems.
Within this worksheet every destination table is described on an own tab. Every destination column is listed with its data type, description and additional information and for every destination column the source is mapped to it(source system, source schema, source table, source column, source data type and ETL rules).
So when all mappings have been defined within the logical data mapping you can give it to your ETL developers to implement the ETL processes.
Step 4: Build the Cube and Reports
The last step of how to build a Data Warehouse is creating the cube and some reports. Typically the cube implementation can start already when the star schema is defined. Step 3 and Step 4 can be done in parallel.
And that’s it. Isn’t it easy…no magic around…a straightforward approach.