When I think of a data warehouse architecture I have this typical picture in mind:
Data in or Staging Area
On the lefthand side there are a bunch of source files or source systems which are staged in the “Data in Layer” (or called Staging Area). On the staging layer typically no transformations happen so far. The data is imported by a 1:1 copy, also the metadata like source columns or data types are the same as on the source system.
There are a bunch of names for this layer if have seen so far, like ODS or Persistent Staging or DWH or whatever. I don’t want to start a religous discussion right here, so let’s call it Production so far.
The key is that there is sometimes a layer in between the staging are and the star schema. I will get into more details in how to define the star schema and the right data model, but I can highly recommend Thomas Kejsers blog about data modeling so far. The need for the Production layer is very often ETL driven. There a some technical requirments for smart data storage or updating data that needs to be modeled differently. I will give some best practices in later posts.
The consumption layer is very often, and that is also something i recommend, modeled as a star schema as introduced by Ralph Kimball. This layer is the starting point for business users to analyze their data. Wheter directly with relational reports or via a multidimensional database like Analysis Services.
Within my next posts I will get in more detail on every layer, providing samples and best practices and arguments for this particular design of a data warehouse.