Today my two penny thoughts about Data Warehousing. It’s my profession however sometimes I feel that I am the jerk that copies data from one source to another. Mostly to obtain a so called ‘a single version of the truth’. What truth, every manager presents his own Excel sheet and argues that his figures are better, more accurate than the other ones including my version of the truth.
Sometimes I feel I have to build a DWH just because that is what everybody else does. Its common practice. We need a DWH, to do what???. If you don’t have one what do you miss? My colleagues and I often argue, you need a DWH to structure your organization data into a form it is easy to access, report, and analyze, and in the end make decisions. This sounds to me as a concept, or objective.
To achieve this objective, the question arises should one use the ‘generic’ Way Of Working? Use the paradigms as postulated by ‘Kimball’, ‘Inmonn’, and ‘Linstedt’? What if, if your Enterpise data is already consistent in the ‘various’ source systems.? What if, you are able to obtain your required enterprise data directly from the source system easily with minimal performance impact? What If you are able to combine/relate that data with data from other systems, just by using views, or other means? Modern in memory software like QlikView,Tableau, Microsoft’s PowerPivot are able to do so. What remains is the set up of one or more data models. A data model defines the relationships between the various data for a given Enterprise Information need. It also translates database names into meaningful business names. For various needs one is free to create different ‘relationships’ (consolidate).
In modern times Information needs vary more rapidly than a traditional DWH implementation is able to keep up with. To me a Data Warehouse is a mean, no more no less. And if there are other means that do the job cheaper, more effective, with less effort why not choose them.
The need for a ‘traditional’ DWH is there when (non exhaustive):
- Data of the various source systems is not aligned to each other or are difficult to align
- History is not well kept in the source systems
- Its difficult to extract data from the source systems
- When Querying the source system directly you go up for a cup of coffee or have a chat with on of your colleagues.
- The data from the sources is highly distorted, you hire a third party and let them cleanse and clean it. Making some sense out of it.
- You think you have to apply a bunch of business rules on the source data. You where not able to do so when the data was entered in the first place.