Dealing with Missing Information in a Data Warehouse
Today businesses are investing many resources in building data warehouses and data marts to obtain timely and actionable information that will give them better business insight. This will enable them to achieve, among other things, sustainable competitive advantage, increased revenues and a better bottom line.
In the early '90s, data warehousing applications were either strategic or tactical in nature. Trending and detecting patterns was the typical focus of many solutions. Now, companies are implementing data warehouses or operational data stores which meet both strategic and operational needs. The business need for these solutions usually comes from the desire to make near real-time actions in a constantly changing environment while receiving information from both internal as well as external source systems.
Dealing with missing or unknown data is critical in these types of environments. Unknowns skew metrics and results to produce incorrect decisions. Knowledge of the unknown allows at least for further examination of any conclusions drawn from incomplete data. Furthermore, in a well-designed business intelligence environment, these unknowns are often resolved later as data that is more complete is entered into the operational systems. Irrespective of the nature of the applications, missing information has always been a problem for data warehouses. As business intelligence environments become more mature, real time and mission critical, the increased number of operational applications accentuates this problem.
It is important to keep in mind the time-specific value of information. As General George S. Patton once said, "A good plan, violently executed now, is better than a perfect plan executed next week." To form a good plan, one needs adequate information now, rather than perfect and correct information