Are you a sufferer of Data Hoarding? This is an affliction that affects thousands of businesses every day. Common symptoms are terabytes of data being kept around just in case you ever need them. The data ends up being stored in a multitude of disconnected tables just piled on top of one another with no organization. When a new piece of data is thought to be needed, another table is created and thrown on the pile.
The data is kept around if it is ever needed, but good luck ever finding it in the future when it is buried under piles of other data. Now where did I put that snapshot of the ledger balances from March 2011?
Even worse, most new applications create an isolated database to hold the data that the application requires. Doesn’t matter that at least 30-40% percent of the data already exists somewhere, it is too hard to integrate and create one common data source. Much easier to copy data once and then modify it as the application desires.
It doesn’t have to be this way.
Data Hoarders Anonymous
At Data Hoarders Anonymous we work with you and take you through a seven step program.
Phase I – Repent
1) Admit you have a problem
The first step to any recovery is admitting you have a problem. Look around at your databases, do you have a data dictionary? Do you have 20+ tables with the same name? Have you copied tables directly into a database without doing any analysis or integration?
2) Understand the Enterprise’s perspective
Data is an asset for the enterprise, but only if it is easy to understand. That is the key word, the enterprise must be able to get and understand the data easily. Typically when data has been replicated across god’s green earth, there are multiple copies of the truth and they are anything but easy to find. The data is worthless if it isn’t easy understood.
3) Apologize to Developers you have made extract data from multiple complex disconnected schemas
Take donuts with you. But apologize to those developers that you have made create programs that need to extract data from multiple sources and somehow make sense of it all. SQL should not be a language where you need a PHD to understand. If your SQL is longer than a screen, you probably have a Data Hoarding problem.
4) Apologize to Business users you have made explain inconsistent data to Vice-presidents
Take alcohol with you and apologize to the business users who don’t know what is included in the allocated income calculation and what table actually has the true up to date value. Apologize for not having a Data Dictionary available to them that would tell them where the data exists and what it actually means.
Phase II – Recovery
Now that you have repented, you can move onto the recovery portion…
5) Organize your data with Data Normalization
Data Normalization is your new friend. Learn the rules and let them assist you in removing data redundancy and enforcing consistency and integrity. But be very careful. Many on the recovery path of Data Hoarding end up becoming ‘Norm-aholics’. They normalize to at least third normal form all the time, even though it creates excessively complex schemas for the developers and data that can’t be accessed by business users. Even worse, ‘Norm-aholics’ have no self-awareness that they have an addiction. Never let the rules of Data Normalization affect the progress you have made. Sadly, many DBAs fall into this trap and let the theory of Data Normalization affect the understanding of the data.
Data Normalization must increase the understanding of the data, if not Data Normalization has been corrupted.
6) Simplify your data with Dimensional Analysis
Now that you have used Data Normalization to remove data redundancy and maximize consistency and integrity, the time has come to use the tools of Dimensional Analysis to ensure the data design is simple. Use Dimensional analysis to ensure that you haven’t traded one type of complexity for another. Evaluate your data across time and Master Data Dimensions to ensure that the transactions concepts are consistent and simple.
7) There can be only only one
At the last step, you will need to embrace the principle that there can be only one…
- One owner and source of the data – The data must be owned by one business area and system
- One place to update the data – the data must be updated in only one system that the owner controls
- One way to get the data – there can not be multiple query paths to retrieve the required data
- One truth – the data must have only one meaning and this meaning must be documented in a Data Dictionary
Coming to you soon on A&E – Data Hoarders Anonymous