On my latest project I have been doing a lot of research and learning on how to conduct a Data Warehouse project in an Agile manner. Actually I’m really honoured to be able to combine my two passions in my career: Data Modeling and Agile.
This investigation has led me to what I feel is a unique approach to an Agile Data Warehouse project. Many of the Agile Data Warehouse projects I hear of and read about typically seem to follow a somewhat standard process. This process usually involves building a Data Warehouse in increments by working with departments within the corporation to understand their data requirements. (in many situations, this is a very detailed analysis of their data requirements) In this way, the Data Warehouse is built up over time until an Enterprise Data Warehouse is created. While this is a far cry better that trying to analyze the data requirements for the entire corporation, I do believe there are some limitations with this approach.
Limitations
1) If the Data Modelers are not experts in the existing data domain, this process may involve significant rework as data modeled for one department may need to be modified as new information is learned and existing data structures need to be integrated with data structures designed in subsequent releases. This risk can be significantly reduced if the data modelers are experts in the data domain, but this can be a real risk for consultants that are not experts.
2) We are still not implementing a Data Warehouse in an iterative fashion, only incrementally. While this is a step in the right direction, it is still falling short of being able to implement a Data Warehouse iteratively and get even quicker feedback. Since we are also essentially time-boxing implementing data requirements on a department by department basis, we are also only working on the most important data requirements on a departmental basis. We are not working on the prioritized data requirements for the entire corporation. If every department receives the same time allocation, we may work on some data requirements while other more important ones are left in the backlog. And finally, the data requirement for each department can get quite detailed. In many ways, we are still doing Big Design Up Front – in chunks.
Our Approach
My background in Data Modeling has always been aligned with Bill Inmon’s thoughts rather than Ralph Kimball’s. So to no surprise I felt I needed a good understanding of the data domain for the Enterprise before I could start to model the data. To address this I proposed we create and Agile Enterprise Data Model. An Agile Enterprise Data Model is an Enterprise Data Model that takes 6 weeks instead of 6 months to create. (or 6 years) Its purpose is to validate the entities, relationships, and primary attributes. It is not intended to drive down into every single attribute and ensure alignment. In my experience, this excessive detail has derailed other Enterprise Data Model projects. (as consensus cannot be reached or the data is always evolving) But in creating an Agile Enterprise Data Model, we now understand enough of the enterprise’s data so that even though we may only be modeling one department’s data, we know what integrations and relationships are required in the future.
I felt this addressed the first limitation.
The second limitation was much harder. I felt it was much harder to engage clients in small slices or iterations of their data requirements in a light-weight fashion that wouldn’t require reviewing their current data requirements in detail. How could we get a picture of the data requirements for the corporation at a high level? Many of the clients were concerned that if the project operated in iterations they would not get all of their requirements satisfied. In addition, our current project had a list of existing reports that the clients would like to be migrated to the new environment. How could we engage our clients and work with them to determine what their corporation’s data requirements were in an iterative light-weight manner? How could we provide visibility of what the entire corporation data requirement were?
I turned to Innovation Games.
I am a fan of Innovation Games and any light weight process that can help to visualize data and engage the clients better. I was searching for a way to use a variant of the ‘Prune the Product Tree’ game to use. But I didn’t feel that just placing data requirements on the Product Tree would truly help the engagement of the client and provide the visibility required. What we really needed was a variant of the Product Tree that helped the clients ‘see’ their data requirements to confirm that we got it right and that showed the requirements for the entire corporation on one wall.
We ended up merging a ‘Prune the Product Tree’ game with a ‘Spider Web’ game with some unique additions.
Here is what are doing:
1) We are seeding the visualizations with 30-40 enterprise level reports that are used internally and externally.
2) We have asked for the top 20 reports from each department to further seed the visualizations. This will help to populate the visualization and I believe help the clients to generate other data requirements that we are missing.
3) We are capturing data requirements or reports in the simplest mode possible. We decided to leverage the Goal, Question, Metric method proposed by Victor Basili. Our data requirement stories will specify the following:
- Noun : The object of primary interest
- Question : The main inquiry about the noun
- Reason : What is the business reason for the question
- Frequency : How often do I need the answer?
An example is provided below:
Noun |
Question |
Reason |
Frequency |
Claims |
Amount > $1,000 |
To generate audits |
Monthly |
We will capture these additional data requirement stories in Silent Brainstorming sessions with users. This Silent Brainstorming will occur after we present the visualizations and review the data requirements that have seeded the visualization.
The final piece of the puzzle
Even with those first three pieces, it was still unclear how we could visualize the data and reporting requirements and gain engagement with the clients. Then during an Innovation Games course that was educating us on how to customize and create our own game, it struck me. The Product Tree is the wrong metaphor for our data requirements. We need something that visualizes the data requirements themselves. We iterated through a spider web metaphor, to a 3 dimensional axis metaphor, until we ended up on a hexagon. I’d recommend you find the shape that best fits your data domain. The hex and its six dimensions fit our data domain nicely.
The hex diagram below is our customized Innovation Game we will use to seed data requirements and generate new requirements.

The Concept
Usually for all data requirements or reports there is a person object at the centre. This is no exception. We will have a data hex for each type of person that the corporation needs to know about. The six accesses were chosen by understanding the data domain and by reviewing existing reports that are used by the client. The existing data requirements and reports will be translated into the data requirement stories and placed on the hex. Sessions will be facilitated to add additional stories onto the data hexes.
Once all of the stories have been placed on the data hexes we will prioritize the stories by moving the data requirement stories based on priority. Data Requirement stories that are high priority will be moved near the centre. Stories of less priority will be moved to the outer rings.
The Rationale
Using this metaphor we are able to do the following:
1) Visualize the hundreds of data requirements and reports of the corporation at once and prioritize them.
2) Ensure we are working on the true data requirements across the entire corporation.
3) Work in a true iterative fashion in how we deliver data requirements and ultimately build the Data Warehouse iteratively.
4) Use a light weight method that limits detailed design up front
Summary
If you want to hear how this process worked, the results will be presented at Agile 2012 in Dallas and at SDEC2012 in Winnipeg! I’ll also have a subsequent blog post that publishes the results. So far the feedback has been encouraging and we will be expanding the use of the process over the next few weeks.