#Agile Data Modeling – still a ways to go

I have wanted to write a Blog entry on Agile Data Modelling for a while now. It combines the two prime areas of interest for myself as I really started as a DBA/Data Architect and then moved on towards Project Management and Agile Project Management. But truly, Data Modelling has been and will always be my first true love. (Very appropriate that I am writing this article the day before Valentine’s day)

I am on a project currently where I am struggling to not fall back into the traditional ways I have done Data Modelling in the past. Since almost all of my Data Modelling experience has been on more traditional projects, it is easy to fall back into that pattern. Thanks to Scott Ambler and Steve Rogalsky for reminding me of how we can continue make Data Modelling more Agile.

More often than not, the areas of Database Design and Data Modelling has been one of the most resistant to Agile Methods. Recently I came across this Blog post by Tom Haughey on the Erwin site:

Agile Development and Data Modeling

In some ways, I thought Tom was quite Agile in his preference to segment Data Modelling projects into 3-6 month phases or increments  to help increase the chances for success. But other statements reminded me that we as Data Modellers still have ways to go before we have joined the rest of the Agile team.

Some of the concerning statements were:

“Data modeling has always been performed in an iterative and incremental manner. The data model has always been expanded and enriched in a collaborative manner. In my 28 years of involvement in data management, no qualities of data modeling have been more consistently reiterated, not even non-redundancy. It is absurd to imply that traditional data modeling is done in one continuous act or that it is done all upfront by an isolated team without involving Subject Matter Experts and without sensible examination of requirements.”

By this same definition one could also say all analysis has been iterative and incremental which we know is incorrect. I believe the misunderstanding may lie in what people define as an iteration. Of course Data Models are iterated as analysis and data design is done as more requirements are gathered. But is the data design part of an end-t0-end iteration where a segment of the data model is promoted to production and used? Or is there a horizontal iteration of creating a high level Enterprise Data Model before detailed data modeling is done? On almost all the projects I have been on the answer is a resounding no. There usually is a big bang implementation of the data model to the developers after months of analysis. If anything, Data Modelling tends to be more incremental than iterative.

“In summary, traditional data modeling is incremental, evolutionary and collaborative (and thereby agile) in its own right.”

Being incremental, evolutionary, and collaborative doesn’t necessary make you Agile. I also don’t know if you can ever achieve Agile as an end state as well. We are striving to be more Agile and I don’t believe the striving should ever end and we can rest because we are Agile.

“The implications of Agile proponents like Scott Ambler is that “the traditional approach of creating a (nearly) complete set of logical and physical data models up front or ‘early’ isn’t going to work.” One issue with a statement like this is what does “up front” or “early” mean. He says that the main advantage of the traditional approach is “that it makes the job of the database administrator (DBA) much easier – the data schema is put into place early and that’s what people use.” Actually, the main advantages are that it is a clear expression of business information requirements plus developers have a stable base from which to work.

This the one statement that perhaps is troubling. The desire to get a stable base for developers is very similar to trying to get a stable analysis base for developers. In my experience, Data Modelers can be perfectionists (like all great analysts) and they struggle with releasing something that is not fully done. But this goes against Agile. Just like other functionality, we should try data models early and often and get feedback on they are used and how they perform in production. We can then use that feedback to make the Data Models better as the project progresses. This example best highlights the difference between incremental and iterative Data Modelling:

Incremental – releasing stable sections of the Data Model for development and use. Limited changes to the Data Model are expected.

Iterative – releasing initial version of the Data Model for development and feedback to make the Data Model and future Data Models better. Moderate to significant changes to the Data Model are expected and embraced.

They say that it requires the designers “to get it right early, forcing you to identify most requirements even earlier in the project, and therefore forcing your project team into taking a serial approach to development.” On the contrary, data and process modeling, and thereby data design and program design, should be done in a flip-flop manner. You collaborate on the requirements, model some data, model some processes, and iterate this process till the modeling is done – using a white-board and Post”

Hopefully Tom Haughey will read this Blog post and clarify this statement with me. It does sound that there may be aspects of Iterative Data Modeling being proposed, but this conflicts with the earlier statement so I am unsure. The iterations still seem to be focused on the modeling and not explicitly incorporating development and having the functionality promoted and used by the clients in production. (The only true measure of value)

“But remember this. The traditional SDLC (System Development Life Cycle), whatever its faults, has successfully delivered the core systems that run business across the world. Imagine delivering a new large brokerage trading system in 2-week intervals, or going live with a space shuttle project 2-weeks at a time, or delivering a robotic systems for heart surgery in 2-week intervals. Much, but not all, of Agile development has focused on apps like web-based systems and smaller, non-strategic systems.”

I’m not sure if I agree with these comments. I guess it depends on how you define success. With the statistics of the Standish Chaos reports, I’m not sure how anyone can say the Traditional SDLC has successfully delivered core systems. It is true that is have delivered core systems, but many of those projects may not be defined as a success by the clients. The statement that Agile development has been focused on smaller, non-strategic systems is also concerning. I’ve personally used Agile on large, strategic systems. I’m sure many other people would agree.

“Database refactoring represents a simple change to a database schema that improves its design while retaining both its behavioral and informational semantics. Database refactoring is more difficult than code refactoring. Code refactorings only need to maintain behavior. Database refactorings also must maintain existing integrity and other business rules. The term “database” includes structural objects, such as tables and columns, and logic objects such as stored procedures and triggers.”

While I agree that refactoring a database can be complicated, the risk of extreme changes to the Data Model can be mitigated by creating a High Level Enterprise Data Model in Iteration 0. (and potentially other methods) Frequently people against Agile state that iterations start without some initial work and as such, changes can be drastic and complex. This is incorrect. Agile is a continuum and if small phases of foundation work have value, Agile encourages their use. I have found this method very valuable.

Having experience in both creating Enterprise Data Models and software code. I would say refactoring significant portions of either hurt. So I would recommend trying to minimize drastic changes by doing some upfront high level modeling. I would not say refactoring Data Designs are easier though. A major framework change would be much more intensive and invasive.


So where does Agile Data Modeling go from here? Given that this was a pretty recent article, I’d say that there still is quite a way to go to incorporate Agile Methods in Data Modeling Methods. The good news is that Agile Data Modeling has much to offer Agile Projects. We just need to help to promote the use of Iterative Data Modeling in addition to Incremental Data Modeling. (Incremental Data Modeling is still better than the alternative)