SQL Server 2012 – Cheaper than Open Source Database options

On a recent project we were tasked to review and recommend a database technology platform for a new Data Warehouse environment. This quickly turned into a database evaluation not just for the new Data Warehouse, but also for all future applications. The client was currently on Sybase, but was open to the most cost-effective option. The client also had experience with Open Source technologies in the past for both server operating systems (linux) and reporting solutions. (JasperSoft)

The criteria we specified had cost and performance as two important factors. Since we had limited experience with any Open Source databases in a Data Warehouse environment, we felt it would be useful to evaluate the options from a cost, functionality, and performance point of view.

This post is a recap of that process and of the somewhat surprising recommendation.

The Candidates

Since the client already was using Sybase, it was a no-brainer to include that one. It addition to Sybase, we included SQL Server and Oracle from the commercial DBMS’s due to their market share and existing use at the client. DB2 was excluded due to DB2 not currently being used at the client and the perceived higher cost of implementation of DB2. (Rightly or wrongly) It was felt that either Oracle or SQL Server could represent the commercial DBMS’s.

On the Open Source side, the DBMS’s MySQL and PostgreSQL were selected. MySQL was selected due to its large adoption in the Open Source community and recognition for performance. PostgreSQL was selected also for its recognition in the Open Source community for robustness and performance. MySQL and PostgreSQL were the only two Open Source DBMS’s as it was felt they were the clear leaders in the Open Source DBMS’s.

After performing detailed performance tests between MySQL and PostgreSQL, PostgreSQL was chosen as the Open Source DBMS option to compare against commercial DBMS’s. This was due to the fact that MySQL had significantly poorer performance when using the InnoDB engine that guaranteed ACID compliance and provided referential integrity. MySQL has an interesting piece of functionality to allow you to use different database engines that provide different functionality and types of processing. Unfortunately the engine used in many of the published benchmarks do not provide the functionality required by standard enterprise DBMS solutions.

The Requirements

The current Data Warehouse requirements were reviewed and it was determined that the vast majority of the reporting requirements were operational reports delivered in text format. This is probably not unlike many companies out there that are early in the use of Data Warehouses. This is not to preclude the use of more analytical functions in the future, but that was not the use of the Data Warehouse currently.

The current Data Warehouse also stored data in the 10’s of gigabytes. So although this was a considerable amount of data, it also did not require advanced Data Warehouse appliances or Big Data solutions. Traditional Data Warehouse architecture and designs would be sufficient for now and well into the future.

The requirement was that the Open Source solutions must be supported. For example, this meant that we would require support from EnterpriseDB if we chose PostgreSQL. I propose this is pretty typical for larger companies evaluating Open Source.

The Architecture

Due to the current requirements and expected future requirements, the recommended Data Warehouse solution architecture was designed to incorporate both an Operational Data Store and a Star Schema Data Warehouse. The Operational Data Store would be used to generate pure operational reports and would be loaded at least daily. The Star Schema Data Warehouse would be used to generate more analytical reports and would be loaded on a weekly basis.

The Evaluation

The recommended configuration for the evaluation was for two servers each with two Quad core CPUs. This was due to the current workload and the expected increased workload in the future. We did evaluate higher levels of configuration to ensure the cost was linear for future growth.

This was not a small Data Warehouse and is probably typical of most enterprises as an entry into Data Warehouse technology.

We separated our cost evaluation into two sections:

1)       Initial Cost

2)      Ongoing Cost

Initial Cost

Initial cost incorporated the server and DBMS license fees for both development and production environments and also two human factors:

Training – What is the cost of the initial training required for the administrators.

Efficiency – What is the expected efficiency for the administrators in the first three months.

The results can be seen below:


As you can see, the supported PostgreSQL option was significantly more affordable as far as initial cost.

Annual Cost

The Annual cost incorporated the following two factors:

License Support costs – These costs are the support and maintenance costs for both the Production and Development environments.

Estimated Annual Maintenance Costs – It is estimated that there will be additional ongoing effort to maintain the different DBMS solutions as compared to the current Sybase solution. This is due to the different complexities of the DBMS solution, the available tools for the DBMS solution, and the available support in the DBMS user communities.

  Estimated Annual Maintenance Rating Rationale
Sybase ASE

0%

No additional maintenance effort is required due to the fact that Sybase is currently used
Microsoft SQL Server 2012 Enterprise

5%

No additional maintenance effort is required due to the fact that Microsoft has an excellent set of tools and resources and the environment is similar to Sybase which is currently used

Baseline 5% additional cost has been added to account for the fact that a second DBMS must now be learned and supported.

Oracle 11

20%

Additional maintenance effort of 15% is estimated due to the complexity and toolset for Oracle.

Baseline 5% additional cost has been added to account for the fact that a second DBMS must now be learned and supported.

PostgreSQL – supported

10%

Additional maintenance effort of 5% is estimated due to the simplicity of the PostgreSQL solution and the relatively limited toolset and resources.

Baseline 5% additional cost has been added to account for the fact that a second DBMS must now be learned and supported.

The results can be seen below:

As you can see, the annual ongoing costs end up being in favour of SQL Server.

The 10 year Total Cost of Ownership (TCO) is:

DBMS Cost
Sybase ASE

$1,173,057.20

Microsoft 2012 Enterprise

$1,019,909.00

Oracle

$4,040,910.00

PostgreSQL – supported

$1,074,025.00

Although SQL Server has an initial license cost, the ongoing support costs are less than supported PostgreSQL in almost all categories.

Conclusion

Over a ten-year period, the TCO for SQL Server is less than a supported version of PostgreSQL. This also includes the server licensing costs. As an added benefit, our solution also required a Reporting Solution and an Extract, Transform, and Load solution – SQL Server provides both of these solution bundled with the DBMS for no additional cost.

From a cost perspective, SQL Server is the clear winner.

For our situation, although cost was important, it was not the overriding factor. In fact, cost accounted for only 10% of the entire weight. The factors and their weights are listed below:

Evaluation Criteria

Weight

MBC Required Functionality

350

Cost

100

Future Commercial Viability

100

High Availability

50

Scalability

50

Product Track record/Future stability

50

Technical Functionality

350

Architectural Standards and Connectivity

50

DBMS standard functionality

100

Data Warehouse standard functionality

150

Security/Encryption

50

DBMS Management Functionality

50

Ease of Maintenance/Management

30

Database Configuration

20

People and Future Flexibility

50

Future Migration/Flexibility

20

Standard Technology/People Availability

30

Performance Functionality

200

Bulk Load Performance

20

Bulk Update Performance

20

Bulk Delete Performance

10

Basic Query Performance

20

Cross Product Query Performance

20

Sub Query Performance

20

Parallel Query Performance

20

Analytical Query Performance

20

Indexed Update Performance

20

Indexed Delete Performance

20

Non-Indexed Update Performance

10

Total

1000

After all the factors were evaluated, SQL Server finished 196 points ahead of PostgreSQL.

SQL Server was selected and I was enlightened. Although people assume Open Source DBMS’s are the most inexpensive options, SQL Server is the leader once you include other factors. In fact, SQL Server was the leader in both cost and functionality.

SQL Server Integration Services (SQL Server’s ETL solution) and SQL Server Reporting Services (SQL Server’s Reporting solution) were NOT included in this comparison. If they were, the results would have been even more pronounced.

I encourage you to do your own evaluations. You may be surprised at the results. I know I was.

Please let me know if anyone is interested in specific evaluation criteria. I’d be happy to share our rationale and SQL performance scripts.

Advertisement

How #Oracle helped to make my Data Model #Agile

Many people who know me know that I have a certain fondness for tools that can help to make our jobs easier. I agree that a fool with a tool is still a fool, but I would argue that craftsman with the right tool can be an even better craftsman. In essence, the problem isn’t the tool,it is the person wielding the tool. So let’s not be too quick to paint the use of all tools in a negative light.

Agile Data Modeling

It has always been a struggle to determine how we can manage our Data Models and Databases in an Agile way. In the past, the Data Modeling tools were limited in the functionality they provided that allowed both collaboration and versioning. Recently the Data Modeling tools have become much better at providing that functionality, but most of the Data Modeling tools I have seen have still provided that collaboration and versioning functionality in a proprietary way that was created by the Data Modeling vendor.

This has made the Data Model and Database operate as second class citizens on Agile projects. Usually separate processes need to be created to determine how we will integrate the Data Model and database into Continuous Integration environments. I believe that some of the resistance of Database Modelers and Database Administrators to implement Agile for Data Design activities is due to the fact that the tools at their disposal have placed them at an Agile disadvantage. Due to this the Data Modelers will prefer the illusion of a more stable Data Model so that they are potentially limiting the number of SQL change scripts they need to generate. (and thereby limiting their technical debt) SQL Change Scripts still seem to be the primary way that people roll out changes to their databases. It is not uncommon for projects to have a huge number of these scripts that then themselves need to be maintained. I have seen a wide variety of ways people use to generate these scripts that vary from a Data Modeling Tool generating them to having a DBA write them from scratch.

Having a large backlog of these scripts and having them be generated in such a way is a huge hinderance to being Agile and accepting change. Having to then try to synchronize these scripts in alignment with check-in versions can be beyond frustrating. And I don’t even want to talk about having to maintain all of these scripts and manually checking them in.

Agile Data Warehouse

For a recent project, we were tasked with creating a Data Warehouse and we needed to create the Data Warehouse in an Agile way. We needed to embrace Agility for two reasons:

  1. To deliver the maximum value to the client and deliver the best design that meets actual business needs
  2. To address the reality that we will be required to accept changes throughout the duration of the projects as sub-projects are completed

After reviewing Data Modeling tools that I have used in the past, I was still struggling with how we could develop a process that would allow us to easily adopt Agile practices for our Data Models and databases that would allow the Data Warehouse to be run as Agile as other projects.

Oracle SQL Developer Data Modeler

I did some initial investigation of what Data Modeling tools are now available. My objectives were to find an affordable tool that could be used to do the Data Modeling and that all developers could use to view the Data Models and collaborate on them. Most of the tools seemed to fall into two camps:

  1. Affordable Data Modeling tools that provided good collaboration but limited Data Modeling and integrated versioning functionality
  2. Expensive Data Modeling tools that provided good Data Modeling functionality but limited collaboration and integrated versioning functionality

Then on a list that reviewed the best Data Modeling tools, there was mention of Oracle SQL Developer Data Modeler. It was stated that it was a robust Data Modeling tool that was free. This was certainly something that needed further investigation.

Rarely does a new tool present such functionality that you require time to determine how to use it. 🙂

The functionality that was provided with Oracle SQL Developer Data Modeler absolutely changed our project. We now had the ability to be a first class Agile citizen with the rest of the Agile projects and be able to adapt to change easily. We would be able to roll out changes easily and efficiently as new requirements became apparent.

Oracle SQL Developer Data Modeler provided the following functionality:

  • Full, integrated functionality with Subversion. The Data Models can be checked in and versioned along with the application code.
  • Ability to define a domain library, naming standards, and glossary to ensure a consistent model
  • Ability to apply design rules to a model to ensure Data Modeling standards are being followed
  • Ability to model Logical, Relational, Dimensional, and Physical Models
  • Ability to store DML to populate data along with each table definition. (and determine when they should be executed – at the start/before object creation/after object creation/at the end)
  • Use of Barker notation. (Pet Peeve!)
  • Extensive functionality to generate different types of models and compare different types of models
  • Full versioning functionality to report on the differences between the versions and what changes will be committed on check-in.
  • Robust Data Modeling functionality to create Data Models and create physical databases.

Did I mention free so all the developers could use it to reviews the models and make requests? 

Summary

Although we still need to use Database Re-factoring practices to adapt to changes, the versioning and collaboration features of Oracle SQL Developer Data Modeler did cause us to pause and determine how we could best use its functionality. In fact, it caused us to define a process for how we could efficiently roll out Re-factoring changes in an automated way as Oracle SQL Developer Data Modeler now made the Re-factoring manageable.

If you are looking for an Agile Data Modeling tool, I would highly recommend it!

Did I mention it is free? 🙂

And the support for the product from Oracle is top-notch. Questions I have had have been answered same-day..