Contrary to popular belief, relational databases are not going anywhere. The relational database is not doomed, but I recommend you read the linked article from ReadWriteWeb. I admit that the headline for that article, and this one as well, are quite an exaggeration. However, the article does present a very good analysis of the new cloud data storage services and how they compare to relational databases. As is usual, I do not agree with some of the post, so I wanted to present a counter-argument to some of the points raised.

The first point I have a problem with is the initial premise:

Recently, a lot of new non-relational databases have cropped up both inside and outside the cloud. One key message this sends is, “if you want vast, on-demand scalability, you need a non-relational database”.

At first, I thought this was just a way to hook the reader, but the sentiment is reiterated later:

As more and more applications are launched in environments that have massive workloads, such as web services, their scalability requirements can, first of all, change very quickly and, secondly, grow very large. The first scenario can be difficult to manage if you have a relational database sitting on a single in-house server. For example, if your load triples overnight, how quickly can you upgrade your hardware?

I do not know the background of the author, but his profile states that his career has centered around data management. So I am assuming that he is very familiar with typical relational database offerings. If he is, then I am disappointed that he feels that relational databases can not scale in some manner. The concept of needing to scale overnight is something that very few services will ever have to deal with. Even if your traffic does triple, in order for a relational database to fail in that scenario, it would have had to been heavily trafficked already. Then I would argue that you should have been planning to scale the database anyway. I know that many of the largest e-commerce sites are happily running Oracle with replication and clustering services. Those e-commerce sites typically do not have any more problems than a large web application, and arguably have more traffic during peak times like Black Friday and Cyber Monday.

I do not want to make this a treatise on data management, so I am going to cut to the point. Relational databases and these newer key/value data stores each have a purpose. For a small startup that does not want to pay an experienced DBA, a cloud-based key/value store can be a very attractive and useful option. For control freaks and data wonks like me, a relational database is likely the only place you will feel comfortable. I know from experience that relational databases perform very well under heavy load as well as being capable of storing petabytes of information. Yes, I said petabytes, not gig or terabytes, petabytes. I am sure that Google is storing that much in their data stores, but I am not sure how many other applications are.

So, why is there a concern of scalability with relational databases? It is a matter of experience. I work as a software engineer and even had some time as a DBA. I have seen all sorts of terrible queries written to retrieve data from a database. Many software developers do not really understand the fundamentals of databases anymore. There has been a trend for several years were people are specializing more in their technology of choice, java, php, or whatever. This is troubling to me because almost every application built has some data storage requirements. Even when many web applications are using the key/value stores, you can still write a “bad query” or even have a poorly written application. I guess what I am trying to say is that there is no replacement for knowledge and experience.

Are relational databases going away? No and not any time soon. If you look at any enterprise applications they are likely storing data in a relational database. The key/value data stores are still very immature and have a lot of work to do before they can overtake relational databases outside of the web. So, go pick up a book on SQL and read some of the information on the key/value stores. Make your own choice for your application, but make sure you know how to use it.

Reblog this post [with Zemanta]