Relational Databases Run The World

Contrary to popular belief, relational databases are not going anywhere. The relational database is not doomed, but I recommend you read the linked article from ReadWriteWeb. I admit that the headline for that article, and this one as well, are quite an exaggeration. However, the article does present a very good analysis of the new cloud data storage services and how they compare to relational databases. As is usual, I do not agree with some of the post, so I wanted to present a counter-argument to some of the points raised.

The first point I have a problem with is the initial premise:

Recently, a lot of new non-relational databases have cropped up both inside and outside the cloud. One key message this sends is, “if you want vast, on-demand scalability, you need a non-relational database”.

At first, I thought this was just a way to hook the reader, but the sentiment is reiterated later:

As more and more applications are launched in environments that have massive workloads, such as web services, their scalability requirements can, first of all, change very quickly and, secondly, grow very large. The first scenario can be difficult to manage if you have a relational database sitting on a single in-house server. For example, if your load triples overnight, how quickly can you upgrade your hardware?

I do not know the background of the author, but his profile states that his career has centered around data management. So I am assuming that he is very familiar with typical relational database offerings. If he is, then I am disappointed that he feels that relational databases can not scale in some manner. The concept of needing to scale overnight is something that very few services will ever have to deal with. Even if your traffic does triple, in order for a relational database to fail in that scenario, it would have had to been heavily trafficked already. Then I would argue that you should have been planning to scale the database anyway. I know that many of the largest e-commerce sites are happily running Oracle with replication and clustering services. Those e-commerce sites typically do not have any more problems than a large web application, and arguably have more traffic during peak times like Black Friday and Cyber Monday.

I do not want to make this a treatise on data management, so I am going to cut to the point. Relational databases and these newer key/value data stores each have a purpose. For a small startup that does not want to pay an experienced DBA, a cloud-based key/value store can be a very attractive and useful option. For control freaks and data wonks like me, a relational database is likely the only place you will feel comfortable. I know from experience that relational databases perform very well under heavy load as well as being capable of storing petabytes of information. Yes, I said petabytes, not gig or terabytes, petabytes. I am sure that Google is storing that much in their data stores, but I am not sure how many other applications are.

So, why is there a concern of scalability with relational databases? It is a matter of experience. I work as a software engineer and even had some time as a DBA. I have seen all sorts of terrible queries written to retrieve data from a database. Many software developers do not really understand the fundamentals of databases anymore. There has been a trend for several years were people are specializing more in their technology of choice, java, php, or whatever. This is troubling to me because almost every application built has some data storage requirements. Even when many web applications are using the key/value stores, you can still write a “bad query” or even have a poorly written application. I guess what I am trying to say is that there is no replacement for knowledge and experience.

Are relational databases going away? No and not any time soon. If you look at any enterprise applications they are likely storing data in a relational database. The key/value data stores are still very immature and have a lot of work to do before they can overtake relational databases outside of the web. So, go pick up a book on SQL and read some of the information on the key/value stores. Make your own choice for your application, but make sure you know how to use it.

Reblog this post [with Zemanta]

6 thoughts on “Relational Databases Run The World

  1. Yes relational database is opne of the best database for storing data .But we get more error message compared to other DB, how to overcome this problem?


  2. Where does a post relational database such as intersystems CACHE fit into this equation? I am working on a project that will have immense database needs and I heard that the WHO and the US Veterans Admin use Cache because it is faster and scales better. Does anyone know if that is true?


  3. victorseo,

    It really depends on the needs of your application. For web applications that have light querying, key/value stores are very useful. For enterprise databases where reporting is typically very heavy, relational databases fit better. I cannot really comment on systems like Cache as I have not used them.


  4. For very, very large scale consumer applications relational databases (in my experience) end up being just part of the equation. When you are dealing with multi-million or billion record dbs where most of what you are doing is looking up one record value for a user key:value stores are way, way faster. When you need to start doing joins, full text search and other relational tasks it end up being easier to dump the key values into a relational database just for searching the store, especially if you only need to work with a subset of the records.

    Anyway, the point is a relational db is the right (or good enough) answer for about 99.9% of all applications. I’m guessing most of you readers would be better of with a relational db. But, the cloud store options are great for a lot of apps where scale, speed and low cost are all important.


  5. State of the art databases can be cheap to develop for, or cheap to deploy, or neither.

    Scaling, in a google sense, mean that an application runs on small commodity PC hardware, but supports essentially unbounded load as more PC’s are added. This make an application with demanding hardware requirements cheap to deploy. SQL databases can not do this, I believe that BigTable can.

    Cheap to develop means defining a domain model once, and traversing the domain model, without concerns for the number queries. SQL databases can not do this (they require the model to be defined twice, and a mapping layer created, and access to be adjusted to avoid N+1 selects), in process object databases kernels can.

    SQL databases (they are not relational databases) do run the world, good software developers should have some understanding of them.

    The reason SQL databases run the world is inertia. Good software developers do understand them. They are tried so we know how to cope with the issues they create.

    Is the SQL language an optimal choice of data interface for any application?

    Most applications are so undemanding in both complexity and compute requirements that SQL databases can work.

    If the limitations imposed by SQL databases were lifted the boundaries of our imagination would expand, and we would dream new dreams.


Comments are closed.