It seems like Twitter is now at the crossroads. They recently announced a new funding round of $15 million and have started communicating more with their users. These are very good things. A new open Twitter is a wonderful thing. People have been very hard on them because they were not saying anything about the outages. They have also mentioned in their blog that they know scalability and reliability are issues. The bad side of this is that they had several outages throughout this past week. TechCrunch openly wondered whether they can scale the service.

Twitter is unique in that it needs to parse a large number of messages and deliver them to multiple recipients, with each user having unique connections to other users… Every new Twitter user and every new connection results in an exponentially greater computational requirement.

I have always had faith that given enough time and energy, they would solve the scale problem. That is I had faith until I read their recent blog entry. The first paragraph explains a lot:

We found an errant API project eating way too much of our Jabber resources.  This activity  had an affect of overloading our main database, resulting in the error pages and slowness most people are now encountering.

I can totally understand that there is a third party project that is pounding on the Jabber API. However, there is an underlying problem that concerns me. They have documented problems with scaling the service in general. They now admit that the Jabber API, their realtime API, has some performance issues that affect the main site. The reason I am concerned about this is that it seems like they built a service that became very popular and they could not foresee that there might be issues six months ago. Granted, their real traffic growth started in February of this year, but it has been a steady growth and not any huge jump in traffic. Given how popular the service was in October, scale should have been the focus at that point.

It is now late May and they are just admitting that they have infrastructure problems. Obviously, there are the redundancy and replication issues that can be solved with known techniques. The realtime API needs to be using a replicated database and not affect the main database. This is the part that is concerning me the most as it should have been obvious some time ago. Also, disabling the realtime API while you recover is bad public relations because the third parties that use the API have helped in the growth of the service. if you look at where all of the updates come from, most updates are not coming from the Twitter web client.

So, now we wait. We wait to see how quickly they can solve each of the scalability issues. They now have the money, so let us see what they can do. If they really want to become a utility service, they have to be able to scale reliably. If they really want to become a utility service, the API services need to become more important than the web client. If they really want to become a utility service, they have a lot of work to do.

Good luck Twitter team, I hope you can succeed.