Yesterday, Alexander van Elsas wrote a post about our pointless need for real-time information. As usual, it is a very interesting read, and he gets you thinking. I commented on his post, but I felt like there was more to say. To start, Alexander makes an interesting point of whether information really has any value if anyone can have access to it:

If anyone can have access to any information at any time, what is then the value of that information? As transaction costs to produce, distribute and consume information drop to zero the question arises if the information value itself drops to zero too? My guess is that in many cases the data itself will have less value. That same data all platforms are now fighting a war over, the data that makes web 2.0 more important than the destinations of web 1.0.

Because I have small children, I tend to relate things to simple quotes from cartoons or kids’ movies. Alexander’s quote reminds me of something said by Dash in The Incredibles. He states that if everyone is special, that is the same as saying nobody is special. Both this quote and the quote from Alexander seem to generalize the idea a little too broadly. If everyone has access to information, that does not mean that people get the same things from the information. In the case of news information, some people just skim headlines to stay on top of the general happenings in the world. Other people will read several articles in order to gain knowledge about a topic. They still have access to the same information, but the end result is very different. In the end, knowledge is the differentiator. Alexander does talk about this in relation to Stephen Hawking and information about black holes. If you have all published information about black holes readily available, that does not mean that you understand them as much as Stephen Hawking. I totally agree with this idea.

The main problem I have is that information is not knowledge. If I understand Alexander’s post, he feels the same way. However, he does not take this a step further to determine what this real time access may become. Real time data is just data. The real-time part of it just means that we have access to it quicker. However, quicker access to information is not the problem. Without knowing what to do with the data, the data is useless. I am going to make the same comparison I did in my comment on Alexander’s post.

There is knowledge to be mined

We are currently aggregating all of this real time information into sites like FriendFeed. This is very similar to a trend we saw in the 90s with databases. Many major corporations had various departmental databases. So, marketing may have had some interesting information on the various advertising campaigns, but they did not have any sales information. In order to get the sales information, they had to make a special request to the sales department for a report on the sales during specific periods of time. The sales department had all of this data in their own database, and just needed to write a custom report for the marketing team. People found that the delay in getting these reports was fairly lengthy, and they wondered why the data could not be brought together.

So, data warehousing arouse as a way to centralize the data being generated by various loosely related departmental databases. The simple benefits were immediately obvious. That same report for the marketing group now took a few hours to generate instead of the team waiting for a week to get the report, convert it into a readable format and load it into their own database. On top of this, data mining started to become a more formalised discipline. Once the data from the various departments was aggregated, people noticed that their reports only contained a small subset of the data available. There was a large amount of information that they had never seen before. What did this other information tell them?

For example, large pharmaceutical companies run advertisements all the time. How do they know if they are effective? In the data warehouse model, they can review the sales information for the timeframe of the advertising campaign and the few months after the campaign. If there is a non-seasonal increase in sales of the drug, then the campaign was probably effective. The other information that they could find in the data warehouse is the golden nugget that they are after. The other information that became available is the actual prescription data. These companies can receive daily or monthly feeds of anonymized prescription data from various pharmacy chains. This data will tell them which areas of the country purchase a specific drug more often. In addition, this data can be correlated to the advertising campaign to see if the advertising helped in those areas or even if it helped in areas where a competitor’s drug is selling better.

What does this have to do with real-time information access? First, we are still in the aggregation stage with tools like FriendFeed. Once the aggregation problem is fundamentally solved, people will start clamoring for better tools to help them understand and filter this data. We are currently building our data warehouses of real-time information. We are still waiting for the effective reporting and data mining. Some of this could come from the semantic web technologies and other pieces we probably have not seen yet. However, the mining of the real-time data is the reason we need to collect it. The problem is that you have to collect the information before you can understand what is in it.

Reblog this post [with Zemanta]