Data, context and analysis has come up a lot lately. I talked about data and context about a month ago when people were arguing about JSON vs. XML. The problem at that time was that people were comparing data formats instead of potential usage:
As with any programming problem, different requirements and different contexts may call for different technologies. If you get stuck on saying that JSON is better than XML (or the other way around), you lose another tool in your toolbox.
Another interesting context problem is one regarding location, but not the current location, the geography of thought. In that post, the author is looking at the different context biases western and eastern cultures have:
Unlike the westerners who generally believe objects (in its sense of abstraction) being the fundamental building blocks of the world, for long time the easterners perceive relations being the true construction blocks of the world. The westerners think a noun holding its uniqueness regardless of the application, while the easterners think of a noun indeed with few uniqueness until we recognize its application environment. The easterners think verbs being unique since they describe the ways of tangling while nouns are the things happen to be tangled.
Obviously, this is a different type of context than what the type of context discussed when arguing about the better data format. In all applications, context is of the utmost importance. In applications like Facebook or Twitter, you are providing context in the form of status updates. In applications like FourSquare, the context is provided by the GPS software on your phone. In all of these applications, there is a huge amount of data being created, all of it being transported with the needed contextual information.
However, all of this information currently has no meaning because we are not putting the pieces together. Some of the applications that work with Facebook, Twitter and FourSquare are trying to collect the information, but they are not processing the information. Louis Gray talks about this problem in his We Need Tech Intelligence post:
What is needed now is a doubling down on intelligence, or at least a way to improve our abilities to be better people, better employees, or simply better educated. The promise of something like Quora is not that it’s another social network to chat, but that it has the potential to be a valuable resource for discovery…
The problem with all of this data is that even with context it does not always have much meaning or at least actionable information. Or the data is from only one service, like FourSquare, but there are other players in the location application space. By only taking data from one application, you are potentially limiting your information to one subset of the population. Does this mean that you need all location application information? Absolutely not, but you should get some good coverage. Continuing with the location data example, you want several different checkin services, like GoWalla, Google Latitude and Facebook Places. In addition to basic location data, ensure that you are capturing additional metadata like reviews or ratings from services like Yelp. This allows you to gather more data over a larger population. By doing this, trends that become apparent in the data are more generalized. Taking a trend from one service could mean that a specific type of subset of the population may fit the trend, but it does not apply to the mainstream. Think back to a year ago and what FourSquare was then. It was a haven only for early adopters. Trends from that time were not entirely useful for the general population. If you look at the location services now, more people are using them, and Facebook has thrust location into the mainstream with its Places features. That is a much different population than the early FourSquare users.
Putting It All Together
So, if you have all of this data from all of these different types of services what do you do with it? First, you may not be in the best position to collect the data. Thankfully, there are services that will do the data collection for you for a price. I mentioned some in my “A Look Ahead” post, but there is something missing from that portion of the post:
The one prediction I will make is that some data startup will become huge. We have some players already in Gnip and DataSift, but 2011 is really the year of data and one company will have massive growth.
Yes, these types of data collection services will have a big year, but in order for a data startup to become huge, they need to add value to the data. Thinking in the terms of the examples above, the data collection services provide the data and the context. There is minimal analysis and meaning in the data. Some of this can be done by adding semantic tagging or implementing concepts from Linked Data. By providing this relationship data, some meaning can be inferred or at the minimum you have provided more contextual information. The analysis and meaning can be extracted using various analysis and graphing tools, but in many cases tools and algorithms from machine learning and artificial intelligence will be incorporated.
The analysis of data and extracting meaning from the data is not an easy process. Data analysis, statistics, machine learning and artificial intelligence have been lurking next to the web for years. Web analytics solutions use many concepts from these areas, but now we are generating useful data that is outside of the traditional web analytics world. Imaging if one of these data startups provided analysis of the location data across various services. This may not mean much to the mass consumer, but what about the small business owner? They would love to know that people are often visiting a competitor near them, but those same people have been unhappy with service. The small business owner cannot do this type of data collection and analysis themselves, and they are not the only people that would benefit from these types of services.
The real question is are you ready? Do you have the right tools and services that you need?