By now, you have read dozens of posts regarding Google Public DNS. The basic idea is that Google has decided to release a faster DNS. This is not a new protocol, just a fast implementation of the Domain Name System (DNS). If you decide not to read the wikipedia entry, let me summarize DNS in a very basic manner. When you type an address (URL) into your browser, that address (www.google.com) goes to a DNS server to be translated into an IP address (192.168.1.1). With this information, your request can be sent to the appropriate server somewhere in the world. Every click on a web page has to go through the same process, except for client-only actions where no information is sent to a server.

So, why would a Google Public DNS be a step too far? Well, if all of your browsing data was captured, you would be concerned, right? If Google Public DNS is installed by enough ISPs, then this could happen. Granted, Google has started with a solid privacy policy that states:

We delete these temporary logs within 24 to 48 hours. In the permanent logs, we don’t keep personally identifiable information or IP information. We do keep some location information (at the city/metro level) so that we can conduct debugging, analyze abuse phenomena and improve the Google Public DNS prefetching feature. We don’t correlate or combine your information from these logs with any other log data that Google might have about your use of other services, such as data from Web Search and data from advertising on the Google content network. After keeping this data for two weeks, we randomly sample a small subset for permanent storage.

I know that many of you that read this blog regularly are probably thinking that I am starting into the Google is taking over the internet rant again. The release of a DNS further supports that rant, and makes things really uncomfortable for conspiracy theorists. However, my focus today is on the data. There is an FAQ that gives a lot of information in short bites, and it gives a nice overview of the service. Part of the FAQ and the privacy policy is the data retention policy that is quoted above. The temporary logs have all of the IP address information and that is the reason those logs are only kept for 1 or 2 days.

The permanent logs are obviously a bigger question. What data is kept and what is it used for? The privacy policy does state that after two weeks, the data is randomly sampled for permanent storage. So, even in the permanent logs, they are only keeping data for 2 weeks. The randomly sampled data is interesting because it is not really talked about except for this one point. The randomly sampled data may be a small subset, but if every single web request goes through them, a small subset is still a massive amount of data. Even if the data is aggregated at the city/metro level, there is a wealth of demographic information that is available. This type of random sampling could give Google a huge data warehouse of web usage data.

Even if you did not know the age of a user or their political and religious leanings, think about how much information is available when they know what time of day the request came in, what city the request is coming from, and what the target address was. This is a marketers dream. Google could likely figure out the type of products people shop for on Friday afternoons in the Philadelphia area, just by using random samplings of the DNS data. Google does not state that they will not use the randomly sampled data at all. They only state that they will not correlate or combine information from these logs with data that they may have about their other services. This does not mean that they will not try to make product decisions based on the information.

Outside of the obvious privacy concerns, this could be an amazing data collection idea. Given that Google is the provider of the software, this may make it even more agreeable to ISPs and other major companies. It will be interesting to see how quickly it gains adoption.

Reblog this post [with Zemanta]