Is Google Public DNS A Marketing Data Warehouse?

By now, you have read dozens of posts regarding Google Public DNS. The basic idea is that Google has decided to release a faster DNS. This is not a new protocol, just a fast implementation of the Domain Name System (DNS). If you decide not to read the wikipedia entry, let me summarize DNS in a very basic manner. When you type an address (URL) into your browser, that address (www.google.com) goes to a DNS server to be translated into an IP address (192.168.1.1). With this information, your request can be sent to the appropriate server somewhere in the world. Every click on a web page has to go through the same process, except for client-only actions where no information is sent to a server.

So, why would a Google Public DNS be a step too far? Well, if all of your browsing data was captured, you would be concerned, right? If Google Public DNS is installed by enough ISPs, then this could happen. Granted, Google has started with a solid privacy policy that states:

We delete these temporary logs within 24 to 48 hours. In the permanent logs, we don’t keep personally identifiable information or IP information. We do keep some location information (at the city/metro level) so that we can conduct debugging, analyze abuse phenomena and improve the Google Public DNS prefetching feature. We don’t correlate or combine your information from these logs with any other log data that Google might have about your use of other services, such as data from Web Search and data from advertising on the Google content network. After keeping this data for two weeks, we randomly sample a small subset for permanent storage.

I know that many of you that read this blog regularly are probably thinking that I am starting into the Google is taking over the internet rant again. The release of a DNS further supports that rant, and makes things really uncomfortable for conspiracy theorists. However, my focus today is on the data. There is an FAQ that gives a lot of information in short bites, and it gives a nice overview of the service. Part of the FAQ and the privacy policy is the data retention policy that is quoted above. The temporary logs have all of the IP address information and that is the reason those logs are only kept for 1 or 2 days.

The permanent logs are obviously a bigger question. What data is kept and what is it used for? The privacy policy does state that after two weeks, the data is randomly sampled for permanent storage. So, even in the permanent logs, they are only keeping data for 2 weeks. The randomly sampled data is interesting because it is not really talked about except for this one point. The randomly sampled data may be a small subset, but if every single web request goes through them, a small subset is still a massive amount of data. Even if the data is aggregated at the city/metro level, there is a wealth of demographic information that is available. This type of random sampling could give Google a huge data warehouse of web usage data.

Even if you did not know the age of a user or their political and religious leanings, think about how much information is available when they know what time of day the request came in, what city the request is coming from, and what the target address was. This is a marketers dream. Google could likely figure out the type of products people shop for on Friday afternoons in the Philadelphia area, just by using random samplings of the DNS data. Google does not state that they will not use the randomly sampled data at all. They only state that they will not correlate or combine information from these logs with data that they may have about their other services. This does not mean that they will not try to make product decisions based on the information.

Outside of the obvious privacy concerns, this could be an amazing data collection idea. Given that Google is the provider of the software, this may make it even more agreeable to ISPs and other major companies. It will be interesting to see how quickly it gains adoption.

8 thoughts on “Is Google Public DNS A Marketing Data Warehouse?

  1. I actually think this is really a step too far. Not only does Google get more about you, but they’re essentially deciding whether or not people can get to your site.

    *That* is far too much control for a search engine company.

    –Kyle

    Like

  2. From a usability standpoint, if they can find ways to optimize the system and increase speed & efficiency, then more power to them. If they’re going to then share this information and help other DNS services improve, it could really help the internet as a whole. They’ve also said that they’d follow the protocol to the letter and won’t interfere, censor, etc.

    As for the data concentrating, I’m see it as more good than bad. Of course, we have privacy concerns, however given the short storage of individually identifiable logs, it doesn’t seem to be the end of the world. If they do end up using the longterm generalized data for marketing, perhaps it will actually be a good thing – advertisements would be better targeted to appropriate, interested audiences, which improves the experience for those interested and reduces annoyance for those who aren’t. The technical uses of the data to improve the technology is clearly good.

    And in response to Kyle:
    I think we’re well beyond the point of considering Google as a “search engine company,” and instead as more of an online experience company (not the best description, but first one that came to mind) given the extent and age of their products beyond search.

    Like

  3. Kyle,

    I was curious to see what you were going to say about this one. Given that they will not correlate the data with any other service, I am more OK with it than I though I would be.

    As you say though, is it a step too far?

    Like

  4. Tau

    Given the wording in their privacy policy, privacy concerns are really a non-issue. I am sure people will complain for days about it anyway.

    Given the data and the possible targeted advertising, you have to wonder if such a competitive advantage is a good thing. They could be raising the barrier to entry fairly high if this is the case.

    Regarding the type of company Google is, they are more of an advertising and data mining company. Almost everything they do seems to gather data for the purpose of marketing or advertising.

    Like

  5. Sure seems that Google will make use of the data that it collects to further some of it’s advertising purposes. But information about the domain names in the DNS server is hardly of concern to most people who use it even without knowing that such a thing happens when they type that www dot com.

    Like

  6. Kevin

    It is not the information in the DNS that will be a concern, it is the information about every request. Each request has originator information like IP and some more as well. So, they can probably link geographical demographics with DNS requests, without affecting your data privacy.

    Like

  7. Privacy is dead. Get over it. Marketing will get increasingly more precise and accurate. The moment you rip a hole in your underwear, you will have an ad asking you which brand to choose from to replace it. That is fairly benign. But if you go to the same mosque and happen to buy the same brand of aftershave as the next Mohammad Atta, you will have much more serious issues to contend with.

    Like

Comments are closed.