As online privacy becomes an ever-more-pressing concern in modern life, Internet titan Google made an announcement yesterday which takes a few steps towards protecting its users privacy in the long term: after about 18 to 24 months, Google will anonymize its server logs so individually-identifying information about sessions, locations, serach terms, and other details can’t be reconstructed. The exception? Google will hang on to complete logs in any instances where it is required to do so by law. Once in effect, Google will apply the policy retroactively to existing logs.
The move comes some months after AOL, apparently inadvertently, published a sizable chunk of its search query logs for academic and private research use. Of course, the logs quickly leaked, and folks combing through them were able to quickly overcome AOL’s feeble anonymization and specifically identify several individuals from the log data. AOL pulled the information from its servers, but there’s no way to “unpublish” anything on the Internet, and the search logs are still floating around. Last year, Google was the only major search engine to fight Department of Justice requests for substantial amounts of log information and other user data the government wanted in order to make its case for upholding the Child Online Protection Act. AOL, MSN, and Yahoo all blithely complied with the DOJ data requests; Google fought the DOJ in court, with the result that substantially less information about its users was turned over to the government than was true for othr search engines.
However, privacy advocates might not want to celebrate Google’s announcement just yet: at the earliest, the new policy won’t go into effect until the end of 2007, and Google is quick to cite the difficulty of implementing its new initiative as a possible cause for delay. And, even if it gets underway, Google will still hold on to personally identifiable information for 18 to 24 months, which is the sort of timeframe governments are considering mandating as a legal minimum for Internet services to retain user data.
What sorts of data does Google collect? At a minimum, the data include search terms, the IP address you’re using when you connect to Google services, information stored in cookies (such as your preferences and data associated with any logins you may have to Google services, like GMail, etc.) along with the pages/ URLs you access within the Google network of sites and the date and time of your access.