It’s become a given that when you’re online, regardless of device, that you’re being tracked or watched in some way. Typically this tracking software is gathering data for targeted ads based on browsing habits, or dropping cookies in your browser to keep a tab on you. But how far does this all go?
The Web Transparency and Accountability Project (WebTAP) at Princeton University actively searches for the latest in web tracking. Led by computer scientists Arvind Narayanan and Steven Englehardt, the project built a tool called OpenWPM, a piece of software that scours and analyzes websites for features that track the user.
During a two week testing period, OpenWPM scanned the top one million websites on Alexa. It’s the largest research project of its kind ever, and shines an unflattering light on the sheer amount of tracking that every user is subject to.
Beyond cookies, it highlights the abundant use of techniques like “canvas fingerprinting,” a method of identifying a user by their device. The method came to the fore two years ago when Princeton researchers, and counterparts at KU Leuven, published a paper on the tactic, which can be used to create a unique image of a user without a single cookie involved.
According to the researchers, they are hoping to build a “robust and modular platform” to facilitate further study in this field and guide people in how to act on tracking.
Raising awareness, with some success
Web tracking is a lifeline for online advertising. One look at the targeted ads on Google and Facebook is all the evidence you need that you’re being followed online. But the depth of this tracking remains unknown for many. Some companies and services are trying to rectify that.
“I would say that people in general aren’t familiar with tracking on websites,” asserted Larry Furr, VP of product at Ghostery, a browser extension that detects and blocks tracking tags on webpages. Meanwhile Mozilla co-founder and JavaScript creator Brendan Eich, who is now co-founder of ad- and tracker-free browser Brave, calls the whole state of affairs a “parasitic ecosystem.”
Until a few years ago, this architecture of user tracking went largely unchallenged, save for the more privacy-conscious out there. The tables have turned and awareness is beginning to bloom.
Progress! @FTC recommends tracker blockers as a way to ‘block the flow of info’ / block ads https://t.co/8jdo7ZXaYe pic.twitter.com/mj6GQqts7Y
— ashkan soltani (@ashk4n) July 5, 2016
The Federal Trade Commission (FTC) now warns consumers about online tracking. This guidance was updated in June to provide more details on newer tracking practices like fingerprinting and unique device identifiers. It goes so far as to recommend that people actively use browser extensions that reduce the amount of data collected on you.
The FTC has also moved into action by fining companies for infractions. Most recently it slapped Indian adtech firm InMobi for tracking children.
The EU has the “Cookie Law,” which requires site owners to display a message to first-time visitors that they use cookies. Similarly, there is AdChoices for ensuring responsible use of tracking, but this is a self-regulatory system for internet advertisers.
So is any of this working?
“In spite of the fact that the disclosures are there, not everyone takes the time to read them or understand what they do,” counters Furr.
Tracking the trackers
People have a “certain threshold” for tracking and what they’re comfortable with, according to Furr, but there’s still a lack of awareness over how deep tracking goes. Visualizing this activity can make a huge difference.
“The technology behind a tracker is a JavaScript tag that fires on a page. When a page starts to load in your browser, there are a series of these JavaScript tags which fire and in doing so they connect with third party servers,” explained Furr.
Mozilla co-founder Brendan Eich calls the state of affairs a ‘parasitic ecosystem.’
Ghostery’s extension uses a “listener” to detect these trackers based off the company’s script library, a record of known trackers.
When users land on a site, they’ll be shown a list of trackers detected. Users can choose to block certain tags on a site, or whitelist sites. At the same time when a tag is detected, users can access a profile of the company behind the tag if they want to investigate further.
On the other hand the privacy-focused browser Brave, founded by Mozilla’s Eich, is trying to consolidate these efforts into one browser rather than using several extensions. In a similar way, Brave finds and blocks ads and tracking software on webpages. Unlike Ghostery, it does all this by default as part of its mission to make the web better.
“We limit the privileges of any third party content on a webpage. For instance when you go to example.com, they might be loading resources from Google, from some ad networks,” said Yan Zhu, an engineer at Brave. “Brave will block those third parties from sending cookies and doing some sorts of fingerprinting and storing local data on your machine that they can use to track you later.”
Ghostery likes to start at a blank page with blocking off by default, with the options handed over to the user as it breaks up trackers into several different categories for a view of what’s out there such as advertising, analytics, comment thread trackers or social media tags.
But there are many, many trackers. They can become overwhelming for both users and developers. That presents the next challenge for companies like Ghostery, Brave and researchers alike.
The arms race between ad firms and privacy advocates
“The techniques used for tracking are continually evolving, so staying current is definitely an ongoing process,” explained Steven Englehardt, one of the Princeton researchers. “New techniques are often discovered by researchers, both by analyzing new browser features or by analyzing tracking scripts online. I’m currently working to see if we can discover new techniques in use by trackers online in an automated way.”
Research remains a huge area of investment for Ghostery, said Furr. Its process involves a full-time staff and crowdsourcing from enthusiastic users to find and catalog trackers.
The techniques used for tracking are continually evolving.
“It’s kind of an arms race,” stated Eich. Brave keeps pace with trackers through shared lists with other companies and researchers, but right now there’s no real number on the amount out there.
“I’m pretty sure the total number is in the thousands,” added Zhu. Start-ups and computer scientists in this field are in a constant race to put their best effort forward.
“Right now we’re blocking canvas and WebGL fingerprinting, WebRTC and IP fingerprinting,” she said, and recently Brave added features to block a type of audio fingerprinting discovered by the WebTAP researchers at Princeton that collects data from your machine’s audio signature.
“Fingerprinting is definitely dirty,” said Eich, adding that Brave is willing to continue fighting fingerprinting as it gets better and more nuanced. At the same time, there is a lot of hype around the different techniques, he believes, and just how much they can track.
He recalls some adtech and media companies claiming in the past to have developed “unblockable” tactics, but they fell short. “It was trivial. There was nothing they could do that we couldn’t block,” he said and so this “arms race or cat and mouse game” continues.
Staying proactive
As anti-tracking tools get better, there’s still an onus on the user to be aware of the level of tracking at play.
Email tracking is common, for example. It’s often used among marketing professionals to give them insight into whether or not people have opened their emails, clicked links, or just sent them straight to the trash folder. It may even gather location and device data. One way of doing this is including an image with tracking pixels in the email.
Once again, there’s an extension for that. Ugly Email is a Chrome extension that detects whether or not an email in your Gmail inbox includes tracking code. The developers currently have a Firefox version in the works.
Tools like ad blockers, tracking blockers, and privacy-first web browsers go a long way. “It gives users a voice to signal to sites that they are dissatisfied with the current amount of tracking and are willing to take steps to reduce it,” said Englehardt.
Security isn’t easy, and it’s difficult to stay on top of the advice for best practice. It’s one of the many reasons we see so many hacks and data breaches. People reuse passwords or click dubious links all the time. It’s hard to stay vigilant 24/7, but projects like WebTAP hope to make it a little easier.