Etag Tracking 101
An introduction to Entity Tags and how they are used to track internet users as they browse the internet.
The dynamics of how the web works is not a new concept. The crux of it is; a client requests a resource from a server and then the server in turn returns the requested resource. New mechanisms were introduced time and again to make this process easy, efficient and fast. To make this process a lot easier, efficient and inexpensive the concept of cache was bred and later on Entity Tags were introduced.
This article is meant to introduce the concept of Etag Tracking on a beginner level.
ETags or Entity Tags is a web cache validation technique for identifying users or rather resources visiting a website. It is a unique key-value pair tagged to a browser. These etags are elements which are cached by the browser, and returned to the web server when the said element is requested a second time.
This Etag mechanism enables websites to track users across sessions, in spite of changing the IP address, disabling JavaScript, cookies and/or local storage. This is achieved as a result of sending the Etag data in the http-header.
A live working example of this can be seen at the website which goes as follows: cookieless-cookies. This particular domain uses an "eye" tracker image to keep hold of users and displays the number of visits and the last visit date and time.
The basic functionality of an Etag is to optimize performance and enhance the client-server communication process. Let us take an example of an image on a website. The web server generates & returns a validation token in the http header in the form of a hash. The next time a user requests the resource, the Etag value will be checked with the server and if the values match there is no need to request that resource again. (If an Etag hasn't changed a "304 Not Modified" message is returned, meaning the resource will not be requested.)
Not all websites use Etags to track users, some simply use them to save bandwidth but there is no harm in being a little cautious. However, repercussions do seem to arise when Etags are set to persist indefinitely (no max-age constraint) for tracking purposes by a tracking server. Traditionally, third party web sites and advertisement services are known to track users.
Below is an example how Etags exist within headers. The browser used in this illustration is Mozilla Firefox, the websites using Etags are pretty much popular domains wikipedia and meme and a browser extension HTTP Header Live .
Then How Can Avoid Etag Tracking?
- A well known cure for Etag tracking is deleting cache in a timely manner. The point to be noted here is that after revisiting the website a new set of Etags will be generated and stored in the browser cache but those cannot be associated with other sessions thus defeating multi-session tracking.
- Privacy Badger: Privacy badger is an extension-tool developed by EFF to block invisible trackers. Read more about privacy badger here.
- Etag usage monitoring: This is a website which notifies its registered user about Etags being used by top web pages. Learn more.
Closing Thoughts: As I mentioned earlier, not all websites use Etags in http-headers for tracking. The fundamental use of Etag is to make the client-server communication fast and effective. But since some websites started to track its users (Hulu and KISSmetrics) it is intelligent to stay informed about what a website you visit does to you behind the scenes. Disabling flash and java on the browser can prove to be an additional wise decisions. Last but not the least, users should have the discretion whether they want their online activity to be tracked!
If you have any suggestions or thoughts about this article feel free to connect.