Page X-Ray Data Privacy Analysis Featured in Episode 43

Dr. Augustine Fou joins us for Episode 43 of Reality 2.0 to walk us through his Page X-Ray app, a data visualization tool that maps web site trackers.

This is Dr. Fou’s second time visiting with us on the podcast, and I encourage you to listen to both episodes:

Episode 13: Surveillance Marketing

Episode 43: Ad Tracking Runs Deep

Page X-Ray differs from typical consumer-oriented privacy apps, like The Markup’s recently published Blacklight in that it not only detects trackers that are loaded by the page, but also the trackers that are called by other trackers, giving a more extensive view of tracking activity and data collection.

This depth of analysis provides the striking visual below, taken from a report gathered via smithsonianmag.com.

Tree graph of trackers found on smithsonianmag.com
Figure 1. A complete tree graph of smithsonianmag.com

Instead of only seeing the first level of trackers, Page X-Ray goes deeper and follows each tracker with a crawler, executing all the javascript, and thus uncovering everything else that’s being loaded. The graphs begin with the first layer of trackers being called by the site, and beyond that, show what each of those scripts loads. The app records every network call, and it translates the result into a tree graph indicating the relationship between what is loading and being loaded.

Some reports, like Smithsonian pictured above, go many layers deep.

Detail of tree graph of trackers found on smithsonianmag.com
Figure 2. The connecting lines, urls, and circled numbers provide additional useful information.

You’ll note that some of the lines are gray, orange, and red, and each url may be gray, blue, orange, or red.

Gray indicates that no cookie is set, and this is the preferred condition that we like to see.

Orange indicates that a third-party cookie was set, meaning it is set by a domain other than the site you are currently visiting.

The white circles indicate the number of times a tag was loaded. If the circle is highlighted yellow, it means this tracker was loaded ten or more times.

A blue url indicates an ad server request, while orange is another analytics tracker. If it’s gray, the nature of the server is unknown.

The flag icons show the country where a specific tracker is called from or where the data is sent, which is especially interesting when information is sent across borders, as privacy regulations differ.

Finally, a fingerprint icon indicates that a script is exfiltrating user data, and logging user behavior, thus creating a digital “fingerprint.” Sometimes this type of tracking is used with good intentions to improve UX, but the clear downside is that data is being sent that potentially includes logins and passwords. These are indicated by corresponding red lines.

While Page X-Ray is geared toward the needs of privacy and ad fraud researchers, it’s worth looking at for anyone curious about the data any site is potentially collecting and sharing. We explore its potential in-depth, and other related topics in Episode 43, and I hope you’ll join us!

Article Comments

Mastodon