Skip to content

Commit

Permalink
TrackHAR now also supports web trackers
Browse files Browse the repository at this point in the history
  • Loading branch information
baltpeter committed Nov 29, 2024
1 parent 1b480d1 commit dde2172
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 6 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@

> Library for detecting tracking data transmissions from traffic in HAR format.
For research into mobile privacy and complaints against tracking, it is important to know what data is being transmitted in a request to a tracking server. But these requests are in a huge variety of different formats and often heavily nested and/or obfuscated, which hinders straightforward automatic analysis. TrackHAR aims to address this problem. It takes recorded traffic in a [HAR files](http://www.softwareishard.com/blog/har-12-spec/) as the input and returns a parsed list of the transmitted data (and, optionally, additional metadata like the tracking company and location in the data) for each request it can handle.
For research into privacy in websites and apps, and complaints against tracking, it is important to know what data is being transmitted in a request to a tracking server. But these requests are in a huge variety of different formats and often heavily nested and/or obfuscated, which hinders straightforward automatic analysis. TrackHAR aims to address this problem. It takes recorded traffic in a [HAR files](http://www.softwareishard.com/blog/har-12-spec/) as the input and returns a parsed list of the transmitted data (and, optionally, additional metadata like the tracking company and location in the data) for each request it can handle.

To achieve this, TrackHAR uses two complementary approaches: adapter-based parsing and indicator matching.

* **Adapter-based parsing**: Our main approach is to use adapters written for specific tracking endpoints. In our [research](https://benjamin-altpeter.de/doc/thesis-consent-dialogs.pdf), we have found that generic approaches (like indicator matching in the raw transmitted plain text or [base64-encoded](https://github.com/baltpeter/base64-search) request content) are not sufficient due to the frankly ridiculous nesting and obfuscation we observed. In addition, approaches that search for static honey data values can never capture dynamic data types such as free disk space and current RAM usage, or low-entropy values like the operating system version (e.g. `11`).
However, we have also noticed that there is a comparatively small number of tracking endpoints which make up a large portion of all app traffic. This makes our adapter-based approach feasible to detect most of the transmitted tracking data.
However, we have also noticed that there is a comparatively small number of tracking endpoints which make up a large portion of all web or app traffic. This makes our adapter-based approach feasible to detect most of the transmitted tracking data.

* **Indicator matching**: But it will never be possible to write an adapter for every request. Thus, we use indicator matching as a fallback for requests not covered by any adapter. Indicator matching relies on the user providing known honey data values (such as the advertising ID or geolocation) that are then searched for in the requests. TrackHAR supports indicator matching for plain text, base64-encoded and URL-encoded values in the request headers, path, or body. It also tries to match case-insensitively where possible.

Expand Down
Loading

0 comments on commit dde2172

Please sign in to comment.