Expand on known hosts related to a domain through searching for instances of repeated code/html and tracking ids across publically available data.
- Initial idea for this came from reading this bellingcat article
Traverse currently uses
- host.io [free, requires api key]
- spyonweb [free, requires api key]
- publicwww [free, requires api key],
- shodan [free, requires api key (premium api keys are regularly available for free or a low price)],
- WebArchive scraping.
- This may take some time depending on how many snapshots of the page there are.
- Disabled by default, to enable it open traverse.py and add "webarchive" to the 'services' list.
- Live page scraping
I wrote a blog post going into detail on this topic, some ideas referenced in the post have not yet been implemented.
There are 2 output formats: -oS
(output simple) which is just a plain text output of discovered domains, and -oJ
(output json) which is a more detailed JSON output.
-
https://httparchive.org/ / BigQuery is amazing for things like this, but also very expensive :( If you have the resources, definitely look into using this great dataset.
-
https://xaviesteve.com/domeye/ is also great, but I'll avoid scripting it as it seems to be run and paid for by an individual.
-
https://dnslytics.com/reverse-analytics is good but requires payment for most features that separate it from others (no free api)
TODO:
- Facebook Pixel
- Google Tag Manager
- Quantcast
- Yandex Metrika
- Recursive Search
- CommonCrawl (?)