Skip to content

Latest commit

 

History

History
15 lines (10 loc) · 750 Bytes

README.md

File metadata and controls

15 lines (10 loc) · 750 Bytes

Node TypeScript Multi-Purpose Headless-Chrome Scraper

It's a web scraper/crawler written in TypeScript that uses the Chrome devtools API with parallel headless Chrome instances to scrape websites using the real browser and not some approximation like many tools do.

It means it sees the full interface of pure JavaScript SPAs like any regular user would.

Currently the main reports this tool provides are internal URLs with error status codes, it's not a lot but it's already quite useful for the websites I work on.

Now that the base mechanism of discovering and going through all of the URLs of a website starting at a given domain is in place, I plan to add more feature as needed for my work.

Oh and the UI is a web one, written in React.