WebCrawler to get URLs from a given site
Belows are the tool used to perform the development of this project.
- JavaScript
- Express JS
- Node JS
- Node JS For testing - It's the same Node Library, but in this case to provide a quick solution, Node JS was used for testing purposes instead of any other testing library
To use this project, you'll need:
- Node.js, a Long-Term Support (LTS) release version 16 or later - download
- Google Chrome, latest version - download
Once you have the code on your computer, use your computer terminal to run the following command in the directory where you've cloned the project:
npm install
To run the webcrawler, you can do in the following way:
Currently it only works for urls that ends in .com (if is added any other chars after .com, the cli will not work properly)
Get URL's from a given website.
Usage
$ chmod +x run-crawler.sh
$ ./run-crawler.sh
After those commands have been typed, the script is going to guide you through the options available on how ro run
Usage
$ npx get-links-urls <url> # Get all URLs from a given single website
$ npx get-links-urls <url1> <url2> <url3> <url4> <url5> # Get all URLs from multiple websites
$ npx get-links-urls --help # Help of the usage of the tool
* Running for a single URL
npx get-links-urls
* Running for multiple URLs
npx get-links-urls google.com polaris.shopify.com
Options
--output=<string>, -o File saved to the system
--max-depth=<number>, Maximum depth of routes to search
Examples
$ get-site-urls your.given-url.com --output=sitemap.xml
✅ Generated sitemap.xml 150 urls found from the tiven website
In this case, no tests/assertion library was used, in this case to provide a quick solution, Node JS was used for testing purposes instead of any other testing library
Download all the dependencies
npm install
Run the tests with a simple command
npm run tests
This is my profile on Github: ralves20