🔗 WebCrawler - Get Site URLs

WebCrawler to get URLs from a given site

1. Architecture

1.1. Technologies of the project

Belows are the tool used to perform the development of this project.

JavaScript
Express JS
Node JS
Node JS For testing - It's the same Node Library, but in this case to provide a quick solution, Node JS was used for testing purposes instead of any other testing library

1.2. Prerequisites to run

To use this project, you'll need:

Node.js, a Long-Term Support (LTS) release version 16 or later - download
Google Chrome, latest version - download

2. Usage

2.1 Installation / Setup

Once you have the code on your computer, use your computer terminal to run the following command in the directory where you've cloned the project:

npm install

2.1. Command line interface

To run the webcrawler, you can do in the following way:

Currently it only works for urls that ends in .com (if is added any other chars after .com, the cli will not work properly)

2.2 CLI options / arguments

Get URL's from a given website.

2.2.1 Example of usage by using the BASH script

Usage

$ chmod +x run-crawler.sh
$ ./run-crawler.sh

After those commands have been typed, the script is going to guide you through the options available on how ro run

2.2.2 Example of usage by using the JS CLI

Usage

$ npx get-links-urls <url>        						 # Get all URLs from a given single website
$ npx get-links-urls <url1> <url2> <url3> <url4> <url5>  # Get all URLs from multiple websites
$ npx get-links-urls --help								 # Help of the usage of the tool

* Running for a single URL
npx get-links-urls

* Running for multiple URLs
npx get-links-urls google.com polaris.shopify.com

Options

	--output=<string>, -o   File saved to the system
	--max-depth=<number>,   Maximum depth of routes to search

Examples

	$ get-site-urls your.given-url.com --output=sitemap.xml
	✅ Generated sitemap.xml 150 urls found from the tiven website

3. Tests

In this case, no tests/assertion library was used, in this case to provide a quick solution, Node JS was used for testing purposes instead of any other testing library

3.1 Installation / Setup

Download all the dependencies

npm install

2.1 Running the tests

Run the tests with a simple command

npm run tests

About myself

This is my profile on Github: ralves20

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
test-fixture		test-fixture
.editorconfig		.editorconfig
.gitignore		.gitignore
README.md		README.md
cli.js		cli.js
data.json		data.json
index.js		index.js
package.json		package.json
run-crawler.sh		run-crawler.sh
test.js		test.js
urls.json		urls.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔗 WebCrawler - Get Site URLs

1. Architecture

1.1. Technologies of the project

1.2. Prerequisites to run

2. Usage

2.1 Installation / Setup

2.1. Command line interface

2.2 CLI options / arguments

2.2.1 Example of usage by using the BASH script

2.2.2 Example of usage by using the JS CLI

3. Tests

3.1 Installation / Setup

2.1 Running the tests

About myself

About

Releases

Packages

Languages

therodrigocosta/get-links-urls-crawler

Folders and files

Latest commit

History

Repository files navigation

🔗 WebCrawler - Get Site URLs

1. Architecture

1.1. Technologies of the project

1.2. Prerequisites to run

2. Usage

2.1 Installation / Setup

2.1. Command line interface

2.2 CLI options / arguments

2.2.1 Example of usage by using the BASH script

2.2.2 Example of usage by using the JS CLI

3. Tests

3.1 Installation / Setup

2.1 Running the tests

About myself

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages