Python URL Logger

A simple python script to extract URL endpoints from a website. One of a number of tools in my arsenal for helping migrate websites between platforms, with little to no SEO penalty.

This script will extract all URLs from a website, and log them to a CSV file; making it easier to plan the URL structure for your new website

Setup

Clone the repo to your local machine git clone git@github.com:danmenzies/url-logger.git
Create a virtual environment python3 -m venv venv
Activate the virtual environment source venv/bin/activate
Install the requirements pip install -r requirements.txt

Usage

Once either of these scripts has finished, see the ./data directory for the CSV file containing the URLs

Crawler:

Open the ./scripts/ directory
Run the script python manual-crawl.py
Enter the domain (or subdomain) of the website you want to extract URLs from
Grab a coffee, this may take a while...

Sitemap Grabber:

Open the ./scripts/ directory
Run the script python convert-sitemap.py
Enter the domain (or subdomain) of the website you want to extract URLs from

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change. You are also welcome to fork the repo and make your own changes, if you prefer; and I welcome any requests to merge back into the main branch.

Feedback is also welcome, if you have any suggestions for improvements, please open an issue.

Disclaimer

Please only scrape sites you have been authorised to scrape! I take no responsibility for any misuse of this script.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Python URL Logger

Setup

Usage

Contributing

Disclaimer

Files

README.md

Latest commit

History

README.md

File metadata and controls

Python URL Logger

Setup

Usage

Contributing

Disclaimer