Skip to content

Latest commit

 

History

History
24 lines (19 loc) · 838 Bytes

README.md

File metadata and controls

24 lines (19 loc) · 838 Bytes

BestRouteTo

Find the best route to the inner links of a website, find dead links and create the sitemap.xml

Implements the Dijkstra algorithm for finding the shortest paths. Not professionally made, but provides three utilities that can be easily extended. All results are saved on separate xml files.

It is also provided an http server for testing purposes and a script that creates randomly interlinked html pages.

Depedencies

It requires the following libraries:

  • bs4
  • dominate

    Running

    Runs on python 3.x

    From terminal:
    python3 web_crawler.py --domain http://www.example.com --firstpage thefirstpage.html
    -d, --domain is required
    -f, --firstpage defaults to /
    -so, --sitemapout defaults to sitemap.xml
    -po, --pathsout defaults to paths.xml
    -do, --deadout defaults to dead.xml