webcrawler

A very basic web crawler implemented using beautiful soup 4 and native python

extracts urls from any website and using breadth first search traversal, visits each url and in turn extract urls from each webpage and so on... with slight modification this can be used to download an entire website.

usage:

pip install -r requirements.txt

python crawler.py http://stanford.edu/

replace that with any url that you'd like to crawl

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
crawler.py		crawler.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

webcrawler

About

Releases

Packages

Languages

aditya6992/webcrawler

Folders and files

Latest commit

History

Repository files navigation

webcrawler

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages