bas-scraping

A web scraper for bastrucks.com which extracts all the relevant information.

Implementation Details

This channel i.e. https://www.youtube.com/c/BASWorld/videos has more than 30K videos. We needed to extract the links from the description of each video and collect all the data from the original website i.e. www.bastrucks.com and provide it in excel format.

How to run the scraper

Clone the repo.
Run yt_links_extractor.py. It will generate an excel file named YouTube.xlsx having all the relevant columns being scraped from YouTube. Just need to add google api_key on line number 68.
Run the command scrapy crawl javascript_rendered to generate an excel file named BasTrucksFinalDocs.xlsx which will be having the documents links.
Run the command python make_client_relevant_file.py to generate the excel file whose format will be same as to the required file.
Run scrapy crawl bastrucks which will scrape whole website to generate an excel file having all the relevant data being populated.
Run scrapy crawl docdownloader to download all the documents.
Run scrapy crawl imagedownloader to download all the images.

Demo:

demo_video.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
BAS		BAS
images		images
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bas-scraping

Implementation Details

How to run the scraper

Demo:

About

Releases

Packages

Languages

License

CognitiaAI/bas-scraping

Folders and files

Latest commit

History

Repository files navigation

bas-scraping

Implementation Details

How to run the scraper

Demo:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages