Bookstore Web Scraper

This project is a web scraper for Books to Scrape, an online bookstore for testing web scraping scripts. The scraper downloads HTML pages from the site and extracts product information, such as book titles, prices, and URLs saving the data into a csv file.

Features

Download HTML Pages: Retrieve pages from the bookstore site based on category and pagination.
Extract Product Information: Parse downloaded pages to extract book details.
Save to CSV: Export the extracted data into a CSV file for further analysis.

Requirements

Python 3.x
Dependencies:
- playwright
- beautifulsoup4
- pandas

Install dependencies using

pip install playwright beautifulsoup4 pandas

Scripts

**bookstore_page_downloader.py** This script downloads HTML page for a given book category and saves them locally
**main.py** This script processes the downloaded HTML file to extract product details and save them in a CSV file.

Usage

Download Pages: Modify the query, page_from, and page_to parameters in bookstore_page_downloader.py to specify the book category and page range to download.
```
python bookstore_page_downloader.py
```
Extract Data: Modify the query and source_dir parameters in main.py to specify the category and location of the downloaded HTML files.
```
python main.py
```
The resulting CSV file will be saved in the project directory.

Notes

Ensure the export folder exists in the project root or specify a valid path.
The script works with the nonfiction_13 category by default. Adjust the query parameter for other categories.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
bookstore_page_downloader.py		bookstore_page_downloader.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bookstore Web Scraper

Features

Requirements

Scripts

Usage

Notes

About

Releases

Packages

Languages

reipared/Bookstore_Web_Scraping_with_Python

Folders and files

Latest commit

History

Repository files navigation

Bookstore Web Scraper

Features

Requirements

Scripts

Usage

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages