soka-news-scraper

This project involves web scraping news articles from the Soka University website using the Python web scraping framework Scrapy. The goal is to extract information such as article titles, URLs, text content, categories, publication dates, and the date of scraping.

The project utilizes the scrapy library and extends the CrawlSpider class from Scrapy to define the web scraping behavior. The spider is named "soka-news" and is configured to crawl the "https://www.soka.edu/news-events/news" URL as the starting point.

Usage

Navigate to the project directory: cd scraper

Start the scraping process by running the following command: scrapy crawl soka-news -o articles.json

The scraper will start fetching news articles from the Soka University website and store the extracted information in the output file articles.json.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
scraper		scraper
README.md		README.md
scrapped.json		scrapped.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

soka-news-scraper

Usage

About

Releases

Packages

Languages

yourvivian/soka-news-scraper

Folders and files

Latest commit

History

Repository files navigation

soka-news-scraper

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages