Skip to content

A web scraper built with Scrapy to extract all news articles from the Soka University website

Notifications You must be signed in to change notification settings

yourvivian/soka-news-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

soka-news-scraper

This project involves web scraping news articles from the Soka University website using the Python web scraping framework Scrapy. The goal is to extract information such as article titles, URLs, text content, categories, publication dates, and the date of scraping.

The project utilizes the scrapy library and extends the CrawlSpider class from Scrapy to define the web scraping behavior. The spider is named "soka-news" and is configured to crawl the "https://www.soka.edu/news-events/news" URL as the starting point.

Usage

Navigate to the project directory: cd scraper

Start the scraping process by running the following command: scrapy crawl soka-news -o articles.json

The scraper will start fetching news articles from the Soka University website and store the extracted information in the output file articles.json.

About

A web scraper built with Scrapy to extract all news articles from the Soka University website

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages