Skip to content

Latest commit

 

History

History
23 lines (16 loc) · 1.16 KB

README.md

File metadata and controls

23 lines (16 loc) · 1.16 KB

Web Scraping

⚠️ WEB SCRAPING ETIQUETTE⚠️

Always play nice with the websites you're scraping; check out their rules and get the green light if needed. Steer clear of swiping personal stuff and be copyright-conscious. Oh, and stay in the know about the legal side of scraping – we don't want any surprise legal drama, right?

ALWAYS CHECK THE robots.txt file of the website you are scraping, this will show you which pages you can and cannot crawl.

This project includes:

Simple web scraping of a page from the Vegan Society News. I first retrived the news cards on the page, saving them into a CSV file. Next I proceeded by scraping the images on the news, and saving them to my machine.

Popular libraries for web scraping

  1. Scrappy
  2. BeautifulSoup - used in this small project
  3. Selenium

Web scraping Steps

  1. Crawl
  2. Parse and transform
  3. Store