diff --git a/README.md b/README.md index 61044c6..24bc3a5 100644 --- a/README.md +++ b/README.md @@ -1,9 +1,11 @@ # immoweb-scraper -immoweb-scraper is a Python-based tool designed to scrape property listings from the Immoweb website. The idea is to use the scraped data as a way to experiment with modeling exercises and as a way to experiment with different scraping, data engineering workflows. This codebase uses Prefect to schedule regular scraping scraping tasks. The results of the scraping are added to an sqlite database. Eventually, we will add tasks to make modelling views of the collected data and make backups of the database. +immoweb-scraper is a Python-based tool designed to scrape property listings from the Immoweb website. The idea is to use the scraped data as a way to experiment with modeling exercises and as a way to experiment with different scraping, data engineering workflows. The results of the scraping are added to an sqlite database. The dependencies for this project are managed via `poetry`. -## Architecture Overview +## Future plans -We use a python package to separate components for separating concerns into database connections, scraping logic, URL generation, and browser setup. The dependencies for this project are managed via `poetry`. +* Scheduler to run the scraping tasks and accumulate data slowly over time. +* Dockerize the analysis +* Investigate decoupled architecture of microservices ## Usage