-
Notifications
You must be signed in to change notification settings - Fork 5
Used Car Data Scrapper
You operating system has to be either:
- Ubuntu 16.04 LTS (“Xenial”)
- Ubuntu 18.04 LTS (“Bionic”)
Other operating systems may also work, but those have not been tested yet.
The following tutorial is a condensed version of Scrapy official documentation
conda install -c conda-forge scrapy
pip install Scrapy
The following tutorial is a condensed version of MongoDB official documentation.
From a terminal, issue the following command to import the MongoDB public GPG Key from https://www.mongodb.org/static/pgp/server-4.4.asc:
wget -qO - https://www.mongodb.org/static/pgp/server-4.4.asc | sudo apt-key add
Create the list file /etc/apt/sources.list.d/mongodb-org-4.4.list for your version of Ubuntu.
echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/4.4 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.4.list
echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.4 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.4.list
echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/4.4 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.4.list
sudo apt-get update
sudo apt-get install -y mongodb-org
sudo systemctl start mongod
or
sudo service mongod start
You can optionally ensure that MongoDB will start following a system reboot by issuing the following command:
sudo systemctl enable mongod
mongo
We want to create a database called "usedcar". Execute following command in MongoDB CLI.
>use usedcar
Use the following command to check if you successfully created the database
>db
usedcar
Create a Collection
>db.createCollection("cargurus")
Navigate to setting.py under usedcar directory. Find defined variable, it should be something like
MONGO_URI = atlas
change atlas to your MongoDB connection URI, such as
'mongodb://localhost:27017'
Your can run the scrapper now!
Type the following command
scrapy crawl usedcar
Data would be feed into your MongoDB