The global used cars market was estimated at $828.24 billion in 2019 and is projected to reach $1,355.15 billion by 2027. In Vietnam, Used Car market in terms of sales volume increased at a double digit CAGR over the review period 2013-2018. The market was observed to be at the early growth stage owing to the faster vehicle replacement rate, reduction in new car launch time, growing middle class population, increasing average ticket size and reduction in import duty on new cars. In Vietnam, people prefer to buy a used car as new ones are expensive and for middle or lower income group people, used cars have become more popular choices. Vietnamese government is expecting new policies that ban motorbikes in the urban area so that the demand for used cars is expected to rocket.
Due to the increasing demand for used cars in Vietnam, we have built a prediction model to predict used cars' prices to make it easier for Vietnamese to purchase cars.
- Nguyen Thanh Tuan - Director of DSLab
- To Duc Anh - DSLab member
- Tran Minh Khoa - DSEB member
- Duong Thu Phuong - DSEB member
- Nguyen Anh Tu - DSLab member
- Kieu Son Tung - DSEB member
- Nguyen Son Tung - DSLab member
To make the model best fit with Vietnamese market, we have search for top e-commerce websites that sell used cars to crawl selling post of that sites.
Since data have a lot of missing fields because each e-commerce site has different data field information, we have also scrape autodata.net to fill in the missing fields. We will leave the crawled data and crawler open source, if anyone is interested, you can use it for free of charge, no permission required.
Since finding and matching cars takes too many time, we decided to build a Machine Learning model to predict and fill in the missing fields with precision of 99.2%.
We mainly use requests and beautifulsoup package in Python to send requests and extract information. Sometimes, the protection to prevent DoS attack stop us from scraping the website, therefore, we have to simulate user activities using Selenium
Selenium requires a driver to interface with the chosen browser. Firefox, for example, requires geckodriver, which needs to be installed before the below examples can be run.
Failure to observe this step will give you an error selenium.common.exceptions.WebDriverException: Message: ‘geckodriver’ executable needs to be in PATH.
The crawling mechanism requires selenium, which mean you need specific version of browser to work with, we use Chrome the details version is listed below:
- Chrome === 98
- Other browser version will be supported later.
We included a webdriver in the repo itself, but if you want you can change to use the webdriver of your choice:
If you want to change the webdriver path, go to common/check_os, there is a function called: get_selenium_chrome_webdriver_path with a variable called defined_path, change it to your desired path to the chromewebriver in your machine
Open Terminal / cmd and do the following:
python -m venv <envname>
- On Mac:
source <envname>/bin/activate
- On Windows:
<envname>\Scripts\activate
pip install -r requirement.txt
We included our crawled data, but if you want to crawl the newest data, do the following
Head to crawl.py and run it
Disclaimer: Due to the update of the website or changing website structure, some crawlers might not work
If you want to re train the model, you can head to the notebook and choose run all to get the model results.
Distributed under the GNU General Public License v3.0 License. See LICENSE.txt
for more information.
Project Link: Used cars prediction