Update: This model is somewhat out of date, but the exploration is still valuable. Hopefully going forward I will be able to develop a tool to forecast UB's pollution around the city. If you are interested please let me know.
This project aims to predict PM2.5 levels in Ulaanbaatar, the capital city of Mongolia. Ulaanbaatar is the coldest capital city on the planet and also has some of the worst pollution. It's location in a valley and lack of infrastructure mean that the majority of the population use raw coal for heat and cooking during the long and severe winter.
Several machine learning models were tested. A random forest regression model was selected. A full write up of the project can be found on Medium in four parts:
- Part 1, Introduction to the problem and some solutions
- Part 2, Exploring the data
- Part 3, The machine learning model
- Part 4, Deployment
The goal of this project is to provide citizens of Ulaanbaatar a tool to use in protecting themselves and their families from air pollution. In testing our RMSE was 28 (scale is 0-500). This is sufficient to enable the prediction of the AQI category.
The Jupyter notebook contains the exploratory data analysis, data cleaning, and algorithm testing. Notebook is written in Python 3.6. Requirements.txt lists all dependencies to run the code. The original data is included (weather-and-aqi-v5.csv). This is a combination of weather data and pollution data. Date range for data: 10-1-2015 to 1-31-2018
Weather data was obtained from NOAA at their global hourly data access tool link. Pollution data is from the US Embassy in Ulaanbaatar's PM2.5 monitoring station link.
The production machine learning model has been deployed on Microsofts's Azure ML platform. The model has been published to the Studio Gallery and can be found here.
Big thanks to Amarbayan for writing the backend scripts, database management architecture, and assisting with making the front end site.