The goal of this project is to forecast the incidence of COVID-19 cases. Data curated by JHU CSSE. This project is an extension of UMich Masters of Applied Data Science: Health Analytics Coursework.
-
Inspect correlations between demographic factors and COVID-19 prevalence
-
Implement LSTM model for COVID-19 forecasting and evaluate on state and county-level data
-
Visualize the performance of model trained on county-level data
Smoking is the factor most positively correlated with covid cases, followed by poor health and obesity. Conversely, median household income negatively correlates with covid.
We can visualize the rate of cases in comparison to smokers:
We implement elements of the model explained in the paper: A spatiotemporal machine learning approach to forecasting COVID-19 incidence at the county level in the United States.
The prediction intereval is widest around November 2020.
Unlike the paper we will did not implement an ensemble but only one model. This is something for future work along with comparision to more traditional time series models such as ARIMA.