Each session in this course uses R code for demonstration. All the content is self-contained within a R software package designed for the course and uses notebooks for each session.
+
If you would like to you can follow the course material without installing any software or running any code. However, if you would like to run the code yourself, you will need to follow the instructions below.
+
+
+
Getting R
+
+
R is used as the main programming language. Please install at least version: R-4.2.0.
+
RStudio is a popular graphic user interface (GUI). Its Visual Editor provides the best experience of going through this course. Please make sure you update RStudio to the latest version.
+
+
+
+
Installing additional requirements
+
Before you get started with the course, you will first need to install the following software.
+
+
Installation of the nfidd package
+
To install the packages needed in the course, including the nfidd package that contains data files and helper functions used throughout, you can use the pak package:
Alternatively, if you are familiar with git you can clone the repo.
+
If you prefer, you can also view each session on the website, and copy-paste the code into your own R script. In that case you don’t need to download the material.
+
+
Tip: if you hover over each code chunk on the website you can use a “Copy” button at the top right corner.
+
+
+
+
Interacting with a local copy of the course material
+
A benefit of downloading or cloning all the material is that you can interact with the session files directly.
+
In this course, all content is written using R Notebooks. This means that we can combine text with code and see the output directly. The notebooks are then directly reproduced on the course website (for example, this page).
+
To interact with each session in the course, we recommend opening the RStudio Project file (ueifid.RProj) in the ueifid folder you have just downloaded. Then you can choose to:
+
+
View each session on the website, and copy-paste the code into your own R script.
+
+
Tip: if you hover over each code chunk on the website you can use a “Copy” button at the top right corner.
+
+
Open the R Notebook for each session.
+
+
Each notebook is saved in ueifid/sessions/ as a .qmd file.
+
Execute code with the single green “play” button at the top-right corner of each code chunk (“Run current chunk”). You can also execute code line-by-line using Ctrl/Cmd + Enter.
+
We suggest using “Visual” view for a better experience (top-left of the RStudio pane).
understanding of forecasting as an epidemiological problem
+
understanding of ways to visualise forecasts
+
+
+
+
Evaluating forecasts
+
+
familiarity with metrics for evaluating probabilistic forecasts and their properties
+
ability to score probabilistic forecasts in R
+
+
+
+
Ensemble models
+
+
understanding of predictive ensembles and their properties
+
ability to create a predictive ensemble of forecasts in R
+
understanding the concept of weighted forecast ensembles
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/robots.txt b/robots.txt
new file mode 100644
index 0000000..4ed394c
--- /dev/null
+++ b/robots.txt
@@ -0,0 +1 @@
+Sitemap: https://nfidd.github.io/ueifid/sitemap.xml
diff --git a/search.json b/search.json
new file mode 100644
index 0000000..863cd22
--- /dev/null
+++ b/search.json
@@ -0,0 +1,656 @@
+[
+ {
+ "objectID": "sessions/forecast-ensembles.html",
+ "href": "sessions/forecast-ensembles.html",
+ "title": "Forecast ensembles",
+ "section": "",
+ "text": "As we saw in the forecast evaluation session, different modelling approaches have different strength and weaknesses, and it is not clear a priori which one produces the best forecast in any given situation. One way to attempt to draw strength from a diversity of approaches is the creation of so-called forecast ensembles from the forecasts produced by different models.\nIn this session, we’ll build ensembles using forecasts from models of different levels of mechanism vs. statistical complexity. We will then compare the performance of these ensembles to the individual models and to each other. Rather than using the forecast samples we have been using we will instead now use quantile-based forecasts.\n\n\n\n\n\n\nRepresentations of probabilistic forecasts\n\n\n\n\n\nProbabilistic predictions can be described as coming from a probabilistic probability distributions. In general and when using complex models such as the one we discuss in this course, these distributions can not be expressed in a simple analytical formal as we can do if, e.g. talking about common probability distributions such as the normal or gamma distributions. Instead, we typically use a limited number of samples generated from Monte-Carlo methods to represent the predictive distribution. However, this is not the only way to characterise distributions.\nA quantile is the value that corresponds to a given quantile level of a distribution. For example, the median is the 50th quantile of a distribution, meaning that 50% of the values in the distribution are less than the median and 50% are greater. Similarly, the 90th quantile is the value that corresponds to 90% of the distribution being less than this value. If we characterise a predictive distribution by its quantiles, we specify these values at a range of specific quantile levels, e.g. from 5% to 95% in 5% steps.\nDeciding how to represent forecasts depends on many things, for example the method used (and whether it produces samples by default) but also logistic considerations. Many collaborative forecasting projects and so-called forecasting hubs use quantile-based representations of forecasts in the hope to be able to characterise both the centre and tails of the distributions more reliably and with less demand on storage space than a sample-based representation.\n\n\n\n\n\n\nIntroduction to ensembles\n\n\n\n\nThe aim of this session is to introduce the concept of ensembles of forecasts and to evaluate the performance of ensembles of the multiple models.\n\n\n\n\n\n\nSetup\n\n\n\n\n\n\n\nThe source file of this session is located at sessions/forecast-ensembles.qmd.\n\n\n\nIn this session we will use the nfidd package to load a data set of infection times and access stan models and helper functions, the dplyr and tidyr packages for data wrangling, ggplot2 library for plotting, the tidybayes package for extracting results of the inference and the scoringutils package for evaluating forecasts. We will also use qrensemble for quantile regression averaging in the weighted ensemble section.\n\nlibrary(\"nfidd\")\nlibrary(\"dplyr\")\nlibrary(\"tidyr\")\nlibrary(\"ggplot2\")\nlibrary(\"scoringutils\")\nlibrary(\"qrensemble\")\n\n\n\n\n\n\n\nTip\n\n\n\nThe best way to interact with the material is via the Visual Editor of RStudio.\n\n\n\n\n\nWe set a random seed for reproducibility. Setting this ensures that you should get exactly the same results on your computer as we do.\n\nset.seed(123)",
+ "crumbs": [
+ "Forecast ensembles"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-ensembles.html#slides",
+ "href": "sessions/forecast-ensembles.html#slides",
+ "title": "Forecast ensembles",
+ "section": "",
+ "text": "Introduction to ensembles",
+ "crumbs": [
+ "Forecast ensembles"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-ensembles.html#objectives",
+ "href": "sessions/forecast-ensembles.html#objectives",
+ "title": "Forecast ensembles",
+ "section": "",
+ "text": "The aim of this session is to introduce the concept of ensembles of forecasts and to evaluate the performance of ensembles of the multiple models.\n\n\n\n\n\n\nSetup\n\n\n\n\n\n\n\nThe source file of this session is located at sessions/forecast-ensembles.qmd.\n\n\n\nIn this session we will use the nfidd package to load a data set of infection times and access stan models and helper functions, the dplyr and tidyr packages for data wrangling, ggplot2 library for plotting, the tidybayes package for extracting results of the inference and the scoringutils package for evaluating forecasts. We will also use qrensemble for quantile regression averaging in the weighted ensemble section.\n\nlibrary(\"nfidd\")\nlibrary(\"dplyr\")\nlibrary(\"tidyr\")\nlibrary(\"ggplot2\")\nlibrary(\"scoringutils\")\nlibrary(\"qrensemble\")\n\n\n\n\n\n\n\nTip\n\n\n\nThe best way to interact with the material is via the Visual Editor of RStudio.\n\n\n\n\n\nWe set a random seed for reproducibility. Setting this ensures that you should get exactly the same results on your computer as we do.\n\nset.seed(123)",
+ "crumbs": [
+ "Forecast ensembles"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-ensembles.html#source-file",
+ "href": "sessions/forecast-ensembles.html#source-file",
+ "title": "Forecast ensembles",
+ "section": "",
+ "text": "The source file of this session is located at sessions/forecast-ensembles.qmd.",
+ "crumbs": [
+ "Forecast ensembles"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-ensembles.html#libraries-used",
+ "href": "sessions/forecast-ensembles.html#libraries-used",
+ "title": "Forecast ensembles",
+ "section": "",
+ "text": "In this session we will use the nfidd package to load a data set of infection times and access stan models and helper functions, the dplyr and tidyr packages for data wrangling, ggplot2 library for plotting, the tidybayes package for extracting results of the inference and the scoringutils package for evaluating forecasts. We will also use qrensemble for quantile regression averaging in the weighted ensemble section.\n\nlibrary(\"nfidd\")\nlibrary(\"dplyr\")\nlibrary(\"tidyr\")\nlibrary(\"ggplot2\")\nlibrary(\"scoringutils\")\nlibrary(\"qrensemble\")\n\n\n\n\n\n\n\nTip\n\n\n\nThe best way to interact with the material is via the Visual Editor of RStudio.",
+ "crumbs": [
+ "Forecast ensembles"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-ensembles.html#initialisation",
+ "href": "sessions/forecast-ensembles.html#initialisation",
+ "title": "Forecast ensembles",
+ "section": "",
+ "text": "We set a random seed for reproducibility. Setting this ensures that you should get exactly the same results on your computer as we do.\n\nset.seed(123)",
+ "crumbs": [
+ "Forecast ensembles"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-ensembles.html#construction",
+ "href": "sessions/forecast-ensembles.html#construction",
+ "title": "Forecast ensembles",
+ "section": "Construction",
+ "text": "Construction\nWe can calculate the mean ensemble by taking the mean of the forecasts at each quantile level.\n\nmean_ensemble <- quantile_forecasts |>\n as_tibble() |>\n summarise(\n predicted = mean(predicted),\n observed = unique(observed),\n model = \"Mean ensemble\",\n .by = c(target_day, horizon, quantile_level, day)\n )\n\nSimilarly, we can calculate the median ensemble by taking the median of the forecasts at each quantile level.\n\nmedian_ensemble <- quantile_forecasts |>\n as_tibble() |>\n summarise(\n predicted = median(predicted),\n observed = unique(observed),\n model = \"Median ensemble\",\n .by = c(target_day, horizon, quantile_level, day)\n )\n\nWe combine the ensembles into a single data frame along with the individual forecasts in order to make visualisation easier.\n\nsimple_ensembles <- bind_rows(\n mean_ensemble,\n median_ensemble,\n quantile_forecasts\n)",
+ "crumbs": [
+ "Forecast ensembles"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-ensembles.html#visualisation",
+ "href": "sessions/forecast-ensembles.html#visualisation",
+ "title": "Forecast ensembles",
+ "section": "Visualisation",
+ "text": "Visualisation\nHow do these ensembles visually differ from the individual models? Lets start by plotting a single forecast from each model and comparing them.\n\nplot_ensembles <- function(data, obs_data) {\n data |>\n pivot_wider(names_from = quantile_level, values_from = predicted) |>\n ggplot(aes(x = day)) +\n geom_ribbon(\n aes(\n ymin = .data[[\"0.05\"]], ymax = .data[[\"0.95\"]], fill = model,\n group = target_day\n ),\n alpha = 0.2\n ) +\n geom_ribbon(\n aes(\n ymin = .data[[\"0.25\"]], ymax = .data[[\"0.75\"]], fill = model,\n group = target_day\n ),\n alpha = 0.5\n ) +\n geom_point(\n data = obs_data,\n aes(x = day, y = onsets), color = \"black\"\n ) +\n scale_color_binned(type = \"viridis\") +\n facet_wrap(~model) +\n theme(legend.position = \"none\")\n}\n\nplot_single_forecasts <- simple_ensembles |>\n filter(target_day == 57) |>\n plot_ensembles(onset_df |> filter(day >= 57, day <= 57 + 14))\n\nplot_single_forecasts\n\n\n\n\n\n\n\n\nAgain we can get a different perspective by plotting the forecasts on the log scale.\n\nplot_single_forecasts +\n scale_y_log10()\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nTake 2 minutes\n\n\n\nHow do these ensembles compare to the individual models? How do they differ from each other?\n\n\n\n\n\n\n\n\nSolution\n\n\n\n\n\nHow do these ensembles compare to the individual models?\n\nBoth of the simple ensembles appear to be less variable than the statistical models but have more variable than the mechanistic model.\nBoth ensembles are more like the statistical models than the mechanistic model.\n\nHow do they differ from each other?\n\nThe mean ensemble has slightly tighter uncertainty bounds than the median ensemble.\n\n\n\n\nNow lets plot a range of forecasts from each model and ensemble.\n\nplot_multiple_forecasts <- simple_ensembles |>\n plot_ensembles(onset_df |> filter(day >= 21)) +\n lims(y = c(0, 400))\n\nplot_multiple_forecasts\n\n\n\n\n\n\n\n\nAgain we can get a different perspective by plotting the forecasts on the log scale.\n\nplot_multiple_forecasts +\n scale_y_log10()\n\nScale for y is already present.\nAdding another scale for y, which will replace the existing scale.\n\n\nWarning in scale_y_log10(): log-10 transformation introduced infinite values.\nlog-10 transformation introduced infinite values.\nlog-10 transformation introduced infinite values.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nTake 2 minutes\n\n\n\nHow do these ensembles compare to the individual models?\nHow do they differ from each other?\nAre there any differences across forecast dates?\n\n\n\n\n\n\n\n\nSolution\n\n\n\n\n\nHow do these ensembles compare to the individual models?\n\nAs before, the ensembles appear to be less variable than the statistical models but more variable than the mechanistic model.\n\nHow do they differ from each other?\n\nThe mean ensemble has marginally tighter uncertainty bounds than the median ensemble as for the single forecast.\n\nAre there any differences across forecast dates?\n\nThe mean ensemble appears to be more variable across forecast dates than the median ensemble with this being more pronounced after the peak of the outbreak.",
+ "crumbs": [
+ "Forecast ensembles"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-ensembles.html#evaluation",
+ "href": "sessions/forecast-ensembles.html#evaluation",
+ "title": "Forecast ensembles",
+ "section": "Evaluation",
+ "text": "Evaluation\nAs in the forecast evaluation session, we can evaluate the accuracy of the ensembles using the {scoringutils} package and in particular the score() function.\n\nensemble_scores <- simple_ensembles |>\n as_forecast_quantile(forecast_unit = c(\"target_day\", \"horizon\", \"model\")) |>\n score()\n\nAgain we start with a high level overview of the scores by model.\n\nensemble_scores |>\n summarise_scores(by = c(\"model\"))\n\n model wis overprediction underprediction dispersion\n <char> <num> <num> <num> <num>\n1: Mean ensemble 8.250097 4.271101 0.9264881 3.052507\n2: Median ensemble 10.197812 5.954107 0.9897321 3.253973\n3: Random walk 12.086955 7.568929 0.7236607 3.794366\n4: More statistical 10.717272 5.710446 1.3504464 3.656379\n5: More mechanistic 5.453563 1.614821 2.1319643 1.706777\n bias interval_coverage_50 interval_coverage_90 ae_median\n <num> <num> <num> <num>\n1: 0.200000000 0.5133929 0.8750000 12.498512\n2: 0.158928571 0.5267857 0.8794643 15.484375\n3: 0.227678571 0.4955357 0.8616071 18.209821\n4: -0.004464286 0.5357143 0.8616071 15.959821\n5: 0.214285714 0.4732143 0.7857143 8.473214\n\n\n\n\n\n\n\n\nTake 5 minutes\n\n\n\nWhat do you think the scores are telling you? Which model do you think is best? What other scoring breakdowns might you want to look at?\n\n\n\n\n\n\n\n\nSolution\n\n\n\n\n\nWhat do you think the scores are telling you? Which model do you think is best?\n\nThe mean ensemble appears to be the best performing ensemble model overall.\nHowever, the more mechanistic model appears to be the best performing model overall.\n\nWhat other scoring breakdowns might you want to look at?\n\nThere might be variation over forecast dates or horizons between the different ensemble methods",
+ "crumbs": [
+ "Forecast ensembles"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-ensembles.html#construction-1",
+ "href": "sessions/forecast-ensembles.html#construction-1",
+ "title": "Forecast ensembles",
+ "section": "Construction",
+ "text": "Construction\nAs we just saw, the random walk model (our original baseline model) is performing poorly in comparison to the other models. We can remove this model from the ensemble and see if this improves the performance of the ensemble.\n\n\n\n\n\n\nWarning\n\n\n\nHere we are technically cheating a little as we are using the test data to help select the models to include in the ensemble. In the real world you would not do this as you would not have access to the test data and so this is an idealised scenario.\n\n\n\nfiltered_models <- quantile_forecasts |>\n filter(model != \"Random walk\")\n\nWe then need to recalculate the ensembles. First the mean ensemble,\n\nfiltered_mean_ensembles <- filtered_models |>\n as_tibble() |>\n summarise(\n predicted = mean(predicted),\n observed = unique(observed),\n model = \"Mean filtered ensemble\",\n .by = c(target_day, horizon, quantile_level, day)\n )\n\nand then the median ensemble.\n\nfiltered_median_ensembles <- filtered_models |>\n as_tibble() |>\n summarise(\n predicted = median(predicted),\n observed = unique(observed),\n model = \"Median filtered ensemble\",\n .by = c(target_day, horizon, quantile_level, day)\n )\n\nWe combine these new ensembles with our previous ensembles in order to make visualisation easier.\n\nfiltered_ensembles <- bind_rows(\n filtered_mean_ensembles,\n filtered_median_ensembles,\n simple_ensembles\n)",
+ "crumbs": [
+ "Forecast ensembles"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-ensembles.html#visualisation-1",
+ "href": "sessions/forecast-ensembles.html#visualisation-1",
+ "title": "Forecast ensembles",
+ "section": "Visualisation",
+ "text": "Visualisation\nAs for the simple ensembles, we can plot a single forecast from each model and ensemble.\n\nfiltered_ensembles |>\n filter(target_day == 57) |>\n plot_ensembles(onset_df |> filter(day >= 57, day <= 57 + 14))\n\n\n\n\n\n\n\n\nand on the log scale.\n\nfiltered_ensembles |>\n filter(target_day == 57) |>\n plot_ensembles(onset_df |> filter(day >= 57, day <= 57 + 14)) +\n scale_y_log10()\n\n\n\n\n\n\n\n\nTo get an overview we also plot a range of forecasts from each model and ensemble.\n\nplot_multiple_filtered_forecasts <- filtered_ensembles |>\n plot_ensembles(onset_df |> filter(day >= 21)) +\n lims(y = c(0, 400))\nplot_multiple_filtered_forecasts\n\n\n\n\n\n\n\n\nand on the log scale.\n\nplot_multiple_filtered_forecasts +\n scale_y_log10()\n\nScale for y is already present.\nAdding another scale for y, which will replace the existing scale.\n\n\nWarning in scale_y_log10(): log-10 transformation introduced infinite values.\nlog-10 transformation introduced infinite values.\nlog-10 transformation introduced infinite values.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nTake 2 minutes\n\n\n\nHow do the filtered ensembles compare to the simple ensembles? Which do you think is best?\n\n\n\n\n\n\n\n\nSolution\n\n\n\n\n\nHow do the filtered ensembles compare to the simple ensembles?\n\nThe filtered ensembles appear to be less variable than the simple ensembles.\nThe filtered ensembles appear to be more like the mechanistic model than the simple ensembles.\n\nWhich do you think is best?\n\nVisually, the filtered ensembles appear very similar. This makes sense given we know there are only two models left in the ensemble.",
+ "crumbs": [
+ "Forecast ensembles"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-ensembles.html#evaluation-1",
+ "href": "sessions/forecast-ensembles.html#evaluation-1",
+ "title": "Forecast ensembles",
+ "section": "Evaluation",
+ "text": "Evaluation\nLet us score the filtered ensembles.\n\nfiltered_ensemble_scores <- filtered_ensembles |>\n as_forecast_quantile(\n forecast_unit = c(\n \"target_day\", \"horizon\", \"model\"\n )\n ) |>\n score()\n\nAgain we can get a high level overview of the scores by model.\n\nfiltered_ensemble_scores |>\n summarise_scores(by = c(\"model\"))\n\n model wis overprediction underprediction dispersion\n <char> <num> <num> <num> <num>\n1: Mean filtered ensemble 6.754926 2.891027 1.1823214 2.681578\n2: Median filtered ensemble 6.754926 2.891027 1.1823214 2.681578\n3: Mean ensemble 8.250097 4.271101 0.9264881 3.052507\n4: Median ensemble 10.197812 5.954107 0.9897321 3.253973\n5: Random walk 12.086955 7.568929 0.7236607 3.794366\n6: More statistical 10.717272 5.710446 1.3504464 3.656379\n7: More mechanistic 5.453563 1.614821 2.1319643 1.706777\n bias interval_coverage_50 interval_coverage_90 ae_median\n <num> <num> <num> <num>\n1: 0.152232143 0.5312500 0.8794643 10.337054\n2: 0.152232143 0.5312500 0.8794643 10.337054\n3: 0.200000000 0.5133929 0.8750000 12.498512\n4: 0.158928571 0.5267857 0.8794643 15.484375\n5: 0.227678571 0.4955357 0.8616071 18.209821\n6: -0.004464286 0.5357143 0.8616071 15.959821\n7: 0.214285714 0.4732143 0.7857143 8.473214\n\n\n\n\n\n\n\n\nTake 2 minutes\n\n\n\nHow do the filtered ensembles compare to the simple ensembles?\n\n\n\n\n\n\n\n\nSolution\n\n\n\n\n\nHow do the filtered ensembles compare to the simple ensembles?\n\nThe filtered ensembles appear to be more accurate than the simple ensembles.\nAs you would expect they are an average of the more mechanistic model and the more statistical model.\nAs there are only two models in the ensemble, the median and mean ensembles are identical.\nFor the first time there are features of the ensemble that outperform the more mechanistic model though it remains the best performing model overall.",
+ "crumbs": [
+ "Forecast ensembles"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-ensembles.html#construction-2",
+ "href": "sessions/forecast-ensembles.html#construction-2",
+ "title": "Forecast ensembles",
+ "section": "Construction",
+ "text": "Construction\n\nInverse WIS weighting\nIn order to perform inverse WIS weighting we first need to calculate the WIS for each model. We already have this from the previous evaluation so we can reuse this.\n\nweights_per_model <- ensemble_scores |>\n dplyr::filter(\n model %in% c(\"More mechanistic\", \"More statistical\", \"Random walk\")\n ) |>\n summarise_scores(by = c(\"model\", \"target_day\")) |>\n select(model, target_day, wis) |>\n mutate(inv_wis = 1 / wis) |>\n mutate(\n inv_wis_total_by_date = sum(inv_wis), .by = target_day\n ) |>\n mutate(weights = inv_wis / inv_wis_total_by_date)\n\nweights_per_model |>\n select(model, target_day, weights) |>\n pivot_wider(names_from = model, values_from = weights)\n\n# A tibble: 16 × 4\n target_day `Random walk` `More statistical` `More mechanistic`\n <dbl> <dbl> <dbl> <dbl>\n 1 22 0.323 0.379 0.298\n 2 29 0.408 0.346 0.247\n 3 36 0.378 0.287 0.335\n 4 43 0.324 0.267 0.409\n 5 50 0.297 0.226 0.477\n 6 57 0.290 0.248 0.462\n 7 64 0.243 0.264 0.494\n 8 71 0.418 0.326 0.256\n 9 78 0.233 0.334 0.432\n10 85 0.0652 0.0748 0.860\n11 92 0.215 0.300 0.484\n12 99 0.205 0.304 0.490\n13 106 0.151 0.177 0.672\n14 113 0.324 0.470 0.206\n15 120 0.440 0.387 0.173\n16 127 0.319 0.260 0.421\n\n\nNow lets apply the weights to the forecast models. As we can only use information that was available at the time of the forecast to perform the weighting, we use weights from two weeks prior to the forecast date to inform each ensemble.\n\ninverse_wis_ensemble <- quantile_forecasts |>\n as_tibble() |>\n left_join(\n weights_per_model |>\n mutate(target_day = target_day + 14),\n by = c(\"model\", \"target_day\")\n ) |>\n # assign equal weights if no weights are available\n mutate(weights = ifelse(is.na(weights), 1/3, weights)) |>\n summarise(\n predicted = sum(predicted * weights),\n observed = unique(observed),\n model = \"Inverse WIS ensemble\",\n .by = c(target_day, horizon, quantile_level, day)\n )\n\n\n\nQuantile regression averaging\nWe futher to perform quantile regression averaging (QRA) for each forecast date. Again we need to consider how many previous forecasts we wish to use to inform each ensemble forecast. Here we decide to use up to 3 weeks of previous forecasts to inform each QRA ensemble.\n\nforecast_dates <- quantile_forecasts |>\n as_tibble() |>\n pull(target_day) |>\n unique()\n\nqra_by_forecast <- function(\n quantile_forecasts,\n forecast_dates,\n group = c(\"target_end_date\"), \n ...\n) {\n lapply(forecast_dates, \\(x) {\n quantile_forecasts |>\n mutate(target_end_date = x) |>\n dplyr::filter(target_day <= x) |>\n dplyr::filter(target_day >= x - (3 * 7 + 1)) |>\n dplyr::filter(target_day == x | day <= x) |>\n qra(\n group = group,\n target = c(target_day = x),\n ...\n )\n })\n}\n\nqra_ensembles_obj <- qra_by_forecast(\n quantile_forecasts,\n forecast_dates[-1],\n group = c(\"target_end_date\")\n)\n\nqra_weights <- seq_along(qra_ensembles_obj) |>\n lapply(\\(i) attr(qra_ensembles_obj[[i]], \"weights\") |>\n mutate(target_day = forecast_dates[i + 1])\n ) |>\n bind_rows() |>\n dplyr::filter(quantile == 0.5)\n\nqra_ensembles <- qra_ensembles_obj |>\n bind_rows()\n\nInstead of creating a single optimised ensemble and using this for all forecast horizons we might also want to consider a separate optimised QRA ensemble for each forecast horizon, reflecting that models might perform differently depending on how far ahead a forecast is produced. We can do this using qra() with the group argument.\n\nqra_ensembles_by_horizon <- qra_by_forecast(\n quantile_forecasts,\n forecast_dates[-c(1:2)],\n group = c(\"horizon\", \"target_end_date\"),\n model = \"QRA by horizon\"\n)\n\nqra_weights_by_horizon <- seq_along(qra_ensembles_by_horizon) |>\n lapply(\\(i) attr(qra_ensembles_by_horizon[[i]], \"weights\") |>\n mutate(target_day = forecast_dates[i + 2])\n ) |>\n bind_rows()\n\n\n\nCombine ensembles\n\nweighted_ensembles <- bind_rows(\n inverse_wis_ensemble,\n qra_ensembles,\n qra_ensembles_by_horizon,\n filtered_ensembles\n) |>\n # remove the repeated filtered ensemble\n filter(model != \"Mean filtered ensemble\")",
+ "crumbs": [
+ "Forecast ensembles"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-ensembles.html#visualisation-2",
+ "href": "sessions/forecast-ensembles.html#visualisation-2",
+ "title": "Forecast ensembles",
+ "section": "Visualisation",
+ "text": "Visualisation\n\nSingle forecasts\nAgain we start by plotting a single forecast from each model and ensemble.\n\nweighted_ensembles |>\n filter(target_day == 57) |>\n plot_ensembles(onset_df |> filter(day >= 57, day <= 57 + 14))\n\n\n\n\n\n\n\n\nand on the log scale.\n\nweighted_ensembles |>\n filter(target_day == 57) |>\n plot_ensembles(onset_df |> filter(day >= 57, day <= 57 + 14)) +\n scale_y_log10()\n\n\n\n\n\n\n\n\n\n\nMultiple forecasts\nAs before we can plot a range of forecasts from each model and ensemble.\n\nplot_multiple_weighted_forecasts <- weighted_ensembles |>\n plot_ensembles(onset_df |> filter(day >= 21)) +\n lims(y = c(0, 400))\nplot_multiple_weighted_forecasts\n\n\n\n\n\n\n\n\nand on the log scale.\n\nplot_multiple_weighted_forecasts +\n scale_y_log10()\n\nScale for y is already present.\nAdding another scale for y, which will replace the existing scale.\n\n\nWarning in scale_y_log10(): log-10 transformation introduced infinite values.\nlog-10 transformation introduced infinite values.\nlog-10 transformation introduced infinite values.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nTake 2 minutes\n\n\n\nHow do the weighted ensembles compare to the simple ensembles? Which do you think is best? Are you surprised by the results? Can you think of any reasons that would explain them?\n\n\n\n\n\n\n\n\nSolution\n\n\n\n\n\nHow do the weighted ensembles compare to the simple ensembles?\n\n\n\n\n\nModels weights\nNow that we have a weighted ensemble, we can also look at the weights of the individual models. Here we do this for the quantile regression averaging ensemble but we could also do this for the inverse WIS ensemble or any other weighted ensemble (for an unweighted ensemble the weights are all equal).\n\nqra_weights |>\n ggplot(aes(x = target_day, y = weight, fill = model)) +\n geom_col(position = \"stack\") +\n theme(legend.position = \"bottom\")\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nTake 2 minutes\n\n\n\nHow do the weights change over time? Are you surprised by the results given what you know about the models performance?\n\n\n\n\n\n\n\n\nSolution\n\n\n\n\n\nHow do the weights change over time?\n\nEarly on the more statistical models have higher weights\nGradually the random walk model gains weight and by the end of the forecast horizon it represents the entire ensemble.\nNear the peak the mechanistic model also gains weight.\n\nAre you surprised by the results given what you know about the models performance?\n\nAs the random walk model is performing poorly, you would expect it to have low weights but actually it often doesn’t. This implies that its poor performance is restricted to certain parts of the outbreak.\nEven though the mechanistic model performs really well overall it is only included in the ensemble around the peak. This could be because the training data doesn’t include changes in the epidemic dynamics and so the mechanistic model is not given sufficient weight.",
+ "crumbs": [
+ "Forecast ensembles"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-ensembles.html#evaluation-2",
+ "href": "sessions/forecast-ensembles.html#evaluation-2",
+ "title": "Forecast ensembles",
+ "section": "Evaluation",
+ "text": "Evaluation\nFor a final evaluation we can look at the scores for each model and ensemble again. We remove the two weeks of forecasts as these do not have a quantile regression average forecasts as these require training data to estimate.\n\nweighted_ensemble_scores <- weighted_ensembles |>\n filter(target_day >= 29) |>\n as_forecast_quantile(forecast_unit = c(\"target_day\", \"horizon\", \"model\")) |>\n score()\n\nAgain we can get a high level overview of the scores by model.\n\nweighted_ensemble_scores |>\n summarise_scores(by = c(\"model\"))\n\n model wis overprediction underprediction\n <char> <num> <num> <num>\n1: Inverse WIS ensemble 8.473266 4.422306 1.1072491\n2: Quantile Regression Average 7.362562 3.040759 1.9483875\n3: QRA by horizon 7.963665 3.409738 2.0659589\n4: Median filtered ensemble 7.022126 2.985429 1.2601905\n5: Mean ensemble 8.612259 4.452825 0.9873016\n6: Median ensemble 10.681557 6.238667 1.0547619\n7: Random walk 12.695476 7.961143 0.7709524\n8: More statistical 11.263910 6.030190 1.4395238\n9: More mechanistic 5.603486 1.571524 2.2731429\n dispersion bias interval_coverage_50 interval_coverage_90 ae_median\n <num> <num> <num> <num> <num>\n1: 2.943712 0.16619048 0.5095238 0.8190476 12.681390\n2: 2.373415 0.07190476 0.5047619 0.7952381 11.193606\n3: 2.487968 0.13367347 0.4897959 0.7755102 12.005322\n4: 2.776507 0.11476190 0.5523810 0.8761905 10.728571\n5: 3.172132 0.16571429 0.5333333 0.8714286 13.020635\n6: 3.388129 0.12238095 0.5476190 0.8714286 16.178571\n7: 3.963381 0.19571429 0.5142857 0.8523810 19.085714\n8: 3.794195 -0.04238095 0.5333333 0.8523810 16.780952\n9: 1.758819 0.17809524 0.4952381 0.7857143 8.685714\n\n\nRemembering the session on forecast evaluation, we should also check performance on the log scale.\n\nlog_ensemble_scores <- weighted_ensembles |>\n filter(target_day >= 29) |>\n as_forecast_quantile(forecast_unit = c(\"target_day\", \"horizon\", \"model\")) |>\n transform_forecasts(\n fun = log_shift,\n offset = 1,\n append = FALSE\n ) |>\n score()\n\nlog_ensemble_scores |>\n summarise_scores(by = c(\"model\"))\n\n model wis overprediction underprediction\n <char> <num> <num> <num>\n1: Inverse WIS ensemble 0.1738246 0.08278544 0.03090237\n2: Quantile Regression Average 0.1770130 0.08663305 0.03016151\n3: QRA by horizon 0.1657445 0.09253210 0.02377453\n4: Median filtered ensemble 0.1549876 0.07138677 0.02566016\n5: Mean ensemble 0.1689778 0.07928707 0.02702292\n6: Median ensemble 0.1889663 0.08160408 0.03772979\n7: Random walk 0.2046647 0.09644129 0.03538436\n8: More statistical 0.2160592 0.06803280 0.06550430\n9: More mechanistic 0.1598201 0.09859516 0.02048076\n dispersion bias interval_coverage_50 interval_coverage_90 ae_median\n <num> <num> <num> <num> <num>\n1: 0.06013681 0.16619048 0.5095238 0.8190476 0.2634627\n2: 0.06021847 0.07190476 0.5047619 0.7952381 0.2597085\n3: 0.04943783 0.13826531 0.4948980 0.7755102 0.2405630\n4: 0.05794062 0.11476190 0.5523810 0.8761905 0.2364991\n5: 0.06266781 0.16571429 0.5333333 0.8714286 0.2565306\n6: 0.06963244 0.12238095 0.5476190 0.8714286 0.2876486\n7: 0.07283902 0.19571429 0.5142857 0.8523810 0.3108524\n8: 0.08252213 -0.04238095 0.5333333 0.8523810 0.3193459\n9: 0.04074423 0.17809524 0.4952381 0.7857143 0.2396753\n\n\n\n\n\n\n\n\nTake 2 minutes\n\n\n\nHow do the weighted ensembles compare to the simple ensembles on the natural and log scale?\n\n\n\n\n\n\n\n\nSolution\n\n\n\n\n\nThe best ensembles slightly outperform some of the simple ensembles but there is no obvious benefit from using weighted ensembles. Why might this be the case?",
+ "crumbs": [
+ "Forecast ensembles"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-evaluation.html",
+ "href": "sessions/forecast-evaluation.html",
+ "title": "Forecast evaluation",
+ "section": "",
+ "text": "So far we have focused on visualising forecasts, including confronting them with events that were observed after the forecast was made. Besides visualising the forecasts, we can also summarise performance quantitatively. In this session you will get to know several ways of assessing different aspects of forecast performance.\n\n\n\nForecast evaluation\n\n\n\n\nThe aim of this session is to introduce the concept of forecast evaluation using scoring rules.\n\n\n\n\n\n\nSetup\n\n\n\n\n\n\n\nThe source file of this session is located at sessions/forecast-evaluation.qmd.\n\n\n\nIn this session we will use the nfidd package to load a data set of infection times and access stan models and helper functions, the dplyr and package for data wrangling, ggplot2 library for plotting and the scoringutils package for evaluating forecasts.\n\nlibrary(\"nfidd\")\nlibrary(\"dplyr\")\nlibrary(\"ggplot2\")\nlibrary(\"scoringutils\")\n\n\n\n\n\n\n\nTip\n\n\n\nThe best way to interact with the material is via the Visual Editor of RStudio.\n\n\n\n\n\nWe set a random seed for reproducibility. Setting this ensures that you should get exactly the same results on your computer as we do.\n\nset.seed(123)",
+ "crumbs": [
+ "Forecast evaluation"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-evaluation.html#slides",
+ "href": "sessions/forecast-evaluation.html#slides",
+ "title": "Forecast evaluation",
+ "section": "",
+ "text": "Forecast evaluation",
+ "crumbs": [
+ "Forecast evaluation"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-evaluation.html#objectives",
+ "href": "sessions/forecast-evaluation.html#objectives",
+ "title": "Forecast evaluation",
+ "section": "",
+ "text": "The aim of this session is to introduce the concept of forecast evaluation using scoring rules.\n\n\n\n\n\n\nSetup\n\n\n\n\n\n\n\nThe source file of this session is located at sessions/forecast-evaluation.qmd.\n\n\n\nIn this session we will use the nfidd package to load a data set of infection times and access stan models and helper functions, the dplyr and package for data wrangling, ggplot2 library for plotting and the scoringutils package for evaluating forecasts.\n\nlibrary(\"nfidd\")\nlibrary(\"dplyr\")\nlibrary(\"ggplot2\")\nlibrary(\"scoringutils\")\n\n\n\n\n\n\n\nTip\n\n\n\nThe best way to interact with the material is via the Visual Editor of RStudio.\n\n\n\n\n\nWe set a random seed for reproducibility. Setting this ensures that you should get exactly the same results on your computer as we do.\n\nset.seed(123)",
+ "crumbs": [
+ "Forecast evaluation"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-evaluation.html#source-file",
+ "href": "sessions/forecast-evaluation.html#source-file",
+ "title": "Forecast evaluation",
+ "section": "",
+ "text": "The source file of this session is located at sessions/forecast-evaluation.qmd.",
+ "crumbs": [
+ "Forecast evaluation"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-evaluation.html#libraries-used",
+ "href": "sessions/forecast-evaluation.html#libraries-used",
+ "title": "Forecast evaluation",
+ "section": "",
+ "text": "In this session we will use the nfidd package to load a data set of infection times and access stan models and helper functions, the dplyr and package for data wrangling, ggplot2 library for plotting and the scoringutils package for evaluating forecasts.\n\nlibrary(\"nfidd\")\nlibrary(\"dplyr\")\nlibrary(\"ggplot2\")\nlibrary(\"scoringutils\")\n\n\n\n\n\n\n\nTip\n\n\n\nThe best way to interact with the material is via the Visual Editor of RStudio.",
+ "crumbs": [
+ "Forecast evaluation"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-evaluation.html#initialisation",
+ "href": "sessions/forecast-evaluation.html#initialisation",
+ "title": "Forecast evaluation",
+ "section": "",
+ "text": "We set a random seed for reproducibility. Setting this ensures that you should get exactly the same results on your computer as we do.\n\nset.seed(123)",
+ "crumbs": [
+ "Forecast evaluation"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-evaluation.html#scoring-the-forecasts",
+ "href": "sessions/forecast-evaluation.html#scoring-the-forecasts",
+ "title": "Forecast evaluation",
+ "section": "Scoring the forecasts",
+ "text": "Scoring the forecasts\nWe now summarise performance quantitatively by using scoring metrics. Whilst some of these metrics are more useful for comparing models, many can be also be useful for understanding the performance of a single model.\n\n\n\n\n\n\nTip\n\n\n\nIn this session, we’ll use “proper” scoring rules: these are scoring rules that make sure no model can get better scores than the true model, i.e. the model used to generate the data. Of course we usually don’t know this (as we don’t know the “true model” for real-world data) but proper scoring rules incentivise forecasters to make their best attempt at reproducing its behaviour. For a comprehensive text on proper scoring rules and their mathematical properties, we recommend the classic paper by Gneiting and Raftery (2007).\n\n\nWe will use the {scoringutils} package to calculate these metrics. Our first step is to convert our forecasts into a format that the {scoringutils} package can use. We will use as_forecast_sample() to do this:\n\nsc_forecasts <- rw_forecasts |>\n left_join(onset_df, by = \"day\") |>\n filter(!is.na(.value)) |>\n as_forecast_sample(\n forecast_unit = c(\n \"target_day\", \"horizon\", \"model\"\n ),\n observed = \"onsets\",\n predicted = \".value\",\n sample_id = \".draw\"\n )\nsc_forecasts\n\nForecast type: sample\n\n\nForecast unit:\n\n\ntarget_day, horizon, and model\n\n\n\n sample_id predicted observed target_day horizon model\n <int> <num> <int> <num> <int> <char>\n 1: 1 4 4 22 1 Random walk\n 2: 2 2 4 22 1 Random walk\n 3: 3 2 4 22 1 Random walk\n 4: 4 6 4 22 1 Random walk\n 5: 5 2 4 22 1 Random walk\n --- \n223996: 996 1 2 127 14 Random walk\n223997: 997 0 2 127 14 Random walk\n223998: 998 0 2 127 14 Random walk\n223999: 999 1 2 127 14 Random walk\n224000: 1000 0 2 127 14 Random walk\n\n\nAs you can see this has created a forecast object which has a print method that summarises the forecasts.\n\n\n\n\n\n\nTake 2 minutes\n\n\n\nWhat important information is in the forecast object?\n\n\n\n\n\n\n\n\nSolution\n\n\n\n\n\n\nThe forecast unit which is the target day, horizon, and model\nThe type of forecast which is a sample forecast\n\n\n\n\nEverything seems to be in order. We can now use the scoringutils package to calculate some metrics. We will use the default sample metrics (as our forecasts are in sample format) and score our forecasts.\n\nsc_scores <- sc_forecasts |>\n score()\n\nsc_scores\n\n target_day horizon model bias dss crps overprediction\n <num> <int> <char> <num> <num> <num> <num>\n 1: 22 1 Random walk -0.304 1.587880 0.623424 0.000\n 2: 22 2 Random walk 0.828 3.215491 1.664522 1.118\n 3: 22 3 Random walk 0.002 1.936078 0.574624 0.000\n 4: 22 4 Random walk 0.687 3.231981 1.570469 0.862\n 5: 22 5 Random walk 0.767 3.750147 2.172579 1.386\n --- \n220: 127 10 Random walk -0.821 3.120249 1.745842 0.000\n221: 127 11 Random walk -0.867 3.853982 1.972866 0.000\n222: 127 12 Random walk -0.949 7.803330 3.092738 0.000\n223: 127 13 Random walk -0.914 5.003912 2.318689 0.000\n224: 127 14 Random walk -0.637 1.140452 0.774974 0.000\n underprediction dispersion log_score mad ae_median se_mean\n <num> <num> <num> <num> <num> <num>\n 1: 0.154 0.469424 1.887790 1.4826 1 0.367236\n 2: 0.000 0.546522 2.231108 2.9652 3 8.479744\n 3: 0.000 0.574624 1.862234 2.9652 0 0.142884\n 4: 0.000 0.708469 2.175798 2.9652 3 9.284209\n 5: 0.000 0.786579 2.489797 2.9652 3 15.327225\n --- \n220: 1.400 0.345842 2.694333 1.4826 3 5.597956\n221: 1.684 0.288866 2.871442 1.4826 3 6.687396\n222: 2.832 0.260738 4.339301 1.4826 4 13.883076\n223: 2.062 0.256689 3.239324 1.4826 3 8.265625\n224: 0.496 0.278974 1.814469 1.4826 1 0.948676\n\n\n\n\n\n\n\n\nLearning more about the output of score()\n\n\n\n\n\nSee the documentation for get_metrics.forecast_sample()[https://epiforecasts.io/scoringutils/reference/get_metrics.forecast_sample.html] for information on the default metrics for forecasts that are represented as samples (in our case the samples generated by the stan model).\n\n\n\n\nAt a glance\nBefore we look in detail at the scores, we can use summarise_scores to get a quick overview of the scores. Don’t worry if you don’t understand all the scores yet, we will go some of them in more detail in the next section and you can find more information in the {scoringutils} documentation.\n\nsc_scores |>\n summarise_scores(by = \"model\")\n\n model bias dss crps overprediction underprediction\n <char> <num> <num> <num> <num> <num>\n1: Random walk 0.2147143 6.366294 13.50026 8.38367 0.7640446\n dispersion log_score mad ae_median se_mean\n <num> <num> <num> <num> <num>\n1: 4.352542 3.973451 18.2512 18.20982 1567.78\n\n\n\n\n\n\n\n\nTake 2 minutes\n\n\n\nBefore we look in detail at the scores, what do you think the scores are telling you?\n\n\n\n\nContinuous ranked probability score\n\nWhat is the Continuous Ranked Probability Score (CRPS)?\nThe Continuous Ranked Probability Score (CRPS) is a proper scoring rule used to evaluate the accuracy of probabilistic forecasts. It is a generalization of the Mean Absolute Error (MAE) to probabilistic forecasts, where the forecast is a distribution rather than a single point estimate (i.e. like ours).\nThe CRPS can be thought about as the combination of two key aspects of forecasting: 1. The accuracy of the forecast in terms of how close the predicted values are to the observed value. 2. The confidence of the forecast in terms of the spread of the predicted values.\nBy balancing these two aspects, the CRPS provides a comprehensive measure of the quality of probabilistic forecasts.\n\n\n\n\n\n\nKey things to note about the CRPS\n\n\n\n\nSmall values are better\nAs it is an absolute scoring rule it can be difficult to use to compare forecasts across scales.\n\n\n\n\n\n\n\n\n\nMathematical Definition (optional)\n\n\n\n\n\nFor distributions with a finite first moment (a mean exists and it is finite), the CRPS can be expressed as:\n\\[\nCRPS(D, y) = \\mathbb{E}_{X \\sim D}[|X - y|] - \\frac{1}{2} \\mathbb{E}_{X, X' \\sim D}[|X - X'|]\n\\]\nwhere \\(X\\) and \\(X'\\) are independent random variables sampled from the distribution \\(D\\). To calculate this we simply replace \\(X\\) and \\(X'\\) by samples from our posterior distribution and sum over all possible combinations.\nThis equation can be broke down into the two components:\n\nBreakdown of the Components\n\nExpected Absolute Error Between Forecast and Observation: \\(\\mathbb{E}_{X \\sim D}[|X - y|]\\) This term represents the average absolute difference between the values predicted by the forecasted distribution \\(D\\) and the actual observed value \\(y\\). It measures how far, on average, the forecasted values are from the observed value. A smaller value indicates that the forecasted distribution is closer to the observed value.\nExpected Absolute Error Between Two Forecasted Values: \\(\\frac{1}{2} \\mathbb{E}_{X, X' \\sim D}[|X - X'|]\\) This term represents the average absolute difference between two independent samples from the forecasted distribution \\(D\\). It measures the internal variability or spread of the forecasted distribution. A larger value indicates a wider spread of the forecasted values.\n\n\n\nInterpretation\n\nFirst Term (\\(\\mathbb{E}_{X \\sim D}[|X - y|]\\)): This term penalizes the forecast based on how far the predicted values are from the observed value. It ensures that the forecast is accurate in terms of proximity to the actual observation.\nSecond Term (\\(\\frac{1}{2} \\mathbb{E}_{X, X' \\sim D}[|X - X'|]\\)): This term accounts for the spread of the forecasted distribution. It penalizes forecasts that are too uncertain or have a wide spread. By subtracting this term, the CRPS rewards forecasts that are not only accurate but also confident (i.e., have a narrow spread).\n\n\n\n\n\nWhilst the CRPS is a very useful metric it can be difficult to interpret in isolation. It is often useful to compare the CRPS of different models or to compare the CRPS of the same model under different conditions. For example, lets compare the CRPS across different forecast horizons.\n\nsc_scores |>\n summarise_scores(by = \"horizon\") |>\n ggplot(aes(x = horizon, y = crps)) +\n geom_point() +\n labs(title = \"CRPS by daily forecast horizon\", \n subtitle = \"Summarised across all forecasts\")\n\n\n\n\n\n\n\n\nand at different time points.\n\nsc_scores |>\n summarise_scores(by = \"target_day\") |>\n ggplot(aes(x = target_day, y = crps)) +\n geom_point() +\n labs(title = \"CRPS by forecast start date\", \n subtitle = \"Summarised across all forecasts\", x = \"forecast date\")\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nTake 5 minutes\n\n\n\nHow do the CRPS scores change based on forecast date? How do the CRPS scores change with forecast horizon? What does this tell you about the model?\n\n\n\n\n\n\n\n\nSolution\n\n\n\n\n\n\nThe CRPS scores increase for forecast dates where incidence is higher.\nThe CRPS scores increase with forecast horizon.\nAs the CRPS is an absolute measure it is hard to immediately know if the CRPS increasing with forecast date indicates that the model is performing worse.\nHowever, the CRPS increasing with forecast horizon is a sign that the model is struggling to capture the longer term dynamics of the epidemic.\n\n\n\n\n\n\n\nPIT histograms\nAs well as the CRPS we can also look at the calibration and bias of the model. Calibration is the agreement between the forecast probabilities and the observed frequencies. Bias is a measure of how likely the model is to over or under predict the observed values.\nThere are many ways to assess calibration and bias but one common way is to use a probability integral transform (PIT) histogram. This is a histogram of the cumulative distribution of function of a forecast evaluated at the observed value.\n\n\n\n\n\n\nInterpreting the PIT histogram\n\n\n\n\nIdeally PIT histograms should be uniform.\nIf is a U shape then the model is overconfident and if it is an inverted U shape then the model is underconfident.\nIf it is skewed then the model is biased towards the direction of the skew.\n\n\n\n\n\n\n\n\n\nMathematical Definition (optional)\n\n\n\n\n\nContinuous Case\nFor a continuous random variable \\(X\\) with cumulative distribution function (CDF) \\(F_X\\), the PIT is defined as:\n\\[\nY = F_X(X)\n\\]\nwhere \\(Y\\) is uniformly distributed on \\([0, 1]\\).\n\nInteger Case\nWhen dealing with integer forecasts, the standard PIT does not yield a uniform distribution even if the forecasts are perfectly calibrated. To address this, a randomized version of the PIT can be used (Czado, Gneiting, and Held 2009). For an integer-valued random variable \\(X\\) with CDF \\(F_X\\), the randomized PIT is defined as:\n\\[\nU = F_X(k) + v \\cdot (F_X(k) - F_X(k-1))\n\\]\nwhere:\n\n\\(k\\) is the observed integer value.\n\\(F_X(k)\\) is the CDF evaluated at \\(k\\).\n\\(v\\) is a random variable uniformly distributed on \\([0, 1]\\).\n\nThis transformation ensures that \\(U\\) is uniformly distributed on \\([0, 1]\\) if the forecasted distribution \\(F_X\\) is correctly specified.\n\n\n\n\nLet’s first look at the overall PIT histogram.\n\nsc_forecasts |>\n get_pit_histogram() |>\n ggplot(aes(x = mid, y = density)) +\n geom_col() +\n labs(title = \"PIT histogram\", x = \"Quantile\", y = \"Density\")\n\n\n\n\n\n\n\n\nAs before lets look at the PIT histogram by forecast horizon. To save space we will group horizons into a few days each:\n\nsc_forecasts |>\n mutate(group_horizon = case_when(\n horizon <= 3 ~ \"1-3\",\n horizon <= 7 ~ \"4-7\",\n horizon <= 14 ~ \"8-14\"\n )) |>\n get_pit_histogram(by = \"group_horizon\") |>\n ggplot(aes(x = mid, y = density)) +\n geom_col() + \n facet_wrap(~group_horizon) +\n labs(title = \"PIT by forecast horizon (days)\")\n\n\n\n\n\n\n\n\nand then for different forecast dates.\n\nsc_forecasts |>\n get_pit_histogram(by = \"target_day\") |>\n ggplot(aes(x = mid, y = density)) +\n geom_col() + \n facet_wrap(~target_day) +\n labs(title = \"PIT by forecast date\")\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nTake 5 minutes\n\n\n\nWhat do you think of the PIT histograms? Do they look well calibrated? Do they look biased?\n\n\n\n\n\n\n\n\nSolution\n\n\n\n\n\n\nIt looks like the model is biased towards overpredicting and that this bias gets worse at longer forecast horizons.\nLooking over forecast dates it looks like much of this bias is coming from near the outbreak peak where the model is consistently overpredicting but the model is also over predicting at other times.",
+ "crumbs": [
+ "Forecast evaluation"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-evaluation.html#scoring-on-the-log-scale",
+ "href": "sessions/forecast-evaluation.html#scoring-on-the-log-scale",
+ "title": "Forecast evaluation",
+ "section": "Scoring on the log scale",
+ "text": "Scoring on the log scale\nWe can also score on the logarithmic scale. This can be useful if we are interested in the relative performance of the model at different scales of the data, for example if we are interested in the model’s performance at capturing the exponential growth phase of the epidemic. In some sense scoring in this way can be an approximation of scoring the effective reproduction number estimates. Doing this directly can be difficult as the effective reproduction number is a latent variable and so we cannot directly score it.\nWe again use scoringutils but first transform both the forecasts and observations to the log scale.\n\nlog_sc_forecasts <- sc_forecasts |>\n transform_forecasts(\n fun = log_shift,\n offset = 1,\n append = FALSE\n )\n\nlog_scores <- log_sc_forecasts |>\n score()\n\nFor more on scoring on the log scale see the paper by Bosse et al. (2023).\n\nAt a glance\n\nlog_scores |>\n summarise_scores(by = \"model\")\n\n model bias dss crps overprediction underprediction\n <char> <num> <num> <num> <num> <num>\n1: Random walk 0.1774554 -0.6084371 0.2451112 0.121214 0.03678654\n dispersion log_score mad ae_median se_mean\n <num> <num> <num> <num> <num>\n1: 0.08711065 0.5440927 0.3761217 0.3379461 0.2082002\n\n\n\n\n\n\n\n\nTake 2 minutes\n\n\n\nBefore we look in detail at the scores, what do you think the scores are telling you? How do you think they will differ from the scores on the natural scale?\n\n\n\n\nCRPS\n\nlog_scores |>\n summarise_scores(by = \"horizon\") |>\n ggplot(aes(x = horizon, y = crps)) +\n geom_point() +\n labs(title = \"CRPS by daily forecast horizon, scored on the log scale\")\n\n\n\n\n\n\n\n\nand across different forecast dates\n\nlog_scores |>\n summarise_scores(by = \"target_day\") |>\n ggplot(aes(x = target_day, y = crps)) +\n geom_point() +\n labs(title = \"CRPS by forecast date, scored on the log scale\")\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nTake 5 minutes\n\n\n\nHow do the CRPS scores change based on forecast date? How do the CRPS scores change with forecast horizon? What does this tell you about the model?\n\n\n\n\n\n\n\n\nSolution\n\n\n\n\n\n\nAs for the natural scale CRPS scores increase with forecast horizon but now the increase appears to be linear vs exponential.\nThere has been a reduction in the CRPS scores for forecast dates near the outbreak peak compared to other forecast dates but this is still the period where the model is performing worst.\n\n\n\n\n\n\nPIT histograms\nLet’s first look at the overall PIT histogram.\n\nlog_sc_forecasts |>\n get_pit_histogram(by = \"model\") |>\n ggplot(aes(x = mid, y = density)) +\n geom_col() +\n labs(title = \"PIT histogram, scored on the log scale\")\n\n\n\n\n\n\n\n\nAs before lets look at the PIT histogram by forecast horizon\n\nlog_sc_forecasts |>\n mutate(group_horizon = case_when(\n horizon <= 3 ~ \"1-3\",\n horizon <= 7 ~ \"4-7\",\n horizon <= 14 ~ \"8-14\"\n )) |>\n get_pit_histogram(by = \"group_horizon\") |>\n ggplot(aes(x = mid, y = density)) +\n geom_col() +\n facet_wrap(~group_horizon) +\n labs(title = \"PIT by forecast horizon, scored on the log scale\")\n\n\n\n\n\n\n\n\nand then for different forecast dates.\n\nlog_sc_forecasts |>\n get_pit_histogram(by = \"target_day\") |>\n ggplot(aes(x = mid, y = density)) +\n geom_col() +\n facet_wrap(~target_day) +\n labs(title = \"PIT by forecast date, scored on the log scale\")\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nTake 5 minutes\n\n\n\nWhat do you think of the PIT histograms? Do they look well calibrated? Do they look biased?\n\n\n\n\n\n\n\n\nSolution\n\n\n\n\n\n\nThe overall PIT histograms suggest that the model is less biased to over predict when scored on the log scale than the natural scale, but it is still biased. This makes sense when we think back to the comparison of reproduction number estimates and forecasts we made earlier where the model was consistently over predicting on the reproduction number.\nBy forecast horizon the model is still biased towards over predicting but this bias is less pronounced than on the natural scale.\nTowards the end and beginning of the forecast period the model appears to be well calibrated on the log scale but is biased towards over predicting in the middle of the forecast period.\nThis matches with our knowledge of the underlying reproduction number which were initially constant and then began to decrease only to stabilise towards the end of the outbreak.",
+ "crumbs": [
+ "Forecast evaluation"
+ ]
+ },
+ {
+ "objectID": "index.html",
+ "href": "index.html",
+ "title": "Understanding, evaluating, and improving forecasts of infectious disease burden",
+ "section": "",
+ "text": "This course is a shorter version of a course on Nowcasting and forecasting infectious disease dynamics.\nAll materials here are provided under the permissive MIT License."
+ },
+ {
+ "objectID": "sessions/slides/introduction-to-ensembles.html#ensembles",
+ "href": "sessions/slides/introduction-to-ensembles.html#ensembles",
+ "title": "Multi-model ensembles",
+ "section": "Ensembles",
+ "text": "Ensembles\n\nCombine many different models’ forecasts into a single prediction"
+ },
+ {
+ "objectID": "sessions/slides/introduction-to-ensembles.html#why-ensemble",
+ "href": "sessions/slides/introduction-to-ensembles.html#why-ensemble",
+ "title": "Multi-model ensembles",
+ "section": "Why ensemble?",
+ "text": "Why ensemble?\n\nMany uncertainties, many approaches: many models\n\nLayers of uncertainty\n\nModel parameters\n\ne.g. parameterising delay distributions\n\nModel structure\n\ne.g. more mechanistic or more statistical approaches\n\n\nWant to use all available information"
+ },
+ {
+ "objectID": "sessions/slides/introduction-to-ensembles.html#why-ensemble-1",
+ "href": "sessions/slides/introduction-to-ensembles.html#why-ensemble-1",
+ "title": "Multi-model ensembles",
+ "section": "Why ensemble?",
+ "text": "Why ensemble?\n“Whole is greater than sum of parts”\n\nAverage of multiple predictions is often more performant than any individual model\n\nHistory in weather & economic forecasting\nSeen this in infectious disease forecasting\n\n“Forecast challenges”: Ebola, dengue, flu, COVID-19…"
+ },
+ {
+ "objectID": "sessions/slides/introduction-to-ensembles.html#ensemble-methods",
+ "href": "sessions/slides/introduction-to-ensembles.html#ensemble-methods",
+ "title": "Multi-model ensembles",
+ "section": "Ensemble methods",
+ "text": "Ensemble methods\n\n\nSummarising across models to create single (probabilistic) prediction\n\ne.g. average at each models’ probabilistic quantiles\n\nMean\nMedian - trims the outliers, so narrows the uncertainty"
+ },
+ {
+ "objectID": "sessions/slides/introduction-to-ensembles.html#ensemble-methods-1",
+ "href": "sessions/slides/introduction-to-ensembles.html#ensemble-methods-1",
+ "title": "Multi-model ensembles",
+ "section": "Ensemble methods",
+ "text": "Ensemble methods\n\nEqual or weighted combination\n\nWeight models by past forecast performance\n\ne.g. using forecast scores\n\nRarely better than equal average"
+ },
+ {
+ "objectID": "sessions/slides/introduction-to-ensembles.html#collaborative-modelling",
+ "href": "sessions/slides/introduction-to-ensembles.html#collaborative-modelling",
+ "title": "Multi-model ensembles",
+ "section": "Collaborative modelling",
+ "text": "Collaborative modelling"
+ },
+ {
+ "objectID": "sessions/slides/introduction-to-ensembles.html#section",
+ "href": "sessions/slides/introduction-to-ensembles.html#section",
+ "title": "Multi-model ensembles",
+ "section": "",
+ "text": "… European Respicast"
+ },
+ {
+ "objectID": "sessions/slides/introduction-to-ensembles.html#single-model",
+ "href": "sessions/slides/introduction-to-ensembles.html#single-model",
+ "title": "Multi-model ensembles",
+ "section": "Single model",
+ "text": "Single model"
+ },
+ {
+ "objectID": "sessions/slides/introduction-to-ensembles.html#multiple-models",
+ "href": "sessions/slides/introduction-to-ensembles.html#multiple-models",
+ "title": "Multi-model ensembles",
+ "section": "… Multiple models",
+ "text": "… Multiple models"
+ },
+ {
+ "objectID": "sessions/slides/introduction-to-ensembles.html#multi-model-ensemble",
+ "href": "sessions/slides/introduction-to-ensembles.html#multi-model-ensemble",
+ "title": "Multi-model ensembles",
+ "section": "… … Multi-model ensemble",
+ "text": "… … Multi-model ensemble"
+ },
+ {
+ "objectID": "sessions/slides/introduction-to-ensembles.html#r-fontawesomefalaptop-code-white-your-turn",
+ "href": "sessions/slides/introduction-to-ensembles.html#r-fontawesomefalaptop-code-white-your-turn",
+ "title": "Multi-model ensembles",
+ "section": "r fontawesome::fa(\"laptop-code\", \"white\") Your Turn",
+ "text": "r fontawesome::fa(\"laptop-code\", \"white\") Your Turn\n\nCreate unweighted and weighted ensembles using forecasts from multiple models.\nEvaluate the forecasts from ensembles compared to their constituent models."
+ },
+ {
+ "objectID": "sessions/slides/forecast-evaluation.html#your-turn",
+ "href": "sessions/slides/forecast-evaluation.html#your-turn",
+ "title": "Forecast evaluation",
+ "section": " Your Turn",
+ "text": "Your Turn\n\nLoad forecasts from the model we have visualised previously.\nEvaluate the forecasts using proper scoring rules"
+ },
+ {
+ "objectID": "sessions/introduction-and-course-overview.html",
+ "href": "sessions/introduction-and-course-overview.html",
+ "title": "Introduction and course overview",
+ "section": "",
+ "text": "Introduction to the course and the instructors\n\nAim of the course\nIn this course we will explore how to:\n\nVisualize and interpret infectious disease forecasts\nEvaluate forecast performance and limitations\nCombine multiple forecasts into ensembles\n\nPredictive modeling has become crucial for public health decision-making and pandemic preparedness. Through this workshop, we aim to make forecast interpretation and evaluation more accessible to academics and public health institutions.\n\n\nApproach\nEach session in the course:\n\nbuilds on the previous one so that participants will have an overview of forecast evaluation by the end of the course;\nstarts with a short introductory talk;\nmainly consists of interactive content that participants will work through;\nhas optional/additional material that can be skipped or completed after the course ends;\n\nFor those attending the in-person version the course also:\n\nhas multiple instructors ready to answer questions about this content; if several people have a similar question we may pause the session and discuss it with the group;\nends with a wrap-up and discussion where we review the sessions material.\n\n\n\nTimeline for the course\nThe course consists of four main sessions spread across an afternoon. If you are attending the in-person version of the course, the schedule is available here. If you are studying this material on your own using the website, you can go through it at your own pace.\nLet’s get started!\n\nHave a more detailed look at the learning objectives\nIf you haven’t already, start with getting set up for the course\nStart the course by learning about Visualising infectious disease forecasts",
+ "crumbs": [
+ "Introduction and course overview"
+ ]
+ },
+ {
+ "objectID": "sessions.html",
+ "href": "sessions.html",
+ "title": "Sessions",
+ "section": "",
+ "text": "13:30-13:35\n\nIntroduction to the course and the instructors (5 mins)"
+ },
+ {
+ "objectID": "sessions.html#session-0-introduction-and-course-overview",
+ "href": "sessions.html#session-0-introduction-and-course-overview",
+ "title": "Sessions",
+ "section": "",
+ "text": "13:30-13:35\n\nIntroduction to the course and the instructors (5 mins)"
+ },
+ {
+ "objectID": "sessions.html#session-1-forecasting",
+ "href": "sessions.html#session-1-forecasting",
+ "title": "Sessions",
+ "section": "Session 1: Forecasting",
+ "text": "Session 1: Forecasting\n13:35-14:10\n\nIntroduction to forecasting as an epidemiological problem (10 mins)\nPractice session: Visualising infectious disease forecasts (25 minutes)"
+ },
+ {
+ "objectID": "sessions.html#session-2-forecast-evaluation",
+ "href": "sessions.html#session-2-forecast-evaluation",
+ "title": "Sessions",
+ "section": "Session 2: Forecast evaluation",
+ "text": "Session 2: Forecast evaluation\n14:10-15:00\n\nAn introduction to forecast evaluation (5 mins)\nPractice session: Evaluating forecasts from a range of models (40 mins)\nWrap up (5 mins)"
+ },
+ {
+ "objectID": "sessions.html#session-3-evaluating-forecasts-from-multiple-models",
+ "href": "sessions.html#session-3-evaluating-forecasts-from-multiple-models",
+ "title": "Sessions",
+ "section": "Session 3: Evaluating forecasts from multiple models",
+ "text": "Session 3: Evaluating forecasts from multiple models\n15:10-15:50\n\nWhy might we want to evaluate forecasts from multiple models? (5 mins)\nPractice session: Evaluating forecasts from multiple models (40 mins)\nWrap up (5 mins)\n\nThere is a coffee break at 15:30-15:40 you are welcome to attend this or keep working through the course material."
+ },
+ {
+ "objectID": "sessions.html#session-4-forecast-ensembles",
+ "href": "sessions.html#session-4-forecast-ensembles",
+ "title": "Sessions",
+ "section": "Session 4: Forecast ensembles",
+ "text": "Session 4: Forecast ensembles\n15:50-17:10\n\nIntroduction to forecast ensembles (10 mins)\nPractice session: Creating forecast ensembles (60 mins)\nWrap up (10 mins)"
+ },
+ {
+ "objectID": "sessions.html#session-5-end-of-course-summary",
+ "href": "sessions.html#session-5-end-of-course-summary",
+ "title": "Sessions",
+ "section": "Session 5: End of course summary",
+ "text": "Session 5: End of course summary\n17:10-17:30\n\nSummary of the course (10 mins)\nFinal discussion and closing (10 mins)"
+ },
+ {
+ "objectID": "help.html",
+ "href": "help.html",
+ "title": "Getting help",
+ "section": "",
+ "text": "For any questions about the course or its content, feel free to use the Discussion board"
+ },
+ {
+ "objectID": "learning_objectives.html",
+ "href": "learning_objectives.html",
+ "title": "Independent learning outcomes",
+ "section": "",
+ "text": "understanding of forecasting as an epidemiological problem\nunderstanding of ways to visualise forecasts"
+ },
+ {
+ "objectID": "learning_objectives.html#forecasting",
+ "href": "learning_objectives.html#forecasting",
+ "title": "Independent learning outcomes",
+ "section": "",
+ "text": "understanding of forecasting as an epidemiological problem\nunderstanding of ways to visualise forecasts"
+ },
+ {
+ "objectID": "learning_objectives.html#evaluating-forecasts",
+ "href": "learning_objectives.html#evaluating-forecasts",
+ "title": "Independent learning outcomes",
+ "section": "Evaluating forecasts",
+ "text": "Evaluating forecasts\n\nfamiliarity with metrics for evaluating probabilistic forecasts and their properties\nability to score probabilistic forecasts in R"
+ },
+ {
+ "objectID": "learning_objectives.html#ensemble-models",
+ "href": "learning_objectives.html#ensemble-models",
+ "title": "Independent learning outcomes",
+ "section": "Ensemble models",
+ "text": "Ensemble models\n\nunderstanding of predictive ensembles and their properties\nability to create a predictive ensemble of forecasts in R\nunderstanding the concept of weighted forecast ensembles"
+ },
+ {
+ "objectID": "sessions/end-of-course-summary-and-discussion.html",
+ "href": "sessions/end-of-course-summary-and-discussion.html",
+ "title": "End of course summary and discussion",
+ "section": "",
+ "text": "We hope that you’ve enjoyed taking this course on understanding and evaluating forecasts of infectious disease burden. We will be very happy to have your feedback, comments or any questions on the course discussion page.\n\n\nEnd of course summary",
+ "crumbs": [
+ "End of course summary and discussion"
+ ]
+ },
+ {
+ "objectID": "sessions/end-of-course-summary-and-discussion.html#slides",
+ "href": "sessions/end-of-course-summary-and-discussion.html#slides",
+ "title": "End of course summary and discussion",
+ "section": "",
+ "text": "End of course summary",
+ "crumbs": [
+ "End of course summary and discussion"
+ ]
+ },
+ {
+ "objectID": "sessions/slides/closing.html#session-1-forecasting",
+ "href": "sessions/slides/closing.html#session-1-forecasting",
+ "title": "End of course summary",
+ "section": "Session 1: Forecasting",
+ "text": "Session 1: Forecasting\n\nforecasting is the task of making unconditional statements about the future\nmeaningful forecasts are probabilistic\nwe can use visualisation understand the predictive performance of a model"
+ },
+ {
+ "objectID": "sessions/slides/closing.html#session-2-forecast-evaluation",
+ "href": "sessions/slides/closing.html#session-2-forecast-evaluation",
+ "title": "End of course summary",
+ "section": "Session 2: Forecast evaluation",
+ "text": "Session 2: Forecast evaluation\n\nwe can assess forecasts using proper scoring rules\nwe can use scoring to quantify the predictive performance of different models"
+ },
+ {
+ "objectID": "sessions/slides/forecasting-as-an-epidemiological-problem.html#your-turn",
+ "href": "sessions/slides/forecasting-as-an-epidemiological-problem.html#your-turn",
+ "title": "Forecasting as an epidemiological problem",
+ "section": " Your Turn",
+ "text": "Your Turn\n\nLoad some forecasts we have generated and visualise them alongside the data.\nExplore different ways of visualising forecasts to assess how good the model performs at forecasting."
+ },
+ {
+ "objectID": "sessions/slides/introduction-to-the-spectrum-of-forecasting-models.html#different-types-of-models",
+ "href": "sessions/slides/introduction-to-the-spectrum-of-forecasting-models.html#different-types-of-models",
+ "title": "Introduction to the spectrum of forecasting models",
+ "section": "Different types of models",
+ "text": "Different types of models\n\n\n\n\nWe can classify models by the level of mechanism they include\nAll of the model types we will introduce in the next few slides have been used for COVID-19 forecasting (the US and/or European COVID-19 forecast hub)\n\nNOTE: level of mechanism \\(\\neq\\) model complexity\n\n\nCramer et al., Scientific Data, 2022"
+ },
+ {
+ "objectID": "sessions/slides/introduction-to-the-spectrum-of-forecasting-models.html#your-turn",
+ "href": "sessions/slides/introduction-to-the-spectrum-of-forecasting-models.html#your-turn",
+ "title": "Introduction to the spectrum of forecasting models",
+ "section": " Your Turn",
+ "text": "Your Turn\n\nLoad forecasts from the model we have visualised previously.\nEvaluate the forecasts using proper scoring rules"
+ },
+ {
+ "objectID": "getting-set-up.html",
+ "href": "getting-set-up.html",
+ "title": "Overview",
+ "section": "",
+ "text": "Each session in this course uses R code for demonstration. All the content is self-contained within a R software package designed for the course and uses notebooks for each session.\nIf you would like to you can follow the course material without installing any software or running any code. However, if you would like to run the code yourself, you will need to follow the instructions below."
+ },
+ {
+ "objectID": "getting-set-up.html#installation-of-the-nfidd-package",
+ "href": "getting-set-up.html#installation-of-the-nfidd-package",
+ "title": "Overview",
+ "section": "Installation of the nfidd package",
+ "text": "Installation of the nfidd package\nTo install the packages needed in the course, including the nfidd package that contains data files and helper functions used throughout, you can use the pak package:\n\ninstall.packages(\"pak\")\noptions(repos = c(\n options(\"repos\"),\n \"stan-dev\" = \"https://stan-dev.r-universe.dev\",\n \"epiforecasts\" = \"https://epiforecasts.r-universe.dev\"\n))\npak::pak(\"nfidd/nfidd\", dependencies = \"all\", upgrade = TRUE)\n\nThen you can check that the installation completed successfully by loading the package into your R session:\n\nlibrary(\"nfidd\")"
+ },
+ {
+ "objectID": "sessions/forecast-visualisation.html",
+ "href": "sessions/forecast-visualisation.html",
+ "title": "Visualising infectious disease forecasts",
+ "section": "",
+ "text": "Epidemiological forecasts are probabilistic statements about what might happen to population disease burden in the future. In this session we will make some simple forecasts using a commonly used infectious disease model, based on the renewal equation. We will see how we can visualise such forecasts, and visually compare them to what really happened.\n\n\n\nForecasting as an epidemiological problem\n\n\n\n\nThe aim of this session is to introduce the concept of forecasting and forecast visualisation using a simple model.\n\n\n\n\n\n\nSetup\n\n\n\n\n\n\n\nThe source file of this session is located at sessions/forecasting-visualisation.qmd.\n\n\n\nIn this session we will use the nfidd package to load a data set of infection times and corresponding forecasts, the dplyr package for data wrangling, and the ggplot2 library for plotting.\n\nlibrary(\"nfidd\")\nlibrary(\"dplyr\")\nlibrary(\"ggplot2\")\n\n\n\n\n\n\n\nTip\n\n\n\nThe best way to interact with the material is via the Visual Editor of RStudio.\n\n\n\n\n\nWe set a random seed for reproducibility. Setting this ensures that you should get exactly the same results on your computer as we do.\n\nset.seed(123)\nWe will now look at some data from an infectious disease outbreak and forecasts from a renewal equation model. First, we simulate some data using functions from the nfidd package.\ngen_time_pmf <- make_gen_time_pmf()\nip_pmf <- make_ip_pmf()\nonset_df <- simulate_onsets(\n make_daily_infections(infection_times), gen_time_pmf, ip_pmf\n)\ntail(onset_df)\n\n# A tibble: 6 × 3\n day onsets infections\n <dbl> <int> <int>\n1 137 1 5\n2 138 3 1\n3 139 5 4\n4 140 3 1\n5 141 2 1\n6 142 2 0\nThis uses a data set of infection times that is included in the R package, and a function to turn these into daily number of observed symptom onsets. Do not worry too much about the details of this. The important thing is that we now have a data set onset_df of symptom onset.",
+ "crumbs": [
+ "Visualising infectious disease forecasts"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-visualisation.html#slides",
+ "href": "sessions/forecast-visualisation.html#slides",
+ "title": "Visualising infectious disease forecasts",
+ "section": "",
+ "text": "Forecasting as an epidemiological problem",
+ "crumbs": [
+ "Visualising infectious disease forecasts"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-visualisation.html#objectives",
+ "href": "sessions/forecast-visualisation.html#objectives",
+ "title": "Visualising infectious disease forecasts",
+ "section": "",
+ "text": "The aim of this session is to introduce the concept of forecasting and forecast visualisation using a simple model.\n\n\n\n\n\n\nSetup\n\n\n\n\n\n\n\nThe source file of this session is located at sessions/forecasting-visualisation.qmd.\n\n\n\nIn this session we will use the nfidd package to load a data set of infection times and corresponding forecasts, the dplyr package for data wrangling, and the ggplot2 library for plotting.\n\nlibrary(\"nfidd\")\nlibrary(\"dplyr\")\nlibrary(\"ggplot2\")\n\n\n\n\n\n\n\nTip\n\n\n\nThe best way to interact with the material is via the Visual Editor of RStudio.\n\n\n\n\n\nWe set a random seed for reproducibility. Setting this ensures that you should get exactly the same results on your computer as we do.\n\nset.seed(123)",
+ "crumbs": [
+ "Visualising infectious disease forecasts"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-visualisation.html#source-file",
+ "href": "sessions/forecast-visualisation.html#source-file",
+ "title": "Visualising infectious disease forecasts",
+ "section": "",
+ "text": "The source file of this session is located at sessions/forecasting-visualisation.qmd.",
+ "crumbs": [
+ "Visualising infectious disease forecasts"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-visualisation.html#libraries-used",
+ "href": "sessions/forecast-visualisation.html#libraries-used",
+ "title": "Visualising infectious disease forecasts",
+ "section": "",
+ "text": "In this session we will use the nfidd package to load a data set of infection times and corresponding forecasts, the dplyr package for data wrangling, and the ggplot2 library for plotting.\n\nlibrary(\"nfidd\")\nlibrary(\"dplyr\")\nlibrary(\"ggplot2\")\n\n\n\n\n\n\n\nTip\n\n\n\nThe best way to interact with the material is via the Visual Editor of RStudio.",
+ "crumbs": [
+ "Visualising infectious disease forecasts"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-visualisation.html#initialisation",
+ "href": "sessions/forecast-visualisation.html#initialisation",
+ "title": "Visualising infectious disease forecasts",
+ "section": "",
+ "text": "We set a random seed for reproducibility. Setting this ensures that you should get exactly the same results on your computer as we do.\n\nset.seed(123)",
+ "crumbs": [
+ "Visualising infectious disease forecasts"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-visualisation.html#simulating-a-geometric-random-walk",
+ "href": "sessions/forecast-visualisation.html#simulating-a-geometric-random-walk",
+ "title": "Visualising infectious disease forecasts",
+ "section": "Simulating a geometric random walk",
+ "text": "Simulating a geometric random walk\nYou can have a look at an R function for performing the geometric random walk:\n\ngeometric_random_walk\n\nfunction (init, noise, std) \n{\n n <- length(noise) + 1\n x <- numeric(n)\n x[1] <- init\n for (i in 2:n) {\n x[i] <- x[i - 1] + noise[i - 1] * std\n }\n return(exp(x))\n}\n<bytecode: 0x556748ce16a8>\n<environment: namespace:nfidd>\n\n\nWe can use this function to simulate a random walk:\n\nR <- geometric_random_walk(init = 1, noise = rnorm(100), std = 0.1)\ndata <- tibble(t = seq_along(R), R = exp(R))\n\nggplot(data, aes(x = t, y = R)) +\n geom_line() +\n labs(title = \"Simulated data from a random walk model\",\n x = \"Time\",\n y = \"R\")\n\n\n\n\n\n\n\n\nYou can repeat this multiple times either with the same parameters or changing some to get a feeling for what this does.",
+ "crumbs": [
+ "Visualising infectious disease forecasts"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-visualisation.html#visualising-the-forecast",
+ "href": "sessions/forecast-visualisation.html#visualising-the-forecast",
+ "title": "Visualising infectious disease forecasts",
+ "section": "Visualising the forecast",
+ "text": "Visualising the forecast\nWe can now visualise a forecast made from the renewal equation model we used for forecasting. Once again, this forecast is provided in the nfidd package which we loaded earlier. We will first extract the forecast and then plot the forecasted number of symptom onsets alongside the observed number of symptom onsets before the forecast was made.\n\ndata(rw_forecasts)\nhorizon <- 14\nforecast_day <- 71\nforecast <- rw_forecasts |>\n ungroup() |>\n filter(target_day == forecast_day)\n\nprevious_onsets <- onset_df |>\n filter(day <= forecast_day)\n\n\n\n\n\n\n\nHow did we generate these forecasts?\n\n\n\n\n\nSome important things to note about these forecasts:\n\nWe used a 14 day forecast horizon.\nEach forecast used all the data up to the forecast date.\nWe generated 1000 predictive posterior samples for each forecast.\nWe started forecasting 3 weeks into the outbreak and then forecast every 7 days until the end of the data (excluding the last 14 days to allow a full forecast).\nWe use the same simulated outbreak data as before:\n\n\ngen_time_pmf <- make_gen_time_pmf()\nip_pmf <- make_ip_pmf()\nonset_df <- simulate_onsets(\n make_daily_infections(infection_times), gen_time_pmf, ip_pmf\n)\nhead(onset_df)\n\n# A tibble: 6 × 3\n day onsets infections\n <dbl> <int> <int>\n1 1 0 0\n2 2 0 1\n3 3 0 0\n4 4 0 2\n5 5 0 1\n6 6 0 1\n\n\n\n\n\n\nforecast |>\n filter(.draw %in% sample(.draw, 100)) |>\n ggplot(aes(x = day)) +\n geom_line(alpha = 0.1, aes(y = .value, group = .draw)) +\n geom_point(data = previous_onsets, aes(x = day, y = onsets), color = \"black\") +\n labs(\n title = \"Symptom onsets\",\n subtitle = \"Forecast (trajectories) and observed (points)\"\n )\n\n\n\n\n\n\n\n\nIn this plot, the dots show the data and the lines are forecast trajectories that the model deems plausible and consistent with the data so far.\n\n\n\n\n\n\nTake 5 minutes\n\n\n\nWhat do you think of this forecast? Does it match your intuition of how this outbreak will continue? Is there another way you could visualise the forecast that might be more informative?\n\n\n\n\n\n\n\n\nSolution\n\n\n\n\n\n\nThe forecast mainly predicts things to continue growing as they have been. However, some of the trajectories are going down, indicating that with some probabilities the outbreak might end.\nBased purely on the data and without knowing anything else about the disease and setting, it is hard to come up with an alternative. The model seems sensible given the available data. In particular, uncertainty increases with increasing distance to the data, which is a sign of a good forecasting model.\nInstead of visualising plausible trajectories we could visualise a cone with a central forecast and uncertainty around it. We will look at this in the next session as an alternative.\n\n\n\n\nIf we want to know how well a model is doing at forecasting, we can later compare it do the data of what really happened. If we do this, we get the following plot.\n\nfuture_onsets <- onset_df |>\n filter(day <= forecast_day + horizon)\n\nforecast |>\n filter(.draw %in% sample(.draw, 100)) |>\n ggplot(aes(x = day)) +\n geom_line(alpha = 0.1, aes(y = .value, group = .draw)) +\n geom_point(data = future_onsets, aes(x = day, y = onsets), color = \"black\") +\n labs(\n title = \"Symptom onsets\",\n subtitle = \"Forecast (trajectories) and observed (points)\"\n )\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nTake 5 minutes\n\n\n\nWhat do you think now of this forecast? Did the model do a good job? Again, is there another way you could visualise the forecast that might be more informative?\n\n\n\n\n\n\n\n\nSolution\n\n\n\n\n\n\nOn the face of it the forecast looks very poor with some very high predictions compared to the data.\nBased on this visualisation it is hard to tell if the model is doing a good job but it seems like it is not.\nAs outbreaks are generally considered to be exponential processes it might be more informative to plot the forecast on the log scale.\n\n\nforecast |>\n filter(.draw %in% sample(.draw, 100)) |>\n ggplot(aes(x = day)) +\n geom_line(alpha = 0.1, aes(y = .value, group = .draw)) +\n geom_point(data = future_onsets, aes(x = day, y = onsets), color = \"black\") +\n scale_y_log10() +\n labs(title = \"Symptom onsets, log scale\", subtitle = \"Forecast and observed\")\n\nWarning in scale_y_log10(): log-10 transformation introduced infinite values.\n\n\n\n\n\n\n\n\n\nThis should be a lot more informative. We see that for longer forecast horizons the model is not doing a great job of capturing the reduction in symptom onsets. However, we can now see that the model seems to be producing very reasonable forecasts for the first week or so of the forecast. This is a common pattern in forecasting where a model is good at capturing the short term dynamics but struggles with the longer term dynamics.\n\n\n\nWe managed to learn quite a lot about our model’s forecasting limitations just looking at a single forecast using visualisations. However, what if we wanted to quantify how well the model is doing? In order to do that we need to look at multiple forecasts from the model.",
+ "crumbs": [
+ "Visualising infectious disease forecasts"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-visualisation.html#visualising-multiple-forecasts-from-a-single-model",
+ "href": "sessions/forecast-visualisation.html#visualising-multiple-forecasts-from-a-single-model",
+ "title": "Visualising infectious disease forecasts",
+ "section": "Visualising multiple forecasts from a single model",
+ "text": "Visualising multiple forecasts from a single model\nAs for a single forecast, our first step is to visualise the forecasts as this can give us a good idea of how well the model is doing without having to calculate any metrics.\n\nrw_forecasts |>\n filter(.draw %in% sample(.draw, 100)) |>\n ggplot(aes(x = day)) +\n geom_line(\n aes(y = .value, group = interaction(.draw, target_day), col = target_day),\n alpha = 0.1\n ) +\n geom_point(\n data = onset_df |>\n filter(day >= 21),\n aes(x = day, y = onsets), color = \"black\") +\n scale_color_binned(type = \"viridis\") +\n labs(title = \"Weekly forecasts of symptom onsets over an outbreak\",\n col = \"Forecast start day\")\n\n\n\n\n\n\n\n\nAs for the single forecast it may be helpful to also plot the forecast on the log scale.\n\nrw_forecasts |>\n filter(.draw %in% sample(.draw, 100)) |> \n ggplot(aes(x = day)) +\n geom_line(\n aes(y = .value, group = interaction(.draw, target_day), col = target_day),\n alpha = 0.1\n ) +\n geom_point(data = onset_df, aes(x = day, y = onsets), color = \"black\") +\n scale_y_log10() +\n scale_color_binned(type = \"viridis\") +\n labs(title = \"Weekly symptom onset forecasts: log scale\",\n col = \"Forecast start day\")\n\nWarning in scale_y_log10(): log-10 transformation introduced infinite values.\nlog-10 transformation introduced infinite values.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nTake 5 minutes\n\n\n\nWhat do you think of these forecasts? Are they any good? How well do they capture changes in trend? Does the uncertainty seem reasonable? Do they seem to under or over predict consistently? Would you visualise the forecast in a different way?\n\n\n\n\n\n\n\n\nSolution\n\n\n\n\n\nWhat do you think of these forecasts?\n\nWe think these forecasts are a reasonable place to start but there is definitely room for improvement.\n\nAre they any good?\n\nThey seem to do a reasonable job of capturing the short term dynamics but struggle with the longer term dynamics.\n\nHow well do they capture changes in trend?\n\nThere is little evidence of the model capturing the reduction in onsets before it begins to show in the data.\n\nDoes the uncertainty seem reasonable?\n\nOn the natural scale it looks like the model often over predicts. Things seem more balanced on the log scale but the model still seems to be overly uncertain.\n\nDo they seem to under or over predict consistently?\n\nIt looks like the model is consistently over predicting on the natural scale but this is less clear on the log scale.",
+ "crumbs": [
+ "Visualising infectious disease forecasts"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-evaluation-of-multiple-models.html",
+ "href": "sessions/forecast-evaluation-of-multiple-models.html",
+ "title": "Evaluating forecasts from multiple models",
+ "section": "",
+ "text": "We can classify models along a spectrum by how much they include an understanding of underlying processes, or mechanisms; or whether they emphasise drawing from the data using a statistical approach. These different approaches all have different strength and weaknesses, and it is not clear a prior which one produces the best forecast in any given situation.\nIn this session, we’ll start with forecasts from models of different levels of mechanism vs. statistical complexity and evaluate them using visualisation and proper scoring rules as we did in the last session for the random walk model\n\n\n\nIntroduction to the spectrum of forecasting models\n\n\n\n\nThe aim of this session is to introduce the concept of a spectrum of forecasting models and to demonstrate how to evaluate a range of different models from across this spectrum.\n\n\n\n\n\n\nSetup\n\n\n\n\n\n\n\nThe source file of this session is located at sessions/forecast-ensembles.qmd.\n\n\n\nIn this session we will use the nfidd package to load a data set of infection times and access stan models and helper functions, the dplyr and tidyr packages for data wrangling, ggplot2 library for plotting, the tidybayes package for extracting results of the inference and the scoringutils package for evaluating forecasts. We will also use qra for quantile regression averaging in the weighted ensemble section.\n\nlibrary(\"nfidd\")\nlibrary(\"dplyr\")\nlibrary(\"tidyr\")\nlibrary(\"ggplot2\")\nlibrary(\"scoringutils\")\nlibrary(\"qra\")\n\n\n\n\n\n\n\nTip\n\n\n\nThe best way to interact with the material is via the Visual Editor of RStudio.\n\n\n\n\n\nWe set a random seed for reproducibility. Setting this ensures that you should get exactly the same results on your computer as we do.\n\nset.seed(123)",
+ "crumbs": [
+ "Evaluating forecasts from multiple models"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-evaluation-of-multiple-models.html#slides",
+ "href": "sessions/forecast-evaluation-of-multiple-models.html#slides",
+ "title": "Evaluating forecasts from multiple models",
+ "section": "",
+ "text": "Introduction to the spectrum of forecasting models",
+ "crumbs": [
+ "Evaluating forecasts from multiple models"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-evaluation-of-multiple-models.html#objectives",
+ "href": "sessions/forecast-evaluation-of-multiple-models.html#objectives",
+ "title": "Evaluating forecasts from multiple models",
+ "section": "",
+ "text": "The aim of this session is to introduce the concept of a spectrum of forecasting models and to demonstrate how to evaluate a range of different models from across this spectrum.\n\n\n\n\n\n\nSetup\n\n\n\n\n\n\n\nThe source file of this session is located at sessions/forecast-ensembles.qmd.\n\n\n\nIn this session we will use the nfidd package to load a data set of infection times and access stan models and helper functions, the dplyr and tidyr packages for data wrangling, ggplot2 library for plotting, the tidybayes package for extracting results of the inference and the scoringutils package for evaluating forecasts. We will also use qra for quantile regression averaging in the weighted ensemble section.\n\nlibrary(\"nfidd\")\nlibrary(\"dplyr\")\nlibrary(\"tidyr\")\nlibrary(\"ggplot2\")\nlibrary(\"scoringutils\")\nlibrary(\"qra\")\n\n\n\n\n\n\n\nTip\n\n\n\nThe best way to interact with the material is via the Visual Editor of RStudio.\n\n\n\n\n\nWe set a random seed for reproducibility. Setting this ensures that you should get exactly the same results on your computer as we do.\n\nset.seed(123)",
+ "crumbs": [
+ "Evaluating forecasts from multiple models"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-evaluation-of-multiple-models.html#source-file",
+ "href": "sessions/forecast-evaluation-of-multiple-models.html#source-file",
+ "title": "Evaluating forecasts from multiple models",
+ "section": "",
+ "text": "The source file of this session is located at sessions/forecast-ensembles.qmd.",
+ "crumbs": [
+ "Evaluating forecasts from multiple models"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-evaluation-of-multiple-models.html#libraries-used",
+ "href": "sessions/forecast-evaluation-of-multiple-models.html#libraries-used",
+ "title": "Evaluating forecasts from multiple models",
+ "section": "",
+ "text": "In this session we will use the nfidd package to load a data set of infection times and access stan models and helper functions, the dplyr and tidyr packages for data wrangling, ggplot2 library for plotting, the tidybayes package for extracting results of the inference and the scoringutils package for evaluating forecasts. We will also use qra for quantile regression averaging in the weighted ensemble section.\n\nlibrary(\"nfidd\")\nlibrary(\"dplyr\")\nlibrary(\"tidyr\")\nlibrary(\"ggplot2\")\nlibrary(\"scoringutils\")\nlibrary(\"qra\")\n\n\n\n\n\n\n\nTip\n\n\n\nThe best way to interact with the material is via the Visual Editor of RStudio.",
+ "crumbs": [
+ "Evaluating forecasts from multiple models"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-evaluation-of-multiple-models.html#initialisation",
+ "href": "sessions/forecast-evaluation-of-multiple-models.html#initialisation",
+ "title": "Evaluating forecasts from multiple models",
+ "section": "",
+ "text": "We set a random seed for reproducibility. Setting this ensures that you should get exactly the same results on your computer as we do.\n\nset.seed(123)",
+ "crumbs": [
+ "Evaluating forecasts from multiple models"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-evaluation-of-multiple-models.html#scoring-your-forecast",
+ "href": "sessions/forecast-evaluation-of-multiple-models.html#scoring-your-forecast",
+ "title": "Evaluating forecasts from multiple models",
+ "section": "Scoring your forecast",
+ "text": "Scoring your forecast\nAgain as in the forecasting evaluation sessions, we will score the forecasts using the scoringutils package by first converting the forecasts to the scoringutils format.\n\nsc_forecasts <- forecasts |>\n left_join(onset_df, by = \"day\") |>\n filter(!is.na(.value)) |>\n as_forecast_sample(\n forecast_unit = c(\n \"target_day\", \"horizon\", \"model\"\n ),\n observed = \"onsets\",\n predicted = \".value\",\n sample_id = \".draw\"\n )\nsc_forecasts\n\nForecast type: sample\n\n\nForecast unit:\n\n\ntarget_day, horizon, and model\n\n\n\n sample_id predicted observed target_day horizon model\n <int> <num> <int> <num> <int> <char>\n 1: 1 4 2 22 1 Random walk\n 2: 2 2 2 22 1 Random walk\n 3: 3 2 2 22 1 Random walk\n 4: 4 6 2 22 1 Random walk\n 5: 5 2 2 22 1 Random walk\n --- \n671996: 996 1 1 127 14 More mechanistic\n671997: 997 7 1 127 14 More mechanistic\n671998: 998 2 1 127 14 More mechanistic\n671999: 999 1 1 127 14 More mechanistic\n672000: 1000 5 1 127 14 More mechanistic\n\n\nEverything seems to be in order. We can now calculate some metrics as we did in the forecasting concepts session.\n\nsc_scores <- sc_forecasts |>\n score()\n\nsc_scores\n\n target_day horizon model bias dss crps\n <num> <int> <char> <num> <num> <num>\n 1: 22 1 Random walk 0.440 1.937266 0.709424\n 2: 22 2 Random walk 0.828 3.215491 1.664522\n 3: 22 3 Random walk 0.636 2.748205 1.230624\n 4: 22 4 Random walk 0.885 3.932499 2.376469\n 5: 22 5 Random walk -0.331 2.627111 1.024579\n --- \n668: 127 10 More mechanistic 0.133 1.562747 0.467048\n669: 127 11 More mechanistic 0.680 2.464343 1.158455\n670: 127 12 More mechanistic 0.843 3.233157 1.656446\n671: 127 13 More mechanistic 0.802 2.848762 1.404800\n672: 127 14 More mechanistic 0.751 2.446626 1.153386\n overprediction underprediction dispersion log_score mad ae_median\n <num> <num> <num> <num> <num> <num>\n 1: 0.240 0.000 0.469424 1.671874 1.4826 1\n 2: 1.118 0.000 0.546522 2.231108 2.9652 3\n 3: 0.656 0.000 0.574624 2.011411 2.9652 2\n 4: 1.668 0.000 0.708469 2.605082 2.9652 4\n 5: 0.000 0.238 0.786579 2.277217 2.9652 2\n --- \n668: 0.000 0.000 0.467048 1.615033 1.4826 0\n669: 0.680 0.000 0.478455 1.845716 1.4826 2\n670: 1.170 0.000 0.486446 2.282375 1.4826 2\n671: 0.996 0.000 0.408800 2.092336 1.4826 2\n672: 0.782 0.000 0.371386 1.817025 1.4826 2\n se_mean\n <num>\n 1: 1.943236\n 2: 8.479744\n 3: 5.654884\n 4: 16.378209\n 5: 1.177225\n --- \n668: 0.264196\n669: 4.313929\n670: 7.300804\n671: 5.503716\n672: 4.048144\n\n\n\nAt a glance\nLet’s summarise the scores by model first.\n\nsc_scores |>\n summarise_scores(by = \"model\")\n\n model bias dss crps overprediction\n <char> <num> <num> <num> <num>\n1: Random walk 0.21751786 6.380853 13.820203 8.750536\n2: More statistical 0.02879464 6.488302 11.974884 6.488286\n3: More mechanistic 0.23843304 5.548895 6.334783 2.061241\n underprediction dispersion log_score mad ae_median se_mean\n <num> <num> <num> <num> <num> <num>\n1: 0.717125 4.352542 3.987173 18.251203 18.607143 1595.3668\n2: 1.314455 4.172143 4.015102 17.149181 15.910714 1104.6347\n3: 2.284071 1.989470 3.704554 8.511712 8.928571 176.2964\n\n\n\n\n\n\n\n\nTake 2 minutes\n\n\n\nBefore we look in detail at the scores, what do you think the scores are telling you? Which model do you think is best?\n\n\n\n\nContinuous ranked probability score\nAs in the forecasting evaluation session, we will start by looking at the CRPS by horizon and forecast date.\n\n\n\n\n\n\nReminder: Key things to note about the CRPS\n\n\n\n\nSmall values are better\nWhen scoring absolute values (e.g. number of cases) it can be difficult to use to compare forecasts across scales (i.e., when case numbers are different, for example between locations or at different times).\n\n\n\nFirst by forecast horizon.\n\nsc_scores |>\n summarise_scores(by = c(\"model\", \"horizon\")) |>\n ggplot(aes(x = horizon, y = crps, col = model)) +\n geom_point()\n\n\n\n\n\n\n\n\nand across different forecast dates\n\nsc_scores |>\n summarise_scores(by = c(\"target_day\", \"model\")) |>\n ggplot(aes(x = target_day, y = crps, col = model)) +\n geom_point()\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nTake 5 minutes\n\n\n\nHow do the CRPS values change based on forecast date? How do the CRPS values change with forecast horizon?\n\n\n\n\n\n\n\n\nSolution\n\n\n\n\n\nHow do the CRPS values change based on forecast horizon?\n\nAll models have increasing CRPS with forecast horizon.\nThe more mechanistic model has the lowest CRPS at all forecast horizon.\nThe more stastical model starts to outperform the random walk model at longer time horizons.\n\nHow do the CRPS values change with forecast date?\n\nThe more statistical model does particularly poorly around the peak of the outbreak but outperforms the random walk model.\nThe more mechanistic model does particularly well around the peak of the outbreak versus all other models\nThe more statistical model starts to outperform the other models towards the end of the outbreak.\n\n\n\n\n\n\nPIT histograms\n\n\n\n\n\n\nReminder: Interpreting the PIT histogram\n\n\n\n\nIdeally PIT histograms should be uniform.\nIf is a U shape then the model is overconfident and if it is an inverted U shape then the model is underconfident.\nIf it is skewed then the model is biased towards the direction of the skew.\n\n\n\nLet’s first look at the overall PIT histogram.\n\nsc_forecasts |>\n get_pit_histogram(by = \"model\") |>\n ggplot(aes(x = mid, y = density)) +\n geom_col() +\n facet_wrap(~model)\n\n\n\n\n\n\n\n\nAs before let’s look at the PIT histogram by forecast horizon (to save space we will group horizons)\n\nsc_forecasts |>\n mutate(group_horizon = case_when(\n horizon <= 3 ~ \"1-3\",\n horizon <= 7 ~ \"4-7\",\n horizon <= 14 ~ \"8-14\"\n )) |>\n get_pit_histogram(by = c(\"model\", \"group_horizon\")) |>\n ggplot(aes(x = mid, y = density)) +\n geom_col() +\n facet_grid(vars(model), vars(group_horizon))\n\n\n\n\n\n\n\n\nand then for different forecast dates.\n\nsc_forecasts |>\n get_pit_histogram(by = c(\"model\", \"target_day\")) |>\n ggplot(aes(x = mid, y = density)) +\n geom_col() +\n facet_grid(vars(model), vars(target_day))\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nTake 5 minutes\n\n\n\nWhat do you think of the PIT histograms?\n\n\n\n\n\n\n\n\nSolution\n\n\n\n\n\nWhat do you think of the PIT histograms?\n\nThe more mechanistic model is reasonably well calibrated but has a slight tendency to be overconfident.\nThe random walk is biased towards overpredicting.\nThe more statistical model is underconfident.\nAcross horizons the more mechanistic model is only liable to underpredict at the longest horizons.\nThe random walk model is initially relatively unbiased and well calibrated but becomes increasingly likely to overpredict as the horizon increases.\nThe forecast date stratified PIT histograms are hard to interpret. We may need to find other ways to visualise bias and calibration at this level of stratification (see the {scoringutils} documentation for some ideas).",
+ "crumbs": [
+ "Evaluating forecasts from multiple models"
+ ]
+ },
+ {
+ "objectID": "sessions/forecast-evaluation-of-multiple-models.html#scoring-on-the-log-scale",
+ "href": "sessions/forecast-evaluation-of-multiple-models.html#scoring-on-the-log-scale",
+ "title": "Evaluating forecasts from multiple models",
+ "section": "Scoring on the log scale",
+ "text": "Scoring on the log scale\nAgain as in the forecast evaluation session, we will also score the forecasts on the log scale.\n\nlog_sc_forecasts <- sc_forecasts |>\n transform_forecasts(\n fun = log_shift,\n offset = 1,\n append = FALSE\n )\n\nlog_sc_scores <- log_sc_forecasts |>\n score()\n\n\n\n\n\n\n\nTip\n\n\n\nReminder: For more on scoring on the log scale see the paper by @bosse2023scorin.\n\n\n\nAt a glance\n\nlog_sc_scores |>\n summarise_scores(by = \"model\")\n\n model bias dss crps overprediction\n <char> <num> <num> <num> <num>\n1: Random walk 0.175232143 -0.4690178 0.2299255 0.11532350\n2: More statistical -0.009196429 -0.4280965 0.2405370 0.08154039\n3: More mechanistic 0.195008929 -1.0961907 0.1861159 0.11374787\n underprediction dispersion log_score mad ae_median se_mean\n <num> <num> <num> <num> <num> <num>\n1: 0.02749138 0.08711065 0.5889563 0.3761217 0.3078707 0.1778818\n2: 0.05800097 0.10099568 0.6801200 0.4197095 0.3242281 0.1972211\n3: 0.02078127 0.05158679 0.6008573 0.2182572 0.2619873 0.1229412\n\n\n\n\n\n\n\n\nTake 2 minutes\n\n\n\nBefore we look in detail at the scores, what do you think the scores are telling you? Which model do you think is best?\n\n\n\n\nCRPS\n\nlog_sc_scores |>\n summarise_scores(by = c(\"model\", \"horizon\")) |>\n ggplot(aes(x = horizon, y = crps, col = model)) +\n geom_point()\n\n\n\n\n\n\n\n\n\nlog_sc_scores |>\n summarise_scores(by = c(\"target_day\", \"model\")) |>\n ggplot(aes(x = target_day, y = crps, col = model)) +\n geom_point()\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nTake 5 minutes\n\n\n\nHow do the CRPS scores on the log scale compare to the scores on the original scale?\n\n\n\n\n\n\n\n\nSolution\n\n\n\n\n\n\nThe performance of the mechanistic model is more variable across forecast horizon than on the natural scale.\nOn the log scale the by horizon performance of the random walk and more statistical mdoel is more comparable than on the log scale.\nThe period of high incidence dominates the target day stratified scores less on the log scale. We see that all models performed less well early and late on.\n\n\n\n\n\n\nPIT histograms\n\nlog_sc_forecasts |>\n get_pit_histogram(by = \"model\") |>\n ggplot(aes(x = mid, y = density)) +\n geom_col() +\n facet_wrap(~model)\n\n\n\n\n\n\n\n\n\nlog_sc_forecasts |>\n mutate(group_horizon = case_when(\n horizon <= 3 ~ \"1-3\",\n horizon <= 7 ~ \"4-7\",\n horizon <= 14 ~ \"8-14\"\n )) |>\n get_pit_histogram(by = c(\"model\", \"group_horizon\")) |>\n ggplot(aes(x = mid, y = density)) +\n geom_col() +\n facet_grid(vars(model), vars(group_horizon))\n\n\n\n\n\n\n\n\n\nlog_sc_forecasts |>\n get_pit_histogram(by = c(\"model\", \"target_day\")) |>\n ggplot(aes(x = mid, y = density)) +\n geom_col() +\n facet_grid(vars(model), vars(target_day))\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nTake 5 minutes\n\n\n\nWhat do you think of the PIT histograms?\n\n\n\n\n\n\n\n\nSolution\n\n\n\n\n\nThe PIT histograms are similar to the original scale PIT histograms but the mechanistic model appears better calibrated.",
+ "crumbs": [
+ "Evaluating forecasts from multiple models"
+ ]
+ }
+]
\ No newline at end of file
diff --git a/sessions.html b/sessions.html
new file mode 100644
index 0000000..acd992f
--- /dev/null
+++ b/sessions.html
@@ -0,0 +1,829 @@
+
+
+
+
+
+
+
+
+
+Sessions – UEIFID
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Sessions
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Session 0: Introduction and course overview
+
13:30-13:35
+
+
Introduction to the course and the instructors (5 mins)
+
+
+
+
Session 1: Forecasting
+
13:35-14:10
+
+
Introduction to forecasting as an epidemiological problem (10 mins)
+
Practice session: Visualising infectious disease forecasts (25 minutes)
+
+
+
+
Session 2: Forecast evaluation
+
14:10-15:00
+
+
An introduction to forecast evaluation (5 mins)
+
Practice session: Evaluating forecasts from a range of models (40 mins)
+
Wrap up (5 mins)
+
+
+
+
Session 3: Evaluating forecasts from multiple models
+
15:10-15:50
+
+
Why might we want to evaluate forecasts from multiple models? (5 mins)
+
Practice session: Evaluating forecasts from multiple models (40 mins)
+
Wrap up (5 mins)
+
+
There is a coffee break at 15:30-15:40 you are welcome to attend this or keep working through the course material.
+
+
+
Session 4: Forecast ensembles
+
15:50-17:10
+
+
Introduction to forecast ensembles (10 mins)
+
Practice session: Creating forecast ensembles (60 mins)
We hope that you’ve enjoyed taking this course on understanding and evaluating forecasts of infectious disease burden. We will be very happy to have your feedback, comments or any questions on the course discussion page.
The following is a highly subjective list of papers we would recommend to read for those interested in engaging further with the topics discussed here.
As we saw in the forecast evaluation session, different modelling approaches have different strength and weaknesses, and it is not clear a priori which one produces the best forecast in any given situation. One way to attempt to draw strength from a diversity of approaches is the creation of so-called forecast ensembles from the forecasts produced by different models.
+
In this session, we’ll build ensembles using forecasts from models of different levels of mechanism vs. statistical complexity. We will then compare the performance of these ensembles to the individual models and to each other. Rather than using the forecast samples we have been using we will instead now use quantile-based forecasts.
+
+
+
+
+
+
+Representations of probabilistic forecasts
+
+
+
+
+
+
Probabilistic predictions can be described as coming from a probabilistic probability distributions. In general and when using complex models such as the one we discuss in this course, these distributions can not be expressed in a simple analytical formal as we can do if, e.g. talking about common probability distributions such as the normal or gamma distributions. Instead, we typically use a limited number of samples generated from Monte-Carlo methods to represent the predictive distribution. However, this is not the only way to characterise distributions.
+
A quantile is the value that corresponds to a given quantile level of a distribution. For example, the median is the 50th quantile of a distribution, meaning that 50% of the values in the distribution are less than the median and 50% are greater. Similarly, the 90th quantile is the value that corresponds to 90% of the distribution being less than this value. If we characterise a predictive distribution by its quantiles, we specify these values at a range of specific quantile levels, e.g. from 5% to 95% in 5% steps.
+
Deciding how to represent forecasts depends on many things, for example the method used (and whether it produces samples by default) but also logistic considerations. Many collaborative forecasting projects and so-called forecasting hubs use quantile-based representations of forecasts in the hope to be able to characterise both the centre and tails of the distributions more reliably and with less demand on storage space than a sample-based representation.
The aim of this session is to introduce the concept of ensembles of forecasts and to evaluate the performance of ensembles of the multiple models.
+
+
+
+
+
+
+Setup
+
+
+
+
+
+
+
Source file
+
The source file of this session is located at sessions/forecast-ensembles.qmd.
+
+
+
Libraries used
+
In this session we will use the nfidd package to load a data set of infection times and access stan models and helper functions, the dplyr and tidyr packages for data wrangling, ggplot2 library for plotting, the tidybayes package for extracting results of the inference and the scoringutils package for evaluating forecasts. We will also use qrensemble for quantile regression averaging in the weighted ensemble section.
The best way to interact with the material is via the Visual Editor of RStudio.
+
+
+
+
+
Initialisation
+
We set a random seed for reproducibility. Setting this ensures that you should get exactly the same results on your computer as we do.
+
+
set.seed(123)
+
+
+
+
+
+
+
+
+
Individual forecast models
+
In this session we will use the forecasts from different models we introduced in the forecasting evaluation of multiple models session. There all shared the same basic renewal with delays structure but used different models for the evolution of the effective reproduction number over time. These were:
+
+
A random walk model (what we have looked at so far)
+
A simple model of susceptible depletion referred to as “More mechanistic”
+
A differenced autoregressive model referred to as “More statistical”
+
+
As previously, we have fitted these models to a range of forecast dates so you don’t have to wait for the models to fit. We will now evaluate the forecasts from the mechanistic and statistical models.
+
+
data(rw_forecasts, stat_forecasts, mech_forecasts)
+forecasts <-bind_rows(
+ rw_forecasts,
+mutate(stat_forecasts, model ="More statistical"),
+mutate(mech_forecasts, model ="More mechanistic")
+) |>
+ungroup()
+
+forecasts
+
+
# A tibble: 672,000 × 7
+ day .draw .variable .value horizon target_day model
+ <dbl> <int> <chr> <dbl> <int> <dbl> <chr>
+ 1 23 1 forecast 4 1 22 Random walk
+ 2 23 2 forecast 2 1 22 Random walk
+ 3 23 3 forecast 2 1 22 Random walk
+ 4 23 4 forecast 6 1 22 Random walk
+ 5 23 5 forecast 2 1 22 Random walk
+ 6 23 6 forecast 3 1 22 Random walk
+ 7 23 7 forecast 5 1 22 Random walk
+ 8 23 8 forecast 2 1 22 Random walk
+ 9 23 9 forecast 3 1 22 Random walk
+10 23 10 forecast 5 1 22 Random walk
+# ℹ 671,990 more rows
+
+
+
+
+
+
+
+
+How did we generate these forecasts?
+
+
+
+
+
+
Some important things to note about these forecasts:
+
+
We used a 14 day forecast horizon.
+
Each forecast used all the data up to the forecast date.
+
We generated 1000 predictive posterior samples for each forecast.
+
We started forecasting 3 weeks into the outbreak and then forecast every 7 days until the end of the data (excluding the last 14 days to allow a full forecast).
+
We use the same simulated outbreak data as before:
Converting sample-based forecasts to quantile-based forecasts
+
As in this session we will be thinking about forecasts in terms quantiles of the predictive distributions, we will need to convert our sample based forecasts to quantile-based forecasts. We will do this by focusing at the marginal distribution at each predicted time point, that is we treat each time point as independent of all others and calculate quantiles based on the sample predictive trajectories at that time point. An easy way to do this is to use the {scoringutils} package. The steps to do this are to first declare the forecasts as sample forecasts.
+ observed target_day horizon model day quantile_level
+ <int> <num> <int> <char> <num> <num>
+ 1: 4 22 1 Random walk 23 0.05
+ 2: 4 22 1 Random walk 23 0.25
+ 3: 4 22 1 Random walk 23 0.50
+ 4: 4 22 1 Random walk 23 0.75
+ 5: 4 22 1 Random walk 23 0.95
+ ---
+3356: 2 127 14 More mechanistic 141 0.05
+3357: 2 127 14 More mechanistic 141 0.25
+3358: 2 127 14 More mechanistic 141 0.50
+3359: 2 127 14 More mechanistic 141 0.75
+3360: 2 127 14 More mechanistic 141 0.95
+ predicted
+ <num>
+ 1: 0
+ 2: 2
+ 3: 3
+ 4: 5
+ 5: 7
+ ---
+3356: 1
+3357: 2
+3358: 3
+3359: 4
+3360: 6
+
+
+
+
+
+
+
+
+What is happening here?
+
+
+
+
+
+
+
Internally scoringutils is calculating the quantiles of the sample-based forecasts.
+
It does this by using a set of default quantiles but different ones can be specified by the user to override the default.
+
It then calls the quantile() function from base R to calculate the quantiles.
+
This is estimating the value that corresponds to each given quantile level by ordering the samples and then taking the value at the appropriate position.
+
+
+
+
+
+
+
Simple unweighted ensembles
+
A good place to start when building ensembles is to take the mean or median of the unweighted forecast at each quantile level, and treat these as quantiles of the ensemble predictive distribution. Typically, the median is preferred when outlier forecasts are likely to be present as it is less sensitive to these. However, the mean is preferred when forecasters have more faith in models that diverge from the median performance and want to represent this in the ensemble.
+
+
+
+
+
+
+Vincent average
+
+
+
+
+
+
The procedure of calculating quantiles of a new distribution as a weighted average of quantiles of constituent distributions (e.g., different measurements) is called a Vincent average, after the biologist Stella Vincent who described this as early as 1912 when studying the function of whiskers in the behaviour of white rats.
+
+
+
+
+
Construction
+
We can calculate the mean ensemble by taking the mean of the forecasts at each quantile level.
How do these ensembles compare to the individual models?
+
How do they differ from each other?
+
Are there any differences across forecast dates?
+
+
+
+
+
+
+
+
+Solution
+
+
+
+
+
+
How do these ensembles compare to the individual models?
+
+
As before, the ensembles appear to be less variable than the statistical models but more variable than the mechanistic model.
+
+
How do they differ from each other?
+
+
The mean ensemble has marginally tighter uncertainty bounds than the median ensemble as for the single forecast.
+
+
Are there any differences across forecast dates?
+
+
The mean ensemble appears to be more variable across forecast dates than the median ensemble with this being more pronounced after the peak of the outbreak.
+
+
+
+
+
+
+
Evaluation
+
As in the forecast evaluation session, we can evaluate the accuracy of the ensembles using the {scoringutils} package and in particular the score() function.
What do you think the scores are telling you? Which model do you think is best? What other scoring breakdowns might you want to look at?
+
+
+
+
+
+
+
+
+Solution
+
+
+
+
+
+
What do you think the scores are telling you? Which model do you think is best?
+
+
The mean ensemble appears to be the best performing ensemble model overall.
+
However, the more mechanistic model appears to be the best performing model overall.
+
+
What other scoring breakdowns might you want to look at?
+
+
There might be variation over forecast dates or horizons between the different ensemble methods
+
+
+
+
+
+
+
+
Unweighted ensembles of filtered models
+
A simple method that is often used to improve ensemble performance is to prune out models that perform very poorly. Balancing this can be tricky however as it can be hard to know how much to prune. The key tradeoff to consider is how much to optimise for which models have performed well in the past (and what your definition of the past is, for example all time or only the last few weeks) versus how much you want to allow for the possibility that these models may not perform well in the future.
+
+
Construction
+
As we just saw, the random walk model (our original baseline model) is performing poorly in comparison to the other models. We can remove this model from the ensemble and see if this improves the performance of the ensemble.
+
+
+
+
+
+
+Warning
+
+
+
+
Here we are technically cheating a little as we are using the test data to help select the models to include in the ensemble. In the real world you would not do this as you would not have access to the test data and so this is an idealised scenario.
How do the filtered ensembles compare to the simple ensembles?
+
+
+
+
+
+
+
+
+Solution
+
+
+
+
+
+
How do the filtered ensembles compare to the simple ensembles?
+
+
The filtered ensembles appear to be more accurate than the simple ensembles.
+
As you would expect they are an average of the more mechanistic model and the more statistical model.
+
As there are only two models in the ensemble, the median and mean ensembles are identical.
+
For the first time there are features of the ensemble that outperform the more mechanistic model though it remains the best performing model overall.
+
+
+
+
+
+
+
+
Weighted quantile ensembles
+
The simple mean and median we used to average quantiles earlier treats every model as the same. We could try to improve performance by replacing this with a weighted mean (or weighted median), for example given greater weight to models that have proven to make better forecasts. Here we will explore two common weighting methods:
+
+
Inverse WIS weighting
+
Quantile regression averaging
+
+
Inverse WIS weighting is a simple method that weights the forecasts by the inverse of their WIS over some period (note that identifying what this period should be in order to produce the best forecasts is not straightforward as predictive performance may vary over time if models are good at different things). The main benefit of WIS weighting over other methods is that it is simple to understand and implement. However, it does not optimise the weights directly to produce the best forecasts. It relies on the hope that giving more weight to better performing models yields a better ensemble
+
Quantile regression averaging on the other hand does optimise the weights directly in order to yield the best scores on past data.
+
+
Construction
+
+
Inverse WIS weighting
+
In order to perform inverse WIS weighting we first need to calculate the WIS for each model. We already have this from the previous evaluation so we can reuse this.
Now lets apply the weights to the forecast models. As we can only use information that was available at the time of the forecast to perform the weighting, we use weights from two weeks prior to the forecast date to inform each ensemble.
We futher to perform quantile regression averaging (QRA) for each forecast date. Again we need to consider how many previous forecasts we wish to use to inform each ensemble forecast. Here we decide to use up to 3 weeks of previous forecasts to inform each QRA ensemble.
Instead of creating a single optimised ensemble and using this for all forecast horizons we might also want to consider a separate optimised QRA ensemble for each forecast horizon, reflecting that models might perform differently depending on how far ahead a forecast is produced. We can do this using qra() with the group argument.
How do the weighted ensembles compare to the simple ensembles? Which do you think is best? Are you surprised by the results? Can you think of any reasons that would explain them?
+
+
+
+
+
+
+
+
+Solution
+
+
+
+
+
+
How do the weighted ensembles compare to the simple ensembles?
+
+
+
+
+
+
Models weights
+
Now that we have a weighted ensemble, we can also look at the weights of the individual models. Here we do this for the quantile regression averaging ensemble but we could also do this for the inverse WIS ensemble or any other weighted ensemble (for an unweighted ensemble the weights are all equal).
+
+
qra_weights |>
+ggplot(aes(x = target_day, y = weight, fill = model)) +
+geom_col(position ="stack") +
+theme(legend.position ="bottom")
+
+
+
+
+
+
+
+
+
+
+
+
+Take 2 minutes
+
+
+
+
How do the weights change over time? Are you surprised by the results given what you know about the models performance?
+
+
+
+
+
+
+
+
+Solution
+
+
+
+
+
+
How do the weights change over time?
+
+
Early on the more statistical models have higher weights
+
Gradually the random walk model gains weight and by the end of the forecast horizon it represents the entire ensemble.
+
Near the peak the mechanistic model also gains weight.
+
+
Are you surprised by the results given what you know about the models performance?
+
+
As the random walk model is performing poorly, you would expect it to have low weights but actually it often doesn’t. This implies that its poor performance is restricted to certain parts of the outbreak.
+
Even though the mechanistic model performs really well overall it is only included in the ensemble around the peak. This could be because the training data doesn’t include changes in the epidemic dynamics and so the mechanistic model is not given sufficient weight.
+
+
+
+
+
+
+
+
Evaluation
+
For a final evaluation we can look at the scores for each model and ensemble again. We remove the two weeks of forecasts as these do not have a quantile regression average forecasts as these require training data to estimate.
How do the weighted ensembles compare to the simple ensembles on the natural and log scale?
+
+
+
+
+
+
+
+
+Solution
+
+
+
+
+
+
The best ensembles slightly outperform some of the simple ensembles but there is no obvious benefit from using weighted ensembles. Why might this be the case?
+
+
+
+
+
+
+
Going further
+
+
+
Wrap up
+
+
+
References
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/sessions/forecast-ensembles_files/figure-html/plot-model-weights-1.png b/sessions/forecast-ensembles_files/figure-html/plot-model-weights-1.png
new file mode 100644
index 0000000..e0570c4
Binary files /dev/null and b/sessions/forecast-ensembles_files/figure-html/plot-model-weights-1.png differ
diff --git a/sessions/forecast-ensembles_files/figure-html/plot-multiple-filtered-forecasts-1.png b/sessions/forecast-ensembles_files/figure-html/plot-multiple-filtered-forecasts-1.png
new file mode 100644
index 0000000..d52934e
Binary files /dev/null and b/sessions/forecast-ensembles_files/figure-html/plot-multiple-filtered-forecasts-1.png differ
diff --git a/sessions/forecast-ensembles_files/figure-html/plot-multiple-filtered-forecasts-log-1.png b/sessions/forecast-ensembles_files/figure-html/plot-multiple-filtered-forecasts-log-1.png
new file mode 100644
index 0000000..abe85e5
Binary files /dev/null and b/sessions/forecast-ensembles_files/figure-html/plot-multiple-filtered-forecasts-log-1.png differ
diff --git a/sessions/forecast-ensembles_files/figure-html/plot-multiple-forecasts-1.png b/sessions/forecast-ensembles_files/figure-html/plot-multiple-forecasts-1.png
new file mode 100644
index 0000000..49a7e62
Binary files /dev/null and b/sessions/forecast-ensembles_files/figure-html/plot-multiple-forecasts-1.png differ
diff --git a/sessions/forecast-ensembles_files/figure-html/plot-multiple-forecasts-log-1.png b/sessions/forecast-ensembles_files/figure-html/plot-multiple-forecasts-log-1.png
new file mode 100644
index 0000000..384af11
Binary files /dev/null and b/sessions/forecast-ensembles_files/figure-html/plot-multiple-forecasts-log-1.png differ
diff --git a/sessions/forecast-ensembles_files/figure-html/plot-multiple-weighted-forecasts-1.png b/sessions/forecast-ensembles_files/figure-html/plot-multiple-weighted-forecasts-1.png
new file mode 100644
index 0000000..aa82290
Binary files /dev/null and b/sessions/forecast-ensembles_files/figure-html/plot-multiple-weighted-forecasts-1.png differ
diff --git a/sessions/forecast-ensembles_files/figure-html/plot-multiple-weighted-forecasts-log-1.png b/sessions/forecast-ensembles_files/figure-html/plot-multiple-weighted-forecasts-log-1.png
new file mode 100644
index 0000000..53968d2
Binary files /dev/null and b/sessions/forecast-ensembles_files/figure-html/plot-multiple-weighted-forecasts-log-1.png differ
diff --git a/sessions/forecast-ensembles_files/figure-html/plot-single-filtered-forecasts-1.png b/sessions/forecast-ensembles_files/figure-html/plot-single-filtered-forecasts-1.png
new file mode 100644
index 0000000..81cf1db
Binary files /dev/null and b/sessions/forecast-ensembles_files/figure-html/plot-single-filtered-forecasts-1.png differ
diff --git a/sessions/forecast-ensembles_files/figure-html/plot-single-filtered-forecasts-log-1.png b/sessions/forecast-ensembles_files/figure-html/plot-single-filtered-forecasts-log-1.png
new file mode 100644
index 0000000..250320d
Binary files /dev/null and b/sessions/forecast-ensembles_files/figure-html/plot-single-filtered-forecasts-log-1.png differ
diff --git a/sessions/forecast-ensembles_files/figure-html/plot-single-forecasts-1.png b/sessions/forecast-ensembles_files/figure-html/plot-single-forecasts-1.png
new file mode 100644
index 0000000..a019378
Binary files /dev/null and b/sessions/forecast-ensembles_files/figure-html/plot-single-forecasts-1.png differ
diff --git a/sessions/forecast-ensembles_files/figure-html/plot-single-forecasts-log-1.png b/sessions/forecast-ensembles_files/figure-html/plot-single-forecasts-log-1.png
new file mode 100644
index 0000000..7a7f64f
Binary files /dev/null and b/sessions/forecast-ensembles_files/figure-html/plot-single-forecasts-log-1.png differ
diff --git a/sessions/forecast-ensembles_files/figure-html/plot-single-weighted-forecasts-1.png b/sessions/forecast-ensembles_files/figure-html/plot-single-weighted-forecasts-1.png
new file mode 100644
index 0000000..468709d
Binary files /dev/null and b/sessions/forecast-ensembles_files/figure-html/plot-single-weighted-forecasts-1.png differ
diff --git a/sessions/forecast-ensembles_files/figure-html/plot-single-weighted-forecasts-log-1.png b/sessions/forecast-ensembles_files/figure-html/plot-single-weighted-forecasts-log-1.png
new file mode 100644
index 0000000..94d626b
Binary files /dev/null and b/sessions/forecast-ensembles_files/figure-html/plot-single-weighted-forecasts-log-1.png differ
diff --git a/sessions/forecast-evaluation-of-multiple-models.html b/sessions/forecast-evaluation-of-multiple-models.html
new file mode 100644
index 0000000..03b113b
--- /dev/null
+++ b/sessions/forecast-evaluation-of-multiple-models.html
@@ -0,0 +1,1921 @@
+
+
+
+
+
+
+
+
+
+Evaluating forecasts from multiple models – UEIFID
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
We can classify models along a spectrum by how much they include an understanding of underlying processes, or mechanisms; or whether they emphasise drawing from the data using a statistical approach. These different approaches all have different strength and weaknesses, and it is not clear a prior which one produces the best forecast in any given situation.
+
In this session, we’ll start with forecasts from models of different levels of mechanism vs. statistical complexity and evaluate them using visualisation and proper scoring rules as we did in the last session for the random walk model
The aim of this session is to introduce the concept of a spectrum of forecasting models and to demonstrate how to evaluate a range of different models from across this spectrum.
+
+
+
+
+
+
+Setup
+
+
+
+
+
+
+
Source file
+
The source file of this session is located at sessions/forecast-ensembles.qmd.
+
+
+
Libraries used
+
In this session we will use the nfidd package to load a data set of infection times and access stan models and helper functions, the dplyr and tidyr packages for data wrangling, ggplot2 library for plotting, the tidybayes package for extracting results of the inference and the scoringutils package for evaluating forecasts. We will also use qra for quantile regression averaging in the weighted ensemble section.
The best way to interact with the material is via the Visual Editor of RStudio.
+
+
+
+
+
Initialisation
+
We set a random seed for reproducibility. Setting this ensures that you should get exactly the same results on your computer as we do.
+
+
set.seed(123)
+
+
+
+
+
+
+
+
+
Individual forecast models
+
In this session we will use the forecasts from different models. There all shared the same basic renewal with delays structure but used different models for the evolution of the effective reproduction number over time. These were:
+
+
A random walk model (what we have looked at so far)
+
A simple model of susceptible depletion referred to as “More mechanistic”
+
A differenced autoregressive model referred to as “More statistical”
+
+
For the purposes of this session the precise details of the models are not critical to the concepts we are exploring.
+
+
+
+
+
+
+More information on the mechanistic model (optional)
+
+
+
+
+
+
One way to potentially improve the renewal model is to add more mechanistic structure. In the forecasting visualisation session, we saw that the renewal model was making unbiased forecasts when the reproduction number was constant but that it overestimated the number of cases when the reproduction number was reducing due to susceptible depletion.
+
+
+
+
+
+
+Warning
+
+
+
+
This is slightly cheating as we know the future of this outbreak and so can make a model to match. This is easy to do and important to watch for if wanting to make generalisable methods.
+
+
+
This suggests that we should add a term to the renewal model which captures the depletion of susceptibles. One way to do this is to add a term which is proportional to the number of susceptibles in the population. This is the idea behind the SIR model which is a simple compartmental model of infectious disease transmission. If we assume that susceptible depletion is the only mechanism which is causing the reproduction number to change, we can write the reproduction model as:
+
\[
+R_t = \frac{S_{t-1}}{N} R_0
+\]
+
+
+
+
+
+
+Note
+
+
+
+
This approximates susceptible depletion as a linear function of the number of susceptibles in the population. This is a simplification but it is a good starting point.
+
+
+
+
+
+
+
+
+What behaviour would we expect from this model?
+
+
+
+
+
+
+
n <-100
+N <-1000
+R0 <-1.5
+S <-rep(NA, n)
+S[1] <- N
+Rt <-rep(NA, n) ## reproduction number
+Rt[1] <- R0
+I <-rep(NA, n)
+I[1] <-1
+for (i in2:n) {
+ Rt[i] <- (S[i-1]) / N * R0
+ I[i] <- I[i-1] * Rt[i]
+ S[i] <- S[i-1] - I[i]
+}
+
+data <-tibble(t =1:n, Rt = Rt)
+
+ggplot(data, aes(x = t, y = Rt)) +
+geom_line() +
+labs(title ="Simulated data from an SIR model",
+x ="Time",
+y ="Rt")
+
+
+
+
+
+
+
+
+
+
The key assumptions we are making here are:
+
+
The population is constant and we roughly know the size of the population.
+
The reproduction number only changes due to susceptible depletion
+
The number of new cases at each time is proportional to the number of susceptibles in the population.
+
+
+
+
+
+
+
+
+
+
+More information on the statistical model (optional)
+
+
+
+
+
+
Adding more mechanistic structure is not always possible and, if we don’t specify mechanisms correctly, might make forecasts worse. Rather than adding more mechanistic structure to the renewal model, we could add more statistical structure with the aim of improving performance. Before we do this, we need to think about what we want from a forecasting model. As we identified above, we want a model which is unbiased and which has good short-term forecasting properties. We know that we want it to be able to adapt to trends in the reproduction number and that we want it to be able to capture the noise in the data. A statistical term that can be used to describe a time series with a trend is saying that the time series is non-stationary. More specifically, a stationary time series is defined as one whose statistical properties, such as mean and variance, do not change over time. In infectious disease epidemiology, this would only be expected for endemic diseases without external seasonal influence.
+
The random walk model we used in the forecasting visualisation session is a special case of a more general class of models called autoregressive (AR) models. AR models are a class of models which predict the next value in a time series as a linear combination of the previous values in the time series. The random walk model is specifically a special case of an AR(1) model where the next value in the time series is predicted as the previous value, multiplied by a value between 1 and -1 , plus some noise. This becomes a random walk when the multipled value is 0.
+
For the log-transformed reproduction number (\(log(R_t)\)), the model is:
+
\[
+log(R_t) = \phi log(R_{t-1}) + \epsilon_t
+\]
+
where \(\epsilon_t\) is a normally distributed error term with mean 0 and standard deviation \(\sigma\) and \(\phi\) is a parameter between -1 and 1. If we restrict \(\phi\) to be between 0 and 1, we get a model which is biased towards a reproduction number of 1 but which can still capture trends in the data that decay over time.
+
+
+
+
+
+
+What behaviour would we expect from this model?
+
+
+
+
+
+
+
n <-100
+phi <-0.4
+sigma <-0.1
+log_R <-rep(NA, n)
+log_R[1] <-rnorm(1, 0, sigma)
+for (i in2:n) {
+ log_R[i] <- phi * log_R[i-1] +rnorm(1, 0, sigma)
+}
+data <-tibble(t =1:n, R =exp(log_R))
+
+ggplot(data, aes(x = t, y = R)) +
+geom_line() +
+labs(title ="Simulated data from an exponentiated AR(1) model",
+x ="Time",
+y ="R")
+
+
+
+
+
+
+
+
+
+
However, we probably don’t want a model which is biased towards a reproduction number of 1 (unless we have good reason to believe that is the expected behaviour). So what should we do?
+
Returning to the idea that the reproduction number is a non-stationary time series, as we expect to have a trend in the reproduction numbers we want to capture, we can use a method from the field of time series analysis called differencing to make the time series stationary. This involves taking the difference between consecutive values in the time series. For the log-transformed reproduction number, this would be:
We can use this function to simulate a differenced AR process:
+
+
log_R <-geometric_diff_ar(init =1, noise =rnorm(100), std =0.1, damp =0.1)
+
+data <-tibble(t =seq_along(log_R), R =exp(log_R))
+
+ggplot(data, aes(x = t, y = R)) +
+geom_line() +
+labs(title ="Simulated data from an exponentiated AR(1) model with differencing",
+x ="Time",
+y ="R")
+
+
+
+
+
+
+
+
+
+
+
+
+
As previously, we have fitted these models to a range of forecast dates so you don’t have to wait for the models to fit. We will now evaluate the forecasts from the mechanistic and statistical models.
+
+
data(rw_forecasts, stat_forecasts, mech_forecasts)
+forecasts <-bind_rows(
+ rw_forecasts,
+mutate(stat_forecasts, model ="More statistical"),
+mutate(mech_forecasts, model ="More mechanistic")
+) |>
+ungroup()
+
+forecasts
+
+
# A tibble: 672,000 × 7
+ day .draw .variable .value horizon target_day model
+ <dbl> <int> <chr> <dbl> <int> <dbl> <chr>
+ 1 23 1 forecast 4 1 22 Random walk
+ 2 23 2 forecast 2 1 22 Random walk
+ 3 23 3 forecast 2 1 22 Random walk
+ 4 23 4 forecast 6 1 22 Random walk
+ 5 23 5 forecast 2 1 22 Random walk
+ 6 23 6 forecast 3 1 22 Random walk
+ 7 23 7 forecast 5 1 22 Random walk
+ 8 23 8 forecast 2 1 22 Random walk
+ 9 23 9 forecast 3 1 22 Random walk
+10 23 10 forecast 5 1 22 Random walk
+# ℹ 671,990 more rows
+
+
+
+
+
+
+
+
+How did we generate these forecasts?
+
+
+
+
+
+
Some important things to note about these forecasts:
+
+
We used a 14 day forecast horizon.
+
Each forecast used all the data up to the forecast date.
+
We generated 1000 predictive posterior samples for each forecast.
+
We started forecasting 3 weeks into the outbreak and then forecast every 7 days until the end of the data (excluding the last 14 days to allow a full forecast).
+
We use the same simulated outbreak data as before:
How do these forecasts compare? Which do you prefer?
+
+
+
+
+
+
+
+
+Solution
+
+
+
+
+
+
How do these forecasts compare?
+
+
The more mechanistic model captures the downturn in the data very well.
+
Past the peak all models were comparable.
+
The more statistical model captures the downturn faster than the random walk but less fast than the more mechanistic mode.
+
The more statistical model sporadically predicts a more rapidly growing outbreak than occurred early on.
+
The more statistical model is more uncertain than the mechanistic model but less uncertain than the random walk.
+
+
Which do you prefer?
+
+
The more mechanistic model seems to be the best at capturing the downturn in the data and the uncertainty in the forecasts seems reasonable.
+
If we weren’t confident in the effective susceptible population the AR model might be preferable.
+
+
+
+
+
+
Scoring your forecast
+
Again as in the forecasting evaluation sessions, we will score the forecasts using the scoringutils package by first converting the forecasts to the scoringutils format.
Before we look in detail at the scores, what do you think the scores are telling you? Which model do you think is best?
+
+
+
+
+
Continuous ranked probability score
+
As in the forecasting evaluation session, we will start by looking at the CRPS by horizon and forecast date.
+
+
+
+
+
+
+Reminder: Key things to note about the CRPS
+
+
+
+
+
Small values are better
+
When scoring absolute values (e.g. number of cases) it can be difficult to use to compare forecasts across scales (i.e., when case numbers are different, for example between locations or at different times).
+
+
+
+
First by forecast horizon.
+
+
sc_scores |>
+summarise_scores(by =c("model", "horizon")) |>
+ggplot(aes(x = horizon, y = crps, col = model)) +
+geom_point()
+
+
+
+
+
+
+
and across different forecast dates
+
+
sc_scores |>
+summarise_scores(by =c("target_day", "model")) |>
+ggplot(aes(x = target_day, y = crps, col = model)) +
+geom_point()
+
+
+
+
+
+
+
+
+
+
+
+
+Take 5 minutes
+
+
+
+
How do the CRPS values change based on forecast date? How do the CRPS values change with forecast horizon?
+
+
+
+
+
+
+
+
+Solution
+
+
+
+
+
+
How do the CRPS values change based on forecast horizon?
+
+
All models have increasing CRPS with forecast horizon.
+
The more mechanistic model has the lowest CRPS at all forecast horizon.
+
The more stastical model starts to outperform the random walk model at longer time horizons.
+
+
How do the CRPS values change with forecast date?
+
+
The more statistical model does particularly poorly around the peak of the outbreak but outperforms the random walk model.
+
The more mechanistic model does particularly well around the peak of the outbreak versus all other models
+
The more statistical model starts to outperform the other models towards the end of the outbreak.
+
+
+
+
+
+
+
PIT histograms
+
+
+
+
+
+
+Reminder: Interpreting the PIT histogram
+
+
+
+
+
Ideally PIT histograms should be uniform.
+
If is a U shape then the model is overconfident and if it is an inverted U shape then the model is underconfident.
+
If it is skewed then the model is biased towards the direction of the skew.
The more mechanistic model is reasonably well calibrated but has a slight tendency to be overconfident.
+
The random walk is biased towards overpredicting.
+
The more statistical model is underconfident.
+
Across horizons the more mechanistic model is only liable to underpredict at the longest horizons.
+
The random walk model is initially relatively unbiased and well calibrated but becomes increasingly likely to overpredict as the horizon increases.
+
The forecast date stratified PIT histograms are hard to interpret. We may need to find other ways to visualise bias and calibration at this level of stratification (see the {scoringutils} documentation for some ideas).
Before we look in detail at the scores, what do you think the scores are telling you? Which model do you think is best?
+
+
+
+
+
CRPS
+
+
log_sc_scores |>
+summarise_scores(by =c("model", "horizon")) |>
+ggplot(aes(x = horizon, y = crps, col = model)) +
+geom_point()
+
+
+
+
+
+
+
+
log_sc_scores |>
+summarise_scores(by =c("target_day", "model")) |>
+ggplot(aes(x = target_day, y = crps, col = model)) +
+geom_point()
+
+
+
+
+
+
+
+
+
+
+
+
+Take 5 minutes
+
+
+
+
How do the CRPS scores on the log scale compare to the scores on the original scale?
+
+
+
+
+
+
+
+
+Solution
+
+
+
+
+
+
+
The performance of the mechanistic model is more variable across forecast horizon than on the natural scale.
+
On the log scale the by horizon performance of the random walk and more statistical mdoel is more comparable than on the log scale.
+
The period of high incidence dominates the target day stratified scores less on the log scale. We see that all models performed less well early and late on.
The PIT histograms are similar to the original scale PIT histograms but the mechanistic model appears better calibrated.
+
+
+
+
+
+
+
+
Going further
+
+
We have only looked at three forecasting models here. There are many more models that could be used. For example, we could use a more complex mechanistic model which captures more of the underlying dynamics of the data generating process. We could also use a more complex statistical model which captures more of the underlying structure of the data.
+
We could also combine the more mechanistic and more statistical models to create a hybrid model which captures the best of both worlds (maybe).
+
We could also use a more complex scoring rule to evaluate the forecasts. For example, we could use a multivariate scoring rule which captures more of the structure of the data.
+
+
+
+
Wrap up
+
+
+
References
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/sessions/forecast-evaluation-of-multiple-models_files/figure-html/pit-histogram-1.png b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/pit-histogram-1.png
new file mode 100644
index 0000000..90d5852
Binary files /dev/null and b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/pit-histogram-1.png differ
diff --git a/sessions/forecast-evaluation-of-multiple-models_files/figure-html/pit-histogram-date-1.png b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/pit-histogram-date-1.png
new file mode 100644
index 0000000..a93f784
Binary files /dev/null and b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/pit-histogram-date-1.png differ
diff --git a/sessions/forecast-evaluation-of-multiple-models_files/figure-html/pit-histogram-date-log-1.png b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/pit-histogram-date-log-1.png
new file mode 100644
index 0000000..aa9628a
Binary files /dev/null and b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/pit-histogram-date-log-1.png differ
diff --git a/sessions/forecast-evaluation-of-multiple-models_files/figure-html/pit-histogram-horizon-1.png b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/pit-histogram-horizon-1.png
new file mode 100644
index 0000000..f799387
Binary files /dev/null and b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/pit-histogram-horizon-1.png differ
diff --git a/sessions/forecast-evaluation-of-multiple-models_files/figure-html/pit-histogram-horizon-log-1.png b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/pit-histogram-horizon-log-1.png
new file mode 100644
index 0000000..f8492eb
Binary files /dev/null and b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/pit-histogram-horizon-log-1.png differ
diff --git a/sessions/forecast-evaluation-of-multiple-models_files/figure-html/pit-histogram-log-1.png b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/pit-histogram-log-1.png
new file mode 100644
index 0000000..bfc0959
Binary files /dev/null and b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/pit-histogram-log-1.png differ
diff --git a/sessions/forecast-evaluation-of-multiple-models_files/figure-html/plot-all-forecasts-1.png b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/plot-all-forecasts-1.png
new file mode 100644
index 0000000..e74175f
Binary files /dev/null and b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/plot-all-forecasts-1.png differ
diff --git a/sessions/forecast-evaluation-of-multiple-models_files/figure-html/plot-all-forecasts-log-1.png b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/plot-all-forecasts-log-1.png
new file mode 100644
index 0000000..cb692eb
Binary files /dev/null and b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/plot-all-forecasts-log-1.png differ
diff --git a/sessions/forecast-evaluation-of-multiple-models_files/figure-html/unnamed-chunk-10-1.png b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/unnamed-chunk-10-1.png
new file mode 100644
index 0000000..1c7ee52
Binary files /dev/null and b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/unnamed-chunk-10-1.png differ
diff --git a/sessions/forecast-evaluation-of-multiple-models_files/figure-html/unnamed-chunk-11-1.png b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/unnamed-chunk-11-1.png
new file mode 100644
index 0000000..bf94c7f
Binary files /dev/null and b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/unnamed-chunk-11-1.png differ
diff --git a/sessions/forecast-evaluation-of-multiple-models_files/figure-html/unnamed-chunk-2-1.png b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/unnamed-chunk-2-1.png
new file mode 100644
index 0000000..ccb8e18
Binary files /dev/null and b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/unnamed-chunk-2-1.png differ
diff --git a/sessions/forecast-evaluation-of-multiple-models_files/figure-html/unnamed-chunk-3-1.png b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/unnamed-chunk-3-1.png
new file mode 100644
index 0000000..9c1fbfc
Binary files /dev/null and b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/unnamed-chunk-3-1.png differ
diff --git a/sessions/forecast-evaluation-of-multiple-models_files/figure-html/unnamed-chunk-4-1.png b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/unnamed-chunk-4-1.png
new file mode 100644
index 0000000..3f60712
Binary files /dev/null and b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/unnamed-chunk-4-1.png differ
diff --git a/sessions/forecast-evaluation-of-multiple-models_files/figure-html/unnamed-chunk-7-1.png b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/unnamed-chunk-7-1.png
new file mode 100644
index 0000000..32d062a
Binary files /dev/null and b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/unnamed-chunk-7-1.png differ
diff --git a/sessions/forecast-evaluation-of-multiple-models_files/figure-html/unnamed-chunk-8-1.png b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/unnamed-chunk-8-1.png
new file mode 100644
index 0000000..8b86252
Binary files /dev/null and b/sessions/forecast-evaluation-of-multiple-models_files/figure-html/unnamed-chunk-8-1.png differ
diff --git a/sessions/forecast-evaluation.html b/sessions/forecast-evaluation.html
new file mode 100644
index 0000000..c4b799e
--- /dev/null
+++ b/sessions/forecast-evaluation.html
@@ -0,0 +1,1746 @@
+
+
+
+
+
+
+
+
+
+Forecast evaluation – UEIFID
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
So far we have focused on visualising forecasts, including confronting them with events that were observed after the forecast was made. Besides visualising the forecasts, we can also summarise performance quantitatively. In this session you will get to know several ways of assessing different aspects of forecast performance.
The aim of this session is to introduce the concept of forecast evaluation using scoring rules.
+
+
+
+
+
+
+Setup
+
+
+
+
+
+
+
Source file
+
The source file of this session is located at sessions/forecast-evaluation.qmd.
+
+
+
Libraries used
+
In this session we will use the nfidd package to load a data set of infection times and access stan models and helper functions, the dplyr and package for data wrangling, ggplot2 library for plotting and the scoringutils package for evaluating forecasts.
The best way to interact with the material is via the Visual Editor of RStudio.
+
+
+
+
+
Initialisation
+
We set a random seed for reproducibility. Setting this ensures that you should get exactly the same results on your computer as we do.
+
+
set.seed(123)
+
+
+
+
+
+
+
+
+
Introduction to forecast evaluation
+
An important aspect of making forecasts is that we can later confront the forecasts with what really happened and use this to assess whether our forecast model makes good predictions, or which of multiple models work best in which situation.
+
+
+
+
+
+
+What do we look for in a good forecast? Some food for thought:
+
+
+
+
+
Calibration: The forecast should be well calibrated. This means that the forecasted probabilities should match the observed frequencies. For example, if the model predicts a 50% probability of an event occurring, then the event should occur approximately 50% of the time.
+
Unbiasedness: The forecast should be unbiased. This means that the average forecasted value should be equal to the average observed value. It shouldn’t consistently over- or underpredict.
+
Accuracy: The forecast should be accurate. This means that the forecasted values should be close to the observed values.
+
Sharpness: As long as the other conditions are fulfilled we want prediction intervals to be as narrow as possible. Predicting that “anything can happen” might be correct but not very useful.
+
+
+
+
+
+
Evaluate a forecasting model
+
We consider forecasts from the model we visualised previously, and explore different metrics for evaluation.
+
+
data(rw_forecasts)
+rw_forecasts |>
+ungroup()
+
+
# A tibble: 224,000 × 7
+ day .draw .variable .value horizon target_day model
+ <dbl> <int> <chr> <dbl> <int> <dbl> <chr>
+ 1 23 1 forecast 4 1 22 Random walk
+ 2 23 2 forecast 2 1 22 Random walk
+ 3 23 3 forecast 2 1 22 Random walk
+ 4 23 4 forecast 6 1 22 Random walk
+ 5 23 5 forecast 2 1 22 Random walk
+ 6 23 6 forecast 3 1 22 Random walk
+ 7 23 7 forecast 5 1 22 Random walk
+ 8 23 8 forecast 2 1 22 Random walk
+ 9 23 9 forecast 3 1 22 Random walk
+10 23 10 forecast 5 1 22 Random walk
+# ℹ 223,990 more rows
+
+
+
+
+
+
+
+
+How did we generate these forecasts?
+
+
+
+
+
+
Some important things to note about these forecasts:
+
+
We used a 14 day forecast horizon.
+
Each forecast used all the data up to the forecast date.
+
We generated 1000 predictive posterior samples for each forecast.
+
We started forecasting 3 weeks into the outbreak and then forecast every 7 days until the end of the data (excluding the last 14 days to allow a full forecast).
+
We use the same simulated outbreak data as before:
We now summarise performance quantitatively by using scoring metrics. Whilst some of these metrics are more useful for comparing models, many can be also be useful for understanding the performance of a single model.
+
+
+
+
+
+
+Tip
+
+
+
+
In this session, we’ll use “proper” scoring rules: these are scoring rules that make sure no model can get better scores than the true model, i.e. the model used to generate the data. Of course we usually don’t know this (as we don’t know the “true model” for real-world data) but proper scoring rules incentivise forecasters to make their best attempt at reproducing its behaviour. For a comprehensive text on proper scoring rules and their mathematical properties, we recommend the classic paper by Gneiting and Raftery (2007).
+
+
+
We will use the {scoringutils} package to calculate these metrics. Our first step is to convert our forecasts into a format that the {scoringutils} package can use. We will use as_forecast_sample() to do this:
+ sample_id predicted observed target_day horizon model
+ <int> <num> <int> <num> <int> <char>
+ 1: 1 4 4 22 1 Random walk
+ 2: 2 2 4 22 1 Random walk
+ 3: 3 2 4 22 1 Random walk
+ 4: 4 6 4 22 1 Random walk
+ 5: 5 2 4 22 1 Random walk
+ ---
+223996: 996 1 2 127 14 Random walk
+223997: 997 0 2 127 14 Random walk
+223998: 998 0 2 127 14 Random walk
+223999: 999 1 2 127 14 Random walk
+224000: 1000 0 2 127 14 Random walk
+
+
+
As you can see this has created a forecast object which has a print method that summarises the forecasts.
+
+
+
+
+
+
+Take 2 minutes
+
+
+
+
What important information is in the forecast object?
+
+
+
+
+
+
+
+
+Solution
+
+
+
+
+
+
+
The forecast unit which is the target day, horizon, and model
+
The type of forecast which is a sample forecast
+
+
+
+
+
Everything seems to be in order. We can now use the scoringutils package to calculate some metrics. We will use the default sample metrics (as our forecasts are in sample format) and score our forecasts.
target_day horizon model bias dss crps overprediction
+ <num> <int> <char> <num> <num> <num> <num>
+ 1: 22 1 Random walk -0.304 1.587880 0.623424 0.000
+ 2: 22 2 Random walk 0.828 3.215491 1.664522 1.118
+ 3: 22 3 Random walk 0.002 1.936078 0.574624 0.000
+ 4: 22 4 Random walk 0.687 3.231981 1.570469 0.862
+ 5: 22 5 Random walk 0.767 3.750147 2.172579 1.386
+ ---
+220: 127 10 Random walk -0.821 3.120249 1.745842 0.000
+221: 127 11 Random walk -0.867 3.853982 1.972866 0.000
+222: 127 12 Random walk -0.949 7.803330 3.092738 0.000
+223: 127 13 Random walk -0.914 5.003912 2.318689 0.000
+224: 127 14 Random walk -0.637 1.140452 0.774974 0.000
+ underprediction dispersion log_score mad ae_median se_mean
+ <num> <num> <num> <num> <num> <num>
+ 1: 0.154 0.469424 1.887790 1.4826 1 0.367236
+ 2: 0.000 0.546522 2.231108 2.9652 3 8.479744
+ 3: 0.000 0.574624 1.862234 2.9652 0 0.142884
+ 4: 0.000 0.708469 2.175798 2.9652 3 9.284209
+ 5: 0.000 0.786579 2.489797 2.9652 3 15.327225
+ ---
+220: 1.400 0.345842 2.694333 1.4826 3 5.597956
+221: 1.684 0.288866 2.871442 1.4826 3 6.687396
+222: 2.832 0.260738 4.339301 1.4826 4 13.883076
+223: 2.062 0.256689 3.239324 1.4826 3 8.265625
+224: 0.496 0.278974 1.814469 1.4826 1 0.948676
+
+
+
+
+
+
+
+
+Learning more about the output of score()
+
+
+
+
+
+
See the documentation for get_metrics.forecast_sample()[https://epiforecasts.io/scoringutils/reference/get_metrics.forecast_sample.html] for information on the default metrics for forecasts that are represented as samples (in our case the samples generated by the stan model).
+
+
+
+
+
At a glance
+
Before we look in detail at the scores, we can use summarise_scores to get a quick overview of the scores. Don’t worry if you don’t understand all the scores yet, we will go some of them in more detail in the next section and you can find more information in the {scoringutils} documentation.
Before we look in detail at the scores, what do you think the scores are telling you?
+
+
+
+
+
Continuous ranked probability score
+
+
What is the Continuous Ranked Probability Score (CRPS)?
+
The Continuous Ranked Probability Score (CRPS) is a proper scoring rule used to evaluate the accuracy of probabilistic forecasts. It is a generalization of the Mean Absolute Error (MAE) to probabilistic forecasts, where the forecast is a distribution rather than a single point estimate (i.e. like ours).
+
The CRPS can be thought about as the combination of two key aspects of forecasting: 1. The accuracy of the forecast in terms of how close the predicted values are to the observed value. 2. The confidence of the forecast in terms of the spread of the predicted values.
+
By balancing these two aspects, the CRPS provides a comprehensive measure of the quality of probabilistic forecasts.
+
+
+
+
+
+
+Key things to note about the CRPS
+
+
+
+
+
Small values are better
+
As it is an absolute scoring rule it can be difficult to use to compare forecasts across scales.
+
+
+
+
+
+
+
+
+
+Mathematical Definition (optional)
+
+
+
+
+
+
For distributions with a finite first moment (a mean exists and it is finite), the CRPS can be expressed as:
where \(X\) and \(X'\) are independent random variables sampled from the distribution \(D\). To calculate this we simply replace \(X\) and \(X'\) by samples from our posterior distribution and sum over all possible combinations.
+
This equation can be broke down into the two components:
+
+
Breakdown of the Components
+
+
Expected Absolute Error Between Forecast and Observation: \(\mathbb{E}_{X \sim D}[|X - y|]\) This term represents the average absolute difference between the values predicted by the forecasted distribution \(D\) and the actual observed value \(y\). It measures how far, on average, the forecasted values are from the observed value. A smaller value indicates that the forecasted distribution is closer to the observed value.
+
Expected Absolute Error Between Two Forecasted Values: \(\frac{1}{2} \mathbb{E}_{X, X' \sim D}[|X - X'|]\) This term represents the average absolute difference between two independent samples from the forecasted distribution \(D\). It measures the internal variability or spread of the forecasted distribution. A larger value indicates a wider spread of the forecasted values.
+
+
+
+
Interpretation
+
+
First Term (\(\mathbb{E}_{X \sim D}[|X - y|]\)): This term penalizes the forecast based on how far the predicted values are from the observed value. It ensures that the forecast is accurate in terms of proximity to the actual observation.
+
Second Term (\(\frac{1}{2} \mathbb{E}_{X, X' \sim D}[|X - X'|]\)): This term accounts for the spread of the forecasted distribution. It penalizes forecasts that are too uncertain or have a wide spread. By subtracting this term, the CRPS rewards forecasts that are not only accurate but also confident (i.e., have a narrow spread).
+
+
+
+
+
+
Whilst the CRPS is a very useful metric it can be difficult to interpret in isolation. It is often useful to compare the CRPS of different models or to compare the CRPS of the same model under different conditions. For example, lets compare the CRPS across different forecast horizons.
+
+
sc_scores |>
+summarise_scores(by ="horizon") |>
+ggplot(aes(x = horizon, y = crps)) +
+geom_point() +
+labs(title ="CRPS by daily forecast horizon",
+subtitle ="Summarised across all forecasts")
+
+
+
+
+
+
+
and at different time points.
+
+
sc_scores |>
+summarise_scores(by ="target_day") |>
+ggplot(aes(x = target_day, y = crps)) +
+geom_point() +
+labs(title ="CRPS by forecast start date",
+subtitle ="Summarised across all forecasts", x ="forecast date")
+
+
+
+
+
+
+
+
+
+
+
+
+Take 5 minutes
+
+
+
+
How do the CRPS scores change based on forecast date? How do the CRPS scores change with forecast horizon? What does this tell you about the model?
+
+
+
+
+
+
+
+
+Solution
+
+
+
+
+
+
+
The CRPS scores increase for forecast dates where incidence is higher.
+
The CRPS scores increase with forecast horizon.
+
As the CRPS is an absolute measure it is hard to immediately know if the CRPS increasing with forecast date indicates that the model is performing worse.
+
However, the CRPS increasing with forecast horizon is a sign that the model is struggling to capture the longer term dynamics of the epidemic.
+
+
+
+
+
+
+
+
PIT histograms
+
As well as the CRPS we can also look at the calibration and bias of the model. Calibration is the agreement between the forecast probabilities and the observed frequencies. Bias is a measure of how likely the model is to over or under predict the observed values.
+
There are many ways to assess calibration and bias but one common way is to use a probability integral transform (PIT) histogram. This is a histogram of the cumulative distribution of function of a forecast evaluated at the observed value.
+
+
+
+
+
+
+Interpreting the PIT histogram
+
+
+
+
+
Ideally PIT histograms should be uniform.
+
If is a U shape then the model is overconfident and if it is an inverted U shape then the model is underconfident.
+
If it is skewed then the model is biased towards the direction of the skew.
+
+
+
+
+
+
+
+
+
+Mathematical Definition (optional)
+
+
+
+
+
+
Continuous Case
+
For a continuous random variable \(X\) with cumulative distribution function (CDF) \(F_X\), the PIT is defined as:
+
\[
+Y = F_X(X)
+\]
+
where \(Y\) is uniformly distributed on \([0, 1]\).
+
+
Integer Case
+
When dealing with integer forecasts, the standard PIT does not yield a uniform distribution even if the forecasts are perfectly calibrated. To address this, a randomized version of the PIT can be used (Czado, Gneiting, and Held 2009). For an integer-valued random variable \(X\) with CDF \(F_X\), the randomized PIT is defined as:
+
\[
+U = F_X(k) + v \cdot (F_X(k) - F_X(k-1))
+\]
+
where:
+
+
\(k\) is the observed integer value.
+
\(F_X(k)\) is the CDF evaluated at \(k\).
+
\(v\) is a random variable uniformly distributed on \([0, 1]\).
+
+
This transformation ensures that \(U\) is uniformly distributed on \([0, 1]\) if the forecasted distribution \(F_X\) is correctly specified.
+
+
+
+
+
Let’s first look at the overall PIT histogram.
+
+
sc_forecasts |>
+get_pit_histogram() |>
+ggplot(aes(x = mid, y = density)) +
+geom_col() +
+labs(title ="PIT histogram", x ="Quantile", y ="Density")
+
+
+
+
+
+
+
As before lets look at the PIT histogram by forecast horizon. To save space we will group horizons into a few days each:
What do you think of the PIT histograms? Do they look well calibrated? Do they look biased?
+
+
+
+
+
+
+
+
+Solution
+
+
+
+
+
+
+
It looks like the model is biased towards overpredicting and that this bias gets worse at longer forecast horizons.
+
Looking over forecast dates it looks like much of this bias is coming from near the outbreak peak where the model is consistently overpredicting but the model is also over predicting at other times.
+
+
+
+
+
+
+
+
Scoring on the log scale
+
We can also score on the logarithmic scale. This can be useful if we are interested in the relative performance of the model at different scales of the data, for example if we are interested in the model’s performance at capturing the exponential growth phase of the epidemic. In some sense scoring in this way can be an approximation of scoring the effective reproduction number estimates. Doing this directly can be difficult as the effective reproduction number is a latent variable and so we cannot directly score it.
+
We again use scoringutils but first transform both the forecasts and observations to the log scale.
Before we look in detail at the scores, what do you think the scores are telling you? How do you think they will differ from the scores on the natural scale?
+
+
+
+
+
CRPS
+
+
log_scores |>
+summarise_scores(by ="horizon") |>
+ggplot(aes(x = horizon, y = crps)) +
+geom_point() +
+labs(title ="CRPS by daily forecast horizon, scored on the log scale")
+
+
+
+
+
+
+
and across different forecast dates
+
+
log_scores |>
+summarise_scores(by ="target_day") |>
+ggplot(aes(x = target_day, y = crps)) +
+geom_point() +
+labs(title ="CRPS by forecast date, scored on the log scale")
+
+
+
+
+
+
+
+
+
+
+
+
+Take 5 minutes
+
+
+
+
How do the CRPS scores change based on forecast date? How do the CRPS scores change with forecast horizon? What does this tell you about the model?
+
+
+
+
+
+
+
+
+Solution
+
+
+
+
+
+
+
As for the natural scale CRPS scores increase with forecast horizon but now the increase appears to be linear vs exponential.
+
There has been a reduction in the CRPS scores for forecast dates near the outbreak peak compared to other forecast dates but this is still the period where the model is performing worst.
+
+
+
+
+
+
+
PIT histograms
+
Let’s first look at the overall PIT histogram.
+
+
log_sc_forecasts |>
+get_pit_histogram(by ="model") |>
+ggplot(aes(x = mid, y = density)) +
+geom_col() +
+labs(title ="PIT histogram, scored on the log scale")
+
+
+
+
+
+
+
As before lets look at the PIT histogram by forecast horizon
+
+
log_sc_forecasts |>
+mutate(group_horizon =case_when(
+ horizon <=3~"1-3",
+ horizon <=7~"4-7",
+ horizon <=14~"8-14"
+ )) |>
+get_pit_histogram(by ="group_horizon") |>
+ggplot(aes(x = mid, y = density)) +
+geom_col() +
+facet_wrap(~group_horizon) +
+labs(title ="PIT by forecast horizon, scored on the log scale")
+
+
+
+
+
+
+
and then for different forecast dates.
+
+
log_sc_forecasts |>
+get_pit_histogram(by ="target_day") |>
+ggplot(aes(x = mid, y = density)) +
+geom_col() +
+facet_wrap(~target_day) +
+labs(title ="PIT by forecast date, scored on the log scale")
+
+
+
+
+
+
+
+
+
+
+
+
+Take 5 minutes
+
+
+
+
What do you think of the PIT histograms? Do they look well calibrated? Do they look biased?
+
+
+
+
+
+
+
+
+Solution
+
+
+
+
+
+
+
The overall PIT histograms suggest that the model is less biased to over predict when scored on the log scale than the natural scale, but it is still biased. This makes sense when we think back to the comparison of reproduction number estimates and forecasts we made earlier where the model was consistently over predicting on the reproduction number.
+
By forecast horizon the model is still biased towards over predicting but this bias is less pronounced than on the natural scale.
+
Towards the end and beginning of the forecast period the model appears to be well calibrated on the log scale but is biased towards over predicting in the middle of the forecast period.
+
This matches with our knowledge of the underlying reproduction number which were initially constant and then began to decrease only to stabilise towards the end of the outbreak.
+
+
+
+
+
+
+
+
+
Going further
+
+
In which other ways could we summarise the performance of the forecasts?
+
What other metrics could we use?
+
There is no one-size-fits-all approach to forecast evaluation, often you will need to use a combination of metrics to understand the performance of your model and typically the metrics you use will depend on the context of the forecast. What attributes of the forecast are most important to you?
+
There are many other metrics that can be used to evaluate forecasts. The documentation for the {scoringutils} package has a good overview of these metrics and how to use them.
+
One useful way to think about evaluating forecasts is to consider exploring the scores as a data analysis in its own right. For example, you could look at how the scores change over time, how they change with different forecast horizons, or how they change with different models. This can be a useful way to understand the strengths and weaknesses of your model. Explore some of these aspects using the scores from this session.
+
+
+
+
Wrap up
+
+
+
References
+
+
+Bosse, Nikos I., Sam Abbott, Anne Cori, Edwin van Leeuwen, Johannes Bracher, and Sebastian Funk. 2023. “Scoring Epidemiological Forecasts on Transformed Scales.” Edited by James M McCaw. PLOS Computational Biology 19 (8): e1011393. https://doi.org/10.1371/journal.pcbi.1011393.
+
+Gneiting, Tilmann, and Adrian E Raftery. 2007. “Strictly Proper Scoring Rules, Prediction, and Estimation.”Journal of the American Statistical Association 102 (477): 359–78. https://doi.org/10.1198/016214506000001437.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/sessions/forecast-evaluation_files/figure-html/log-pit-histogram-1.png b/sessions/forecast-evaluation_files/figure-html/log-pit-histogram-1.png
new file mode 100644
index 0000000..4f74765
Binary files /dev/null and b/sessions/forecast-evaluation_files/figure-html/log-pit-histogram-1.png differ
diff --git a/sessions/forecast-evaluation_files/figure-html/log-pit-histogram-date-1.png b/sessions/forecast-evaluation_files/figure-html/log-pit-histogram-date-1.png
new file mode 100644
index 0000000..6776348
Binary files /dev/null and b/sessions/forecast-evaluation_files/figure-html/log-pit-histogram-date-1.png differ
diff --git a/sessions/forecast-evaluation_files/figure-html/log-pit-histogram-horizon-1.png b/sessions/forecast-evaluation_files/figure-html/log-pit-histogram-horizon-1.png
new file mode 100644
index 0000000..b6da822
Binary files /dev/null and b/sessions/forecast-evaluation_files/figure-html/log-pit-histogram-horizon-1.png differ
diff --git a/sessions/forecast-evaluation_files/figure-html/pit-histogram-1.png b/sessions/forecast-evaluation_files/figure-html/pit-histogram-1.png
new file mode 100644
index 0000000..a0ef623
Binary files /dev/null and b/sessions/forecast-evaluation_files/figure-html/pit-histogram-1.png differ
diff --git a/sessions/forecast-evaluation_files/figure-html/pit-histogram-date-1.png b/sessions/forecast-evaluation_files/figure-html/pit-histogram-date-1.png
new file mode 100644
index 0000000..55001e1
Binary files /dev/null and b/sessions/forecast-evaluation_files/figure-html/pit-histogram-date-1.png differ
diff --git a/sessions/forecast-evaluation_files/figure-html/pit-histogram-horizon-1.png b/sessions/forecast-evaluation_files/figure-html/pit-histogram-horizon-1.png
new file mode 100644
index 0000000..050b5c5
Binary files /dev/null and b/sessions/forecast-evaluation_files/figure-html/pit-histogram-horizon-1.png differ
diff --git a/sessions/forecast-evaluation_files/figure-html/unnamed-chunk-4-1.png b/sessions/forecast-evaluation_files/figure-html/unnamed-chunk-4-1.png
new file mode 100644
index 0000000..2e8df5a
Binary files /dev/null and b/sessions/forecast-evaluation_files/figure-html/unnamed-chunk-4-1.png differ
diff --git a/sessions/forecast-evaluation_files/figure-html/unnamed-chunk-5-1.png b/sessions/forecast-evaluation_files/figure-html/unnamed-chunk-5-1.png
new file mode 100644
index 0000000..8c7a26c
Binary files /dev/null and b/sessions/forecast-evaluation_files/figure-html/unnamed-chunk-5-1.png differ
diff --git a/sessions/forecast-evaluation_files/figure-html/unnamed-chunk-7-1.png b/sessions/forecast-evaluation_files/figure-html/unnamed-chunk-7-1.png
new file mode 100644
index 0000000..9da8ff4
Binary files /dev/null and b/sessions/forecast-evaluation_files/figure-html/unnamed-chunk-7-1.png differ
diff --git a/sessions/forecast-evaluation_files/figure-html/unnamed-chunk-8-1.png b/sessions/forecast-evaluation_files/figure-html/unnamed-chunk-8-1.png
new file mode 100644
index 0000000..40054af
Binary files /dev/null and b/sessions/forecast-evaluation_files/figure-html/unnamed-chunk-8-1.png differ
diff --git a/sessions/forecast-visualisation.html b/sessions/forecast-visualisation.html
new file mode 100644
index 0000000..cd623b6
--- /dev/null
+++ b/sessions/forecast-visualisation.html
@@ -0,0 +1,1421 @@
+
+
+
+
+
+
+
+
+
+Visualising infectious disease forecasts – UEIFID
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Epidemiological forecasts are probabilistic statements about what might happen to population disease burden in the future. In this session we will make some simple forecasts using a commonly used infectious disease model, based on the renewal equation. We will see how we can visualise such forecasts, and visually compare them to what really happened.
The aim of this session is to introduce the concept of forecasting and forecast visualisation using a simple model.
+
+
+
+
+
+
+Setup
+
+
+
+
+
+
+
Source file
+
The source file of this session is located at sessions/forecasting-visualisation.qmd.
+
+
+
Libraries used
+
In this session we will use the nfidd package to load a data set of infection times and corresponding forecasts, the dplyr package for data wrangling, and the ggplot2 library for plotting.
The best way to interact with the material is via the Visual Editor of RStudio.
+
+
+
+
+
Initialisation
+
We set a random seed for reproducibility. Setting this ensures that you should get exactly the same results on your computer as we do.
+
+
set.seed(123)
+
+
+
+
+
+
+
+
+
What is forecasting?
+
Forecasting is the process of making predictions about the future based on past and present data. In the context of infectious disease epidemiology, forecasting is usually the process of predicting the future course of some metric of infectious disease incidence or prevalence based on past and present data. Here we focus on forecasting observed data (the number of individuals with new symptom onset) but forecasts can also be made for other quantities of interest such as the number of infections, the reproduction number, or the number of deaths. Epidemiological forecasting is closely related to nowcasting and, when using mechanistic approaches, estimation of the reproduction number. In fact, the model we will use for forecasting could also be used for real-time estimation of the reproduction number and nowcasting, two tasks we won’t have time to go into here but which are discussed in the longer version of this course. The only difference to these tasks is that here we extend the model into the future.
+
+
+
Extending a model into the future
+
We will use a data set of infections we generate using functions from the nfidd R package, and look at forecasts made using a renewal equation model. This model assumes that number infectious people determines the number of future infectious people via the reproduction number \(R_t\).
+
+
+
The reproduction number \(R_t\)
+
The reproduction number \(R_t\) (sometimes called the effective reproduction number) more generally describes the average number of secondary infections caused by a single infectious individual and can vary in time and space as a function of differences in population level susceptibility, changes in behaviour, policy, seasonality etc. The reproduction number \(R_t\) is therefore a more general concept than the basic reproduction number \(R_0\) which represents the average number of secondary infections caused by a single infectious individual in a completely susceptible population.
+
+
+
+
+
+
+
+The discrete renewal equation (optional)
+
+
+
+
+
+
The renewal equation represents a general model of infectious disease transmission which includes the SIR model as a special case. In its basic form it makes no assumption about the specific processes that cause \(R_t\) to have a certain value and/or change over time, but instead it only relates the number of infected people in the population, the current value of the reproduction number and a delay distribution that represents the timings of when individuals infect others relative to when they themselves became infected, the so-called generation time. Mathematically, it can be written as
Here, \(I_t\) is the number of infected individuals on day \(t\), \(R_t\) is the current value of the reproduction number and \(g_i\) is the probability of a secondary infection occurring \(i\) days after the infector became infected themselves, with a maximum \(g_\mathrm{max}\).
+
The forecasting model we visualise in this session is based on \(R_t\) doing a geometric random walk in time. The step size of this geometric random walk is estimated from data, and this is then used to project into the future. For more information on this approach to forecasting see, for example, Funk et al. (2018).
+
+
+
+
+
+
+The geometric random walk model (optional)
+
+
+
+
+
+
We might expect the reproduction number to change smoothly over time (except in situations of drastic change such as a very effective intervention) and to be similar at adjacent time points. We can model this by assuming that the reproduction number at time \(t\) is a random draw from a normal distribution with mean equal to the reproduction number at time \(t-1\) and some standard deviation \(\sigma\). This can be described as a random walk model for the reproduction number. In fact, rather than using this model directly, a better choice might be to use a model where the logarithm of the reproduction number does a random walk, as this will ensure that the reproduction number is always positive and that changes are multiplicative rather than additive (i.e as otherwise the same absolute change in the reproduction number would have a larger effect when the reproduction number is small which likely doesn’t match your intuition for how outbreaks evolve over time). We can write this model as
Here we have placed a prior on the standard deviation of the random walk, which we have assumed to be half-normal (i.e., normal but restricted to being non-negative) with a mean of 0 and a standard deviation of 0.05. This is a so-called weakly informative prior that allows for some variation in the reproduction number over time but not an unrealistic amount. We have also placed a prior on the initial reproduction number, which we have assumed to be log-normally distributed with a mean of -0.1 and a standard deviation of 0.5. This is a weakly informative prior that allows for a wide range of initial reproduction numbers but has a mean of approximately 1. The last line is the geometric random walk.
+
+
Simulating a geometric random walk
+
You can have a look at an R function for performing the geometric random walk:
+
+
geometric_random_walk
+
+
function (init, noise, std)
+{
+ n <- length(noise) + 1
+ x <- numeric(n)
+ x[1] <- init
+ for (i in 2:n) {
+ x[i] <- x[i - 1] + noise[i - 1] * std
+ }
+ return(exp(x))
+}
+<bytecode: 0x556748ce16a8>
+<environment: namespace:nfidd>
+
+
+
We can use this function to simulate a random walk:
+
+
R <-geometric_random_walk(init =1, noise =rnorm(100), std =0.1)
+data <-tibble(t =seq_along(R), R =exp(R))
+
+ggplot(data, aes(x = t, y = R)) +
+geom_line() +
+labs(title ="Simulated data from a random walk model",
+x ="Time",
+y ="R")
+
+
+
+
+
+
+
You can repeat this multiple times either with the same parameters or changing some to get a feeling for what this does.
+
+
+
+
+
+
+
+
We will now look at some data from an infectious disease outbreak and forecasts from a renewal equation model. First, we simulate some data using functions from the nfidd package.
This uses a data set of infection times that is included in the R package, and a function to turn these into daily number of observed symptom onsets. Do not worry too much about the details of this. The important thing is that we now have a data set onset_df of symptom onset.
+
+
+
+
+
+
+How does this work?
+
+
+
+
+
+
If you want to see how this works look at the function code by typing simulate_onsets in the console.
+
+
+
+
+
Visualising the forecast
+
We can now visualise a forecast made from the renewal equation model we used for forecasting. Once again, this forecast is provided in the nfidd package which we loaded earlier. We will first extract the forecast and then plot the forecasted number of symptom onsets alongside the observed number of symptom onsets before the forecast was made.
Some important things to note about these forecasts:
+
+
We used a 14 day forecast horizon.
+
Each forecast used all the data up to the forecast date.
+
We generated 1000 predictive posterior samples for each forecast.
+
We started forecasting 3 weeks into the outbreak and then forecast every 7 days until the end of the data (excluding the last 14 days to allow a full forecast).
+
We use the same simulated outbreak data as before:
In this plot, the dots show the data and the lines are forecast trajectories that the model deems plausible and consistent with the data so far.
+
+
+
+
+
+
+Take 5 minutes
+
+
+
+
What do you think of this forecast? Does it match your intuition of how this outbreak will continue? Is there another way you could visualise the forecast that might be more informative?
+
+
+
+
+
+
+
+
+Solution
+
+
+
+
+
+
+
The forecast mainly predicts things to continue growing as they have been. However, some of the trajectories are going down, indicating that with some probabilities the outbreak might end.
+
Based purely on the data and without knowing anything else about the disease and setting, it is hard to come up with an alternative. The model seems sensible given the available data. In particular, uncertainty increases with increasing distance to the data, which is a sign of a good forecasting model.
+
Instead of visualising plausible trajectories we could visualise a cone with a central forecast and uncertainty around it. We will look at this in the next session as an alternative.
+
+
+
+
+
If we want to know how well a model is doing at forecasting, we can later compare it do the data of what really happened. If we do this, we get the following plot.
What do you think now of this forecast? Did the model do a good job? Again, is there another way you could visualise the forecast that might be more informative?
+
+
+
+
+
+
+
+
+Solution
+
+
+
+
+
+
+
On the face of it the forecast looks very poor with some very high predictions compared to the data.
+
Based on this visualisation it is hard to tell if the model is doing a good job but it seems like it is not.
+
As outbreaks are generally considered to be exponential processes it might be more informative to plot the forecast on the log scale.
Warning in scale_y_log10(): log-10 transformation introduced infinite values.
+
+
+
+
+
+
+
+
This should be a lot more informative. We see that for longer forecast horizons the model is not doing a great job of capturing the reduction in symptom onsets. However, we can now see that the model seems to be producing very reasonable forecasts for the first week or so of the forecast. This is a common pattern in forecasting where a model is good at capturing the short term dynamics but struggles with the longer term dynamics.
+
+
+
+
We managed to learn quite a lot about our model’s forecasting limitations just looking at a single forecast using visualisations. However, what if we wanted to quantify how well the model is doing? In order to do that we need to look at multiple forecasts from the model.
+
+
+
Visualising multiple forecasts from a single model
+
As for a single forecast, our first step is to visualise the forecasts as this can give us a good idea of how well the model is doing without having to calculate any metrics.
+
+
rw_forecasts |>
+filter(.draw %in%sample(.draw, 100)) |>
+ggplot(aes(x = day)) +
+geom_line(
+aes(y = .value, group =interaction(.draw, target_day), col = target_day),
+alpha =0.1
+ ) +
+geom_point(
+data = onset_df |>
+filter(day >=21),
+aes(x = day, y = onsets), color ="black") +
+scale_color_binned(type ="viridis") +
+labs(title ="Weekly forecasts of symptom onsets over an outbreak",
+col ="Forecast start day")
+
+
+
+
+
+
+
As for the single forecast it may be helpful to also plot the forecast on the log scale.
What do you think of these forecasts? Are they any good? How well do they capture changes in trend? Does the uncertainty seem reasonable? Do they seem to under or over predict consistently? Would you visualise the forecast in a different way?
+
+
+
+
+
+
+
+
+Solution
+
+
+
+
+
+
What do you think of these forecasts?
+
+
We think these forecasts are a reasonable place to start but there is definitely room for improvement.
+
+
Are they any good?
+
+
They seem to do a reasonable job of capturing the short term dynamics but struggle with the longer term dynamics.
+
+
How well do they capture changes in trend?
+
+
There is little evidence of the model capturing the reduction in onsets before it begins to show in the data.
+
+
Does the uncertainty seem reasonable?
+
+
On the natural scale it looks like the model often over predicts. Things seem more balanced on the log scale but the model still seems to be overly uncertain.
+
+
Do they seem to under or over predict consistently?
+
+
It looks like the model is consistently over predicting on the natural scale but this is less clear on the log scale.
+
+
+
+
+
+
+
Going further
+
+
What other ways could we visualise the forecasts especially if we had forecasts from multiple models and locations or both?
+
Based on these visualisations how could we improve the model?
+
+
+
+
Wrap up
+
+
+
References
+
+
+Funk, Sebastian, Anton Camacho, Adam J. Kucharski, Rosalind M. Eggo, and W. John Edmunds. 2018. “Real-Time Forecasting of Infectious Disease Dynamics with a Stochastic Semi-Mechanistic Model.”Epidemics 22 (March): 56–61. https://doi.org/10.1016/j.epidem.2016.11.003.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/sessions/forecast-visualisation_files/figure-html/plot-all-forecasts-1.png b/sessions/forecast-visualisation_files/figure-html/plot-all-forecasts-1.png
new file mode 100644
index 0000000..57d482d
Binary files /dev/null and b/sessions/forecast-visualisation_files/figure-html/plot-all-forecasts-1.png differ
diff --git a/sessions/forecast-visualisation_files/figure-html/plot-all-forecasts-log-1.png b/sessions/forecast-visualisation_files/figure-html/plot-all-forecasts-log-1.png
new file mode 100644
index 0000000..e7b6777
Binary files /dev/null and b/sessions/forecast-visualisation_files/figure-html/plot-all-forecasts-log-1.png differ
diff --git a/sessions/forecast-visualisation_files/figure-html/plot_forecast-1.png b/sessions/forecast-visualisation_files/figure-html/plot_forecast-1.png
new file mode 100644
index 0000000..cfe1a4d
Binary files /dev/null and b/sessions/forecast-visualisation_files/figure-html/plot_forecast-1.png differ
diff --git a/sessions/forecast-visualisation_files/figure-html/plot_forecast_log-1.png b/sessions/forecast-visualisation_files/figure-html/plot_forecast_log-1.png
new file mode 100644
index 0000000..2406419
Binary files /dev/null and b/sessions/forecast-visualisation_files/figure-html/plot_forecast_log-1.png differ
diff --git a/sessions/forecast-visualisation_files/figure-html/plot_future_forecast-1.png b/sessions/forecast-visualisation_files/figure-html/plot_future_forecast-1.png
new file mode 100644
index 0000000..2914fed
Binary files /dev/null and b/sessions/forecast-visualisation_files/figure-html/plot_future_forecast-1.png differ
diff --git a/sessions/forecast-visualisation_files/figure-html/simulate-geometric-walk-1.png b/sessions/forecast-visualisation_files/figure-html/simulate-geometric-walk-1.png
new file mode 100644
index 0000000..613a44c
Binary files /dev/null and b/sessions/forecast-visualisation_files/figure-html/simulate-geometric-walk-1.png differ
diff --git a/sessions/introduction-and-course-overview.html b/sessions/introduction-and-course-overview.html
new file mode 100644
index 0000000..4984b81
--- /dev/null
+++ b/sessions/introduction-and-course-overview.html
@@ -0,0 +1,864 @@
+
+
+
+
+
+
+
+
+
+Introduction and course overview – UEIFID
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Visualize and interpret infectious disease forecasts
+
Evaluate forecast performance and limitations
+
Combine multiple forecasts into ensembles
+
+
Predictive modeling has become crucial for public health decision-making and pandemic preparedness. Through this workshop, we aim to make forecast interpretation and evaluation more accessible to academics and public health institutions.
+
+
+
Approach
+
Each session in the course:
+
+
builds on the previous one so that participants will have an overview of forecast evaluation by the end of the course;
+
starts with a short introductory talk;
+
mainly consists of interactive content that participants will work through;
+
has optional/additional material that can be skipped or completed after the course ends;
+
+
For those attending the in-person version the course also:
+
+
has multiple instructors ready to answer questions about this content; if several people have a similar question we may pause the session and discuss it with the group;
+
ends with a wrap-up and discussion where we review the sessions material.
+
+
+
+
Timeline for the course
+
The course consists of four main sessions spread across an afternoon. If you are attending the in-person version of the course, the schedule is available here. If you are studying this material on your own using the website, you can go through it at your own pace.