Skip to content

Commit

Permalink
Update vignette
Browse files Browse the repository at this point in the history
  • Loading branch information
nikosbosse authored and seabbs committed Dec 19, 2023
1 parent b591a78 commit 0b01113
Showing 1 changed file with 47 additions and 73 deletions.
120 changes: 47 additions & 73 deletions vignettes/scoringutils.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,6 @@ set_forecast_unit(
colnames()
```


## Constructing a forecast object

The function `as_forecast()` is be used to construct a forecast object and check the input data. It determines the forecast type, creates an object of the appropriate class (`forecast_binary`, `forecast_quantile` or `forecast_sample`), and validates the input data. Objects of class `forecast_*` have a print method that provides additional information.
Expand All @@ -70,27 +69,60 @@ head(example_quantile)
```

```{r}
as_forecast(example_quantile)
forecast_quantile <- example_quantile %>%
set_forecast_unit(
c("model", "location", "target_end_date", "forecast_date",
"target_type", "horizon")
) %>%
as_forecast()
forecast_quantile
```

If you are unsure what your input data should look like, have a look at the `example_quantile`, `example_integer`, `example_continuous` and `example_binary` data sets provided in the package.

The output of `as_forecast()` can later be directly used as input to `score()`. This is the recommended workflow (however, `score()` will run `as_forecast()` internally if this hasn't happened before).

Note that in the above example some columns contain duplicated information with regards to the forecast unit, e.g. "location" and "location_name", and can be dropped. If we drop essential information, for example, the "target_type" column, we'll get an error informing us that the forecasts aren't uniquely identified any more.

```{r, error=TRUE}
example_quantile %>%
set_forecast_unit(
c("location", "target_end_date",
"forecast_date", "model", "horizon")
) %>%
as_forecast()
```

The function `get_duplicate_forecasts()` may help to investigate the issue. When filtering for only a single quantile of the EuroCOVIDhub-ensemble, we can see that there are indeed two forecasts for every date, location and horizon.

```{r}
duplicates <- example_quantile %>%
set_forecast_unit(
c("location", "target_end_date",
"forecast_date", "model", "horizon")
) %>%
get_duplicate_forecasts()
duplicates[quantile == 0.5 & model == "EuroCOVIDhub-ensemble", ] %>%
head()
```


## Showing available forecasts

The function `get_forecast_counts()` may also be helpful to determine where forecasts are available. Using the `by` argument you can specify the level of summary. For example, to see how many forecasts there are per model and target_type, we can run

```{r}
get_forecast_counts(example_quantile, by = c("model", "target_type"))
get_forecast_counts(forecast_quantile, by = c("model", "target_type"))
```

We see that 'epiforecasts-EpiNow2' has some missing forecasts for the deaths forecast target and that UMass-MechBayes has no case forecasts.

This information can also be visualised using `plot()`:

```{r, fig.width=11, fig.height=6}
example_quantile %>%
forecast_quantile %>%
get_forecast_counts(by = c("model", "forecast_date", "target_type")) %>%
plot_forecast_counts(x = "forecast_date") +
facet_wrap(~ target_type)
Expand All @@ -99,7 +131,7 @@ example_quantile %>%
You can also visualise forecasts directly using the `plot_predictions()` function:

```{r, fig.width = 9, fig.height = 6}
example_quantile %>%
forecast_quantile %>%
make_NA(
what = "truth",
target_end_date >= "2021-07-15",
Expand All @@ -124,79 +156,21 @@ Forecasts can easily be scored using the `score()` function.
For clarity, we suggest setting the forecast unit explicitly and calling `as_forecast()` explicitly.

```{r}
scores <- example_quantile %>%
set_forecast_unit(
c("location", "target_end_date", "target_type", "location_name",
"forecast_date", "model", "horizon")
) %>%
as_forecast() %>%
scores <- forecast_quantile %>%
score()
head(scores)
```

Note that in the above example some columns contain duplicated information with regards to the forecast unit, e.g. "location" and "location_name", and could be dropped.

```{r}
example_quantile %>%
set_forecast_unit(
c("location", "target_end_date", "target_type",
"forecast_date", "model", "horizon")
) %>%
as_forecast()
```

If we drop essential information, for example, the "target_type" column, we'll get an error informing us that the forecasts aren't uniquely identified any more.

```{r, error=TRUE}
example_quantile %>%
set_forecast_unit(
c("location", "target_end_date",
"forecast_date", "model", "horizon")
) %>%
as_forecast()
```

The function `get_duplicate_forecasts()` may help to investigate the issue. When filtering for only median forecasts of the EuroCOVIDhub-ensemble, we can see that there are indeed two forecasts for every date, location and horizon.

```{r}
duplicates <- example_quantile %>%
set_forecast_unit(
c("location", "target_end_date",
"forecast_date", "model", "horizon")
) %>%
get_duplicate_forecasts()
duplicates[quantile == 0.5 & model == "EuroCOVIDhub-ensemble", ] %>%
head()
print(scores, 2)
```

The function `score()` returns unsumarised scores, which in most cases is not what the user wants. It returns a single score per forecast (as determined by the forecast unit). For forecasts in a quantile format, it returns one score per quantile.
The function `score()` returns unsumarised scores, which in most cases is not what the user wants. It returns a single score per forecast (as determined by the forecast unit).

A second function, `summarise_scores()` takes care of summarising these scores to the level specified by the user. The `by` argument can be used to define the level of summary. By default, `by = NULL` and the summary unit is assumed to be the same as the unit of a single forecast. For continuous forecasts, this means that nothing happens if `by` isn't specified.
A second function, `summarise_scores()` takes care of summarising these scores to the level specified by the user. The `by` argument can be used to define the level of summary. By default, `by = NULL` and the summary unit is assumed to be the same as the unit of a single forecast (meaning that nothing happens if `by` isn't specified).

```{r}
scores <- score(example_continuous)
all(scores == summarise_scores(scores), na.rm = TRUE)
```

For quantile based forecasts, if `by = NULL`, then scores are summarised across quantiles and instead of one score per forecast_unit and quantile we only get one score per forecast unit.

```{r}
scores <- example_quantile %>%
set_forecast_unit(
c("location", "target_end_date", "target_type",
"forecast_date", "model", "horizon")
) %>%
as_forecast() %>%
score()
head(scores)
scores %>%
summarise_scores() %>%
head()
```

Through the `by` argument we can specify what unit of summary we want. We can also call `sumarise_scores()` multiple tines, e.g to round your outputs by specifying e.g. `signif()` as a summary function.

```{r}
Expand Down Expand Up @@ -330,13 +304,13 @@ Relative scores for different models can be computed using pairwise comparisons,
In `scoringutils`, pairwise comparisons can be made in two ways: Through the standalone function `pairwise_comparison()` or from within `summarise_scores()` which simply adds relative scores to an existing set of scores.

```{r}
example_quantile %>%
forecast_quantile %>%
score() %>%
pairwise_comparison(by = "model", baseline = "EuroCOVIDhub-baseline")
```

```{r}
example_quantile %>%
forecast_quantile %>%
score() %>%
summarise_scores(by = "model") %>%
add_pairwise_comparison(baseline = "EuroCOVIDhub-baseline")
Expand All @@ -345,7 +319,7 @@ example_quantile %>%
If using the `pairwise_comparison()` function, we can also visualise pairwise comparisons by showing the mean score ratios between models. By default, smaller values are better and the model we care about is showing on the y axis on the left, while the model against it is compared is shown on the x-axis on the bottom. In the example above, the EuroCOVIDhub-ensemble performs best (it only has values smaller 1), while the EuroCOVIDhub-baseline performs worst (and only has values larger than 1). For cases, the UMass-MechBayes model is of course excluded as there are no case forecasts available and therefore the set of overlapping forecasts is empty.

```{r, fig.width=9, fig.height=7}
example_quantile %>%
forecast_quantile %>%
score() %>%
pairwise_comparison(by = c("model", "target_type")) %>%
plot_pairwise_comparison() +
Expand All @@ -360,7 +334,7 @@ example_quantile %>%
It may sometimes be interesting to see how different scores correlate with each other. We can examine this using the function `correlation()`. When dealing with quantile-based forecasts, it is important to call `summarise_scorees()` before `correlation()` to summarise over quantiles before computing correlations.

```{r}
example_quantile %>%
forecast_quantile %>%
score() %>%
summarise_scores() %>%
correlation()
Expand All @@ -369,7 +343,7 @@ example_quantile %>%
Visualising correlations:

```{r}
example_quantile %>%
forecast_quantile %>%
score() %>%
summarise_scores() %>%
correlation(digits = 2) %>%
Expand All @@ -381,7 +355,7 @@ example_quantile %>%
<!-- If you would like to see how different forecast interval ranges contribute to average scores, you can visualise scores by interval range: -->

```{r, eval = FALSE, include = FALSE}
example_quantile %>%
forecast_quantile %>%
score() %>%
summarise_scores(by = c("model", "range", "target_type")) %>%
plot_ranges() +
Expand Down

0 comments on commit 0b01113

Please sign in to comment.