Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework add coverage to work with raw forecasts #426

Merged
merged 8 commits into from
Nov 15, 2023
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ importFrom(data.table,nafill)
importFrom(data.table,rbindlist)
importFrom(data.table,setDT)
importFrom(data.table,setattr)
importFrom(data.table,setcolorder)
importFrom(data.table,setnames)
importFrom(ggdist,geom_lineribbon)
importFrom(ggplot2,.data)
Expand Down
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ The update introduces a lot of breaking changes. If you want to keep using the o
- `quantile`: numeric, a vector with quantile-levels. Can alternatively be a matrix of the same shape as `predicted`.
- `check_forecasts()` was replaced by a new function `validate()`. `validate()` validates the input and in that sense fulfills the purpose of `check_forecasts()`. It has different methods: `validate.default()` assigns the input a class based on their forecast type. Other methods validate the input specifically for the various forecast types.
- The functionality for computing pairwise comparisons was now split from `summarise_scores()`. Instead of doing pairwise comparisons as part of summarising scores, a new function, `add_pairwise_comparison()`, was introduced that takes summarised scores as an input and adds pairwise comparisons to it.
- `add_coverage()` was reworked completely. It's new purpose is now to add coverage information to the raw forecast data (essentially fulfilling some of the functionality that was previously covered by `score_quantile()`)
- The function `find_duplicates()` was renamed to `get_duplicate_forecasts()`
- Changes to `avail_forecasts()` and `plot_avail_forecasts()`:
- The function `avail_forecasts()` was renamed to `available_forecasts()` for consistency with `available_metrics()`. The old function, `avail_forecasts()` is still available as an alias, but will be removed in the future.
Expand Down
83 changes: 83 additions & 0 deletions R/add_coverage.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
#' @title Add Coverage Values to Quantile-Based Forecasts
#'
#' @description Adds interval coverage of central prediction intervals,
#' quantile coverage for predictive quantiles, as well as the deviation between
#' desired and actual coverage to a data.table. Forecasts should be in a
#' quantile format (following the input requirements of `score()`).
#'
#' **Interval coverage**
#'
#' Coverage for a given interval range is defined as the proportion of
#' observations that fall within the corresponding central prediction intervals.
#' Central prediction intervals are symmetric around the median and and formed
#' by two quantiles that denote the lower and upper bound. For example, the 50%
#' central prediction interval is the interval between the 0.25 and 0.75
#' quantiles of the predictive distribution.
#'
#' The function `add_coverage()` computes the coverage per central prediction
#' interval, so the coverage will always be either `TRUE` (observed value falls
#' within the interval) or `FALSE` (observed value falls outside the interval).
#' You can summarise the coverage values to get the proportion of observations
#' that fall within the central prediction intervals.
#'
#' **Quantile coverage**
#'
#' Quantile coverage for a given quantile is defined as the proportion of
#' observed values that are smaller than the corresponding predictive quantile.
#' For example, the 0.5 quantile coverage is the proportion of observed values
#' that are smaller than the 0.5 quantile of the predictive distribution.
#'
#' **Coverage deviation**
#'
#' The coverage deviation is the difference between the desired coverage and the
#' actual coverage. For example, if the desired coverage is 90% and the actual
#' coverage is 80%, the coverage deviation is -0.1.
#'
#' @inheritParams score
#' @return a data.table with the input and columns "interval_coverage",
#' "interval_coverage_deviation", "quantile_coverage",
#' "quantile_coverage_deviation" added.
#' @importFrom data.table setcolorder
#' @examples
#' library(magrittr) # pipe operator
#' example_quantile %>%
#' add_coverage()
#' @export
#' @keywords scoring
#' @export
add_coverage <- function(data) {
stored_attributes <- get_scoringutils_attributes(data)
data <- validate(data)
forecast_unit <- get_forecast_unit(data)
data_cols <- colnames(data) # store so we can reset column order later

# what happens if quantiles are not symmetric around the median?
# should things error? Also write tests for that.
interval_data <- quantile_to_interval(data, format = "wide")
interval_data[, interval_coverage := ifelse(

Check warning on line 57 in R/add_coverage.R

View workflow job for this annotation

GitHub Actions / lint-changed-files

file=R/add_coverage.R,line=57,col=40,[redundant_ifelse_linter] Just use the logical condition (or its negation) directly instead of calling ifelse(x, TRUE, FALSE)
observed <= upper & observed >= lower,

Check warning on line 58 in R/add_coverage.R

View workflow job for this annotation

GitHub Actions / lint-changed-files

file=R/add_coverage.R,line=58,col=4,[indentation_linter] Hanging indent should be 46 spaces but is 4 spaces.
TRUE,
FALSE)
][, c("lower", "upper", "observed") := NULL]

data[, range := get_range_from_quantile(quantile)]

data <- merge(interval_data, data, by = unique(c(forecast_unit, "range")))
data[, interval_coverage_deviation := interval_coverage - range / 100]
data[, quantile_coverage := observed <= predicted]
data[, quantile_coverage_deviation := quantile_coverage - quantile]

# reset column order
new_metrics <- c("interval_coverage", "interval_coverage_deviation",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use the metrices functions vs duplicating here to make the code cleaner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment not really - the thing is that the metrics functions do a many-to-one mapping (i.e. you get one coverage value per forecast (which comprises several quantiles)), whereas this is one-to-one, i.e. one value per quantile / interval.
We need to think again about how to handle one-to-one functions that exist currently and what to do with them: #451

"quantile_coverage", "quantile_coverage_deviation")
setcolorder(data, unique(c(data_cols, "range", new_metrics)))

# add coverage "metrics" to list of stored metrics
# this makes it possible to use `summarise_scores()` later on
stored_attributes[["metric_names"]] <- c(
stored_attributes[["metric_names"]],
new_metrics
)
data <- assign_attributes(data, stored_attributes)
return(data[])
}
2 changes: 2 additions & 0 deletions R/get_-functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,8 @@ get_protected_columns <- function(data = NULL) {
protected_columns <- c(
"predicted", "observed", "sample_id", "quantile", "upper", "lower",
"pit_value", "range", "boundary", "relative_skill", "scaled_rel_skill",
"interval_coverage", "interval_coverage_deviation",
"quantile_coverage", "quantile_coverage_deviation",
available_metrics(),
grep("coverage_", names(data), fixed = TRUE, value = TRUE)
)
Expand Down
2 changes: 1 addition & 1 deletion R/plot.R
Original file line number Diff line number Diff line change
Expand Up @@ -470,7 +470,7 @@
# it separately here to deal with the case when only the median is provided
# (in which case ggdist::geom_lineribbon() will fail)
if (0 %in% range) {
select_median <- (forecasts$range %in% 0 & forecasts$boundary == "lower")

Check warning on line 473 in R/plot.R

View workflow job for this annotation

GitHub Actions / lint-changed-files

file=R/plot.R,line=473,col=23,[scalar_in_linter] Use == to match length-1 scalars, not %in%. Note that == preserves NA where %in% does not.
median <- forecasts[select_median]

if (nrow(median) > 0) {
Expand Down Expand Up @@ -613,7 +613,7 @@
colour = "grey",
linetype = "dashed"
) +
geom_line(aes(y = coverage * 100)) +
geom_line(aes(y = interval_coverage * 100)) +
theme_scoringutils() +
ylab("% Obs inside interval") +
xlab("Nominal interval coverage") +
Expand Down Expand Up @@ -968,7 +968,7 @@
#' ) +
#' facet_wrap("target_type")

plot.scoringutils_available_forecasts <- function(x,

Check warning on line 971 in R/plot.R

View workflow job for this annotation

GitHub Actions / lint-changed-files

file=R/plot.R,line=971,col=1,[object_length_linter] Variable and function names should not be longer than 30 characters.
yvar = "model",
xvar = "forecast_date",
make_xvar_factor = TRUE,
Expand Down
75 changes: 0 additions & 75 deletions R/summarise_scores.R
Original file line number Diff line number Diff line change
Expand Up @@ -111,8 +111,8 @@
stored_attributes <- c(
get_scoringutils_attributes(scores),
list(
"scoringutils_by" = by,

Check warning on line 114 in R/summarise_scores.R

View workflow job for this annotation

GitHub Actions / lint-changed-files

file=R/summarise_scores.R,line=114,col=7,[keyword_quote_linter] Only quote named arguments to functions if necessary, i.e., if the name is not a valid R symbol (see ?make.names).
"unsummarised_scores" = scores

Check warning on line 115 in R/summarise_scores.R

View workflow job for this annotation

GitHub Actions / lint-changed-files

file=R/summarise_scores.R,line=115,col=7,[keyword_quote_linter] Only quote named arguments to functions if necessary, i.e., if the name is not a valid R symbol (see ?make.names).
)
)

Expand Down Expand Up @@ -306,78 +306,3 @@
}
return(relative_skill)
}



#' @title Add coverage of central prediction intervals
#'
#' @description Adds a column with the coverage of central prediction intervals
#' to unsummarised scores as produced by [score()]
#'
#' The coverage values that are added are computed according to the values
#' specified in `by`. If, for example, `by = "model"`, then there will be one
#' coverage value for every model and [add_coverage()] will compute the coverage
#' for every model across the values present in all other columns which define
#' the unit of a single forecast.
#'
#' @inheritParams summarise_scores
#' @param by character vector with column names to add the coverage for.
#' @param ranges numeric vector of the ranges of the central prediction intervals
#' for which coverage values shall be added.
#' @return a data.table with unsummarised scores with columns added for the
#' coverage of the central prediction intervals. While the overall data.table
#' is still unsummarised, note that for the coverage columns some level of
#' summary is present according to the value specified in `by`.
#' @examples
#' library(magrittr) # pipe operator
#' score(example_quantile) %>%
#' # add_coverage(by = c("model", "target_type")) %>%
#' summarise_scores(by = c("model", "target_type")) %>%
#' summarise_scores(fun = signif, digits = 2)
#' @export
#' @keywords scoring

add_coverage <- function(scores,
by = NULL,
ranges = c(50, 90)) {

stored_attributes <- get_scoringutils_attributes(scores)
if (!is.null(attr(scores, "unsummarised_scores"))) {
scores <- attr(scores, "unsummarised_scores")
}

if (is.null(by) && !is.null(stored_attributes[["scoringutils_by"]])) {
by <- stored_attributes[["scoringutils_by"]]
} else if (is.null(by)) {
# Need to check this again.
# (mentioned in https://github.com/epiforecasts/scoringutils/issues/346)
by <- get_forecast_unit(scores)
}

summarised_scores <- summarise_scores(
scores,
by = c(by, "range")
)[range %in% ranges]


# create cast formula
cast_formula <-
paste(
paste(by, collapse = "+"),
"~",
"paste0('coverage_', range)"
)

coverages <- dcast(
summarised_scores,
value.var = "coverage",
formula = cast_formula
)

scores_with_coverage <- merge(scores, coverages, by = by)
scores_with_coverage <- assign_attributes(
scores_with_coverage, stored_attributes
)

return(scores_with_coverage[])
}
4 changes: 4 additions & 0 deletions R/utils_data_handling.R
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@
basenames_overlap <- sub(".x$", "", overlap_names)

# delete overlapping columns
if (length(basenames_overlap > 0)) {

Check warning on line 73 in R/utils_data_handling.R

View workflow job for this annotation

GitHub Actions / lint-changed-files

file=R/utils_data_handling.R,line=73,col=7,[length_test_linter] Checking the length of a logical vector is likely a mistake. Did you mean `length(basenames_overlap) > 0`?
combined[, paste0(basenames_overlap, ".x") := NULL]
combined[, paste0(basenames_overlap, ".y") := NULL]
}
Expand Down Expand Up @@ -101,7 +101,7 @@
sample_to_quantile <- function(data,
quantiles = c(0.05, 0.25, 0.5, 0.75, 0.95),
type = 7) {
if (!is.data.table(data)) {

Check warning on line 104 in R/utils_data_handling.R

View workflow job for this annotation

GitHub Actions / lint-changed-files

file=R/utils_data_handling.R,line=104,col=7,[if_not_else_linter] In a simple if/else statement, prefer `if (A) x else y` to the less-readable `if (!A) y else x`.
data <- data.table::as.data.table(data)
} else {
data <- copy(data)
Expand Down Expand Up @@ -208,7 +208,7 @@
format = "long",
keep_quantile_col = FALSE,
...) {
if (!is.data.table(dt)) {

Check warning on line 211 in R/utils_data_handling.R

View workflow job for this annotation

GitHub Actions / lint-changed-files

file=R/utils_data_handling.R,line=211,col=7,[if_not_else_linter] In a simple if/else statement, prefer `if (A) x else y` to the less-readable `if (!A) y else x`.
dt <- data.table::as.data.table(dt)
} else {
# use copy to avoid
Expand All @@ -230,6 +230,10 @@
if (format == "wide") {
delete_columns(dt, "quantile")
dt <- dcast(dt, ... ~ boundary, value.var = "predicted")
# if there are NA values in `predicted`, this introduces a column "NA"
if ("NA" %in% colnames(dt) && all(is.na(dt[["NA"]]))) {
dt[, "NA" := NULL]
}
}
return(dt[])
}
Expand Down
3 changes: 3 additions & 0 deletions R/z_globalVariables.R
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,12 @@ globalVariables(c(
"identifCol",
"Interval_Score",
"interval_range",
"interval_coverage",
"interval_coverage_deviation",
"overprediction",
"underprediction",
"quantile_coverage",
"quantile_coverage_deviation",
"LogS",
"log_score",
"lower",
Expand Down
2 changes: 1 addition & 1 deletion README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -91,8 +91,8 @@ Forecasts can be easily and quickly scored using the `score()` function. `score(
example_quantile %>%
set_forecast_unit(c("location", "target_end_date", "target_type", "horizon", "model")) %>%
validate() %>%
add_coverage() %>%
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if its in the default metrics list what does this actually do? Is it going to be clear to users why it isn't part of score?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating the readme is on the list: #439

score() %>%
add_coverage(ranges = c(50, 90), by = c("model", "target_type")) %>%
summarise_scores(
by = c("model", "target_type")
) %>%
Expand Down
55 changes: 31 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,8 +129,8 @@ details. Finally we summarise these scores by model and target type.
example_quantile %>%
set_forecast_unit(c("location", "target_end_date", "target_type", "horizon", "model")) %>%
validate() %>%
add_coverage() %>%
score() %>%
add_coverage(ranges = c(50, 90), by = c("model", "target_type")) %>%
summarise_scores(
by = c("model", "target_type")
) %>%
Expand All @@ -144,15 +144,15 @@ example_quantile %>%
kable()
```

| model | target_type | interval_score | dispersion | underprediction | overprediction | coverage_deviation | bias | ae_median | coverage_50 | coverage_90 | relative_skill | scaled_rel_skill |
|:----------------------|:------------|---------------:|-----------:|----------------:|---------------:|-------------------:|--------:|----------:|------------:|------------:|---------------:|-----------------:|
| EuroCOVIDhub-baseline | Cases | 28000 | 4100 | 10000.0 | 14000.0 | -0.110 | 0.0980 | 38000 | 0.33 | 0.82 | 1.30 | 1.6 |
| EuroCOVIDhub-baseline | Deaths | 160 | 91 | 2.1 | 66.0 | 0.120 | 0.3400 | 230 | 0.66 | 1.00 | 2.30 | 3.8 |
| EuroCOVIDhub-ensemble | Cases | 18000 | 3700 | 4200.0 | 10000.0 | -0.098 | -0.0560 | 24000 | 0.39 | 0.80 | 0.82 | 1.0 |
| EuroCOVIDhub-ensemble | Deaths | 41 | 30 | 4.1 | 7.1 | 0.200 | 0.0730 | 53 | 0.88 | 1.00 | 0.60 | 1.0 |
| UMass-MechBayes | Deaths | 53 | 27 | 17.0 | 9.0 | -0.023 | -0.0220 | 78 | 0.46 | 0.88 | 0.75 | 1.3 |
| epiforecasts-EpiNow2 | Cases | 21000 | 5700 | 3300.0 | 12000.0 | -0.067 | -0.0790 | 28000 | 0.47 | 0.79 | 0.95 | 1.2 |
| epiforecasts-EpiNow2 | Deaths | 67 | 32 | 16.0 | 19.0 | -0.043 | -0.0051 | 100 | 0.42 | 0.91 | 0.98 | 1.6 |
| model | target_type | wis | overprediction | underprediction | dispersion | bias | coverage_50 | coverage_90 | coverage_deviation | ae_median | relative_skill | scaled_rel_skill |
|:----------------------|:------------|------:|---------------:|----------------:|-----------:|--------:|------------:|------------:|-------------------:|----------:|---------------:|-----------------:|
seabbs marked this conversation as resolved.
Show resolved Hide resolved
| EuroCOVIDhub-baseline | Cases | 28000 | 14000.0 | 10000.0 | 4100 | 0.0980 | 0.33 | 0.82 | -0.120 | 38000 | 1.30 | 1.6 |
| EuroCOVIDhub-baseline | Deaths | 160 | 66.0 | 2.1 | 91 | 0.3400 | 0.66 | 1.00 | 0.120 | 230 | 2.30 | 3.8 |
| EuroCOVIDhub-ensemble | Cases | 18000 | 10000.0 | 4200.0 | 3700 | -0.0560 | 0.39 | 0.80 | -0.100 | 24000 | 0.82 | 1.0 |
| EuroCOVIDhub-ensemble | Deaths | 41 | 7.1 | 4.1 | 30 | 0.0730 | 0.88 | 1.00 | 0.200 | 53 | 0.60 | 1.0 |
| UMass-MechBayes | Deaths | 53 | 9.0 | 17.0 | 27 | -0.0220 | 0.46 | 0.88 | -0.025 | 78 | 0.75 | 1.3 |
| epiforecasts-EpiNow2 | Cases | 21000 | 12000.0 | 3300.0 | 5700 | -0.0790 | 0.47 | 0.79 | -0.070 | 28000 | 0.95 | 1.2 |
| epiforecasts-EpiNow2 | Deaths | 67 | 19.0 | 16.0 | 32 | -0.0051 | 0.42 | 0.91 | -0.045 | 100 | 0.98 | 1.6 |

`scoringutils` contains additional functionality to transform forecasts,
to summarise scores at different levels, to visualise them, and to
Expand All @@ -174,20 +174,27 @@ example_quantile %>%
score %>%
summarise_scores(by = c("model", "target_type", "scale")) %>%
head()
#> model target_type scale interval_score dispersion
#> 1: EuroCOVIDhub-baseline Cases log 1.169972e+00 0.4373146
#> 2: EuroCOVIDhub-baseline Cases natural 2.209046e+04 4102.5009443
#> 3: EuroCOVIDhub-ensemble Cases log 5.500974e-01 0.1011850
#> 4: EuroCOVIDhub-ensemble Cases natural 1.155071e+04 3663.5245788
#> 5: epiforecasts-EpiNow2 Cases log 6.005778e-01 0.1066329
#> 6: epiforecasts-EpiNow2 Cases natural 1.443844e+04 5664.3779484
#> underprediction overprediction coverage_deviation bias ae_median
#> 1: 3.521964e-01 0.3804607 -0.10940217 0.09726562 1.185905e+00
#> 2: 1.028497e+04 7702.9836957 -0.10940217 0.09726562 3.208048e+04
#> 3: 1.356563e-01 0.3132561 -0.09785326 -0.05640625 7.410484e-01
#> 4: 4.237177e+03 3650.0047554 -0.09785326 -0.05640625 1.770795e+04
#> 5: 1.858699e-01 0.3080750 -0.06660326 -0.07890625 7.656591e-01
#> 6: 3.260356e+03 5513.7058424 -0.06660326 -0.07890625 2.153070e+04
#> model target_type scale wis overprediction
#> 1: EuroCOVIDhub-ensemble Cases natural 11550.70664 3650.004755
#> 2: EuroCOVIDhub-baseline Cases natural 22090.45747 7702.983696
#> 3: epiforecasts-EpiNow2 Cases natural 14438.43943 5513.705842
#> 4: EuroCOVIDhub-ensemble Deaths natural 41.42249 7.138247
#> 5: EuroCOVIDhub-baseline Deaths natural 159.40387 65.899117
#> 6: UMass-MechBayes Deaths natural 52.65195 8.978601
#> underprediction dispersion bias coverage_50 coverage_90
#> 1: 4237.177310 3663.52458 -0.05640625 0.3906250 0.8046875
#> 2: 10284.972826 4102.50094 0.09726562 0.3281250 0.8203125
#> 3: 3260.355639 5664.37795 -0.07890625 0.4687500 0.7890625
#> 4: 4.103261 30.18099 0.07265625 0.8750000 1.0000000
#> 5: 2.098505 91.40625 0.33906250 0.6640625 1.0000000
#> 6: 16.800951 26.87239 -0.02234375 0.4609375 0.8750000
#> coverage_deviation ae_median
#> 1: -0.10230114 17707.95312
#> 2: -0.11437500 32080.48438
#> 3: -0.06963068 21530.69531
#> 4: 0.20380682 53.13281
#> 5: 0.12142045 233.25781
#> 6: -0.02488636 78.47656
```

## Citation
Expand Down
Loading
Loading