Skip to content

Commit

Permalink
Merge pull request #641 from epiforecasts/update-as_forecast()
Browse files Browse the repository at this point in the history
Issue #585: allow users to specify columns and forecast unit in `as_forecast()`
  • Loading branch information
nikosbosse authored Feb 26, 2024
2 parents 4d3f003 + 23b708e commit 66f139f
Show file tree
Hide file tree
Showing 6 changed files with 220 additions and 29 deletions.
2 changes: 1 addition & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ The update introduces breaking changes. If you want to keep using the older vers

## Package updates
- In `score()`, required columns "true_value" and "prediction" were renamed and replaced by required columns "observed" and "predicted". Scoring functions now also use the function arguments "observed" and "predicted" everywhere consistently.
- The overall scoring workflow was updated. `score()` is now a generic function that dispatches the correct method based on the forecast type. forecast types currently supported are "binary", "point", "sample" and "quantile" with corresponding classes "forecast_binary", "forecast_point", "forecast_sample" and "forecast_quantile". An object of class `forecast_*` can be created using the function `as_forecast()`, which also replaces the previous function `check_forecasts()` (see more information below).
- The overall scoring workflow was updated. `score()` is now a generic function that dispatches the correct method based on the forecast type. forecast types currently supported are "binary", "point", "sample" and "quantile" with corresponding classes "forecast_binary", "forecast_point", "forecast_sample" and "forecast_quantile". An object of class `forecast_*` can be created using the function `as_forecast()`, which also replaces the previous function `check_forecasts()` (see more information below). The function also allows users to rename required columns and specify the forecast unit in a single step, taking over the functionality of `set_forecast_unit()` in most cases.
- Scoring rules (functions used for scoring) received a consistent interface and input checks:
- Scoring rules for binary forecasts:
- `observed`: factor with exactly 2 levels
Expand Down
111 changes: 103 additions & 8 deletions R/validate.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,23 @@
#' @description Convert a data.frame or similar of forecasts into an object of
#' class `forecast_*` and validate it.
#'
#' `as_forecast()` determines the forecast type (binary, point, sample-based or
#' `as_forecast()`
#' - allows users to specify the current names of the columns that correspond
#' to the columns required by `scoringutils` (`observed`, `predicted`,
#' `model`, as well `quantile_level` for quantile-based forecasts and
#' `sample_id` for sample-based forecasts). `as_forecast()` renames the
#' existing columns.
#' - allows users to specify the unit of a single forecast. It removes all
#' columns that are neither part of the forecast unit nor a required column
#' (see [set_forecast_unit()] for details)
#' - Determines the forecast type (binary, point, sample-based or
#' quantile-based) from the input data (using the function
#' [get_forecast_type()]. It then constructs an object of the
#' appropriate class (`forecast_binary`, `forecast_point`, `forecast_sample`, or
#' [get_forecast_type()].
#' - Constructs a forecast object of the appropriate class
#' (`forecast_binary`, `forecast_point`, `forecast_sample`, or
#' `forecast_quantile`, using the function [new_forecast()]).
#' Lastly, it calls [as_forecast()] on the object to make sure it conforms with
#' the required input formats.
#' - Calls [validate_forecast()] on the newly created forecast object to
#' validate it
#' @inheritParams score
#' @inheritSection forecast_types Forecast types and input format
#' @return Depending on the forecast type, an object of class
Expand All @@ -18,19 +28,104 @@
#' @keywords check-forecasts
#' @examples
#' as_forecast(example_binary)
#' as_forecast(example_quantile)
as_forecast <- function(data, ...) {
#' as_forecast(
#' example_quantile,
#' forecast_unit = c("model", "target_type", "target_end_date",
#' "horizon", "location")
#' )
as_forecast <- function(data,
...) {
UseMethod("as_forecast")
}

#' @rdname as_forecast
#' @param forecast_unit (optional) Name of the columns in `data` (after
#' any renaming of columns done by `as_forecast()`) that denote the unit of a
#' single forecast. See [get_forecast_unit()] for details.
#' If `NULL` (the default), all columns that are not required columns are
#' assumed to form the unit of a single forecast. If specified, all columns
#' that are not part of the forecast unit (or required columns) will be removed.
#' @param forecast_type (optional) The forecast type you expect the forecasts
#' to have. If the forecast type as determined by `scoringutils` based on the
#' input does not match this, an error will be thrown. If `NULL` (the default),
#' the forecast type will be inferred from the data.
#' @param observed (optional) Name of the column in `data` that contains the
#' observed values. This column will be renamed to "observed".
#' @param predicted (optional) Name of the column in `data` that contains the
#' predicted values. This column will be renamed to "predicted".
#' @param model (optional) Name of the column in `data` that contains the names
#' of the models/forecasters that generated the predicted values.
#' This column will be renamed to "model".
#' @param quantile_level (optional) Name of the column in `data` that contains
#' the quantile level of the predicted values. This column will be renamed to
#' "quantile_level". Only applicable to quantile-based forecasts.
#' @param sample_id (optional) Name of the column in `data` that contains the
#' sample id. This column will be renamed to "sample_id". Only applicable to
#' sample-based forecasts.
#' @export
as_forecast.default <- function(data, ...) {
as_forecast.default <- function(data,
forecast_unit = NULL,
forecast_type = NULL,
observed = NULL,
predicted = NULL,
model = NULL,
quantile_level = NULL,
sample_id = NULL,
...) {
# check inputs
data <- ensure_data.table(data)
assert_character(observed, len = 1, null.ok = TRUE)
assert_subset(observed, names(data), empty.ok = TRUE)

assert_character(predicted, len = 1, null.ok = TRUE)
assert_subset(predicted, names(data), empty.ok = TRUE)

assert_character(model, len = 1, null.ok = TRUE)
assert_subset(model, names(data), empty.ok = TRUE)

assert_character(quantile_level, len = 1, null.ok = TRUE)
assert_subset(quantile_level, names(data), empty.ok = TRUE)

assert_character(sample_id, len = 1, null.ok = TRUE)
assert_subset(sample_id, names(data), empty.ok = TRUE)

# rename columns
if (!is.null(observed)) {
setnames(data, old = observed, new = "observed")
}
if (!is.null(predicted)) {
setnames(data, old = predicted, new = "predicted")
}
if (!is.null(model)) {
setnames(data, old = model, new = "model")
}
if (!is.null(quantile_level)) {
setnames(data, old = quantile_level, new = "quantile_level")
}
if (!is.null(sample_id)) {
setnames(data, old = sample_id, new = "sample_id")
}

# assert that the correct column names are present after renaming
assert(check_data_columns(data))

# set forecast unit (error handling is done in `set_forecast_unit()`)
if (!is.null(forecast_unit)) {
data <- set_forecast_unit(data, forecast_unit)
}

# find forecast type
desired <- forecast_type
forecast_type <- get_forecast_type(data)

if (!is.null(desired) && desired != forecast_type) {
stop(
"Forecast type determined by scoringutils based on input: `",
forecast_type,
"`. Desired forecast type: `", desired, "`."
)
}

# construct class
data <- new_forecast(data, paste0("forecast_", forecast_type))

Expand Down
8 changes: 5 additions & 3 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -120,12 +120,14 @@ example_quantile %>%

### Scoring forecasts

Forecasts can be easily and quickly scored using the `score()` function. `score()` automatically tries to determine the `forecast_unit`, i.e. the set of columns that uniquely defines a single forecast, by taking all column names of the data into account. However, it is recommended to set the forecast unit manually using `set_forecast_unit()` as this may help to avoid errors, especially when scoringutils is used in automated pipelines. The function `set_forecast_unit()` will simply drop unneeded columns. To verify everything is in order, the function `validate_forecast()` should be used. The result of that check can then passed directly into `score()`. `score()` returns unsummarised scores, which in most cases is not what the user wants. Here we make use of additional functions from `scoringutils` to add empirical coverage-levels (`add_coverage()`), and scores relative to a baseline model (here chosen to be the EuroCOVIDhub-ensemble model). See the getting started vignette for more details. Finally we summarise these scores by model and target type.
Forecasts can be easily and quickly scored using the `score()` function. `score()` automatically tries to determine the `forecast_unit`, i.e. the set of columns that uniquely defines a single forecast, by taking all column names of the data into account. However, it is recommended to set the forecast unit manually by specifying the "forecast_unit" argument in `as_forecast()` as this may help to avoid errors. This will drop all columns that are neither part of the forecast unit nor part of the columns internally used by `scoringutils`. The function `as_forecast()` processes and validates the inputs.
`score()` returns unsummarised scores, which in most cases is not what the user wants. Here we make use of additional functions from `scoringutils` to add empirical coverage-levels (`add_coverage()`), and scores relative to a baseline model (here chosen to be the EuroCOVIDhub-ensemble model). See the getting started vignette for more details. Finally we summarise these scores by model and target type.

```{r score-example}
example_quantile %>%
set_forecast_unit(c("location", "target_end_date", "target_type", "horizon", "model")) %>%
as_forecast() %>%
as_forecast(forecast_unit = c(
"location", "target_end_date", "target_type", "horizon", "model"
)) %>%
add_coverage() %>%
score() %>%
add_pairwise_comparison(
Expand Down
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,23 +134,23 @@ Forecasts can be easily and quickly scored using the `score()` function.
`score()` automatically tries to determine the `forecast_unit`, i.e. the
set of columns that uniquely defines a single forecast, by taking all
column names of the data into account. However, it is recommended to set
the forecast unit manually using `set_forecast_unit()` as this may help
to avoid errors, especially when scoringutils is used in automated
pipelines. The function `set_forecast_unit()` will simply drop unneeded
columns. To verify everything is in order, the function
`validate_forecast()` should be used. The result of that check can then
passed directly into `score()`. `score()` returns unsummarised scores,
which in most cases is not what the user wants. Here we make use of
additional functions from `scoringutils` to add empirical
the forecast unit manually by specifying the “forecast_unit” argument in
`as_forecast()` as this may help to avoid errors. This will drop all
columns that are neither part of the forecast unit nor part of the
columns internally used by `scoringutils`. The function `as_forecast()`
processes and validates the inputs. `score()` returns unsummarised
scores, which in most cases is not what the user wants. Here we make use
of additional functions from `scoringutils` to add empirical
coverage-levels (`add_coverage()`), and scores relative to a baseline
model (here chosen to be the EuroCOVIDhub-ensemble model). See the
getting started vignette for more details. Finally we summarise these
scores by model and target type.

``` r
example_quantile %>%
set_forecast_unit(c("location", "target_end_date", "target_type", "horizon", "model")) %>%
as_forecast() %>%
as_forecast(forecast_unit = c(
"location", "target_end_date", "target_type", "horizon", "model"
)) %>%
add_coverage() %>%
score() %>%
add_pairwise_comparison(
Expand Down
70 changes: 63 additions & 7 deletions man/as_forecast.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

38 changes: 38 additions & 0 deletions tests/testthat/test-as_forecast.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,44 @@ test_that("Running `as_forecast()` twice returns the same object", {
)
})

test_that("as_forecast() works as expected", {
test <- na.omit(data.table::copy(example_quantile))
expect_s3_class(as_forecast(test), "forecast_quantile")

# expect error when arguments are not correct
expect_error(as_forecast(test, observed = 3), "Must be of type 'character'")
expect_error(as_forecast(test, quantile_level = c("1", "2")), "Must have length 1")
expect_error(as_forecast(test, observed = "missing"), "Must be a subset of")

# expect no condition with columns already present
expect_no_condition(
as_forecast(test, observed = "observed", predicted = "predicted",
forecast_unit = c("location", "model", "target_type",
"target_end_date", "horizon"),
quantile_level = "quantile_level")
)

# additional test with renaming the model column
test <- na.omit(data.table::copy(example_continuous))
setnames(test, old = c("observed", "predicted", "sample_id", "model"),
new = c("obs", "pred", "sample", "mod"))
expect_no_condition(
as_forecast(test,
observed = "obs", predicted = "pred", model = "mod",
forecast_unit = c("location", "model", "target_type",
"target_end_date", "horizon"),
sample_id = "sample")
)

# test if desired forecast type does not correspond to inferred one
test <- na.omit(data.table::copy(example_continuous))
expect_error(
as_forecast(test, forecast_type = "quantile"),
"Forecast type determined by scoringutils based on input"
)
})


test_that("is_forecast() works as expected", {
ex_binary <- suppressMessages(as_forecast(example_binary))
ex_point <- suppressMessages(as_forecast(example_point))
Expand Down

0 comments on commit 66f139f

Please sign in to comment.