handling floating point values for output_type_id #59

elray1 · 2023-10-21T22:26:15Z

elray1
Oct 21, 2023
Maintainer

I'm just summing up the results of a mini investigation I did, which puts most of my concerns about our handling of floating point values to rest (i.e., it seems like we do not have the problem I was thinking we might). I did file one issue related to this here, but this comment is just noting that some other aspects of our handling of floating point numbers work well. It might possibly be worth adding some tests related to this though, to ensure that current working functionality continues to work if we make any changes to implementations?

The thing I was worried about was handling of floating point issues in quantile levels in hubValidations (and then later in downstream analyses using hubEnsembles). To investigate, note that in R we get:

> a <- 0.09999999999999998
> a
[1] 0.1
> print(a, digits=22)
[1] 0.09999999999999997779554
> print(as.character(a))
[1] "0.1"
> b <- round(a, 1)
> b
[1] 0.1
> print(b, digits=22)
[1] 0.1000000000000000055511
> print(as.character(b))
[1] "0.1"

Based on this, I was worried that hubValidations checks that work by converting things to characters would accept both of these options for the quantile level (output_type_id) 0.1, and that could have downstream implications, e.g. in hubEnsembles if these were both in the same data frame we’d get the following error if we were ensembling two different model outputs with different floating point representations of the same quantile level:

Error in `validate_output_type_ids()`:
✖ `model_outputs` contains 2 invalid distributions.
ℹ Within each group defined by a combination of task id variables and output type, all models must provide the same set of output
  type ids`

However, all seems to be fine because the arrow cast functionality is more careful than just as.character :

> data.frame(x = c(a, b)) |>
+   arrow::arrow_table() |>
+   mutate(x_str = cast(x, string())) |>
+   as_tibble()
# A tibble: 2 × 2
      x x_str              
  <dbl> <chr>              
1   0.1 0.09999999999999998
2   0.1 0.1

This means that validations should not pass when checking equality of 0.09999999999999997779554 with 0.1 (and I confirmed that they do not pass, as expected). One lingering question is whether the value conversions being used here are fully “safe” across platforms, or if there might potentially be some differences in treatment of this across different computing environments?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handling floating point values for output_type_id #59

{{title}}

Replies: 0 comments

Select a reply

handling floating point values for output_type_id #59

elray1 Oct 21, 2023 Maintainer

Replies: 0 comments

elray1
Oct 21, 2023
Maintainer