Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Parsing error in modifiers/inference parameter (checking in bounds?) #380

Open
saraloo opened this issue Nov 1, 2024 · 14 comments · May be fixed by #405
Open

[Bug]: Parsing error in modifiers/inference parameter (checking in bounds?) #380

saraloo opened this issue Nov 1, 2024 · 14 comments · May be fixed by #405
Assignees
Labels
bug Defects or errors in the code. gempyor Concerns the Python core. inference Concerns the parameter inference framework. low priority Low priority. quick issue Short or easy fix.

Comments

@saraloo
Copy link
Contributor

saraloo commented Nov 1, 2024

Label

bug, gempyor, inference, quick issue

Priority Label

low priority

Describe the bug/issue

There is a parsing error in calibrate/inference_parameter when modifiers (not sure if it's just in the check_in_bound function or also in parsing the whole parameter) are in E-notation. Config works by removing this notation so it's just a parsing error.

To Reproduce

Config: https://github.com/HopkinsIDD/RSV_USA/blob/fb7883eae7e52cb581cfc279a0978cc4b9152c7d/config_rsvnet_2024_1_emcee.yml
With notation where modifier is defined as

    IHRadj_age18to49:
      method: SinglePeriodModifier
      parameter: incidH_age18to49_frac
      subpop: "all"
      period_start_date: 1960-08-31
      period_end_date: 2025-09-01
      value:
        distribution: truncnorm
        mean: 3.645e-05
        sd: 7.29e-06
        a: 1e-10
        b: 0.003645

I get the error:

InferenceParameters: with 11 parameters:
    seir_modifiers: 5 parameters
    outcome_modifiers: 6 parameters

Traceback (most recent call last):
  File "/users/s/l/sloo/flepimop-env/bin/flepimop-calibrate", line 8, in <module>
    sys.exit(calibrate())
  File "/nas/longleaf/home/sloo/.local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/nas/longleaf/home/sloo/.local/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/nas/longleaf/home/sloo/.local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/nas/longleaf/home/sloo/.local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/work/users/s/l/sloo/flepi_repos/flepiMoP/flepimop/gempyor_pkg/src/gempyor/calibrate.py", line 182, in calibrate
    assert gempyor_inference.inferpar.check_in_bound(
  File "/work/users/s/l/sloo/flepi_repos/flepiMoP/flepimop/gempyor_pkg/src/gempyor/inference_parameter.py", line 159, in check_in_bound
    if self.hit_lbs(proposal=proposal).any() or self.hit_ubs(proposal=proposal).any():
  File "/work/users/s/l/sloo/flepi_repos/flepiMoP/flepimop/gempyor_pkg/src/gempyor/inference_parameter.py", line 164, in hit_lbs
    return np.array((proposal < self.lbs))
numpy._core._exceptions._UFuncNoLoopError: ufunc 'less' did not contain a loop with signature matching types (<class 'numpy.dtypes.Float64DType'>, <class 'numpy.dtypes.StrDType'>) -> None

Environment, if relevant

No response

@TimothyWillard TimothyWillard added bug Defects or errors in the code. gempyor Concerns the Python core. low priority Low priority. quick issue Short or easy fix. inference Concerns the parameter inference framework. labels Nov 4, 2024
@emprzy
Copy link
Collaborator

emprzy commented Nov 8, 2024

@saraloo , just confirming that your ideal solution to this problem involves making it such that check_in_bound (and other potentially implicated functions) has the functionality to accept modifiers that are written in scientific notation?

@saraloo
Copy link
Contributor Author

saraloo commented Nov 8, 2024

Thanks emily - yeah that would be ideal! I think all the other value parsing accepts this notation. Thanks :)

@emprzy
Copy link
Collaborator

emprzy commented Nov 8, 2024

Not sure if this is common knowledge or not, but I'm just going to delineate my diagnosis of the issue here:

YAML files will correctly parse values input with E/scientific notation as floats, as long as there is a . (decimal) in the mantissa AND a sign (+ or -) in the exponent. Otherwise, YAML will parse that value as a str. Example:

  • 1e-10, 1.0e10, 1e10 will all evaluate as str
  • 1.0e-10 and 1.0e+10 will correctly evaluate as float

Therefore, the problematic line in the config you provided is the a: 1e-10 (it's evaluating as a str), but the other values should present no issues, because they adhere to the conventions that YAML is set up to read.

So, is it wrong to suggest that we should just stick to writing our modifiers with conventions that YAML parses correctly? Or is it important to you that we be able to write in E notation in whatever format?

@emprzy
Copy link
Collaborator

emprzy commented Nov 8, 2024

That is, all you would have to change to make this config work is add .0 after the 1 in the value for a.

a: 1.0e-10 would work (I think!!! 🤞 )

@pearsonca
Copy link
Contributor

So, is it wrong to suggest that we should just stick to writing our modifiers with conventions that YAML parses correctly? Or is it important to you that we be able to write in E notation in whatever format?

At the very least, the parsing needs to clearly state the problem for users - just saying "know how to do it" is a bit harsh, as is the above error spam (I'm guessing numpy._core._exceptions._UFuncNoLoopError: ufunc 'less' did not contain a loop with signature matching types (<class 'numpy.dtypes.Float64DType'>, <class 'numpy.dtypes.StrDType'>) -> None is expressing the same insight you've had about the spec, but wouldn't have made that connection without you highlighting the YAML standard).

The fix here could be either:

  • clear error (user, you gave me X, but I can't translate to a number)
  • translation (if no ., insert .0 before e; if no +/-, insert + after e)

Feels like second solution is likely same amount of work as first, and is nicer?

Also, does it handle E vs e?

@saraloo
Copy link
Contributor Author

saraloo commented Nov 8, 2024

Ah interesting, thanks this is useful information. The yaml files were automatically generated using my config writer in R so this is just how R writes them. I managed to get around it by forcing R to print the full decimal by setting the option options(scipen=999) .

But i think a translation if possible would be helpful to avoid this, thanks Carl for the input.

@emprzy
Copy link
Collaborator

emprzy commented Nov 8, 2024

Okay, I can work at implementing a translation feature (my knee-jerk inclination is to insert an if-else loop into inference_parameter.py somewhere, to catch instances of str values, but if that isn't a good route to the solution then I can pivot).

And yes, E is an equivalent (and therefore parse-able) substitution for e.

@emprzy
Copy link
Collaborator

emprzy commented Nov 18, 2024

@saraloo , I have another clarification question about how you would like the solution to this problem to look. I have written code for two different forms of solutions:

  1. An if loop that corrects un-parseable instances of scientific notation within inference_parameter.py after they have been loaded from the config (this code takes in the scientific notation value that is a str, inserts a decimal and a sign when necessary, then spits out the corrected value as a float)
  2. A function that would parse through a config and appropriately alter all un-parseable instances of scientific notation. This version of the solution, obviously, edits the config (which I'm not sure if that's something we want?), and would likely exist outside of inference_parameter.py, perhaps in another file that is used for checking validity of configs.

Let me know what you think!

@saraloo
Copy link
Contributor Author

saraloo commented Nov 18, 2024

Thanks emily. I think it's fine for it to edit the config - as long as it seems to be able to cover all the cases and you've checked that the parsing is right. Either of these work, would be good to implement both i think. So 2 would check things are right in a check or something similar, and then in case anything gets through or if a check isn't done, then 1 should cover other cases.

@emprzy
Copy link
Collaborator

emprzy commented Nov 18, 2024

Okay lovely. I will implement both solutions more explicitly and then we can take a closer look. I think perhaps it would be not useful/would be redundant to implement both solutions (because they would be checking for the same type of specific error, not just general parsing errors in config values), but I'm not averse to doing so!

@TimothyWillard
Copy link
Contributor

A function that would parse through a config and appropriately alter all un-parseable instances of scientific notation. This version of the solution, obviously, edits the config (which I'm not sure if that's something we want?), and would likely exist outside of inference_parameter.py, perhaps in another file that is used for checking validity of configs.

FWIW I'm not a big fan of this solution. I think there could be unexpected consequences to doing this. A third solution I'll offer, that I don't like either per se but works well here, is adding a as_scientific_number (or whatever name is best) method to the confuse.ConfigView class that tries to cast as a float first, falls back to parsing as scientific notation, and then fails if it can't parse that. We already do this here as an example:

@add_method(confuse.ConfigView)
def as_date(self) -> datetime.date:
"""
Evaluates an datetime.date or ISO8601 date string.
Returns:
A datetime.date data type of the date associated with the object.
"""
return self.get(ISO8601Date())
@add_method(confuse.ConfigView)
def as_evaled_expression(self):
"""
Evaluates an expression string, returning a float.
Returns:
A float data type of the value associated with the object.
Raises:
ValueError: On parsing errors.
"""
value = self.get()
if isinstance(value, numbers.Number):
return value
elif isinstance(value, str):
try:
return float(sympy.parsing.sympy_parser.parse_expr(value))
except TypeError as e:
raise ValueError(e) from e
else:
raise ValueError(f"expected numeric or string expression [got: {value}]")

We would have to reimplement this scientific number parsing with GH-254 though if we went with this approach.

@emprzy
Copy link
Collaborator

emprzy commented Nov 20, 2024

@TimothyWillard What do you mean by reimplement this scientific number parsing?

@TimothyWillard
Copy link
Contributor

@TimothyWillard What do you mean by reimplement this scientific number parsing?

We would have to reimplement said scientific number parsing extension in pydantic with GH-254 when we take that up since my suggestion was confuse specific.

@jcblemai
Copy link
Collaborator

A thing that might just work without doing anything is to force the cast as a float from confuse instead of the generic .get() in interence_parameters.py (see below). I think that might do the trick because it's not the first time we used scientific notation and it's the first time we have this error, so possibly it's just this code is not casting enough these things.

                     self.add_single_parameter(
                            ptype=ptype,
                            pname=pname,
                            subpop=sp,
                            pdist=parameter_config["value"].as_random_distribution(),
                            lb=parameter_config["value"]["a"].get(),
                            ub=parameter_config["value"]["b"].get(),
                        )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Defects or errors in the code. gempyor Concerns the Python core. inference Concerns the parameter inference framework. low priority Low priority. quick issue Short or easy fix.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants