[Bug]: Parsing error in modifiers/inference parameter (checking in bounds?) #380

saraloo · 2024-11-01T19:28:25Z

Label

bug, gempyor, inference, quick issue

Priority Label

low priority

Describe the bug/issue

There is a parsing error in calibrate/inference_parameter when modifiers (not sure if it's just in the check_in_bound function or also in parsing the whole parameter) are in E-notation. Config works by removing this notation so it's just a parsing error.

To Reproduce

Config: https://github.com/HopkinsIDD/RSV_USA/blob/fb7883eae7e52cb581cfc279a0978cc4b9152c7d/config_rsvnet_2024_1_emcee.yml
With notation where modifier is defined as

    IHRadj_age18to49:
      method: SinglePeriodModifier
      parameter: incidH_age18to49_frac
      subpop: "all"
      period_start_date: 1960-08-31
      period_end_date: 2025-09-01
      value:
        distribution: truncnorm
        mean: 3.645e-05
        sd: 7.29e-06
        a: 1e-10
        b: 0.003645

I get the error:

InferenceParameters: with 11 parameters:
    seir_modifiers: 5 parameters
    outcome_modifiers: 6 parameters

Traceback (most recent call last):
  File "/users/s/l/sloo/flepimop-env/bin/flepimop-calibrate", line 8, in <module>
    sys.exit(calibrate())
  File "/nas/longleaf/home/sloo/.local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/nas/longleaf/home/sloo/.local/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/nas/longleaf/home/sloo/.local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/nas/longleaf/home/sloo/.local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/work/users/s/l/sloo/flepi_repos/flepiMoP/flepimop/gempyor_pkg/src/gempyor/calibrate.py", line 182, in calibrate
    assert gempyor_inference.inferpar.check_in_bound(
  File "/work/users/s/l/sloo/flepi_repos/flepiMoP/flepimop/gempyor_pkg/src/gempyor/inference_parameter.py", line 159, in check_in_bound
    if self.hit_lbs(proposal=proposal).any() or self.hit_ubs(proposal=proposal).any():
  File "/work/users/s/l/sloo/flepi_repos/flepiMoP/flepimop/gempyor_pkg/src/gempyor/inference_parameter.py", line 164, in hit_lbs
    return np.array((proposal < self.lbs))
numpy._core._exceptions._UFuncNoLoopError: ufunc 'less' did not contain a loop with signature matching types (<class 'numpy.dtypes.Float64DType'>, <class 'numpy.dtypes.StrDType'>) -> None

Environment, if relevant

No response

The text was updated successfully, but these errors were encountered:

emprzy · 2024-11-08T19:51:58Z

@saraloo , just confirming that your ideal solution to this problem involves making it such that check_in_bound (and other potentially implicated functions) has the functionality to accept modifiers that are written in scientific notation?

saraloo · 2024-11-08T21:12:07Z

Thanks emily - yeah that would be ideal! I think all the other value parsing accepts this notation. Thanks :)

emprzy · 2024-11-08T21:18:23Z

Not sure if this is common knowledge or not, but I'm just going to delineate my diagnosis of the issue here:

YAML files will correctly parse values input with E/scientific notation as floats, as long as there is a . (decimal) in the mantissa AND a sign (+ or -) in the exponent. Otherwise, YAML will parse that value as a str. Example:

1e-10, 1.0e10, 1e10 will all evaluate as str
1.0e-10 and 1.0e+10 will correctly evaluate as float

Therefore, the problematic line in the config you provided is the a: 1e-10 (it's evaluating as a str), but the other values should present no issues, because they adhere to the conventions that YAML is set up to read.

So, is it wrong to suggest that we should just stick to writing our modifiers with conventions that YAML parses correctly? Or is it important to you that we be able to write in E notation in whatever format?

emprzy · 2024-11-08T21:22:05Z

That is, all you would have to change to make this config work is add .0 after the 1 in the value for a.

a: 1.0e-10 would work (I think!!! 🤞 )

pearsonca · 2024-11-08T21:29:04Z

So, is it wrong to suggest that we should just stick to writing our modifiers with conventions that YAML parses correctly? Or is it important to you that we be able to write in E notation in whatever format?

At the very least, the parsing needs to clearly state the problem for users - just saying "know how to do it" is a bit harsh, as is the above error spam (I'm guessing numpy._core._exceptions._UFuncNoLoopError: ufunc 'less' did not contain a loop with signature matching types (<class 'numpy.dtypes.Float64DType'>, <class 'numpy.dtypes.StrDType'>) -> None is expressing the same insight you've had about the spec, but wouldn't have made that connection without you highlighting the YAML standard).

The fix here could be either:

clear error (user, you gave me X, but I can't translate to a number)
translation (if no ., insert .0 before e; if no +/-, insert + after e)

Feels like second solution is likely same amount of work as first, and is nicer?

Also, does it handle E vs e?

saraloo · 2024-11-08T21:31:03Z

Ah interesting, thanks this is useful information. The yaml files were automatically generated using my config writer in R so this is just how R writes them. I managed to get around it by forcing R to print the full decimal by setting the option options(scipen=999) .

But i think a translation if possible would be helpful to avoid this, thanks Carl for the input.

emprzy · 2024-11-08T21:39:38Z

Okay, I can work at implementing a translation feature (my knee-jerk inclination is to insert an if-else loop into inference_parameter.py somewhere, to catch instances of str values, but if that isn't a good route to the solution then I can pivot).

And yes, E is an equivalent (and therefore parse-able) substitution for e.

emprzy · 2024-11-18T19:59:15Z

@saraloo , I have another clarification question about how you would like the solution to this problem to look. I have written code for two different forms of solutions:

An if loop that corrects un-parseable instances of scientific notation within inference_parameter.py after they have been loaded from the config (this code takes in the scientific notation value that is a str, inserts a decimal and a sign when necessary, then spits out the corrected value as a float)
A function that would parse through a config and appropriately alter all un-parseable instances of scientific notation. This version of the solution, obviously, edits the config (which I'm not sure if that's something we want?), and would likely exist outside of inference_parameter.py, perhaps in another file that is used for checking validity of configs.

Let me know what you think!

saraloo · 2024-11-18T20:38:32Z

Thanks emily. I think it's fine for it to edit the config - as long as it seems to be able to cover all the cases and you've checked that the parsing is right. Either of these work, would be good to implement both i think. So 2 would check things are right in a check or something similar, and then in case anything gets through or if a check isn't done, then 1 should cover other cases.

emprzy · 2024-11-18T20:46:36Z

Okay lovely. I will implement both solutions more explicitly and then we can take a closer look. I think perhaps it would be not useful/would be redundant to implement both solutions (because they would be checking for the same type of specific error, not just general parsing errors in config values), but I'm not averse to doing so!

TimothyWillard · 2024-11-18T21:54:55Z

A function that would parse through a config and appropriately alter all un-parseable instances of scientific notation. This version of the solution, obviously, edits the config (which I'm not sure if that's something we want?), and would likely exist outside of inference_parameter.py, perhaps in another file that is used for checking validity of configs.

FWIW I'm not a big fan of this solution. I think there could be unexpected consequences to doing this. A third solution I'll offer, that I don't like either per se but works well here, is adding a as_scientific_number (or whatever name is best) method to the confuse.ConfigView class that tries to cast as a float first, falls back to parsing as scientific notation, and then fails if it can't parse that. We already do this here as an example:

flepiMoP/flepimop/gempyor_pkg/src/gempyor/utils.py

Lines 332 to 365 in f680be3

    
           @add_method(confuse.ConfigView) 
        
           def as_date(self) -> datetime.date: 
        
               """ 
        
               Evaluates an datetime.date or ISO8601 date string. 
        
               Returns: 
        
                   A datetime.date data type of the date associated with the object. 
        
               """ 
        
               return self.get(ISO8601Date()) 
        
           @add_method(confuse.ConfigView) 
        
           def as_evaled_expression(self): 
        
               """ 
        
               Evaluates an expression string, returning a float. 
        
               Returns: 
        
                   A float data type of the value associated with the object. 
        
               Raises: 
        
                   ValueError: On parsing errors. 
        
               """ 
        
               value = self.get() 
        
               if isinstance(value, numbers.Number): 
        
                   return value 
        
               elif isinstance(value, str): 
        
                   try: 
        
                       return float(sympy.parsing.sympy_parser.parse_expr(value)) 
        
                   except TypeError as e: 
        
                       raise ValueError(e) from e 
        
               else: 
        
                   raise ValueError(f"expected numeric or string expression [got: {value}]")

We would have to reimplement this scientific number parsing with GH-254 though if we went with this approach.

emprzy · 2024-11-20T14:55:49Z

@TimothyWillard What do you mean by reimplement this scientific number parsing?

TimothyWillard · 2024-11-20T16:49:23Z

@TimothyWillard What do you mean by reimplement this scientific number parsing?

We would have to reimplement said scientific number parsing extension in pydantic with GH-254 when we take that up since my suggestion was confuse specific.

jcblemai · 2024-11-20T19:34:13Z

A thing that might just work without doing anything is to force the cast as a float from confuse instead of the generic .get() in interence_parameters.py (see below). I think that might do the trick because it's not the first time we used scientific notation and it's the first time we have this error, so possibly it's just this code is not casting enough these things.

                     self.add_single_parameter(
                            ptype=ptype,
                            pname=pname,
                            subpop=sp,
                            pdist=parameter_config["value"].as_random_distribution(),
                            lb=parameter_config["value"]["a"].get(),
                            ub=parameter_config["value"]["b"].get(),
                        )

TimothyWillard added bug Defects or errors in the code. gempyor Concerns the Python core. low priority Low priority. quick issue Short or easy fix. inference Concerns the parameter inference framework. labels Nov 4, 2024

emprzy self-assigned this Nov 13, 2024

TimothyWillard added this to the Stabilize Configuration/Inputs Format milestone Nov 15, 2024

emprzy linked a pull request Nov 22, 2024 that will close this issue

Casting bound values as float in inference_parameter.py to prevent parsing errors #405

Draft

TimothyWillard linked a pull request Nov 25, 2024 that will close this issue

Casting bound values as float in inference_parameter.py to prevent parsing errors #405

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Parsing error in modifiers/inference parameter (checking in bounds?) #380

[Bug]: Parsing error in modifiers/inference parameter (checking in bounds?) #380

saraloo commented Nov 1, 2024

emprzy commented Nov 8, 2024

saraloo commented Nov 8, 2024

emprzy commented Nov 8, 2024 •

edited

Loading

emprzy commented Nov 8, 2024 •

edited

Loading

pearsonca commented Nov 8, 2024

saraloo commented Nov 8, 2024

emprzy commented Nov 8, 2024

emprzy commented Nov 18, 2024 •

edited

Loading

saraloo commented Nov 18, 2024

emprzy commented Nov 18, 2024

TimothyWillard commented Nov 18, 2024

emprzy commented Nov 20, 2024

TimothyWillard commented Nov 20, 2024

jcblemai commented Nov 20, 2024

[Bug]: Parsing error in modifiers/inference parameter (checking in bounds?) #380

[Bug]: Parsing error in modifiers/inference parameter (checking in bounds?) #380

Comments

saraloo commented Nov 1, 2024

Label

Priority Label

Describe the bug/issue

To Reproduce

Environment, if relevant

emprzy commented Nov 8, 2024

saraloo commented Nov 8, 2024

emprzy commented Nov 8, 2024 • edited Loading

emprzy commented Nov 8, 2024 • edited Loading

pearsonca commented Nov 8, 2024

saraloo commented Nov 8, 2024

emprzy commented Nov 8, 2024

emprzy commented Nov 18, 2024 • edited Loading

saraloo commented Nov 18, 2024

emprzy commented Nov 18, 2024

TimothyWillard commented Nov 18, 2024

emprzy commented Nov 20, 2024

TimothyWillard commented Nov 20, 2024

jcblemai commented Nov 20, 2024

emprzy commented Nov 8, 2024 •

edited

Loading

emprzy commented Nov 8, 2024 •

edited

Loading

emprzy commented Nov 18, 2024 •

edited

Loading