Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pyudunits2 #1118

Draft
wants to merge 9 commits into
base: develop
Choose a base branch
from
Draft

Pyudunits2 #1118

wants to merge 9 commits into from

Conversation

ocefpaf
Copy link
Member

@ocefpaf ocefpaf commented Nov 5, 2024

Possible alternative for #1094

TODO:

  • add tests for optional cf-units
  • remove all the workarounds in compliance_checker/cfutil.py (_CCUnits and _units) when pyudunits2 is released.

@ocefpaf
Copy link
Member Author

ocefpaf commented Nov 7, 2024

@pelson Windows failures apart due to the micromamba bug, and the hacks I made to get the dates to work, this is great! Pyudunits2 can replace cf-units with just a few minor tweaks IMO.

@ocefpaf
Copy link
Member Author

ocefpaf commented Nov 7, 2024

This is a bit worrying. The test run time increased a lot with pyudunits2, that may be some of my hacks though.

Develop branch:

python -m pytest -s -rxs -v -k "not integration" compliance_checker  13.12s user 0.30s system 73% cpu 18.220 total

This branch:

python -m pytest -s -rxs -v -k "not integration" compliance_checker  529.36s user 0.62s system 99% cpu 8:54.87 total

Copy link

codecov bot commented Nov 12, 2024

Codecov Report

Attention: Patch coverage is 35.00000% with 39 lines in your changes missing coverage. Please review.

Project coverage is 71.25%. Comparing base (066a826) to head (5165974).
Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
compliance_checker/cfutil.py 31.91% 32 Missing ⚠️
compliance_checker/cf/util.py 42.85% 4 Missing ⚠️
compliance_checker/ioos.py 0.00% 2 Missing ⚠️
compliance_checker/cf/cf_1_6.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           develop    #1118       +/-   ##
============================================
- Coverage    81.91%   71.25%   -10.66%     
============================================
  Files           25       25               
  Lines         5224     5263       +39     
  Branches      1163     1169        +6     
============================================
- Hits          4279     3750      -529     
- Misses         644     1166      +522     
- Partials       301      347       +46     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ocefpaf ocefpaf force-pushed the pyudunits2 branch 6 times, most recently from a5f04cc to 54fbfc8 Compare November 15, 2024 08:07
Copy link

@pelson pelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think things moved forward a bit in pyudunits2 since you made this MR. Would love to know how the performance compares with these changes (keeping in mind that pyudunits2 has not yet been optimised, and there is probably a lot of room for improvement)

@@ -2848,7 +2848,9 @@ def _cell_measures_core(self, ds, var, external_set, variable_template):
valid = False
reasoning.append(conversion_failure_msg)
else:
if not cell_measure_units.is_convertible(Unit(f"m{exponent}")):
if not cell_measure_units.is_convertible(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be interested to measure the performance of this vs for example:

>>> cm_dimensionality = cell_measure_units.dimensionality()
>>> cm_symbolic_dim = {basis_unit._names.symbols[0]: order for basis_unit, order in cm_dimensionality.items()}
>>> cm_symbolic_dim == {'m': 2}
True

(note the private API usage for now - would like to expose symbol in a friendly form ASAP)

Copy link
Member Author

@ocefpaf ocefpaf Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion is quite faster (again, this is poor's man benchmark):

%timeit cm_dimensionality = cell_measure_units.dimensionality(); cm_symbolic_dim = {basis_unit._names.symbols[0]: order for basis_unit, order in cm_dimensionality.items()}; cm_symbolic_dim == {'m': 2}
5.3 μs ± 148 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

%timeit cell_measure_units.is_convertible_to(ut_system.unit("m2"))
1.64 ms ± 24.6 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for measuring! I totally forgot that I actually made this easier:

>>> cell_measure_units.dimensionality() == {'meter': 2}
True

Performance shouldn't be any different.

self.units = self.ut_system.unit(units)
except (SyntaxError, UnresolvableUnitException) as err:
raise ValueError from err
self.definition = self.units.expanded()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would avoid doing this until you have to. On the pyudunits2 side we would cache this within the unit anyway (on first request).

It looks like this isn't actually used presently anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My memory is failing me here now but I believe I added this to create a correspondence with cf_units. However, if the changes above solve the slowdowns I created here with my messy code, I'll probably just ditch cf_units and use only pyudunits2.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@ocefpaf ocefpaf Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some very crude benchmarks points to this as the source of slowdowns.

%timeit ut_system.unit("meters").expanded()
90.6 μs ± 2.45 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

vs

%timeit cf_units.Unit("meter").definition
3.4 μs ± 34.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

I believe all the pieces to return a faster definition, like cf-units does, is already in pyudunits2. I think that all we need is to expose the singular name from ._identifier_references of a unit.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out where it is used!

That check intentionally doesn't support cm as a unit (even though meters is fine), but it does support inch, and any aliases for that (including in). Am I reading that right?

I guess we need to think about what equality of units means - I think m == meters, but m != 100 cm is probably the way to go. We may want some equivalence test (not the same as equality) where m is equivalent to 100 cm, to go alongside the convertible test that we already have.

With such a definition, I would be able to add hashability to the Unit, such that you can make a set of units {unit1, unit2}, and check that the unit you are testing against is in the set (which would handle the possible variations on spellings etc. out of the box)...

valid_units = {unit('meters'), unit('fathoms')}
assert unit('m') in valid_units  # Note the different spelling
assert unit('cm') not in valid_units

I think all of this needs to go into some documentation in pyudunits2 about common patterns.

All of the above doesn't solve the performance difference. The numbers are really useful, so thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants