-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pyudunits2 #1118
base: develop
Are you sure you want to change the base?
Pyudunits2 #1118
Conversation
@pelson Windows failures apart due to the micromamba bug, and the hacks I made to get the dates to work, this is great! Pyudunits2 can replace cf-units with just a few minor tweaks IMO. |
This is a bit worrying. The test run time increased a lot with Develop branch:
This branch:
|
c4a1619
to
85723cb
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1118 +/- ##
============================================
- Coverage 81.91% 71.25% -10.66%
============================================
Files 25 25
Lines 5224 5263 +39
Branches 1163 1169 +6
============================================
- Hits 4279 3750 -529
- Misses 644 1166 +522
- Partials 301 347 +46 ☔ View full report in Codecov by Sentry. |
a5f04cc
to
54fbfc8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think things moved forward a bit in pyudunits2 since you made this MR. Would love to know how the performance compares with these changes (keeping in mind that pyudunits2
has not yet been optimised, and there is probably a lot of room for improvement)
@@ -2848,7 +2848,9 @@ def _cell_measures_core(self, ds, var, external_set, variable_template): | |||
valid = False | |||
reasoning.append(conversion_failure_msg) | |||
else: | |||
if not cell_measure_units.is_convertible(Unit(f"m{exponent}")): | |||
if not cell_measure_units.is_convertible( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be interested to measure the performance of this vs for example:
>>> cm_dimensionality = cell_measure_units.dimensionality()
>>> cm_symbolic_dim = {basis_unit._names.symbols[0]: order for basis_unit, order in cm_dimensionality.items()}
>>> cm_symbolic_dim == {'m': 2}
True
(note the private API usage for now - would like to expose symbol in a friendly form ASAP)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your suggestion is quite faster (again, this is poor's man benchmark):
%timeit cm_dimensionality = cell_measure_units.dimensionality(); cm_symbolic_dim = {basis_unit._names.symbols[0]: order for basis_unit, order in cm_dimensionality.items()}; cm_symbolic_dim == {'m': 2}
5.3 μs ± 148 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
%timeit cell_measure_units.is_convertible_to(ut_system.unit("m2"))
1.64 ms ± 24.6 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for measuring! I totally forgot that I actually made this easier:
>>> cell_measure_units.dimensionality() == {'meter': 2}
True
Performance shouldn't be any different.
self.units = self.ut_system.unit(units) | ||
except (SyntaxError, UnresolvableUnitException) as err: | ||
raise ValueError from err | ||
self.definition = self.units.expanded() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would avoid doing this until you have to. On the pyudunits2
side we would cache this within the unit anyway (on first request).
It looks like this isn't actually used presently anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My memory is failing me here now but I believe I added this to create a correspondence with cf_units. However, if the changes above solve the slowdowns I created here with my messy code, I'll probably just ditch cf_units and use only pyudunits2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some very crude benchmarks points to this as the source of slowdowns.
%timeit ut_system.unit("meters").expanded()
90.6 μs ± 2.45 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
vs
%timeit cf_units.Unit("meter").definition
3.4 μs ± 34.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
I believe all the pieces to return a faster definition
, like cf-units does, is already in pyudunits2. I think that all we need is to expose the singular name from ._identifier_references
of a unit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out where it is used!
That check intentionally doesn't support cm
as a unit (even though meters
is fine), but it does support inch
, and any aliases for that (including in
). Am I reading that right?
I guess we need to think about what equality of units means - I think m == meters
, but m != 100 cm
is probably the way to go. We may want some equivalence test (not the same as equality) where m is equivalent to 100 cm
, to go alongside the convertible test that we already have.
With such a definition, I would be able to add hashability to the Unit
, such that you can make a set of units {unit1, unit2}
, and check that the unit you are testing against is in the set (which would handle the possible variations on spellings etc. out of the box)...
valid_units = {unit('meters'), unit('fathoms')}
assert unit('m') in valid_units # Note the different spelling
assert unit('cm') not in valid_units
I think all of this needs to go into some documentation in pyudunits2
about common patterns.
All of the above doesn't solve the performance difference. The numbers are really useful, so thank you!
Possible alternative for #1094
TODO:
compliance_checker/cfutil.py
(_CCUnits
and_units
) whenpyudunits2
is released.