Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pyudunits2 #1118

Draft
wants to merge 9 commits into
base: develop
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .github/workflows/default-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@ jobs:
matrix:
python-version: [ "3.10", "3.11", "3.12", "3.13" ]
os: [ windows-latest, ubuntu-latest, macos-latest ]
cf_units: [ false ]
include:
- python-version: "3.10"
os: "ubuntu-latest"
cf_units: true
fail-fast: false
defaults:
run:
Expand All @@ -30,6 +35,11 @@ jobs:
--file test_requirements.txt
--channel conda-forge

- name: Test with cf-units (optional)
if: matrix.cf_units == true
run: >-
micromamba install cf-units --channel conda-forge

- name: Install compliance-checker
run: |
python -m pip install -e . --no-deps --force-reinstall
Expand Down
6 changes: 4 additions & 2 deletions compliance_checker/cf/cf_1_6.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
import cftime
import numpy as np
import regex
from cf_units import Unit

from compliance_checker import cfutil
from compliance_checker.base import BaseCheck, Result, TestCtx
Expand All @@ -18,6 +17,7 @@
grid_mapping_dict16,
)
from compliance_checker.cf.cf_base import CFNCCheck, appendix_a_base
from compliance_checker.cfunits import Unit

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -2848,7 +2848,9 @@ def _cell_measures_core(self, ds, var, external_set, variable_template):
valid = False
reasoning.append(conversion_failure_msg)
else:
if not cell_measure_units.is_convertible(Unit(f"m{exponent}")):
if not cell_measure_units.is_convertible(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be interested to measure the performance of this vs for example:

>>> cm_dimensionality = cell_measure_units.dimensionality()
>>> cm_symbolic_dim = {basis_unit._names.symbols[0]: order for basis_unit, order in cm_dimensionality.items()}
>>> cm_symbolic_dim == {'m': 2}
True

(note the private API usage for now - would like to expose symbol in a friendly form ASAP)

Copy link
Member Author

@ocefpaf ocefpaf Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion is quite faster (again, this is poor's man benchmark):

%timeit cm_dimensionality = cell_measure_units.dimensionality(); cm_symbolic_dim = {basis_unit._names.symbols[0]: order for basis_unit, order in cm_dimensionality.items()}; cm_symbolic_dim == {'m': 2}
5.3 μs ± 148 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

%timeit cell_measure_units.is_convertible_to(ut_system.unit("m2"))
1.64 ms ± 24.6 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for measuring! I totally forgot that I actually made this easier:

>>> cell_measure_units.dimensionality() == {'meter': 2}
True

Performance shouldn't be any different.

Unit(f"m{exponent}"),
):
valid = False
reasoning.append(conversion_failure_msg)
if not set(cell_measure_var.dimensions).issubset(var.dimensions):
Expand Down
2 changes: 1 addition & 1 deletion compliance_checker/cf/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@
from typing import Union

import requests
from cf_units import Unit
from lxml import etree
from netCDF4 import Dataset, Dimension, Group, Variable

from compliance_checker.cfunits import Unit
from compliance_checker.cfutil import units_convertible

# copied from paegan
Expand Down
76 changes: 76 additions & 0 deletions compliance_checker/cfunits.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
import pyudunits2

try:
import cf_units
except ImportError:
cf_units = False

UT_SYSTEM = None


class PyUdunits2:
"""Workaround for the differences in pyudunits2 and cf-units.

NB: Some of these may change and/or get implemented upstream. Pyudunits2 is new and in-flux.

1/4 Raise the same ValueError to match cf-unit errors.
2/4 Creates an empty unit from None to mimic cf-unit's Unit('unknown')
3/4 Add a definition object that is ust units.expanded()
"""

def __init__(self, units: str | None):
"""Keep unit system so we can convert from string later."""
global UT_SYSTEM
if UT_SYSTEM is None:
UT_SYSTEM = pyudunits2.UnitSystem.from_udunits2_xml()

self.ut_system = UT_SYSTEM

if units is None:
units = ""

try:
self.units = self.ut_system.unit(units)
except (SyntaxError, pyudunits2.UnresolvableUnitException) as err:
raise ValueError from err
self.definition = self.units.expanded()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would avoid doing this until you have to. On the pyudunits2 side we would cache this within the unit anyway (on first request).

It looks like this isn't actually used presently anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My memory is failing me here now but I believe I added this to create a correspondence with cf_units. However, if the changes above solve the slowdowns I created here with my messy code, I'll probably just ditch cf_units and use only pyudunits2.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@ocefpaf ocefpaf Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some very crude benchmarks points to this as the source of slowdowns.

%timeit ut_system.unit("meters").expanded()
90.6 μs ± 2.45 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

vs

%timeit cf_units.Unit("meter").definition
3.4 μs ± 34.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

I believe all the pieces to return a faster definition, like cf-units does, is already in pyudunits2. I think that all we need is to expose the singular name from ._identifier_references of a unit.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out where it is used!

That check intentionally doesn't support cm as a unit (even though meters is fine), but it does support inch, and any aliases for that (including in). Am I reading that right?

I guess we need to think about what equality of units means - I think m == meters, but m != 100 cm is probably the way to go. We may want some equivalence test (not the same as equality) where m is equivalent to 100 cm, to go alongside the convertible test that we already have.

With such a definition, I would be able to add hashability to the Unit, such that you can make a set of units {unit1, unit2}, and check that the unit you are testing against is in the set (which would handle the possible variations on spellings etc. out of the box)...

valid_units = {unit('meters'), unit('fathoms')}
assert unit('m') in valid_units  # Note the different spelling
assert unit('cm') not in valid_units

I think all of this needs to go into some documentation in pyudunits2 about common patterns.

All of the above doesn't solve the performance difference. The numbers are really useful, so thank you!


def __eq__(self, other):
return self.units == other

def is_dimensionless(self):
return self.units.is_dimensionless()

def is_time_reference(self):
return isinstance(self.units, pyudunits2.DateUnit)

def is_convertible(self, other):
if isinstance(other, str):
other = self.ut_system.unit(other)
elif isinstance(other, (PyUdunits2)):
other = other.units
else:
msg = f"Expected valid unit string or pyudunits2 unit object. Got {other}."
raise ValueError(msg)

# FIXME: cf-units Workaround 1/4 -> cf_units.Unit(None) -> Unit('unknown').
if "" in (self.units.expanded(), other.expanded()):
return False

convertible = self.units.is_convertible_to(other)
# FIXME: cf-units Workaround 2/4 -> time is not convertible to time reference.

# Both are time reference confirm.
if self.is_time_reference() and isinstance(other, pyudunits2.DateUnit):
convertible = True

return convertible


if cf_units:
PyUdunits2 = cf_units.Unit


class Unit(PyUdunits2):
def __init__(self, units):
super().__init__(units)
8 changes: 5 additions & 3 deletions compliance_checker/cfutil.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from functools import lru_cache, partial
from importlib.resources import files

from cf_units import Unit
from compliance_checker.cfunits import Unit

_UNITLESS_DB = None
_SEA_NAMES = None
Expand Down Expand Up @@ -111,7 +111,9 @@ def is_dimensionless_standard_name(standard_name_table, standard_name):
f".//entry[@id='{standard_name}']",
)
if found_standard_name is not None:
canonical_units = Unit(found_standard_name.find("canonical_units").text)
canonical_units = Unit(
found_standard_name.find("canonical_units").text,
)
return canonical_units.is_dimensionless()
# if the standard name is not found, assume we need units for the time being
else:
Expand Down Expand Up @@ -2039,6 +2041,6 @@ def units_convertible(units1, units2):
try:
u1 = Unit(units1)
u2 = Unit(units2)
except ValueError:
except (ValueError, NotImplementedError):
return False
return u1.is_convertible(u2)
3 changes: 1 addition & 2 deletions compliance_checker/ioos.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
from numbers import Number

import validators
from cf_units import Unit
from lxml.etree import XPath
from owslib.namespaces import Namespaces

Expand All @@ -23,6 +22,7 @@
)
from compliance_checker.cf import util as cf_util # not to be confused with cfutil.py
from compliance_checker.cf.cf import CF1_6Check, CF1_7Check
from compliance_checker.cfunits import Unit
from compliance_checker.cfutil import (
get_geophysical_variables,
get_instrument_variables,
Expand Down Expand Up @@ -1377,7 +1377,6 @@ def check_vertical_coordinates(self, ds):
"mile",
"fathom",
)

unit_def_set = {
Unit(unit_str).definition for unit_str in expected_unit_strs
}
Expand Down
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ requires = [
[project]
name = "compliance-checker"
description = "Checks Datasets and SOS endpoints for standards compliance"
readme = "README.md"
readme = { file = "README.md", content-type = "text/markdown" }
license = { text = "Apache-2.0" }
maintainers = [
{ name = "Dave Foster", email = "[email protected]" },
Expand Down Expand Up @@ -38,6 +38,7 @@ dynamic = [
"dependencies",
"version",
]
optional-dependencies.extras = [ "cf-units>=2" ]
urls.documentation = "https://ioos.github.io/compliance-checker"
urls.homepage = "https://compliance.ioos.us/index.html"
urls.repository = "https://github.com/ioos/compliance-checker"
Expand Down Expand Up @@ -79,7 +80,6 @@ compliance_checker = [
dependencies = { file = [
"requirements.txt",
] }
readme = { file = "README.md", content-type = "text/markdown" }

[tool.setuptools_scm]
write_to = "compliance_checker/_version.py"
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
cf-units>=2
cftime>=1.1.0
isodate>=0.6.1
jinja2>=2.7.3
Expand All @@ -9,6 +8,7 @@ packaging
pendulum>=1.2.4
pygeoif>=0.6
pyproj>=2.2.1
pyudunits2
regex>=2017.07.28
requests>=2.2.1
setuptools>=15.0
Expand Down
Loading