Skip to content

Floats are equal, but their hash is different #513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Darkdragon84 opened this issue Apr 4, 2025 · 7 comments
Open

Floats are equal, but their hash is different #513

Darkdragon84 opened this issue Apr 4, 2025 · 7 comments

Comments

@Darkdragon84
Copy link

It generally holds that if two hashable objects are equal, their hashes must be the same.

It seems that python builtin float and symengine.Float violate this rule:

from symengine import Float

Float(2.2) == 2.2 # True
hash(Float(2.2)) == hash(2.2) # False

Shouldn't the hash of a symengine.Float just use the python float's hash?

@certik
Copy link
Contributor

certik commented Apr 4, 2025

The SymEngine Float is not the same as Python float. Two different objects and Python uses a different hash functions to calculate its hash.

@Darkdragon84
Copy link
Author

I know, but if two objects equate, they should also yield the same hash, otherwise you get undefined behavior in e.g. hash tables

@bjodah
Copy link
Contributor

bjodah commented Apr 8, 2025

I think @Darkdragon84 is right, it is not immediately obvious to me how we would achieve this, but poking around with some python standard library classes, this promise seems to hold:

>>> import decimal
>>> decimal.Decimal('0.5') == 0.5
True
>>> hash(decimal.Decimal('0.5')) == hash(0.5)
True
>>> from fractions import Fraction
>>> Fraction(1, 2) == 0.5
True
>>> hash(Fraction(1, 2)) == hash(0.5)
True

If we support float's .as_integer_ratio(), we could use this as the basis for hash computation perhaps? (but it would be potentially be slower, but non-conformance for speed in the python bindings should probably be opt-in with a compile time flag).

EDIT:
Here's the relevant section in Python's documentation.

The only required property is that objects which compare equal have the same hash value;

@bjodah bjodah pinned this issue Apr 8, 2025
@bjodah bjodah unpinned this issue Apr 8, 2025
@rikardn
Copy link
Contributor

rikardn commented Apr 8, 2025

We need to be careful to not create a strong coupling to the Python hash implementation. So if this should be fixed it needs to be in the Python bindings. Other languages we want to bind to most probably does the hashing in a different way.

I feel that the only way would be to first convert to a Python float and then hash. This would avoid the coupling and generate the same hash if objects compare equal. Perhaps we also need to do this for int. I don't think we would need it for Rational since it doesn't compare equal with floatand cannot be integers.

@bjodah
Copy link
Contributor

bjodah commented Apr 8, 2025

@rikardn I agree, any overhead should solely be carried by the python bindings, and if there is any overhead I think we should make it simple to opt-out of at compile time (or runtime, if that's doable).

Converting to float, followed by hash sounds like a good approach to me.

Regarding Rational, I wonder if this is an oversight by us, I feel like this really should return True:

>>> symengine.Rational(1,2) == 0.5
False

(but that would be a separate issue I guess).

@rikardn
Copy link
Contributor

rikardn commented Apr 9, 2025

@bjodah The float not comparing equal to a rational is by design. We see 1/2 as an exact number and 0.5 being 0.5 + $\epsilon$, where espilon can be a small irrational number.

@bjodah
Copy link
Contributor

bjodah commented Apr 9, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants