gh-111545: Add Py_HashDouble() function #113115

vstinner · 2023-12-14T14:48:52Z

Add tests: Modules/_testcapi/hash.c and
Lib/test/test_capi/test_hash.py.

Issue: Make _Py_HashDouble public again as "unstable" API #111545

📚 Documentation preview 📚: https://cpython-previews--113115.org.readthedocs.build/

Add tests: Modules/_testcapi/hash.c and Lib/test/test_capi/test_hash.py.

serhiy-storchaka · 2023-12-15T09:07:16Z

Python/pyhash.c

@@ -84,17 +84,20 @@ static Py_ssize_t hashstats[Py_HASH_STATS_MAX + 1] = {0};
   */

 Py_hash_t
-_Py_HashDouble(PyObject *inst, double v)
+Py_HashDouble(double v)
 {
    int e, sign;
    double m;
    Py_uhash_t x, y;

    if (!Py_IS_FINITE(v)) {


What if remove this and keep only Py_IS_INFINITY(v) check?

If prefer to have a deterministic behavior and always return the same hash value (0) if value is NaN. There are legit use cases to treat NaN as hash value 0.

With the following change, only check for, Py_HashDouble() hangs (fail to exit the loop) if value is NaN.

diff --git a/Python/pyhash.c b/Python/pyhash.c index f64edde4043..23aa2dac7cc 100644 --- a/Python/pyhash.c +++ b/Python/pyhash.c @@ -90,14 +90,8 @@ Py_HashDouble(double v) double m; Py_uhash_t x, y; - if (!Py_IS_FINITE(v)) { - if (Py_IS_INFINITY(v)) { - return (v > 0 ? _PyHASH_INF : -_PyHASH_INF); - } - else { - assert(Py_IS_NAN(v)); - return 0; - } + if (Py_IS_INFINITY(v)) { + return (v > 0 ? _PyHASH_INF : -_PyHASH_INF); } m = frexp(v, &e);

With the following change, Py_HashDouble() returns -_PyHASH_INF if value is NaN, since NaN > 0 is false:

diff --git a/Python/pyhash.c b/Python/pyhash.c index f64edde4043..a853d6dad99 100644 --- a/Python/pyhash.c +++ b/Python/pyhash.c @@ -91,13 +91,8 @@ Py_HashDouble(double v) Py_uhash_t x, y; if (!Py_IS_FINITE(v)) { - if (Py_IS_INFINITY(v)) { - return (v > 0 ? _PyHASH_INF : -_PyHASH_INF); - } - else { - assert(Py_IS_NAN(v)); - return 0; - } + // v can be NaN + return (v > 0 ? _PyHASH_INF : -_PyHASH_INF); } m = frexp(v, &e);

What if use Py_IS_INFINITY() instead of !Py_IS_FINITE()?

My first attempt (first patch in my comment) leads to a hang if you pass NaN.

Why do you want to avoid !Py_IS_FINITE + Py_IS_INFINITY check? Are you worried about performance?

Recipe of What's New in Python 3.13:

Py_hash_t hash_double(PyObject *obj, double value) { if (!Py_IS_NAN(value)) { return Py_HashDouble(value); } else { return Py_HashPointer(obj); } }

Using this recipe and the current implementation, there are 3 code paths:

NaN: 1 test (Py_IS_NAN()), hash_double() calls Py_HashPointer().

infinity: 3 tests (!Py_IS_NAN(), !Py_IS_FINITE(), Py_IS_INFINITY()), return (v > 0 ? _PyHASH_INF : -_PyHASH_INF).

finite: 2 tests (!Py_IS_NAN(), Py_IS_FINITE()), the loop.

I don't think that it's a big deal to add 1 or 2 tests per float point number. I care more about the API, having a deterministic behavior for the 3 cases.

I want to avoid any promises about NaN. It should be recommended to not use this function for NaN.

serhiy-storchaka · 2023-12-15T09:12:29Z

Doc/c-api/hash.rst

+   * If *value* is positive infinity, return :data:`sys.hash_info.inf
+     <sys.hash_info>`.
+   * If *value* is negative infinity, return :data:`-sys.hash_info.inf
+     <sys.hash_info>`.
+   * If *value* is not-a-number (NaN), return :data:`sys.hash_info.nan
+     <sys.hash_info>` (``0``).
+   * Otherwise, return the hash value of the finite *value* number.
+
+   .. note::
+      Return the hash value ``0`` for the floating point numbers ``-0.0`` and
+      ``+0.0``, and for not-a-number (NaN). ``Py_IS_NAN(value)`` can be used to
+      check if *value* is not-a-number.


It exposes too much implementation details why already exposed in different place. Why not simply say that it is equivalent to hash() of Python float object if it is not a NaN? And if it is a NaN, you should use other value to avoid collisions.

Are you talking about the note, or describing the 3 cases and return values? I can just remove the note. My idea is to suggest using Py_IS_NAN() to treate NaN differently. But I'm not sure which implementation to suggest.

@zooba says that if you have a Python object, just call PyObject_Hash(obj) on it 😁

About describing all 3 cases. It should already be described in other place (documentation for sys.hash_info or float or hash()), and if it is not described in details, than it is not necessary for users. You should only document that for non-NaN values it returns the same result as for hash() for Python float object.

vstinner · 2023-12-20T11:02:55Z

I created PR #112095 more than 1 month ago. I spent time to run benchmark, implement different APIs, try to collect feedback on each API, and discuss in length advantages and disadvantages of each API. Sadly, we failed to reach a consensus on the API. Now another API is being discussed. The API looks simple to me, I didn't expect to spend more than one month on a single function.

I need to take a break from that topic. I don't have the energy to dig into these discussions. I prefer to close the PR for now.

pythongh-111545: Add Py_HashDouble() function

9b00e3e

Add tests: Modules/_testcapi/hash.c and Lib/test/test_capi/test_hash.py.

vstinner requested a review from tiran as a code owner December 14, 2023 14:48

bedevere-app bot added the awaiting core review label Dec 14, 2023

bedevere-app bot mentioned this pull request Dec 14, 2023

Make _Py_HashDouble public again as "unstable" API #111545

Closed

vstinner mentioned this pull request Dec 14, 2023

gh-111545: Add Py_HashDouble() function #112449

Closed

Fix Sphinx syntax

67b4eb8

serhiy-storchaka approved these changes Dec 15, 2023

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Dec 15, 2023

vstinner mentioned this pull request Dec 16, 2023

Add Py_HashDouble() function capi-workgroup/decisions#2

Closed

4 tasks

vstinner closed this Dec 20, 2023

vstinner deleted the hash_double4 branch December 20, 2023 11:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-111545: Add Py_HashDouble() function #113115

gh-111545: Add Py_HashDouble() function #113115

vstinner commented Dec 14, 2023 •

edited by github-actions bot

Loading

serhiy-storchaka Dec 15, 2023

vstinner Dec 15, 2023

serhiy-storchaka Dec 18, 2023

vstinner Dec 18, 2023

vstinner Dec 18, 2023

serhiy-storchaka Dec 18, 2023

serhiy-storchaka Dec 15, 2023

vstinner Dec 18, 2023

serhiy-storchaka Dec 18, 2023

vstinner commented Dec 20, 2023

gh-111545: Add Py_HashDouble() function #113115

gh-111545: Add Py_HashDouble() function #113115

Conversation

vstinner commented Dec 14, 2023 • edited by github-actions bot Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vstinner commented Dec 20, 2023

vstinner commented Dec 14, 2023 •

edited by github-actions bot

Loading