Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-111545: Add Py_HashDouble() function #113115

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions Doc/c-api/hash.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,16 @@ PyHash API

See also the :c:member:`PyTypeObject.tp_hash` member.

Types
^^^^^

.. c:type:: Py_hash_t

Hash value type: signed integer.

.. versionadded:: 3.2


.. c:type:: Py_uhash_t

Hash value type: unsigned integer.
Expand Down Expand Up @@ -41,6 +45,29 @@ See also the :c:member:`PyTypeObject.tp_hash` member.
.. versionadded:: 3.4


Functions
^^^^^^^^^

.. c:function:: Py_hash_t Py_HashDouble(double value)

Hash a C double number.

* If *value* is positive infinity, return :data:`sys.hash_info.inf
<sys.hash_info>`.
* If *value* is negative infinity, return :data:`-sys.hash_info.inf
<sys.hash_info>`.
* If *value* is not-a-number (NaN), return :data:`sys.hash_info.nan
<sys.hash_info>` (``0``).
* Otherwise, return the hash value of the finite *value* number.

.. note::
Return the hash value ``0`` for the floating point numbers ``-0.0`` and
``+0.0``, and for not-a-number (NaN). ``Py_IS_NAN(value)`` can be used to
check if *value* is not-a-number.
Comment on lines +55 to +66
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It exposes too much implementation details why already exposed in different place. Why not simply say that it is equivalent to hash() of Python float object if it is not a NaN? And if it is a NaN, you should use other value to avoid collisions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you talking about the note, or describing the 3 cases and return values? I can just remove the note. My idea is to suggest using Py_IS_NAN() to treate NaN differently. But I'm not sure which implementation to suggest.

@zooba says that if you have a Python object, just call PyObject_Hash(obj) on it 😁

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About describing all 3 cases. It should already be described in other place (documentation for sys.hash_info or float or hash()), and if it is not described in details, than it is not necessary for users. You should only document that for non-NaN values it returns the same result as for hash() for Python float object.


.. versionadded:: 3.13


.. c:function:: PyHash_FuncDef* PyHash_GetFuncDef(void)

Get the hash function definition.
Expand Down
17 changes: 17 additions & 0 deletions Doc/whatsnew/3.13.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1281,6 +1281,23 @@ New Features
* Add :c:func:`Py_HashPointer` function to hash a pointer.
(Contributed by Victor Stinner in :gh:`111545`.)

* Add :c:func:`Py_HashDouble` function to hash a C double number. Existing code
using the private ``_Py_HashDouble()`` function can be updated to:

.. code-block:: c

Py_hash_t hash_double(PyObject *obj, double value)
{
if (!Py_IS_NAN(value)) {
return Py_HashDouble(value);
}
else {
return Py_HashPointer(obj);
}
}

(Contributed by Victor Stinner in :gh:`111545`.)


Porting to Python 3.13
----------------------
Expand Down
1 change: 1 addition & 0 deletions Include/cpython/pyhash.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,4 @@ typedef struct {
PyAPI_FUNC(PyHash_FuncDef*) PyHash_GetFuncDef(void);

PyAPI_FUNC(Py_hash_t) Py_HashPointer(const void *ptr);
PyAPI_FUNC(Py_hash_t) Py_HashDouble(double value);
41 changes: 41 additions & 0 deletions Lib/test/test_capi/test_hash.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import math
import sys
import unittest
from test.support import import_helper
Expand Down Expand Up @@ -77,3 +78,43 @@ def python_hash_pointer(x):
# Py_HashPointer((void*)(uintptr_t)-1) doesn't return -1 but -2
VOID_P_MAX = -1 & (2 ** (8 * SIZEOF_VOID_P) - 1)
self.assertEqual(hash_pointer(VOID_P_MAX), -2)

def test_hash_double(self):
# Test Py_HashDouble()
hash_double = _testcapi.hash_double

# test some integers
integers = [
*range(1, 30),
2**30 - 1,
2 ** 233,
int(sys.float_info.max),
]
for x in integers:
with self.subTest(x=x):
self.assertEqual(hash_double(float(x)), hash(x))
self.assertEqual(hash_double(float(-x)), hash(-x))

# test positive and negative zeros
self.assertEqual(hash_double(float(0.0)), 0)
self.assertEqual(hash_double(float(-0.0)), 0)

# test +inf and -inf
inf = float("inf")
self.assertEqual(hash_double(inf), sys.hash_info.inf)
self.assertEqual(hash_double(-inf), -sys.hash_info.inf)

# special float values: compare with Python hash() function
special_values = (
math.nextafter(0.0, 1.0), # smallest positive subnormal number
sys.float_info.min, # smallest positive normal number
sys.float_info.epsilon,
sys.float_info.max, # largest positive finite number
)
for x in special_values:
with self.subTest(x=x):
self.assertEqual(hash_double(x), hash(x))
self.assertEqual(hash_double(-x), hash(-x))

# test not-a-number (NaN)
self.assertEqual(hash_double(float('nan')), 0)
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Add :c:func:`Py_HashDouble` function to hash a C double number. Patch by
Victor Stinner.
28 changes: 26 additions & 2 deletions Modules/_testcapi/hash.c
Original file line number Diff line number Diff line change
@@ -1,6 +1,17 @@
#include "parts.h"
#include "util.h"


static PyObject *
long_from_hash(Py_hash_t hash)
{
assert(hash != -1);

Py_BUILD_ASSERT(sizeof(long long) >= sizeof(hash));
return PyLong_FromLongLong(hash);
}


static PyObject *
hash_getfuncdef(PyObject *Py_UNUSED(module), PyObject *Py_UNUSED(args))
{
Expand Down Expand Up @@ -54,14 +65,27 @@ hash_pointer(PyObject *Py_UNUSED(module), PyObject *arg)
}

Py_hash_t hash = Py_HashPointer(ptr);
Py_BUILD_ASSERT(sizeof(long long) >= sizeof(hash));
return PyLong_FromLongLong(hash);
return long_from_hash(hash);
}


static PyObject *
hash_double(PyObject *Py_UNUSED(module), PyObject *args)
{
double value;
if (!PyArg_ParseTuple(args, "d", &value)) {
return NULL;
}

Py_hash_t hash = Py_HashDouble(value);
return long_from_hash(hash);
}


static PyMethodDef test_methods[] = {
{"hash_getfuncdef", hash_getfuncdef, METH_NOARGS},
{"hash_pointer", hash_pointer, METH_O},
{"hash_double", hash_double, METH_VARARGS},
{NULL},
};

Expand Down
26 changes: 21 additions & 5 deletions Python/pyhash.c
Original file line number Diff line number Diff line change
Expand Up @@ -84,17 +84,20 @@ static Py_ssize_t hashstats[Py_HASH_STATS_MAX + 1] = {0};
*/

Py_hash_t
_Py_HashDouble(PyObject *inst, double v)
Py_HashDouble(double v)
{
int e, sign;
double m;
Py_uhash_t x, y;

if (!Py_IS_FINITE(v)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if remove this and keep only Py_IS_INFINITY(v) check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If prefer to have a deterministic behavior and always return the same hash value (0) if value is NaN. There are legit use cases to treat NaN as hash value 0.

With the following change, only check for, Py_HashDouble() hangs (fail to exit the loop) if value is NaN.

diff --git a/Python/pyhash.c b/Python/pyhash.c
index f64edde4043..23aa2dac7cc 100644
--- a/Python/pyhash.c
+++ b/Python/pyhash.c
@@ -90,14 +90,8 @@ Py_HashDouble(double v)
     double m;
     Py_uhash_t x, y;
 
-    if (!Py_IS_FINITE(v)) {
-        if (Py_IS_INFINITY(v)) {
-            return (v > 0 ? _PyHASH_INF : -_PyHASH_INF);
-        }
-        else {
-            assert(Py_IS_NAN(v));
-            return 0;
-        }
+    if (Py_IS_INFINITY(v)) {
+        return (v > 0 ? _PyHASH_INF : -_PyHASH_INF);
     }
 
     m = frexp(v, &e);

With the following change, Py_HashDouble() returns -_PyHASH_INF if value is NaN, since NaN > 0 is false:

diff --git a/Python/pyhash.c b/Python/pyhash.c
index f64edde4043..a853d6dad99 100644
--- a/Python/pyhash.c
+++ b/Python/pyhash.c
@@ -91,13 +91,8 @@ Py_HashDouble(double v)
     Py_uhash_t x, y;
 
     if (!Py_IS_FINITE(v)) {
-        if (Py_IS_INFINITY(v)) {
-            return (v > 0 ? _PyHASH_INF : -_PyHASH_INF);
-        }
-        else {
-            assert(Py_IS_NAN(v));
-            return 0;
-        }
+        // v can be NaN
+        return (v > 0 ? _PyHASH_INF : -_PyHASH_INF);
     }
 
     m = frexp(v, &e);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if use Py_IS_INFINITY() instead of !Py_IS_FINITE()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first attempt (first patch in my comment) leads to a hang if you pass NaN.

Why do you want to avoid !Py_IS_FINITE + Py_IS_INFINITY check? Are you worried about performance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recipe of What's New in Python 3.13:

     Py_hash_t hash_double(PyObject *obj, double value)
     {
         if (!Py_IS_NAN(value)) {
             return Py_HashDouble(value);
         }
         else {
             return Py_HashPointer(obj);
         }
     }

Using this recipe and the current implementation, there are 3 code paths:

  • NaN: 1 test (Py_IS_NAN()), hash_double() calls Py_HashPointer().
  • infinity: 3 tests (!Py_IS_NAN(), !Py_IS_FINITE(), Py_IS_INFINITY()), return (v > 0 ? _PyHASH_INF : -_PyHASH_INF).
  • finite: 2 tests (!Py_IS_NAN(), Py_IS_FINITE()), the loop.

I don't think that it's a big deal to add 1 or 2 tests per float point number. I care more about the API, having a deterministic behavior for the 3 cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to avoid any promises about NaN. It should be recommended to not use this function for NaN.

if (Py_IS_INFINITY(v))
return v > 0 ? _PyHASH_INF : -_PyHASH_INF;
else
return _Py_HashPointer(inst);
if (Py_IS_INFINITY(v)) {
return (v > 0 ? _PyHASH_INF : -_PyHASH_INF);
}
else {
assert(Py_IS_NAN(v));
return 0;
}
}

m = frexp(v, &e);
Expand Down Expand Up @@ -129,6 +132,19 @@ _Py_HashDouble(PyObject *inst, double v)
return (Py_hash_t)x;
}

Py_hash_t
_Py_HashDouble(PyObject *obj, double value)
{
assert(obj != NULL);

if (!Py_IS_NAN(value)) {
return Py_HashDouble(value);
}
else {
return Py_HashPointer(obj);
}
}

Py_hash_t
Py_HashPointer(const void *ptr)
{
Expand Down