Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-89083: add support for UUID version 7 (RFC 9562) #121119

Open
wants to merge 51 commits into
base: main
Choose a base branch
from

Conversation

picnixz
Copy link
Member

@picnixz picnixz commented Jun 28, 2024

Based on the discussion in #89083 and https://discuss.python.org/t/rfc-4122-9562-uuid-version-7-and-8-implementation/56725/2, this is the implementation that I suggest for the standard library.

The documentation is still missing because I don't have a good formulation for now.

In this PR, I did not include the following:

  • mutex guards
  • timestamp offsets

The reason is that I want to keep the first implementation simple for the sake of review. In addition, we did not give the add mutex for UUIDv1 so I don't want to do it only for v7.

@sergeyprokhorenko I don't know if you have the answer, but is there any safe guards if the timestamp overflows actually? or do we just don't care at all for now? (like, leave the problem for the future generations?)


📚 Documentation preview 📚: https://cpython-previews--121119.org.readthedocs.build/

@sergeyprokhorenko
Copy link

sergeyprokhorenko commented Jun 28, 2024

@sergeyprokhorenko I don't know if you have the answer, but is there any safe guards if the timestamp overflows actually? or do we just don't care at all for now? (like, leave the problem for the future generations?)

You already have three counter overflow protections:

  1. Very long counter (42 bits)
  2. Counter segment (MSB) initialized to 0
  3. Incremented timestamp on overflow

The timestamp will not be full for about 6900 years. If the system clock stops and the timestamp is used as a counter, it will also last a long time.

You can be absolutely calm

@picnixz
Copy link
Member Author

picnixz commented Jun 28, 2024

Yes, but I wanted to know whether the RFC actually considered the case when you use your own offset. Let's say we want to generate a future UUID for some obscure reason, I was wondering "is there anything on that topics?" But I think I'll just leave it to future generations.

What I meant is "what do you do if the operation of incrementing the timestamp itself overflows"?

@sergeyprokhorenko
Copy link

sergeyprokhorenko commented Jun 28, 2024

Yes, but I wanted to know whether the RFC actually considered the case when you use your own offset. Let's say we want to generate a future UUID for some obscure reason, I was wondering "is there anything on that topics?" But I think I'll just leave it to future generations.

What I meant is "what do you do if the operation of incrementing the timestamp itself overflows"?

Don't set offsets to 6900 years or minus 2k years, and everything will be OK. Foolproofing is an implementation detail.

@sergeyprokhorenko
Copy link

sergeyprokhorenko commented Jun 28, 2024

When the timestamp goes beyond the upper or lower limit of the acceptable range, a zero offset can be applied. This is how I would do it. The RFC does not cover this issue.

I think timestamp offset could be a competitive advantage of this implementation without significant cost.

UUIDv1 can be forgotten and no longer upgraded. This is an outdated version

@picnixz picnixz changed the title gh-89083: add ref. impl. for UUID version 7 (RFC 9562) gh-89083: add support for UUID version 7 (RFC 9562) Jun 28, 2024
@pretoriusdre
Copy link

Great job on this PR. One thing...

In the get_counter_and_tail method:
rand = int.from_bytes(os.urandom(10))

Might I suggest to explicitly specify the required byteorder using the byteorder argument?

Running this code in an older python env gives an error:
TypeError: from_bytes() missing required argument 'byteorder' (pos 2)

It seems like some default is now provided, but in my opinion, this could lead to some ambiguity. See below:
https://discuss.python.org/t/what-should-be-the-default-value-for-int-to-bytes-byteorder/10616

There is another usage of int.from_bytes in the same uuid module, perhaps if the above is being addressed, this could be put within same scope.

@picnixz
Copy link
Member Author

picnixz commented Jul 22, 2024

Running this code in an older python env gives an error:

This feature would only be put in 3.14 or later, so we can ignore this.

but in my opinion, this could lead to some ambiguity

It doesn't matter whether it's little or big endian here since we are only interested in randomness and not actual data. In addition, not specifying it might be a bit faster since the C implementation currently does:

    if (byteorder == NULL)
        little_endian = 0;
    else if (_PyUnicode_Equal(byteorder, &_Py_ID(little)))
        little_endian = 1;
    else if (_PyUnicode_Equal(byteorder, &_Py_ID(big)))
        little_endian = 0;

So, not specifying the byteorder, is equivalent to have byteorder being NULL out there, which saves a string comparison.

Copy link

@jnoring-pw jnoring-pw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor comments as I looked through this code (I'm interested in uuid7 suport in our project). Thanks for this! It's looking good.

@picnixz picnixz changed the title gh-89083: add support for UUID version 7 (RFC 9562) gh-89083: add support for UUID version 7 (RFC 9562) Aug 22, 2024
@Davidamgad2
Copy link

Hello,
Thank you so much for your efforts in advance!
May I ask is v7 will be merged soon and I will be able to use it?in django app
Thanks again!

@picnixz
Copy link
Member Author

picnixz commented Jan 13, 2025

I would be happy to but I would need more core devs to support this. We have roughly until May 2025 to include it for 3.14. Hopefully we'll manage to!

@picnixz picnixz added the type-feature A feature request or enhancement label Jan 13, 2025
@picnixz
Copy link
Member Author

picnixz commented Jan 13, 2025

I'll have another look at Rust and other languages' UUIDv7 implementations. Then, I'll ask other core devs to review the PR. And hopefully, we'll have it before the beta release.

@Davidamgad2
Copy link

I'm not sure if this allowed to but I found this repo for uuidv7 if this would help, for now

https://github.com/aminalaee/uuid-utils

Thanks again and keep it up! 🔥

@picnixz
Copy link
Member Author

picnixz commented Jan 13, 2025

I'm not sure if this allowed to but I found this repo for uuidv7 if this would help, for now

I actually found other implementations (see the issue thread) but I want to be as compliant as I can to the RFC and closer to what other standard libraries use. Currently, the implementation follows the implementation of Rust as it was last summer but I'll check if this has changed since then.

@picnixz picnixz requested review from vstinner and hugovk February 17, 2025 10:22
@picnixz
Copy link
Member Author

picnixz commented Feb 17, 2025

Following https://discuss.python.org/t/rfc-4122-9562-uuid-version-7-and-8-implementation/56725/, and considering that many core developers suggested to keep it aligned with the Rust implementation as a first iteration and not PostgreSQL, and that even the author of the UUIDv7 for PostgreSQL recommended that Python aligns itself with Rust to maintain portability on platforms lacking microsecond resolution system clocks, I decided to keep the current implementation.

Comment on lines +760 to +763
_last_timestamp_v7 = None
_last_counter_v7 = 0 # 42-bit counter

def uuid7():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slip in some PEP 8 whitespace:

Suggested change
_last_timestamp_v7 = None
_last_counter_v7 = 0 # 42-bit counter
def uuid7():
_last_timestamp_v7 = None
_last_counter_v7 = 0 # 42-bit counter
def uuid7():

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file is not really PEP-8 friendly because some functions are separated by two empty lines, others aren't. I would prefer either having all uuid functions without PEP-8 or all of them with (it feels a bit weird that uuid1 to uuid4 won't have the double spacing but UUID 6, 7, and 8 would).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then add the rest too :) PEP 8: "although this is also an opportunity to clean up someone else’s mess (in true XP style)."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I can do it. The module is very very slow for updates so I don't think we'll have issues with backports :)

# by construction, the variant and version bits are already cleared
int_uuid_7 |= _RFC_4122_VERSION_7_FLAGS
return UUID._from_int(int_uuid_7)

def uuid8(a=None, b=None, c=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def uuid8(a=None, b=None, c=None):
def uuid8(a=None, b=None, c=None):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants