Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for uuid7 or ulid #528

Open
dahu33 opened this issue May 24, 2023 · 19 comments
Open

Support for uuid7 or ulid #528

dahu33 opened this issue May 24, 2023 · 19 comments

Comments

@dahu33
Copy link

dahu33 commented May 24, 2023

In addition to time and id range it would be nice to have support for uuid7 or ulid (Universally Unique Lexicographically Sortable Identifiers).

Thanks!

@dahu33 dahu33 changed the title Support for Universally Unique Lexicographically Sortable Identifiers Support for uuid7 or ulid May 24, 2023
@keithf4
Copy link
Collaborator

keithf4 commented May 24, 2023

I'm assuming something like this would generally not need ongoing partition generation? As in, you'd define your child tables up front for each block of values and then be done? That's pretty well supported without needing something like pg_partman which primarily provides ongoing maintenance features.

Or is there some scenario where you see that being needed with this data type?

@keithf4 keithf4 self-assigned this May 24, 2023
@keithf4
Copy link
Collaborator

keithf4 commented May 24, 2023

Also, is this data type supported in PG? Don't see it specifically listed here in the uuid data types.

https://www.postgresql.org/docs/15/uuid-ossp.htm

@dahu33
Copy link
Author

dahu33 commented May 24, 2023

Both UUIDv7 and ULID encode a unix timestamp which allow partitioning by time (hours, day, etc...) so the idea is to be able to use time partitioning using a unique identifier. Also, both UUIDv7 and ULID use UUID format and therefore are compatible with the PostgreSQL uuid type (but there is no built-in function to generate a UUIDv7 or a ULID yet).

@keithf4
Copy link
Collaborator

keithf4 commented May 24, 2023

Is it possible to decode the time value out of that? Otherwise, how do you see defining the partitioning interval on this? Not quite that familiar with UUID usage myself.

@dahu33
Copy link
Author

dahu33 commented May 24, 2023

Absolutely, see for the format: https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-01.html#section-4.4

In a nutshell because the first 36-bit of the 128-bit UUID is a big-endian unsigned Unix Timestamp value, I suppose we can easily define the partitions range by time.

Also found https://github.com/fboulnois/pg_uuidv7

@keithf4
Copy link
Collaborator

keithf4 commented May 24, 2023

Ok, I see how it might work now, but it's a pretty specialized request. I have a lot of other stuff I'd like to get done first (see open issues) before I'd really have time to consider looking into adding this myself.

However, after 5.x is out, if you'd like to work on a PR to support this before I would have time to look into this, I'd be happy to review it.

@keithf4 keithf4 added this to the Future milestone May 24, 2023
@rauanmayemir
Copy link

I came here to ask for a similar feature, but from a different angle.

UUID7/ULID is a good identifier that is time-based and hence grows monotonically. It’s very natural to use it not just as Primary Key, but also for partitioning based on timestamp derived from it.

One way to solve this would have been maintaining a generated/unstored column that would extract timestamp from the UUID, but it would bring in extra effort for the application to account for the generated column on the read path, not to mention changing the PK to include it.

I thought a generic solution to this would be a way for pgpartman to support another way of declaring partitions: not by a date-type column, but by a function, so that the function would take the PK(s) and return the applicable time range for the partitioning.

@killua-eu
Copy link

@rauanmayemir declaring functions by a function seems like a brilliant idea.

@linglom1
Copy link

Hi, is there any solution to this problem?

@rauanmayemir
Copy link

@linglom1 you may want to consider timescaledb that supports the described scheme out of the box.

@akulapid
Copy link
Contributor

akulapid commented Aug 5, 2024

I have a similar requirement where we use time ordered uids similar to twitter's snowflake ids. This practice is becoming pretty common in many places. I was also thinking along the lines of @rauanmayemir 's suggestion to externalize the logic by a function, except something lightweight in the form of "encoder" function that encodes a timestamp as per one's id specification. This way the time range generation logic itself can continue to be leveraged from the extension.
@keithf4 I'm happy to work on a PR for this, if you can help review.

@akulapid
Copy link
Contributor

@keithf4 bumping this up, let me know if you're open for a PR along the above lines ^

@keithf4
Copy link
Collaborator

keithf4 commented Aug 12, 2024

I'm certainly open to a PR. Thank you!

@akulapid
Copy link
Contributor

akulapid commented Aug 19, 2024

@keithf4 have raised this - #683
I understand this is a big change, open to some scrutiny on this

@akulapid
Copy link
Contributor

Hey @keithf4, wondering if you had a chance to look at this?

@keithf4
Copy link
Collaborator

keithf4 commented Aug 28, 2024

Hey @keithf4, wondering if you had a chance to look at this?

Sorry I have not yet. Pretty busy with some other projects at the moment. Hoping to get back to partman more in the next month or so

@keithf4
Copy link
Collaborator

keithf4 commented Nov 22, 2024

Version 5.2.0 has been released. Thanks to everyone that helped get this feature implemented!

@calebj
Copy link

calebj commented Jan 6, 2025

This is a much better solution to what I opened #388 to discuss, especially since I have been using timestamptz encode/decode functions for other things. Having the extension ready to go for mainstream UUIDv7 support in PG18 is also fantastic.

Unfortunately, my snowflake ID columns are bigint, which is not a control column type allowed by the changes added in #683. Up to this point I've been using non-epoch ID partitioning, with large values for interval and retention. Enough data comes in that unexpected gaps almost never happen.

To make it time-based, I propose a special type value for p_epoch of 'func', which will make the extension use the same code path as text/uuid columns for types classified as id.

@keithf4
Copy link
Collaborator

keithf4 commented Jan 6, 2025

@calebj
Would you mind opening up another issue with an example of what you'd need supported? Will likely be closing this issue in the near future but would still like to look into what you're requesting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants