You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This was a bit alarming, but it turns out that it's correct. The proquint method runs by selecting n_words * n 16 bit integers, then converting them into "words". This should give a keyspace of 2^32 (~4 billion) as you say above.
Skipping out any code in the package, and looking at selecting ~121k 32 bit integers, and looking for duplicates:
which is consistent with what I saw from proquint based on your example.
Using the "birthday paradox distribution", we only need to draw 77k samples to have a 50% chance of a collision:
qbirthday(classes = 2^32)
# [1] 77164
@r-ash found another way of looking at this problem here - the expected number of collisions can be computed by the probability that the i'th term appears twice (ignoring triple-collisions)
Typically when I've needed large keyspaces I've used ids::random_id which uses by default 16 bytes (so 2^128). That breaks qbirthday but gets us into healthy territory (based on log-log extrapolation I make it that we'd need to draw ~10^19 samples to have a 50% chance of duplicate).
Ok, good to know this isn't a bug. It would be nice if ids::proquint had an argument unique where it resamples any duplicates before returning the vector. For now I solved my problem by using n=3. I would use ids::random_id but find proquints to be fun.
If I'm not mistaken, running
proquint
with the default argumentn_words=2L
should be able to generate 2^32 (4,294,967,296) unique identifiers.When I run
proquint
with justn=121969
, I consistently get duplicates. You can test this with the below:proquint(121969) |> unique() |> length()
which in my case is always 2 to 3 less than 121,969. Am I missing something with how this should work?The text was updated successfully, but these errors were encountered: