Skip to content
Fabio Lima edited this page Dec 9, 2022 · 43 revisions

(Not So) Frequently Asked Questions

What is a TSID?

It's actually quite an old idea.

TSID Creator is just one implementation of that idea.

Why the name "Time Sortable Identifier"?

Because at the time I couldn't think of a better name. I don't even know if "sortable" is a dictionary word. So it's my fault. :)

This could be called, for example, Time-Sorted Unique Identifier, Twitter Snowflake Identifier, Time Sequence Identifier, etc.

You can call it whatever you see fit, as long as people know what you're talking about.

In the end, it's just a name.

Why does the timestamp start on January 1, 2020?

Because the TSID Creator was implemented that year and because there needed to be an easy-to-remember date.

You can use whatever start date you want in your application. You just need to be consistent, I mean, if you decide to use the date 2022-12-22, you have to use it in the entire application, always.

Why does the TSID's lifetime end in 69 or 139 years?

Because those two are the durations that fit in 41 or 42 bits.

If your programming language or database only has signed integer data types, the first bit of the identifier will be used as a sign bit.

This means that, in these languages and databases, the limit will be reached in 69 years. This was the same limit as Twitter Snowflake.

Not all programming languages and not all databases have an unsigned integer data type. For example, Java and PostgreSQL only have signed integers.

In languages and databases that support unsigned integers, the limit will only be reached in 139 years.

If the identifier is stored in string format or in byte array format, which might not be common for a 64-bit value, the limit will also be 139 years.

What happens when the timestamp reaches its limit?

Honestly, I don't know what will happen in 69 or 139 years.

What is a node identifier?

It is a way to identify the ID generator.

The function of the node identifier is to prevent collisions between IDs produced by more than one ID generator.

When you have more than one process generating IDs, there is a relatively high probability of a collision in a 64-bit ID. But when you say that each generator will have a unique node identifier in your application, you eliminate that collision probability.

This can be a virtual machine ID, a container ID, a running process ID, etc. You are the one who says what it means within the context of your application.

If you have only one process generating IDs, you don't have to worry about collisions within your application.

What is a counter?

It's just a bunch of bits that are incremented each time a new identifier is generated. And when the timestamp changes, these bits are randomly reset.

The function of these bits is to ensure that the identifiers are always monotonic, I mean, that an identifier is always greater than the previous identifier.

This prevents collisions between identifiers created by the same generator. Since identifiers never go backwards, there is no risk of collision with identifiers that were previously created by a single generator.

Of course, the system clock can go backwards, causing the risk of identifier collisions. But we can't do much about it.

Why is the random component split in two?

I know it's a little confusing. When I first implemented TSID Creator, the last 22 bits were completely random. So I decided to split this chunk of bits into two subcomponents. These 22 bits can be called the "tail" or something like that. I keep calling it "random" because that's how it's implemented and because these subcomponents are still initialized randomly.

Why Crockford base-32 encoding?

Because it's ULID encoding and because it's efficient.

Nothing prevents you from encoding the TSID in an encoding of your choice, like base-62 for example.

In the current implementation of TSID Creator, this is the only encoding. Perhaps in the future it will be possible to create strings of TSIDs in other base-n encodings.

Why are there 3 TSID variants (256, 1024 and 4096)?

This too can seem confusing.

But, in fact, there is only one type of identifier implemented by TSID Creator.

What changes are the number of bits reserved for the node identifier and for the counter.

I could have just implemented the 1024 node variant, as that's what was used in Twitter Snowflake. I split it into 3 because I thought it would be convenient.

Today I realize that the 256 node variant is the most used.

What is the difference between TsidCreator and TsidFactory classes?

TsidCreator class the easiest way to generate TSIDs.

TsidFactory class is the class that actually creates the TSIDs. This class can be configured to create TSIDs however you see fit. For example, you can change the amount of bits reserved for the node identifier, you can change the start date of the timestamp, you can change the random number generator, etc.

What is the Tsid class?

Tsid class is a value object.

In some applications it may be more convenient to use a value object than a basic data type like long or String.

Why is the timestamp 42 bits?

Because it was the default in Snowflake Twitter IDs.

It was actually 41-bit in Twitter Snowflake. I added 1 bit to turn the TSID into an unsigned integer, doubling the lifetime of TSIDs. I've also done this so that the ordering of the integer is consistent with the ordering of the ID in the Crockford base-32 string.

Some implementations of the concept have different bit counts for timestamps, for example Mastodon has 48 bits.

In the current implementation of TSID Creator, this number of bits cannot be changed. Maybe in the future.

Why is the node identifier 10 bits by default?

Because it was the default in Snowflake Twitter IDs.

If you want, you can use any number of bits between 0 and 20. But if you do that, you're also changing the number of bits in the counter.

In Twitter Snowflake, this node idea actually consisted of two parts: Datacenter ID and Worker ID. These two things added together give 10 bits.

Why is counter 12 bits by default?

Because it was the default in Snowflake Twitter IDs.

If you want you can use any number of bits between 2 and 22. But for that you have to change the number of bits of the node identifier.


Nobody asked me any of this. I use this text format to try to better explain the decisions I had to make during the implementation of the TSID Creator. The questions I've included here are ones I think I would ask myself if I saw this project for the first time. Hope this is helpful.

Clone this wiki locally