Skip to content
Fabio Lima edited this page Dec 28, 2023 · 43 revisions

(Not So) Frequently Asked Questions

What is a TSID?

It's quite an old idea.

TSID Creator is just one implementation of that idea.

Why the name "Time Sortable Identifier"?

At the time, I couldn't come up with a more fitting name, and I'm not even certain if "sortable" is a recognized word. So, that's on me. :)

You might refer to it as a Time-Sorted Unique Identifier, Twitter Snowflake Identifier, Time Sequence Identifier, Time Stamp Identifier, or any other term that suits your preference, as long as it conveys the intended meaning.

Ultimately, it's just a name, and you're free to choose one that resonates with people and effectively communicates the concept.

It is now called Time-Sorted Unique Identifier. Sounds better. :)

Why does the timestamp start on January 1, 2020?

The TSID Creator was introduced in that particular year.

In your application, you have the flexibility to choose any starting date you prefer, but it's essential to maintain consistency. For instance, if you opt for the date 1990-01-01, ensure its uniform use throughout the entire application.

Why does the TSID's lifetime end in 69 or 139 years?

These time durations are the longest periods that can be represented using either 41 or 42 bits.

If your programming language or database only works with signed integer data types, the first bit of the identifier is used as a sign bit. This means the limit will be reached in 69 years.

In languages and databases that allow for unsigned integers, the limit is extended to 139 years.

If the identifier is stored in a string or byte array format, which is not a common practice for a 64-bit value, the limit also goes up to 139 years.

What happens when the timestamp reaches its limit?

Honestly, I don't know what will happen in 69 or 139 years.

What is a node identifier?

It provides a means to identify the ID generator, and its role is to prevent collisions among IDs generated by multiple ID generators.

In scenarios where more than one process is involved in ID generation, the likelihood of collisions increases, especially with 64-bit IDs. However, by assigning a unique node identifier to each generator in your application, you effectively eliminate the probability of collisions.

This node identifier could take various forms, such as a virtual machine ID, container ID, running process ID, app instance ID, etc. Its interpretation within the context of your application is determined by you.

If your application involves a single ID generation process, the need to worry about collisions is alleviated.

What is a counter?

It comprises a set of bits incremented with each new identifier generation, and upon a timestamp change, these bits undergo random resetting.

The purpose of these bits is to maintain the monotonicity of identifiers, ensuring that each identifier is consistently greater than its predecessor. This mechanism effectively prevents collisions among identifiers generated by the same generator, as identifiers always progress forward and never regress to previous values.

While there is a potential risk of identifier collisions in the event of a system clock going backward, mitigating this issue is challenging.

Why is the random component split in two?

I know it might be a bit confusing. Initially, when I created the TSID Creator, the last 22 bits were completely random. To make things clearer, I decided to split these bits into two parts. I still call them "random" because both parts are set up randomly.

These two parts work together to make sure each identifier in the application is unique. The node identifier stops any clashes between different generators, so identifiers from different generators won't be the same.

At the same time, the counter prevents clashes within a single generator. This way, an identifier generated by one generator won't be the same as others generated by the same generator.

Why Crockford base-32 encoding?

Because it's ULID encoding and because it's very efficient.

Nothing prevents you from encoding the TSID in an encoding of your choice, like base-62 for example.

In the current implementation of TSID Creator, this is the only encoding.

Why are there 3 TSID variants (256, 1024 and 4096)?

This can also seem confusing.

But, in fact, there is only one type of identifier implemented by TSID Creator.

What changes are the number of bits reserved for the node identifier and for the counter.

I could have just implemented the 1024 node variant, as that's what was used in Twitter Snowflake. I split it into 3 because I thought it would be convenient.

Today I realize that the 256 node variant is the most used.

What is the difference between TsidCreator and TsidFactory classes?

TsidCreator class is the easiest way to generate TSIDs.

TsidFactory class is the factory that actually creates the TSIDs. This class can be configured to create TSIDs however you see fit. For example, you can change the amount of bits reserved for the node identifier, you can change the start date of the timestamp, you can change the random number generator, etc.

What is the Tsid class?

Tsid class is a value object.

In some applications, it may be more convenient to use a value object than a basic data type like long or String.

Why is the timestamp 42 bits?

Because it was the default in Snowflake Twitter IDs.

In fact, the Twitter Snowflake timestamp is 41 bits long. I added 1 bit to turn the TSID into an unsigned integer, doubling the lifetime of TSIDs. It also made integer format sorting consistent with string format sorting.

Some implementations of the concept have different bit counts for timestamps. For example, the Mastodon timestamp is 48 bits long.

In the current implementation of TSID Creator, this number of bits cannot be changed.

Why is the node identifier 10 bits by default?

Because it was the default in Snowflake Twitter IDs.

If you want, you can use any number of bits between 0 and 20. But if you do that, you're also changing the number of bits in the counter.

In Twitter Snowflake, this node idea actually consisted of two parts: Datacenter ID and Worker ID. These two things added together give 10 bits.

Why is counter 12 bits by default?

Because it was the default in Snowflake Twitter IDs.

If you want, you can use any number of bits between 2 and 22. But for that you have to change the number of bits of the node identifier.


Nobody asked me any of this. I use this text format to try to better explain the decisions I had to make during the implementation of the TSID Creator. The questions I've included here are ones I think I would ask myself if I saw this project for the first time. Hope this is helpful.

Clone this wiki locally