Kubernetes lease name truncation can lead to non-unique lease names #910

an-tex · 2021-05-26T07:21:58Z

Versions used

Versions:
Akka: 2.6.13
Akka Management: 1.0.10

Expected Behavior

When using a kubernetes lease for cluster sharding (https://doc.akka.io/docs/akka/current/typed/cluster-sharding.html#lease), the kubernetes lease names should be unique.

Actual Behavior

In my case I've enabled a projection using a sharded daemon process, using a kubernetes lease for coordination. The lease names end up being prefixed with application-shard-sharded-daemon-process-[...] (42 characters) and being truncated to 63 characters, this leaves only 22 characters for the actual projection name + tag. When truncated, the tag is the first to disappear, this generates the same lease name for different sharding processes. This in turn leads to an error loop with the following logs:

Lease application-shard-sharded-daemon-process-[...] requested by client [email protected]:25520 is already owned by client. Previous lease was not released due to ungraceful shutdown.

Conflict during heartbeat to lease LeaseResource(Some([email protected]:25520),8609152,1622013422731). Lease assumed to be released

sharded-daemon-process-[...]: Shard id [2] lease lost, stopping shard and killing [1] entities.

The text was updated successfully, but these errors were encountered:

an-tex · 2021-05-26T07:28:14Z

About possible solutions:

When truncating, add a shortish hash suffix based on the full name
Use multiple domain segments, as the 63 character limit only counts for each segment. The whole name can be up to 253 characters. Meaning a lease name like application.shard-sharded-daemon-process.projectionName.tag would mitigate the limits.

manuelp · 2023-04-12T07:45:32Z

This is a serious problem, as:

This is not documented and implicit. We found out about this problem the hard way.
Most of the characters are used by Akka's internals.
Truncation is transparent, there is no way to know beforehand if a truncation will happen.
More importantly, when a lease name is truncated there can be implicit naming collisions that cause the system to essentially stop working.

I understand that this is a limitation imposed by k8s, and Akka has no way around that. But the current solution is IMHO fragile and prone to error, so not a real solution.

@an-tex proposed some workarounds. A fixed-length encoding could be a real solution, at the expense of lease names legibility. But I think that an imperfect working system is better than a "legible" (not really the case with truncated lease names) but broken one.

Are we missing something? There is workaround that we could employ right now? How this is working in production for others?

octonato · 2023-04-12T09:26:40Z

The prefix is a concatenation of different bits of information. As described here:

https://doc.akka.io/docs/akka/current/typed/cluster-sharding.html#lease

To use a lease for sharding set akka.cluster.sharding.use-lease to the configuration location of the lease to use. Each shard will try and acquire a lease with with the name -shard-- and the owner is set to the Cluster(system).selfAddress.hostPort.

Another alternative solution could be to make it configurable. But I think only the <actor system name>-shard can be made configurable. The <type name>-<shard-id> portion will need to be kept around.

What about going for solution 2 as suggested by @an-tex, with failure in case the final name is larger than 253 chars?

We can also apply solution 1 in case it surpass 253 chars, but the problem I see with hashing is that it will make it harder to follow logs.

an-tex · 2023-04-12T11:34:36Z

@manuelp my "workaround" is NOT to use the kubernetes lease implementation for sharding & singleton :/ it's still fine to use with SBR.

I'd also opt for my 2. solution for the same reasons as @octonato mentioned, with either failure or error log & fallback to default (no lease).

manuelp · 2023-04-12T12:17:56Z

What about going for solution 2 as suggested by @an-tex, with failure in case the final name is larger than 253 chars?

Early failure would be preferable to silent truncation. At least we'd have an exception that highlights the underlying problem instead of a failing cluster that can't acquire leases.

We can also apply solution 1 in case it surpass 253 chars, but the problem I see with hashing is that it will make it harder to follow logs.

Certainly it'd be prefereable to have readable names, but in that case a working system with unreadable lease names it's still preferable to a failing system with readable ones IMHO.

my "workaround" is NOT to use the kubernetes lease implementation for sharding & singleton :/ it's still fine to use with SBR.

Yes, previously we weren't using the leasing part, but it's useful (in our context at least).

I'd also opt for my 2. solution for the same reasons as @octonato mentioned, with either failure or error log & fallback to default (no lease).

Readable names and early failures with fallback to no leasing seems to me the best solution so far, if possible.

patriknw · 2023-04-12T12:25:14Z

Fail early sounds good to me. Better to fail hard than error and fallback, because some don't even pay attention to errors.

Changing to multiple domain segments sounds good, but it would have to be an opt-in configuration because otherwise we would break rolling updates of existing users.

* for backwards compatibility it's possible to allow old behavior with config, e.g. to support rolling update

* in case the lease names are too long, see akka/akka-management#910

* for backwards compatibility it's possible to allow old behavior with config, e.g. to support rolling update

patriknw · 2023-09-21T10:56:33Z

Fail early check added in #1194

johanandren mentioned this issue Apr 26, 2023

Naming conflicts in cluster singleton leases akka/akka#31925

Closed

patriknw added a commit that referenced this issue Sep 21, 2023

fix: Fail if too long lease name, #910

4e0b21a

* for backwards compatibility it's possible to allow old behavior with config, e.g. to support rolling update

patriknw added a commit to akka/akka that referenced this issue Sep 21, 2023

feat: Define custom lease name for sharding and singleton

2876f0a

* in case the lease names are too long, see akka/akka-management#910

This was referenced Sep 21, 2023

feat: Define custom lease name for sharding and singleton akka/akka#32122

Merged

fix: Fail if too long lease name, #910 #1194

Merged

patriknw self-assigned this Sep 21, 2023

johanandren pushed a commit that referenced this issue Sep 21, 2023

fix: Fail if too long lease name, #910 (#1194)

97c04b1

* for backwards compatibility it's possible to allow old behavior with config, e.g. to support rolling update

patriknw closed this as completed Sep 21, 2023

patriknw added this to the 1.5.0-M1 milestone Sep 21, 2023

patriknw added the bug label Oct 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes lease name truncation can lead to non-unique lease names #910

Kubernetes lease name truncation can lead to non-unique lease names #910

an-tex commented May 26, 2021 •

edited

Loading

an-tex commented May 26, 2021 •

edited

Loading

manuelp commented Apr 12, 2023 •

edited

Loading

octonato commented Apr 12, 2023 •

edited

Loading

an-tex commented Apr 12, 2023

manuelp commented Apr 12, 2023 •

edited

Loading

patriknw commented Apr 12, 2023

patriknw commented Sep 21, 2023

Kubernetes lease name truncation can lead to non-unique lease names #910

Kubernetes lease name truncation can lead to non-unique lease names #910

Comments

an-tex commented May 26, 2021 • edited Loading

Versions used

Expected Behavior

Actual Behavior

an-tex commented May 26, 2021 • edited Loading

manuelp commented Apr 12, 2023 • edited Loading

octonato commented Apr 12, 2023 • edited Loading

an-tex commented Apr 12, 2023

manuelp commented Apr 12, 2023 • edited Loading

patriknw commented Apr 12, 2023

patriknw commented Sep 21, 2023

an-tex commented May 26, 2021 •

edited

Loading

an-tex commented May 26, 2021 •

edited

Loading

manuelp commented Apr 12, 2023 •

edited

Loading

octonato commented Apr 12, 2023 •

edited

Loading

manuelp commented Apr 12, 2023 •

edited

Loading