Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconsider trimming based on persistent lsn #2808

Open
tillrohrmann opened this issue Feb 28, 2025 · 3 comments
Open

Reconsider trimming based on persistent lsn #2808

tillrohrmann opened this issue Feb 28, 2025 · 3 comments
Assignees

Comments

@tillrohrmann
Copy link
Contributor

Currently, Restate supports trimming when uploading a snapshot to an object store (archived lsn based trimming). If snapshotting is not enabled, then Restate also supports trimming by requiring that all nodes that run a PP for a given partition have persisted the log up to given point before trimming the log at this point (persistent lsn based trimming). This strategy is potentially very dangerous because after the first trim operation we can't run partition processors on a node that wasn't running the PP before trimming (e.g. when adding new nodes or moving a PP from one node to another). The problem is that the log is no longer complete and there is no way for the newly started PP to fetch the latest partition store state snapshot.

The persistent lsn based trimming strategy is primarily intended for single node Restate deployments where the placements of PPs won't change. However, it is not disabled in a multi node setup (which is also hard to do because we can change a single-node deployment into a multi-node one).

One way to mitigate the problem is to have an in-band mechanism to exchange partition store state snapshots. However, this requires that at least one of the nodes that has the latest partition store snapshot is still available. Alternatively, we can drop support for the persistent lsn trimming strategy and require users to configure a snapshot directory if they want to have support for log trimming.

Until the problem is fixed, we should update the documentation to make people aware of the limitations when using persistent lsn based log trimming.

@pcholakov
Copy link
Contributor

We could also print a warning on startup - or even periodically - if there are multiple nodes in the config but no snapshot repository is configured. Personally, I would prefer that we pause trimming if we detect that situation. We still have a potential window of vulnerability while nodes are still joining, but we could make sure we just don't trim at all for an initial window of LSNs to cover for that.

Persisted LSN based trimming is only ever a good long-term strategy for single nodes. We can make an exception for "throwaway" clusters so that you don't need to setup snapshots for throwaway multi-node tests but for these it should be okay, by definition, to stop working when the disks fill up.

@tillrohrmann
Copy link
Contributor Author

Yeah, maybe we can start by disabling trimming once we detect that we are running in a multi-node setup and check that the error we report on failing PPs helps users figure out what to do (configuring a snapshot repository). A potential next step could be the in-band mechanism to exchange snapshots.

@pcholakov
Copy link
Contributor

Opened #2814 - disables trimming by persisted LSN in clusters - and restatedev/documentation#556 for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants