-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconsider trimming based on persistent lsn #2808
Comments
We could also print a warning on startup - or even periodically - if there are multiple nodes in the config but no snapshot repository is configured. Personally, I would prefer that we pause trimming if we detect that situation. We still have a potential window of vulnerability while nodes are still joining, but we could make sure we just don't trim at all for an initial window of LSNs to cover for that. Persisted LSN based trimming is only ever a good long-term strategy for single nodes. We can make an exception for "throwaway" clusters so that you don't need to setup snapshots for throwaway multi-node tests but for these it should be okay, by definition, to stop working when the disks fill up. |
Yeah, maybe we can start by disabling trimming once we detect that we are running in a multi-node setup and check that the error we report on failing PPs helps users figure out what to do (configuring a snapshot repository). A potential next step could be the in-band mechanism to exchange snapshots. |
Opened #2814 - disables trimming by persisted LSN in clusters - and restatedev/documentation#556 for this. |
Currently, Restate supports trimming when uploading a snapshot to an object store (archived lsn based trimming). If snapshotting is not enabled, then Restate also supports trimming by requiring that all nodes that run a PP for a given partition have persisted the log up to given point before trimming the log at this point (persistent lsn based trimming). This strategy is potentially very dangerous because after the first trim operation we can't run partition processors on a node that wasn't running the PP before trimming (e.g. when adding new nodes or moving a PP from one node to another). The problem is that the log is no longer complete and there is no way for the newly started PP to fetch the latest partition store state snapshot.
The persistent lsn based trimming strategy is primarily intended for single node Restate deployments where the placements of PPs won't change. However, it is not disabled in a multi node setup (which is also hard to do because we can change a single-node deployment into a multi-node one).
One way to mitigate the problem is to have an in-band mechanism to exchange partition store state snapshots. However, this requires that at least one of the nodes that has the latest partition store snapshot is still available. Alternatively, we can drop support for the persistent lsn trimming strategy and require users to configure a snapshot directory if they want to have support for log trimming.
Until the problem is fixed, we should update the documentation to make people aware of the limitations when using persistent lsn based log trimming.
The text was updated successfully, but these errors were encountered: