You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, when a Liftbridge server shuts down it stops being a leader for its partitions. If many partitions exist that will result in a flurry of Raft events. Would it be possible to trigger a progressive shutdown to prevent this? Have you had some thought about this @tylertreat?
The text was updated successfully, but these errors were encountered:
Yes, this is something I've thought a bit about, especially as it relates to rolling cluster upgrades. I think a graceful shutdown would make sense. There would be a few components to this:
If the server is leader for any partitions, transfer leadership to another replica (invoke a ChangeLeaderOp in Raft) and remove self from ISR (ShrinkISROp). This should be down gradually to avoid a flood of Raft ops. Also interrupt any clients currently subscribed.
If server is follower for any partitions, remove self from ISR (ShrinkISROp). This should be done gradually to avoid a flood of Raft ops. Also interrupt any clients currently subscribed.
At this point, probably reject any client requests, e.g. publish or subscribe.
If the server shutting down is the metadata leader, transfer leadership to another node. Perform a Raft barrier to ensure all preceding Raft ops have been applied.
Remove self from Raft group. Need to think through how this works when rejoining, e.g. in the case of restarting/upgrading a node.
Currently, when a Liftbridge server shuts down it stops being a leader for its partitions. If many partitions exist that will result in a flurry of Raft events. Would it be possible to trigger a progressive shutdown to prevent this? Have you had some thought about this @tylertreat?
The text was updated successfully, but these errors were encountered: