Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rebaser): change quiescent shutdown to reduce missed activity #4707

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

fnichol
Copy link
Contributor

@fnichol fnichol commented Sep 26, 2024

This change alters the logic that helps a change set "process" task to shut down when no Rebaser requests have been seen over our quiescent_period. Prior to this change there was a shutdown window period where the ChangeSetProcessorTask would not be looking for new Rebaser requests to process while waiting for the SerialDvuTask to end. As a hedge against this scenario the process task handler checks the change set subject just before ending to ensure that if there's at least one request message that we don't ack/delete the task.

In this altered version of a quiescent shutdown we notice the quiet period as before in the Rebaser requests subscription stream. However, now a quiesced_notify tokio::sync::Notify is fired to signal the SerialDvuTask. Then the ChangeSetProcessorTask continues to process any further requests that may show up (remember that after running a "dvu" job, another Rebaser request is often submitted). Meanwhile in the SerialDvuTask, it will continue to run "dvu" jobs as long as the run_dvu_notify has been set (in effect "draining" any pending runs), and only then will check to see if the quiesced_notify has been set. If it has, then it will cancel the quiesced_token which cause SerialDvuTask to return with an Ok(Shutdown::Quiesced) and that same CancellationToken will cause the Naxum app in ChangeSetProcessorTask to be gracefully shut down.

With these changes, the one or two remaining "dvu" jobs will not cause the process task to stop processing further Rebaser requests. For example, let's assuming that the last 2 "dvu" jobs take 8 minutes each. That means that the process task is in a quiescent shutdown for up to the next 8 * 2 = 16 minutes, during which time any further Rebaser requests will also be processed (whereas they may not have been prior to this change).

This change alters the logic that helps a change set "process" task to
shut down when no Rebaser requests have been seen over our
`quiescent_period`. Prior to this change there was a shutdown window
period where the `ChangeSetProcessorTask` would not be looking for new
Rebaser requests to process while waiting for the `SerialDvuTask` to
end. As a hedge against this scenario the process task handler checks
the change set subject just before ending to ensure that if there's at
least one request message that we don't ack/delete the task.

In this altered version of a quiescent shutdown we notice the quiet
period as before in the Rebaser requests subscription stream. However,
now a `quiesced_notify` `tokio::sync::Notify` is fired to signal the
`SerialDvuTask`. Then the `ChangeSetProcessorTask` continues to process
any further requests that may show up (remember that after running a
"dvu" job, another Rebaser request is often submitted). Meanwhile in
the `SerialDvuTask`, it will continue to run "dvu" jobs as long as the
`run_dvu_notify` has been set (in effect "draining" any pending runs),
and only then will check to see if the `quiesced_notify` has been set.
If it has, then it will cancel the `quiesced_token` which cause
`SerialDvuTask` to return with an `Ok(Shutdown::Quiesced)` and that same
`CancellationToken` will cause the Naxum app in `ChangeSetProcessorTask`
to be gracefully shut down.

With these changes, the one or two remaining "dvu" jobs will not cause
the process task to stop processing further Rebaser requests. For
example, let's assuming that the last 2 "dvu" jobs take 8 minutes each.
That means that the process task is in a quiescent shutdown for up to
the next 8 * 2 = 16 minutes, during which time any further Rebaser
requests will also be processed (whereas they may not have been prior to
this change).

Signed-off-by: Fletcher Nichol <[email protected]>
@johnrwatson
Copy link
Contributor

johnrwatson commented Nov 22, 2024

/try [using pr as a dummy for CI testing changes]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants