-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VReplication: Estimate lag when workflow fully throttled #16577
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16577 +/- ##
=======================================
Coverage ? 68.79%
=======================================
Files ? 1557
Lines ? 199895
Branches ? 0
=======================================
Hits ? 137515
Misses ? 62380
Partials ? 0 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this PR is still in Draft, it looks good to me. Thank you!
Signed-off-by: Matt Lord <[email protected]>
This did not have a practical impact but was obviously wrong Signed-off-by: Matt Lord <[email protected]>
b1e1e85
to
ee17b78
Compare
812297c
to
5ccf0d9
Compare
Revert "Fix vitessio#16565" Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
5ccf0d9
to
ee25994
Compare
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
throttledHeartbeatsRateLimiter.Do(func() error { | ||
return injectHeartbeat(true, checkResult.Summary()) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that we were getting double the heartbeats as we injected them here and via the timer. I consolidated them into the main loop with the timer. Otherwise we needed to do twice the work as the other heartbeats would get interspersed with these and those would indicate that we weren't throttled when in fact we were — so we needed to check the throttler there as well and reset the heartbeat timer in injectHeartbeat()
to limit the extra heartbeats and even then some extras still went through as it was a race.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand some of the changes in the last force-push, asking for clarifications inline.
}) | ||
// we won't process events, until we're no longer throttling | ||
logger.Infof("throttled.") | ||
logger.Infof("vstreamer throttled%s: %s.", wfNameLog, checkResult.Summary()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this spam the logs when throttled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a throttled logger that logs the message at most once every 5 minutes. It was added in #14936. I thought about removing these altogether as the value was not clear to me, but instead I at least made this one much more useful IMO as it would state the component and the workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK cool!
@@ -394,7 +391,8 @@ func (vs *vstreamer) parseEvents(ctx context.Context, events <-chan mysql.Binlog | |||
case <-ctx.Done(): | |||
return nil | |||
case <-hbTimer.C: | |||
if err := injectHeartbeat(false, ""); err != nil { | |||
checkResult, ok := vs.vse.throttlerClient.ThrottleCheckOKOrWaitAppName(ctx, vs.throttlerApp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain what this throttler check is for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was added as part of consolidating the two heartbeat injectors: #16577 (comment)
We had two goroutines doing heartbeat injections. One that was sent only when we were throttled (removed here), and this one that would be sent both when we were throttled AND when we were not. This one would always say that we weren't throttled — even when in fact we were — so it prevented us from being able to estimate the lag in the vplayer when the vstreamer was fully throttled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case I encourage you to use vs.vse.throttlerClient.ThrottleCheckOK
as opposed to vs.vse.throttlerClient.ThrottleCheckOKOrWaitAppName
. You're not interested in the extra sleep.
This makes sense now, thank you for clarifying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Thank you! 🙇 That makes total sense — no need for the sleep here as it's not done in a tight loop. I should have at least checked the function comments. 🙂 I've made this change.
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch and fix.
Description
We did not properly calculate the vreplication lag for a workflow when it was fully throttled — on the source/vstreamer side or the target/vplayer side.
This PR ensures that we account for these scenarios and estimate the vreplication lag for the workflow when they occur.
Related Issue(s)
Checklist