-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Respect tolerable replication lag even when the new primary has been provided in PRS #15090
Conversation
…en when the new primary is provided Signed-off-by: Manan Gupta <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #15090 +/- ##
===========================================
- Coverage 47.29% 40.88% -6.42%
===========================================
Files 1137 1453 +316
Lines 238684 368141 +129457
===========================================
+ Hits 112895 150516 +37621
- Misses 117168 201438 +84270
- Partials 8621 16187 +7566 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of the current approach, what if you:
- Copied
tabletMap
(line 162), filtering out all tablets except for the one specified inopts.NewPrimaryAlias
, - Removed the conditional check on line 173 for
opts.NewPrimaryAlias
,
I think that would reduce code complexity and reduce logic drift in the future when new behaviors are added to the ChooseNewPrimary
path.
why does CodeCov say this?
|
Signed-off-by: Manan Gupta <[email protected]>
@deepthi @maxenglander I have made the required changes. I have chosen the approach that Deepthi pointed out, that ends up being a cleaner solution. I had to make a couple more changes to preserve the behaviour wherein we don't call I think the code looks good now, could you two please review it again?
I have no idea why codecov says this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Some minor comments.
Signed-off-by: Manan Gupta <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
@@ -114,9 +134,9 @@ func ChooseNewPrimary( | |||
return nil, err | |||
} | |||
|
|||
// return nothing if there are no valid tablets available | |||
// return an error if there are no valid tablets available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
Description
This PR fixes the bug described in #15089.
Now when running PRS, if both the new primary and the tolerable replication lag has been provided, we will check that the new primary tablet actually has a replication lag lower than what we can tolerate, otherwise we'll fail the PRS quickly.
Related Issue(s)
Checklist
Deployment Notes