-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VReplication: improve reliability of log management #15374
Conversation
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
9ffd484
to
91a733e
Compare
Signed-off-by: Matt Lord <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #15374 +/- ##
==========================================
+ Coverage 65.43% 65.45% +0.02%
==========================================
Files 1561 1562 +1
Lines 193638 193877 +239
==========================================
+ Hits 126704 126902 +198
- Misses 66934 66975 +41 ☔ View full report in Codecov by Sentry. |
64f12ba
to
f3507ff
Compare
Signed-off-by: Matt Lord <[email protected]>
f3507ff
to
186cd26
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Had a deja vu about this: I thought we had added this truncation a long time ago, because we had seen this error before for long queries.
Turns out we had done it for _vt.vreplication
while setting the ubiquitous message
column:
func (ct *controller) setMessage(dbClient binlogplayer.DBClient, message string) error {
ct.blpStats.History.Add(&binlogplayer.StatsHistoryRecord{
Time: time.Now(),
Message: message,
})
query := fmt.Sprintf("update _vt.vreplication set message=%v where id=%v", encodeString(binlogplayer.MessageTruncate(message)), ct.id)
if _, err := dbClient.ExecuteFetch(query, 1); err != nil {
return fmt.Errorf("could not set message: %v: %v", query, err)
}
return nil
}
Maybe we can just update the MessageTruncate
function to use your idea of logging enough of the prefix and suffix and use it in this PR as well.
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
98921b8
to
1f80864
Compare
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work.
Description
In helping someone debug a VReplication failure, we had seen the following:
_vt.vreplication_log
record and the apparent cause was that the message was too long_vt.vreplication
record showed the beginning of avcopier
bulkINSERT
query that failed — truncated at 1000 bytes — but not the reason why / error it generated_vt.vreplication_log
record generation failure may have caused problems in the workflow (e.g. if this happens after the point of no return in aSwitchTraffic
then we'll be left in an inconsistent state that needs manual cleanup)We address all of these issues with the following changes in this PR:
_vt.vreplication_log
table_vt.vreplication_log
record as "fatal" — meaning that the workflow operation does not stop and return the log generation error — and instead only logs an error in thevttablet
logI also moved the existing truncation for the
_vt.vreplication.message
field over to this new truncation method so that we have the error message there as well.Related Issue(s)
I didn't create an issue since this felt more like a minor internal cleanup, but I can create one quickly if others disagree.
Checklist