-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VReplication: Support binlog_row_value_options=PARTIAL_JSON #17345
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
1859876
to
a44e7e5
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17345 +/- ##
========================================
Coverage 67.64% 67.65%
========================================
Files 1582 1583 +1
Lines 254000 254321 +321
========================================
+ Hits 171826 172068 +242
- Misses 82174 82253 +79 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
a44e7e5
to
c9cf71c
Compare
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
181d05e
to
7b75ed6
Compare
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
49ad080
to
30ce7d4
Compare
Signed-off-by: Matt Lord <[email protected]>
30ce7d4
to
bd3aa67
Compare
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
@@ -111,7 +111,7 @@ jobs: | |||
- name: Upload coverage reports to codecov.io | |||
if: steps.changes.outputs.changed_files == 'true' | |||
uses: codecov/codecov-action@v4 | |||
uses: codecov/codecov-action@015f24e6818733317a2da2edd6290ab26238649a # https://github.com/codecov/codecov-action/releases/tag/v5.0.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This didn't help with the periodic failures, but good to upgrade anyway as the latest GA is 5.1.
if !isBitSet(rowChange.DataColumns.Cols, i) { | ||
return nil, vterrors.Errorf(vtrpcpb.Code_INTERNAL, | ||
"binary log event missing a needed value for %s.%s due to not using binlog-row-image=FULL; you will need to re-run the workflow with binlog-row-image=FULL", | ||
tp.TargetName, field.Name) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes a bug we had with NOBLOB support. Without this, the blob/text field value would be silently lost (detected by vdiff). Since NOBLOB support is experimental (even in v22), I do not think we need to backport it.
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
if field.Type == querypb.Type_JSON { | ||
var jsVal *sqltypes.Value | ||
if vals[n].IsNull() { // An SQL NULL and not an actual JSON value | ||
jsVal = &sqltypes.NULL | ||
} else { // A JSON value (which may be a JSON null literal value) | ||
jsVal, err = vjson.MarshalSQLValue(vals[n].Raw()) | ||
if err != nil { | ||
return nil, err | ||
} | ||
} | ||
bindVar, err = tp.bindFieldVal(field, jsVal) | ||
} else { | ||
bindVar, err = tp.bindFieldVal(field, &vals[n]) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a bug exposed in the VPlayerBatching, which was enabled by default in v22. Prior to v22 it was experimental and thus I don't think we need to backport this fix.
1. Use iota for op type 2. Optimize binary diff parsing when 0 bytes (don't allocate) Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Now we're covering all JSON types: - string - number - object (JSON object) - array - boolean - null Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Asked for some comment clarifications, see inline. Basically I need a braindump of your knowledge to get a better understanding of how this works (the code is mostly too complex to just figure this out from the code).
go/mysql/binlog/binlog_json.go
Outdated
for pos < len(data) { | ||
opType := jsonDiffOp(data[pos]) | ||
pos++ | ||
if outer { | ||
innerStr = diff.String() | ||
diff.Reset() | ||
} | ||
switch opType { | ||
case jsonDiffOpReplace: | ||
diff.WriteString("JSON_REPLACE(") | ||
case jsonDiffOpInsert: | ||
diff.WriteString("JSON_INSERT(") | ||
case jsonDiffOpRemove: | ||
diff.WriteString("JSON_REMOVE(") | ||
default: | ||
// Can be a JSON null. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you generally clarify (by code comments) what's going on here? (And otherwise in the rest of the function). I feel like there's some intimate knowledge here that I'm missing. What exactly needs to be done? What are we solving?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added here: 5deec92
// IsDeleteRows returns true if this is a DELETE_ROWS_EVENT. | ||
IsDeleteRows() bool | ||
|
||
// IsPseudo is for custom implementations of GTID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain the purpose of this Pseudo() function? What are custom implementations of GTID? (Also, I'm not sure I follow the connection to partial JSON. Is this specific for testing?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the answer to these things either. 🙂 I only moved it to more logically group the IsBlank
functions together. I did not add it here.
It looks like dead code as it's only ever set to false:
go/mysql/binlog_event.go: // IsPseudo is for custom implementations of GTID.
go/mysql/binlog_event.go: IsPseudo() bool
go/mysql/binlog_event_common.go:// IsPseudo is always false for a native binlogEvent.
go/mysql/binlog_event_common.go:func (ev binlogEvent) IsPseudo() bool {
go/mysql/binlog_event_filepos.go:func (ev filePosFakeEvent) IsPseudo() bool {
go/vt/binlog/binlog_streamer.go: case ev.IsPseudo():
It was added 7 years ago in #3950
Signed-off-by: Matt Lord <[email protected]>
I'd be curious how battle tested this actually is, and if there are any MySQL bugs that may cause drift. |
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
I have no idea how widely the option is used across the global set of MySQL installations. I don't see any known bugs:
It's worth noting that this feature has existed in MySQL GA releases now for over 6 years. |
Signed-off-by: Matt Lord <[email protected]>
This way we're testing this feature both with binlog-row-image=FULL and with binlog-row-image=NOBLOB. Signed-off-by: Matt Lord <[email protected]>
Description
This PR adds support for
--binlog-row-value-options=PARTIAL_JSON
. You can read more about this here.This can offer big wins in specific cases. For example...
$.price
from 99.99 to 102.99 (inflation) and the original JSON document is 5MiB.binlog_row_value_options=PARTIAL_JSON
the binlog event will have the full 5MiB document in the BEFORE (original) and AFTER (updated) images. So 10MiB now.JSON_REPLACE
in this case, along with the path of$.price
and the value of 102.99. So we've cut the bytes stored on disk, sent over the wire, and processed in memory by ~ 1/2.Now if you imagine that instead of updating a single row, you updated 10,000 rows... let's say to remove the
$.on_sale
flag on the documents that had it set (Black Friday is over), you can see the big savings that comes with it.Important
With this option enabled, JSON columns that are NOT updated when a row is modified are elided from the AFTER images (similar to the
binlog_row_image=NOBLOB
behavior withBLOB
/TEXT
columns). So this also offers a big potential savings as you may be updating 10,000 rows in a table that happens to have a JSON column, but you're only changing some other column value(s).Manual test:
With the final result:
Related Issue(s)
Checklist