-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VStreamer: For larger compressed transaction payloads, stream the internal contents #17239
VStreamer: For larger compressed transaction payloads, stream the internal contents #17239
Conversation
For larger payloads (> ZstdInMemoryDecompressorMaxSize) we were already streaming the internal events as we decompressed the payload, but in the vstreamer we were still reading the entire contents into memory before sending them to the consumer (vplayer). With this we stream the internal contents all the way from the binlog consumer to the vstream consumer so that we do not need to hold the entire contents, which can be 10s of GiBs, in memory all at once. Signed-off-by: Matt Lord <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Signed-off-by: Matt Lord <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17239 +/- ##
==========================================
- Coverage 67.49% 67.47% -0.03%
==========================================
Files 1577 1577
Lines 253404 253424 +20
==========================================
- Hits 171044 170999 -45
- Misses 82360 82425 +65 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
…cmp_trx_payload_contents Signed-off-by: Matt Lord <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes lgtm.
Are we testing binlog compression in unit tests or e2e atm?
Signed-off-by: Rohit Nayak <[email protected]>
We have the updated unit test for the lower level part. And it's enabled in these e2e tests:
|
@@ -292,6 +297,8 @@ func (tp *TransactionPayload) decompress() error { | |||
} | |||
compressedTrxPayloadsUsingStream.Add(1) | |||
tp.reader = streamDecoder | |||
// Signal the consumer to also stream the contents. | |||
tp.StreamingContents = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just always stream? (Meaning we would remove the conditional flag?)
I guess there's also a backwards-compatibility issue here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that too. It's more efficient to do it in a batch though so for smaller payloads we do that (one batch) at both layers, binlog and vstreamer.
Signed-off-by: Matt Lord <[email protected]>
844eedf
to
cad3138
Compare
cad3138
to
642219a
Compare
…cmp_trx_payload_contents Signed-off-by: Matt Lord <[email protected]>
Description
For larger compressed transaction payloads (> ZstdInMemoryDecompressorMaxSize) we were already streaming the internal events as we decompressed the payload, but in the vstreamer we were still reading the entire contents into memory before sending them to the consumer (vplayer).
In this PR, we stream the internal contents all the way from the binlog consumer to the vstream consumer so that we do not need to hold the entire contents, which can be 10s or even 100s of GiBs, in memory all at once. As you can see in the test/demonstration below, we allocate and use DRASTICALLY less memory when processing the payloads: in this case approximately 14-18 times less.
Here's a manual test/demonstration on macOS (note that we end up with over 40 million rows in the customer table):
Results on the PR branch:
Results on the main branch:
Related Issue(s)
Checklist