-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalize concurrently with sync flows #893
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
serprex
force-pushed
the
normalize-split
branch
from
December 24, 2023 03:40
abf99dd
to
e785a96
Compare
serprex
force-pushed
the
normalize-split
branch
5 times, most recently
from
January 3, 2024 00:34
f05a857
to
e1976c4
Compare
This was referenced Jan 3, 2024
Closed
serprex
force-pushed
the
normalize-split
branch
4 times, most recently
from
January 12, 2024 22:50
1c351fa
to
dd40b2a
Compare
serprex
force-pushed
the
normalize-split
branch
4 times, most recently
from
January 13, 2024 21:45
06613bd
to
47ca6ac
Compare
serprex
added a commit
that referenced
this pull request
Jan 15, 2024
A sync batch should not be considered complete until its schema changes are processed, this avoids failures after commit causing schema changes to be dropped, & when decoupling normalize/sync in #893 was causing normalization to be missing values
serprex
force-pushed
the
normalize-split
branch
from
January 15, 2024 16:04
2a3405b
to
85c9ee3
Compare
serprex
force-pushed
the
normalize-split
branch
from
January 16, 2024 23:19
85c9ee3
to
758e10e
Compare
serprex
force-pushed
the
normalize-split
branch
10 times, most recently
from
January 19, 2024 22:33
179b4ca
to
f2a5b7f
Compare
serprex
force-pushed
the
normalize-split
branch
from
January 19, 2024 23:06
6eb8b93
to
fdf6dce
Compare
serprex
force-pushed
the
normalize-split
branch
from
January 20, 2024 00:10
e390015
to
746f4dc
Compare
iskakaushik
reviewed
Jan 24, 2024
iskakaushik
reviewed
Jan 24, 2024
serprex
force-pushed
the
normalize-split
branch
from
January 24, 2024 17:07
d72dce1
to
5075598
Compare
serprex
force-pushed
the
normalize-split
branch
from
January 24, 2024 17:21
7033c90
to
bcea38b
Compare
iskakaushik
approved these changes
Jan 25, 2024
serprex
force-pushed
the
normalize-split
branch
from
January 25, 2024 16:28
b23c6db
to
b70abbb
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously after each sync we'd pause reading slot to process table schema deltas & normalize
This has two problems:
Now NormalizeFlow is created as a child workflow at start of cdc flow & a signal is sent after each sync flow with schema updates. Normalize consumes all signals since it last checked, merging their processing in parallel with sync flows
NormalizeFlow only reads up to the signal's batch id to avoid potentially syncing a batch without its schema. This creates a range
(normid..syncid]
in which normid is always catching up to syncid as we normalizenormid+1
tosyncid
. Normalize logic already handled this, so it goes untouched in this changePEERDB_ENABLE_PARALLEL_SYNC_NORMALIZE
needs to be set to true, for now keep this change behind feature flag to avoid potentially increasing data warehouse costs