Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

comparing RelationMessage against TableSchema, stop caching RelationMessages #1438

Merged
merged 17 commits into from
Mar 9, 2024

Conversation

heavycrystal
Copy link
Contributor

@heavycrystal heavycrystal commented Mar 5, 2024

Previously we diffed the previous and current RelationMessages received from Postgres during logical replication to ascertain if the schema changed or not. This fails in cases where the schema changed after mirror setup but before we get the first RelationMessage which can happen a) during initial load, before CDC and b) during CDC, if no rows are synced before schema changes are made.

This causes a desync between PeerDB's view of the table and the actual status of the table, causing issues. Changed to diff from the current TableSchema instead, which should always be present. Eliminates having to store and pass RelationMessageMapping around, it is scoped to reading from the replication connection.

Also fixes bugs with schema changes being handled incorrectly for PG and BQ, and schema audits being logged without schema deltas actually being present.

@heavycrystal
Copy link
Contributor Author

Since we now maintain the replication connection across syncs, we don't get a second RelationMessage for the same table unless its schema changes so some amount of caching is needed. Storing at the pgConn level instead of global state.

Also added defensive check to only update TableNameSchemaMapping of NormalizeFlow if not nil, fixes a test.

@@ -99,7 +87,6 @@ message CreateTablesFromExistingOutput {

message SyncFlowOptions {
uint32 batch_size = 1;
map<uint32, RelationMessage> relation_message_mapping = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we only deprecate this field then we won't break pause/unpause upgrade

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will need to retain the other Message types for this, also not sure if it's enough since CDC state also loses fields

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine, mark it all deprecated & leave it unused, then we make an explicit breaking change PR afterwards

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please deprecate and not delete

@serprex
Copy link
Contributor

serprex commented Mar 8, 2024

lgtm, good to see a red PR, would want @iskakaushik to take a look over it too

@iskakaushik iskakaushik enabled auto-merge (squash) March 9, 2024 15:28
@iskakaushik iskakaushik merged commit 9432e24 into main Mar 9, 2024
7 checks passed
@iskakaushik iskakaushik deleted the schema-changes-diff-from-schema branch March 9, 2024 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants