[breaking] BQ SyncRecords now streams properly, code cleanup #909

heavycrystal · 2023-12-27T11:23:51Z

⚠️ This change can break existing CDC mirrors from Postgres to BigQuery!

Fixes a bug in BigQuery CDC where a batch with number of records greater than 2 ** 20 causes the Avro file generation part to hang. This is because all records were being written to a bounded channel first and then consumed by the Avro writer instead of the 2 operations happening in parallel. With a large number of records, the channel would fill up and block before the records finished writing, leading to the loop deadlocking itself.

Fixed by switching BigQuery record generation to the mechanism used by Snowflake, where the record generation happens in another goroutine and therefore the channel consumption happens in parallel. As part of this change, some code was cleaned up and the BigQuery raw table schema was changed in a breaking manner to be similar to the SF/PG equivalent. Specifically, the column _peerdb_timestamp of type TIMESTAMP was removed and the column _peerdb_timestamp_nanos of type INTEGER was renamed to the former. Existing raw tables will need to be fixed up to match this new, simpler schema.

ALTER TABLE <...> DROP COLUMN _peerdb_timestamp;
ALTER TABLE <...> RENAME COLUMN _peerdb_timestamp_nanos TO _peerdb_timestamp;

Closes #908

Amogh-Bharadwaj

lgtm

[breaking] BQ SyncRecords now streams properly, code cleanup

60361a8

heavycrystal requested review from serprex, iskakaushik and Amogh-Bharadwaj December 27, 2023 11:23

Amogh-Bharadwaj approved these changes Dec 27, 2023

View reviewed changes

iskakaushik merged commit a3b2800 into main Dec 27, 2023

serprex deleted the bq-avro-streaming-fixes branch July 19, 2024 15:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[breaking] BQ SyncRecords now streams properly, code cleanup #909

[breaking] BQ SyncRecords now streams properly, code cleanup #909

heavycrystal commented Dec 27, 2023 •

edited

Loading

Amogh-Bharadwaj left a comment

[breaking] BQ SyncRecords now streams properly, code cleanup #909

[breaking] BQ SyncRecords now streams properly, code cleanup #909

Conversation

heavycrystal commented Dec 27, 2023 • edited Loading

⚠️ This change can break existing CDC mirrors from Postgres to BigQuery!

Amogh-Bharadwaj left a comment

Choose a reason for hiding this comment

heavycrystal commented Dec 27, 2023 •

edited

Loading