[breaking] BQ SyncRecords now streams properly, code cleanup #909
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes a bug in BigQuery CDC where a batch with number of records greater than
2 ** 20
causes the Avro file generation part to hang. This is because all records were being written to a bounded channel first and then consumed by the Avro writer instead of the 2 operations happening in parallel. With a large number of records, the channel would fill up and block before the records finished writing, leading to the loop deadlocking itself.Fixed by switching BigQuery record generation to the mechanism used by Snowflake, where the record generation happens in another goroutine and therefore the channel consumption happens in parallel. As part of this change, some code was cleaned up and the BigQuery raw table schema was changed in a breaking manner to be similar to the SF/PG equivalent. Specifically, the column
_peerdb_timestamp
of typeTIMESTAMP
was removed and the column_peerdb_timestamp_nanos
of typeINTEGER
was renamed to the former. Existing raw tables will need to be fixed up to match this new, simpler schema.Closes #908