Normalize split #1088

serprex · 2024-01-16T23:32:57Z

wanting to do testing as if in customer environment

#1020) Reverts the change made in #997 This reverts commit 081eb40.

#1020 reverted #997 Reimplement so only func declaration moves into HeartbeatRoutine

Adds tests for enum for postgres, bigquery and snowflake. We wish for enums to be texts on the destination

json.Marshall does not support NaN , so we have to nullfy them, but: NULLs are aren't allowed when inserting arrays in bigquery. So in the Merge statement we skip nulls Fixes #1029

Tests were flaking because enum tests were trying to create the mood enum in parallel. Unfortunately, `CREATE TYPE IF NOT EXISTS` is not a thing, and doing `DROP TYPE IF EXISTS` before every enum creation can be flaky. Solution is to just ignore the specific unique typename constraint violation

Also move PEERDB_CDC_IDLE_TIMEOUT_SECONDS to being specified by test

Remove force-dynamic in the UI mirrors status file

to allow better testing in the future, also soft-delete only happens when SoftDelete is set to true and a SoftDeleteColName is set. Added merge statement tests for PG, SF and BQ. Removed some whitespace in Postgres merge statements.

Cursor is mutable so type system already knows it has exclusive access

Catalog's methods already handle synchronization, besides when running migrations, which already uses exclusive connections Also query source/destination peer in parallel

Realised we can just create a qrep execution object for many of the activities we're calling in xmin_flow.go Fixes a typo in qrep_flow Adds temporal query handlers in xmin_flow.go (same ones as in qrep_flow - these have been refactored to a common function). Without these, XMIN mirror statuses cannot be viewed in UI

Refactors our initial load status fetch to now perform a join instead of iterating through flow names and running `len(clones)` number of queries. Uses `{cache : 'no-store'}` as an alternative to forcing dynamic rendering. This is the more idiomatic way of using NextJS and force-dynamic is just for cases where easier migration from the earlier getServerSideProps of page router is the need

![Screenshot 2024-01-08 at 7 13 34 PM](https://github.com/PeerDB-io/peerdb/assets/65964360/50748a25-6042-4e89-b385-6152232b81af) ![Screenshot 2024-01-08 at 7 13 51 PM](https://github.com/PeerDB-io/peerdb/assets/65964360/13ddc994-25d4-4e8d-b7b1-0f4168a96ad9)

SchemaRef was used everywhere, pgwire doesn't name the type, so when the api wanted an `Arc<Vec<FieldInfo>>` while we had `Arc<Schema{ fields: Vec<FieldInfo> }>` it required constructing `Arc::new(schema.fields.clone())`, an unnecessary clone

Maps HStore to JSON in BigQuery. Achieves this by transforming `'"a"=>"b"'` to `{"a":"b"}` via string functions, in both qrep/initial load and cdc Test added Relies on the fact hstore key and value must be quoted strings (although the key can be empty and this case is supported) And follows the syntax in the example above ```sql postgres=# select 'a'::hstore; ERROR: Unexpected end of string LINE 1: select 'a'::hstore; ^ postgres=# select 'a=>b'::hstore; -[ RECORD 1 ]---- hstore | "a"=>"b" postgres=# select 'a=>'::hstore; ERROR: Unexpected end of string LINE 1: select 'a=>'::hstore; postgres=# select 'a=>3434'::hstore; -[ RECORD 1 ]------- hstore | "a"=>"3434" ```

go 1.19 introduced atomic types similar to uber

Must specify http or else the callback upon login fails

Add tests for QRep transform functions in Snowflake and BigQuery

If a table does not have primary key nor is it a replica identity, it is now not selectable in Create CDC UI. <img width="1143" alt="Screenshot 2024-01-11 at 3 11 17 AM" src="https://github.com/PeerDB-io/peerdb/assets/65964360/c589802e-437f-4f62-b351-e1def921b4f8">

For tables with a lot of columns in general and especially many toast columns, some batches can trigger a MERGE statement so complex that BigQuery is unable to process them, with errors like `The query is too large. The maximum standard SQL query length is 1024.00K characters, including comments and white space characters.` `Error 400: Resources exceeded during query execution: The query is too complex., resourcesExceeded` For now, the fix is splitting these complex MERGE statements into smaller ones that act on different subsets of a raw table [partitioning on the basis of `_peerdb_unchanged_toast_columns`]. This can lead to tables need 10+ MERGE statements in a single batch, but this is a compromise with our current design. Instead of sending MERGEs for all tables at once, we do it per table now and update metadata at the end, to avoid exceeding SQL query length limits. --------- Co-authored-by: Kaushik Iska <[email protected]>

## Eventhubs: Enable Partitioning Via Column Value This PR aims to achieve a feature where PG to Eventhub CDC now supports partition-level routing. Suppose there is a column in the source table T1 called `id`. We want T1 to be mapped to one eventhub E1, and the rows with the same `id` value V1 in T1 to be mapped to the same partition P1 of E1. And similarly, V2 or T1 to P2 of E1 and so on. **Note**: The limit of 32 on the number of partitions is being accounted for by Eventhub's internal assignment. V100 to P100 is not happening ### Mirror definition - `ScopedEventhub`'s definition is now : ```golang // Scoped eventhub is of the form peer_name.eventhub_name.partition_column.partition_key_value // partition_column is the column in the table that is used to determine // the partition key for the eventhub. Partition value is one such value of that column. type ScopedEventhub struct { PeerName string Eventhub string PartitionKeyColumn string PartitionKeyValue string } ``` Therefore now the semantics of the table mapping in the mirror command is: ```sql WITH TABLE MAPPING ( -- src_table:<eventhub (not group) peer>.<eventhub name>.<column name> public.oss1:ehpeer1.oss1_destination_eventhub.id ) ``` ### Routing When creating a batch, we specify the partition key of the batch: ```golang opts := &azeventhubs.EventDataBatchOptions{ PartitionKey: &destination.PartitionKeyValue, } batch, err := hub.NewEventDataBatch(ctx, opts) ``` According to azeventhubs docs: ``` PartitionKey is hashed to calculate the partition assignment. Messages and message batches with the same PartitionKey are guaranteed to end up in the same partition. ```

Maps HStore to a JSON-compatible Variant in Snowflake - for CDC and QRep

We weren't returning the current sync batch ID as part of eventhub's SyncRecords. As a result, we weren't able to get CDC logs for Eventhub. This PR fixes that

Removes metadatadb requirement for Eventhub Group peer as it can use catalog on its own Fixes a few logs

Co-authored-by: Kevin Biju <[email protected]>

Refreshing in the slot page was not updating the slot table data since we aren't disabling caching there like we are in the mirrors edit page. Same goes for mirror errors page. Also just adds `cache:no-store` in a few other fetches for safety

partitioning based on unchanged toast columns wasn't being done right

- Remove push parallelism - Move heartbeat routine inside processBatch - use atomic int for num records --------- Co-authored-by: Philip Dubé <[email protected]>

fix lints

…ery (#1056) https://pkg.go.dev/github.com/jackc/pgx/v5#QueryExecMode

#1062 renamed parameter #1066 used

In UI we get data from `information_schema.table_constraints` for primary key based filtering. This can be unreliable as according to [postgres docs](https://www.postgresql.org/docs/current/infoschema-table-constraints.html), it won't work for read-only users. This PR modifies two of our queries we use for the table picker to not use this view

`NormalizeFlowCountQuery` is stunting decoupled sync/normalize workflows So replace it with `WaitFor` Besides, I just don't like this `ExitAfterRecords` way of doing things e2e tests are integration tests: implementation should be treated as a black box as much as possible Temporal has a bunch of capabilities to mock activities so that we can create unit tests for the more intrusive tests that'd be necessary to raise branch coverage etc `WaitFor` presents the ideal mechanism for testing convergent processes: update source, wait for destination to reflect change In order to make this change work, however, I needed to use `env.CancelWorkflow` after completing tests since I now want the workflow running indefinitely It turns out our code doesn't adequately handle cancellation, so implemented that

Holding back from pgwire 0.19 since tests were hanging with it

Closing allows for multiple receivers & never blocks sender It's the correct choice when one wants to signal an irreversible state change

Split from #893

Changes necessary because of 1. sunng87/pgwire#144 2. sunng87/pgwire#147 Tests were failing due to hanging in 0.19.0, 0.19.1 fixed hang: sunng87/pgwire#148

A sync batch should not be considered complete until its schema changes are processed, this avoids failures after commit causing schema changes to be dropped, & when decoupling normalize/sync in #893 was causing normalization to be missing values

Seems like `peer_flow_bq_test.go` was missing the sync import

We now hash the partition key column value obtained from the destination table name in create mirror for PG->EH. This is to reduce the number of Eventhub batches we create. Noticed that not doing so makes the mirror extremely slow Also fixes some code of the UI Graph component

serprex and others added 30 commits January 6, 2024 09:32

.editorconfig (#1003)

f7670f8

Update customer-docker.yml (#1002)

ea6cd58

flow.Dockerfile: go 1.21.x & don't install dev packages on base (#1010)

504a038

Store CurrentFlowStatus as a value rather than reference (#1012)

e3c8304

fix update statements to split unchangedToastCols properly (#1016)

19178f2

Revert "HeartbeatRoutine: return context.CancelFunc instead of chan s… (

8aaa4c7

#1020) Reverts the change made in #997 This reverts commit 081eb40.

HeartbeatRoutine: return cancel func instead of chan struct{} (#1021)

ef2ed57

#1020 reverted #997 Reimplement so only func declaration moves into HeartbeatRoutine

e2e/test_utils: don't use query state after error (#1018)

311a65d

Add tests for enum (#1025)

8fd1165

Adds tests for enum for postgres, bigquery and snowflake. We wish for enums to be texts on the destination

BQ: Handle NaN in Merge (#1031)

76e92f4

json.Marshall does not support NaN , so we have to nullfy them, but: NULLs are aren't allowed when inserting arrays in bigquery. So in the Merge statement we skip nulls Fixes #1029

Remove error from GenerateFlowConnectionConfigs return type (#1035)

2306019

Also move PEERDB_CDC_IDLE_TIMEOUT_SECONDS to being specified by test

UI: Remove force-dynamic (#1034)

8e613b5

Remove force-dynamic in the UI mirrors status file

SendableStream: remove mutex, require sync (#1039)

9212035

Cursor is mutable so type system already knows it has exclusive access

nexus: don't wrap Catalog in Mutex (#1040)

9124df2

Catalog's methods already handle synchronization, besides when running migrations, which already uses exclusive connections Also query source/destination peer in parallel

improve error logging (#1044)

36c207b

Replace go.uber.org/atomic with sync/atomic (#1045)

fb41e7d

go 1.19 introduced atomic types similar to uber

Better error handling for failures during slot creating (#1049)

7d7bb66

Fix nextauth url in docker compose files (#1048)

544e0f1

Must specify http or else the callback upon login fails

Add tests for transform functions (#1046)

425ed2b

Add tests for QRep transform functions in Snowflake and BigQuery

make logs a little better (#1051)

aa9e5fb

Amogh-Bharadwaj and others added 28 commits January 11, 2024 09:27

HStore For Snowflake (#1047)

3beed6a

Maps HStore to a JSON-compatible Variant in Snowflake - for CDC and QRep

Eventhub: Fix rows synced log (#1053)

e045b4c

We weren't returning the current sync batch ID as part of eventhub's SyncRecords. As a result, we weren't able to get CDC logs for Eventhub. This PR fixes that

Eventhub: fix logs and metadatadb requirement (#1055)

19da0fb

Removes metadatadb requirement for Eventhub Group peer as it can use catalog on its own Fixes a few logs

Add the ability to dynamically configure some thresholds (#1057)

7cdfb86

Co-authored-by: Kevin Biju <[email protected]>

UI: Fix caching (#1058)

be09741

Refreshing in the slot page was not updating the slot table data since we aren't disabling caching there like we are in the mirrors edit page. Same goes for mirror errors page. Also just adds `cache:no-store` in a few other fetches for safety

fix some error (#1063)

0c73650

make it 8 (#1064)

53da969

fix a mistake in generating merge statements (#1066)

20940fd

partitioning based on unchanged toast columns wasn't being done right

Eventhub: remove parallelism, move recordheartbeat, use atomic (#1061)

637c9c6

- Remove push parallelism - Move heartbeat routine inside processBatch - use atomic int for num records --------- Co-authored-by: Philip Dubé <[email protected]>

replace reviewdog action with official golangci action (#1062)

ddade04

fix lints

compareTables test: drop loop by not using prepared statements for qu…

f419d28

…ery (#1056) https://pkg.go.dev/github.com/jackc/pgx/v5#QueryExecMode

Fix regression due to merge mixup between fixes & lints (#1067)

52d3436

#1062 renamed parameter #1066 used

Show clone errors also (#1069)

c80d09a

flow name OR (#1070)

279d505

Make heartbeats quicker (#1074)

e1ba45f

add error logging (#1075)

426b5f4

context cancel propagation (#1076)

1875b99

chore: update dependencies (#1059)

e8e44d9

Holding back from pgwire 0.19 since tests were hanging with it

Prefer to close completion channels instead of <-struct{}{} (#1072)

aebe35c

Closing allows for multiple receivers & never blocks sender It's the correct choice when one wants to signal an irreversible state change

Remove InitializeTableSchema (#1078)

826e8fa

Split from #893

Fix tests missing import sync (#1080)

42f84e9

pgwire 0.19 (#1081)

1b97624

Changes necessary because of 1. sunng87/pgwire#144 2. sunng87/pgwire#147 Tests were failing due to hanging in 0.19.0, 0.19.1 fixed hang: sunng87/pgwire#148

Import sync in bq test file (#1082)

1b906e7

Seems like `peer_flow_bq_test.go` was missing the sync import

Normalize concurrently with sync flows

758e10e

serprex requested a review from iskakaushik January 16, 2024 23:32

serprex merged commit 4101934 into customer-docker Jan 16, 2024
7 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize split #1088

Normalize split #1088

serprex commented Jan 16, 2024

Normalize split #1088

Normalize split #1088

Conversation

serprex commented Jan 16, 2024