Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cdc): support mapping json of pg to jsonb in rw in pg-cdc #20444

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

KeXiangWang
Copy link
Contributor

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

As titled

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • My PR contains critical fixes that are necessary to be merged into the latest release.

Documentation

  • My PR needs documentation updates.
Release note

@KeXiangWang KeXiangWang force-pushed the wkx/support-json-as-jsonb branch from 8678878 to 1156c89 Compare February 11, 2025 02:15
@KeXiangWang KeXiangWang changed the title feat(cdc): support json in pg as jsonb in rw feat(cdc): support json of pg as jsonb in rw Feb 11, 2025
@xiangjinwu
Copy link
Contributor

xiangjinwu commented Feb 11, 2025

Note that json and jsonb have different semantics in PostgreSQL:

test=# select jsonb_each('{"foo": 1, "bar": 2, "foo": 3}');
 jsonb_each 
------------
 (bar,2)
 (foo,3)
(2 rows)

test=# select json_each('{"foo": 1, "bar": 2, "foo": 3}');
 json_each 
-----------
 (foo,1)
 (bar,2)
 (foo,3)
(3 rows)

Similarly, char(8) is different from varchar. But for PostgreSQL CDC source, it may be acceptable as long as we clarify this in doc.

@KeXiangWang KeXiangWang changed the title feat(cdc): support json of pg as jsonb in rw feat(cdc): support mapping json of pg to jsonb in rw in pg-cdc Feb 11, 2025
}
Ok(JsonbVal::from(Value::from_text(raw)?))
Ok(match *ty {
Type::JSON => JsonbVal::from(Value::from_text(raw)?),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's acknowledge in comments that treating JSON as JSONB can be wrong in special cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. Will also add it in doc.

@@ -207,7 +207,7 @@ impl PostgresExternalTable {
SeaType::Boolean => Ok(PgType::BOOL),
SeaType::Point => Ok(PgType::POINT),
SeaType::Uuid => Ok(PgType::UUID),
SeaType::JsonBinary => Ok(PgType::JSONB),
SeaType::Json | SeaType::JsonBinary => Ok(PgType::JSONB),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this used in RisingWave? I see CHAR_ARRAY and TIMETZ_ARRAY here which do not exist in RisingWave.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of codes are used to load the schema of the table to be connected via source or sink.
timestamptz[] should be supported in RW?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant it should be either:

    // mapping into RisingWave supported data types
    SeaType::Char(_) | SeaType::Varchar(_) => Ok(PgType::VARCHAR),
    SeaType::Json | SeaType::JsonBinary => Ok(PgType::JSONB)

or

    // mapping into PostgreSQL data types, regardless of RisingWave support
    SeaType::Varchar(_) => Ok(PgType::VARCHAR),
    SeaType::Char(_) => Ok(PgType::CHAR),
    SeaType::JsonBinary => Ok(PgType::JSONB),
    SeaType::Json => Ok(PgType::JSON),

But after this PR it is:

    SeaType::Varchar(_) => Ok(PgType::VARCHAR),
    SeaType::Char(_) => Ok(PgType::CHAR),
    SeaType::Json | SeaType::JsonBinary => Ok(PgType::JSONB)

That's why I am asking how is this function used / what is its responsibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants