Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(frontend): use purified definition for replacing table #20131

Merged
merged 11 commits into from
Jan 17, 2025

Conversation

BugenZhao
Copy link
Member

@BugenZhao BugenZhao commented Jan 13, 2025

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Adopt purified definition for table replacement like schema change and sink-into-table.

The main change is to introduce a new enum named SqlColumnStrategy, defining the behavior when the schema is defined in the SQL and derived from external system simultaneously. This was always rejected in the past. However, with the purified definition used for table replacement, we need to handle it with different strategies.

  1. For REFERSH, use the newly resolved schema.
  2. For SINK INTO TABLE, ensure schema unchanged by using the defined schema after purification.
  3. For [ADD | DROP] COLUMN, use the altered defined schema.
  4. For CREATE with schema registry, reject defined schema.

Case 2 actually fixes a long-standing issue caused by impurity. Previously, creating a sink into a table will accidentally refresh its schema.

Cases 3 and 4 adhere to the current behavior. However, we can now modify this behavior simply by passing a different SqlColumnStrategy, which will accomplish the objective in #12199.

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • My PR contains critical fixes that are necessary to be merged into the latest release.

Documentation

  • My PR needs documentation updates.
Release note

@BugenZhao BugenZhao force-pushed the bz/purify-sql-part-5 branch from ece77bb to db328ec Compare January 14, 2025 07:11
@BugenZhao BugenZhao changed the title feat(frontend): use purified definition for ALTER TABLE REFRESH SCHEMA feat(frontend): use purified definition for schema change Jan 14, 2025
@BugenZhao BugenZhao force-pushed the bz/purify-sql-part-4 branch from df8fbed to 84c5ecc Compare January 15, 2025 06:05
@BugenZhao BugenZhao force-pushed the bz/purify-sql-part-5 branch from 331e87a to 2247fdb Compare January 15, 2025 06:05
@BugenZhao BugenZhao force-pushed the bz/purify-sql-part-4 branch from 84c5ecc to f95d993 Compare January 15, 2025 06:54
@BugenZhao BugenZhao force-pushed the bz/purify-sql-part-5 branch from 2247fdb to 78062c3 Compare January 15, 2025 06:54
Base automatically changed from bz/purify-sql-part-4 to main January 15, 2025 07:33
@BugenZhao BugenZhao force-pushed the bz/purify-sql-part-5 branch from 78062c3 to b046e1f Compare January 15, 2025 08:07
@BugenZhao BugenZhao changed the base branch from main to bz/purify-sql-part-4-1 January 15, 2025 08:07
@BugenZhao BugenZhao force-pushed the bz/purify-sql-part-5 branch from b046e1f to ff42a06 Compare January 15, 2025 08:19
@BugenZhao BugenZhao changed the title feat(frontend): use purified definition for schema change feat(frontend): use purified definition for replacing table Jan 15, 2025
@BugenZhao BugenZhao force-pushed the bz/purify-sql-part-5 branch from cedcc25 to 503f070 Compare January 15, 2025 09:48
Base automatically changed from bz/purify-sql-part-4-1 to main January 16, 2025 03:51
@BugenZhao BugenZhao force-pushed the bz/purify-sql-part-5 branch from 407b212 to c8a839f Compare January 16, 2025 04:02
Comment on lines -287 to -293
if columns.is_empty() {
Err(ErrorCode::NotSupported(
"alter a table with empty column definitions".to_owned(),
"Please recreate the table with column definitions.".to_owned(),
))?
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously there were two cases where columns is empty:

  1. All columns are resolved from external source instead of defined in SQL.
  2. The table indeed has zero columns.

For case 1, as we always use a purified definition here, altering it is feasible. For case 2, we should have allowed it.

Comment on lines -65 to -69
// NOTE(st1page): since we have not implemented alter format encode for table, it is actually no use.
let definition = alter_definition_format_encode(
&original_table.definition,
format_encode.row_options.clone(),
)?;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no-op. Don't get the point. Removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.. I used to find it in a refator but forget to delete it.

table_name,
definition,
&original_table,
SqlColumnStrategy::Ignore,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve columns from schema registry again and fully replace the current definition.

table_name,
definition,
&original_catalog,
SqlColumnStrategy::Follow,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly follow the (altered) columns defined in SQL and avoid refreshing.

@@ -558,6 +558,7 @@ pub(crate) async fn reparse_table_for_sink(
None,
include_column_options,
engine,
SqlColumnStrategy::Follow,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly follow the columns defined in SQL and avoid refreshing. Creating a new sink into a table should not alter the table's schema.

@@ -542,6 +550,7 @@ pub(crate) async fn gen_create_table_plan_with_source(
let db_name: &str = &session.database();
let (schema_name, _) = Binder::resolve_schema_qualified_name(db_name, table_name.clone())?;

// TODO: omit this step if `sql_column_strategy` is `Follow`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or not, if we want to validate user-defined column against the resolved schema.

Comment on lines +48 to +49
# Before refreshing schema, we create a `SINK INTO TABLE` which involves table replacement,
# showing that this will NOT accidentally refresh the schema.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Showing that

2. For SINK INTO TABLE, ensure schema unchanged by using the defined schema after purification.

Case 2 actually fixes a long-standing issue caused by impurity. Previously, creating a sink into a table will accidentally refresh its schema.

This will panic the compute node in the current release.

Copy link
Member

@fuyufjh fuyufjh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

src/frontend/src/handler/create_source.rs Show resolved Hide resolved
src/frontend/src/handler/create_source.rs Outdated Show resolved Hide resolved
Comment on lines +332 to +334
// It's possible that we don't follow the user defined columns in SQL but take the
// ones resolved from the source, thus missing some columns. Simply ignore them.
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds ok, but I'm wondering what will happen before this PR?

Copy link
Member Author

@BugenZhao BugenZhao Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only occurs when there are columns defined in SQL (as ast::ColumnDef) but not in the final schema (as ColumnCatalog). Therefore, this won't happen before this PR.

After this PR, we may encounter this when calling REFRESH SCHEMA on a purified definition with some columns to be removed. However, as we only care about generated columns here and generated columns will always be preserved, we can safely ignore the missing regular columns.

@BugenZhao BugenZhao force-pushed the bz/purify-sql-part-5 branch from 95ba2cd to 4014b51 Compare January 17, 2025 06:50
@BugenZhao BugenZhao enabled auto-merge January 17, 2025 06:55
@BugenZhao BugenZhao added this pull request to the merge queue Jan 17, 2025
auto-merge was automatically disabled January 17, 2025 08:07

Pull Request is not mergeable

Merged via the queue into main with commit f0db87b Jan 17, 2025
29 checks passed
@BugenZhao BugenZhao deleted the bz/purify-sql-part-5 branch January 17, 2025 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants