Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Modify Column Type #3701

Closed
wants to merge 5 commits into from
Closed

Conversation

KKould
Copy link
Collaborator

@KKould KKould commented Apr 14, 2024

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

#3517
GreptimeTeam/greptime-proto#146

What's changed and what's your intention?

Support: ALTER TABLE <table_name> ALTER COLUMN <column_name> TYPE <data_type>;
ref: https://www.postgresql.org/docs/current/sql-altertable.html

Tips 1: sql parser does not support Modify Column temporarily, I have created a PR: apache/datafusion-sqlparser-rs#1216

Tips 2: I temporarily use my own fork of greptime-proto, and then replace it after merging it at GreptimeTeam/greptime-proto#146

public=> CREATE TABLE test(i INTEGER, j TIMESTAMP TIME INDEX);
OK 0
public=> ALTER TABLE test ALTER COLUMN I TYPE STRING;
OK 0
public=> DESCRIBE t2;
+--------+----------------------+-----+------+---------+---------------+
| Column | Type                 | Key | Null | Default | Semantic Type |
+--------+----------------------+-----+------+---------+---------------+
| i      | String               |     | YES  |         | FIELD         |
| j      | TimestampMillisecond | PRI | NO   |         | TIMESTAMP     |
+--------+----------------------+-----+------+---------+---------------+

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR does not require documentation updates.

@github-actions github-actions bot added the docs-not-required This change does not impact docs. label Apr 14, 2024
@WenyXu
Copy link
Member

WenyXu commented Apr 14, 2024

Nice job🚀

Tips: sql parser does not support Modify Column temporarily, I have created a PR: apache/datafusion-sqlparser-rs#1216

BTW, Should we support both dialects(i.e., ALTER TABLE <table_name> ALTER COLUMN <column_name> TYPE <data_type>; and ALTER TABLE <table_name> MODIFY <column_name> INT )?

@KKould
Copy link
Collaborator Author

KKould commented Apr 14, 2024

#3517 mentioned that it supports MySQL dialects as much as possible. I think it is good to only support MySQL.

The current PostgreSQL syntax is only for testing

@WenyXu WenyXu self-requested a review April 14, 2024 09:16
@WenyXu
Copy link
Member

WenyXu commented Apr 14, 2024

Run make fmt for formatting the code.

Copy link

codecov bot commented Apr 14, 2024

Codecov Report

Attention: Patch coverage is 91.59120% with 65 lines in your changes are missing coverage. Please review.

Project coverage is 85.25%. Comparing base (02f806f) to head (cb0884a).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3701      +/-   ##
==========================================
- Coverage   85.57%   85.25%   -0.33%     
==========================================
  Files         957      962       +5     
  Lines      159226   160907    +1681     
==========================================
+ Hits       136258   137181     +923     
- Misses      22968    23726     +758     

src/store-api/src/metadata.rs Outdated Show resolved Hide resolved
src/store-api/src/metadata.rs Outdated Show resolved Hide resolved
src/store-api/src/metadata.rs Outdated Show resolved Hide resolved
@@ -458,6 +459,118 @@ impl TableMeta {
Ok(meta_builder)
}

fn modify_columns(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking: Add more tests to cover branches of the modify_columns(Maybe in another PR)

Copy link
Member

@WenyXu WenyXu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking: Let's add the fuzz tests for column type modification(Maybe in another PR).

tests/cases/standalone/common/alter/modify_col.result Outdated Show resolved Hide resolved
@evenyag
Copy link
Contributor

evenyag commented Apr 15, 2024

Since altering column types is a big feature, we might need further discussions and implement it carefully. Could you please describe how you implemented this feature in the issue?

We also need to consider the following things:

  • Conversion rule
    • Do we allow altering a string column to an integer column?
    • Do we allow converting int16 to int8 or uint16?
  • Should we allow altering time index or tag column types? How to support that?
  • How it works?
    • Does it rewrite all SST files?
  • Query performance impact
  • Should other region engines such as metric and file support this?

What's more, I'd suggest implementing this step by step as it also depends on other crates. For example:

  • Support altering column type in the region engine first
  • Support this in the region server, which also modifies the greptime-proto crate
  • Support parsing SQL texts to alter the column type

@KKould
Copy link
Collaborator Author

KKould commented Apr 15, 2024

  • Conversion rule

    • Do we allow altering a string column to an integer column?
    • Do we allow converting int16 to int8 or uint16?
  • Should we allow altering time index or tag column types? How to support that?

  • How it works?

    • Does it rewrite all SST files?
  • Query performance impact

  • Should other region engines such as metric and file support this?

  • Conversion rule

    • Q 1: Do we allow altering a string column to an integer column?
    • A 2: String can theoretically be converted to any type, so it should be possible
    • Q 2: Do we allow converting int16 to int8 or uint16?
    • A 2: I think it should be allowed, specific reference: ConcreteDataType::can_arrow_type_cast_to & VectorOp::cast
  • Q 3: Should we allow altering time index or tag column types? How to support that?

  • A 3: src/table/src/metadata.rs L479 column verification will be performed here, and modification of timesindex and tag is not allowed.

  • How it works?

    • Q 4: Does it rewrite all SST files?
    • A 4: It is definitely not a good idea to completely modify the data, especially for time series databases. I implemented it with reference to add/drop column, and checked whether expect and actual are consistent in compat.rs. When the names are consistent but the types are inconsistent, type conversion will be performed based on IndexOrDefault::Index. Therefore, ModifyColumn does not modify the data, but temporarily performs type conversion when reading, and changes the type to the modified type when the next compaction is triggered to avoid type conversion (this can also be done when modify column is performed multiple times in a short period of time, No additional overhead will be incurred)
/// Index in source batch or a default value to fill a column.
#[derive(Debug)]
enum IndexOrDefault {
    /// Index of the column in source batch.
    Index {
        pos: usize,
        cast_type: Option<ConcreteDataType>,
    },
    /// Default value for the column.
    DefaultValue {
        /// Id of the column.
        column_id: ColumnId,
        /// Default value. The vector has only 1 element.
        default_vector: VectorRef,
    },
}
  • Q 5: Query performance impact?
  • A 5: as I said before, the data needs to be type converted before the next compaction, so there is a column type conversion performance problem when reading.
  • Q 6: Should other region engines such as metric and file support this?
  • A 6: I don't know how to use other engines. hope you can make more suggestions in this regard

@evenyag

@evenyag
Copy link
Contributor

evenyag commented Apr 16, 2024

Q 1: Do we allow altering a string column to an integer column?
A 2: String can theoretically be converted to any type, so it should be possible
Q 2: Do we allow converting int16 to int8 or uint16?
A 2: I think it should be allowed, specific reference: ConcreteDataType::can_arrow_type_cast_to & VectorOp::cast

What about error handling?

  • We can't cast a string like 'hello' to an integer
  • Converting int16 to int8 might overflows
  • arrow::cast may returns an error

Maybe we can only support a subset of conversion rules that arrow implements.

src/table/src/metadata.rs L479 column verification will be performed here, and modification of timesindex and tag is not allowed.

Modifying the data type of the time index might be helpful. e.g. Alter a timestamp(second) to timestamp(millisecond). But we can support it later.

For other engines (file, metric), you could refer to their implementation of the RegionEngine trait. Maybe we should disallow modifying the column type for those engines.

I'm not going to implement ALTER COLUMN in this single PR. We can convert #3517 into a tracking issue. You can summarize your design in that issue and list your steps to implement it. Then we can move the discussion to the issue.

It'd be better if we only modify a single crate in each step. e.g.

  • Add a new variant for modifying a column to the AlterRegionRequest in the store-api
  • Support that variant in mito2
  • Deal with other region engines
  • Add the variant to the gRPC request
  • Support it in the region server
  • Parser
  • Support ALTER COLUMN statement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants