Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support altering column's data type #3517

Closed
MichaelScofield opened this issue Mar 14, 2024 · 8 comments
Closed

Support altering column's data type #3517

MichaelScofield opened this issue Mar 14, 2024 · 8 comments

Comments

@MichaelScofield
Copy link
Collaborator

What problem does the new feature solve?

Sometimes user just want to modify the column's data type.

What does the feature do?

Make column's data type changeable via "alter column" stmt. For example, in mysql:

ALTER TABLE t1 MODIFY b INT

(see https://dev.mysql.com/doc/refman/8.0/en/alter-table.html). Or in postgresql:

ALTER TABLE t1 ALTER COLUMN b TYPE varchar(80)

(see https://www.postgresql.org/docs/current/sql-altertable.html).

We might want to go with mysql's "alter column" syntax for the conciseness.

One critical consideration is how to deal with un-convertable data. For example, when you modify the column from string to int, how to deal with the un-convertable data of "a"? In mysql, this results in an error:

mysql> select * from foo;
+------+
| s    |
+------+
| a    |
| 1    |
+------+
2 rows in set (0.01 sec)

mysql> alter table foo modify s int;
ERROR 1366 (HY000): Incorrect integer value: 'a' for column 's' at row 1
mysql> show create table foo;
+-------+----------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                       |
+-------+----------------------------------------------------------------------------------------------------+
| foo   | CREATE TABLE `foo` (
  `s` text
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+----------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

In postgresql, it's error, too:

postgres=# select * from foo;
 s
---
 a
 1
(2 rows)

postgres=# ALTER TABLE foo ALTER COLUMN s TYPE int;
ERROR:  column "s" cannot be cast automatically to type integer
HINT:  You might need to specify "USING s::integer".
postgres=# ALTER TABLE foo ALTER COLUMN s TYPE int using s::integer;
ERROR:  invalid input syntax for type integer: "a"
postgres=# \d foo
               Table "public.foo"
 Column | Type | Collation | Nullable | Default
--------+------+-----------+----------+---------
 s      | text |           |          |

(Interestingly, postgresql has a "hint" to let user specify the data convention method. We might want to do the same.)

So it's clear that we'd better follow their steps and return some errors complaining some data are just not convertable to the target type.

Implementation challenges

The performance is clearly the most challenging part. If we actually change the data's type in the underlying storage, we might need to rewrite all the affected ssts, which can be a lot of work. If we change the reader (so that the data is converted on the fly), it brings extra query cost. Neither is good.

@KKould
Copy link
Collaborator

KKould commented Apr 18, 2024

Step for modify column

How it works?: #3701 (comment)

There is still one issue that has not yet been resolved:

When modify column, it need to read all the data in this column of the table and check whether each piece of data can be converted e.g. String is type-convertible to Integer, but hello cannot be converted to Integer.

How do I implement it in src/store-api/src/region_request.rs ModifyColumn::validate(I think it's best handled here)?

@evenyag
Copy link
Contributor

evenyag commented Apr 18, 2024

Validating whether all data are valid is too expensive. We have some other options if we can't cast the value

  • fill by NULL (if the column is nullable)
  • fill by a default value

Maybe we can specify which value to fill in the ALTER statement.

@KKould
Copy link
Collaborator

KKould commented Apr 18, 2024

Maybe we can specify which value to fill in the ALTER statement.

I love this idea and will implement it when PR: apache/datafusion-sqlparser-rs#1216 is merged if you think my steps are feasible

@evenyag
Copy link
Contributor

evenyag commented Apr 18, 2024

We can build this feature in a bottom-up pattern. Let's modify the CompatReader first. Then we can implement altering column type in the mito2 engine and other engines. The parser support should be the last step. This ensures that we don't leak an unimplemented feature to users.

We can also set up a simple rule.

  • fill NULL if the column is nullable
  • fill default if the column is not null

PostgreSQL's syntax is enough so I don't think we must support MySQL's alter syntax.

@KKould
Copy link
Collaborator

KKould commented Apr 18, 2024

Got it, then I will implement it in the reverse order of steps

@WenyXu
Copy link
Member

WenyXu commented Apr 24, 2024

Hi @KKould, We support flushing and compacting a table via SQL. Would you like to add a fuzz test for column type alteration after #3796?

@evenyag
Copy link
Contributor

evenyag commented Apr 24, 2024

Hi @KKould, We support flushing and compacting a table via SQL. Would you like to add a fuzz test for column type alteration after #3796?

We might also add some tests in this file. For example: insert data, change column type, insert data, scan

https://github.com/GreptimeTeam/greptimedb/blob/main/src/mito2/src/engine/alter_test.rs

@KKould
Copy link
Collaborator

KKould commented Apr 29, 2024

Hi @KKould, We support flushing and compacting a table via SQL. Would you like to add a fuzz test for column type alteration after #3796?

yep, after #3796 merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants