You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Support for foreign tables in a multi-level partitioned table setup (see example).
Why are you requesting this feature?
When you have a lot of large files in one object store key/directory, scanning all of the files can be prohibitively time consuming. Changing the storage structure to support hive keys may not always be an option.
What is your proposed implementation for this feature?
This is an example of a multi-level partition table, but in many cases only the first partition level would be needed.
Example:
A “root” table that will have two partition “levels”:
CREATE TABLE my_table(
<column definitions>
) PARTITION BY list(id_1);
The table for the first partition level:
create table if not exists my_table_123
partition of my_table
for values in ('123') partition by list(id_2);
Many partition tables may be created at this level - one for each id_1 value in the root table.
Note: If these first two tables have to be created as foreign tables in order for the last partition level table to be a foreign table, they shouldn’t require a files option.
The table for the next partition level looks like this:
create foreign table if not exists my_table_123_abc
partition of my_table_123
for values in ('abc')
options (files 's3://<bucket_name>/.../<filename>.parquet');
Many partition tables may be created at this level as well - one for each id_2 value in the first partition level tables.
Note: when creating this final “leaf” partition table as a foreign table, a files option should be required.
So, when querying the root table with a where clause like this:
select *
from my_table
where id_1 = '123'
and id_2 = 'abc'
...only one parquet file should be scanned - the one specified by the files option of the “leaf” partition table. Or, if a glob pattern is used in the files option of the leaf partition table, only the (ideally) few number of matching files will be scanned.
Full Name:
Patrick Park
Affiliation:
Payzer
The text was updated successfully, but these errors were encountered:
What feature are you requesting?
Support for foreign tables in a multi-level partitioned table setup (see example).
Why are you requesting this feature?
When you have a lot of large files in one object store key/directory, scanning all of the files can be prohibitively time consuming. Changing the storage structure to support hive keys may not always be an option.
What is your proposed implementation for this feature?
This is an example of a multi-level partition table, but in many cases only the first partition level would be needed.
Example:
A “root” table that will have two partition “levels”:
The table for the first partition level:
Many partition tables may be created at this level - one for each
id_1
value in the root table.Note: If these first two tables have to be created as foreign tables in order for the last partition level table to be a foreign table, they shouldn’t require a files option.
The table for the next partition level looks like this:
Many partition tables may be created at this level as well - one for each
id_2
value in the first partition level tables.Note: when creating this final “leaf” partition table as a foreign table, a
files
option should be required.So, when querying the root table with a where clause like this:
...only one parquet file should be scanned - the one specified by the files option of the “leaf” partition table. Or, if a glob pattern is used in the
files
option of the leaf partition table, only the (ideally) few number of matching files will be scanned.Full Name:
Patrick Park
Affiliation:
Payzer
The text was updated successfully, but these errors were encountered: