Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multi-level partition tables #56

Open
pdpark opened this issue Aug 7, 2024 · 2 comments
Open

Support multi-level partition tables #56

pdpark opened this issue Aug 7, 2024 · 2 comments
Labels
feature New feature or request good first issue Good for newcomers priority-low Low priority issue

Comments

@pdpark
Copy link

pdpark commented Aug 7, 2024

What feature are you requesting?

Support for foreign tables in a multi-level partitioned table setup (see example).

Why are you requesting this feature?

When you have a lot of large files in one object store key/directory, scanning all of the files can be prohibitively time consuming. Changing the storage structure to support hive keys may not always be an option.

What is your proposed implementation for this feature?

This is an example of a multi-level partition table, but in many cases only the first partition level would be needed.

Example:

A “root” table that will have two partition “levels”:

CREATE TABLE my_table(
  <column definitions>
) PARTITION BY list(id_1);

The table for the first partition level:

create table if not exists my_table_123
partition of my_table
for values in ('123') partition by list(id_2);

Many partition tables may be created at this level - one for each id_1 value in the root table.

Note: If these first two tables have to be created as foreign tables in order for the last partition level table to be a foreign table, they shouldn’t require a files option.

The table for the next partition level looks like this:

create foreign table if not exists my_table_123_abc
partition of my_table_123
for values in ('abc')
options (files 's3://<bucket_name>/.../<filename>.parquet');

Many partition tables may be created at this level as well - one for each id_2 value in the first partition level tables.

Note: when creating this final “leaf” partition table as a foreign table, a files option should be required.

So, when querying the root table with a where clause like this:

select *
from my_table
where id_1 = '123'
and   id_2 = 'abc'

...only one parquet file should be scanned - the one specified by the files option of the “leaf” partition table. Or, if a glob pattern is used in the files option of the leaf partition table, only the (ideally) few number of matching files will be scanned.

Full Name:

Patrick Park

Affiliation:

Payzer

@philippemnoel philippemnoel added the good first issue Good for newcomers label Aug 7, 2024
@philippemnoel philippemnoel transferred this issue from paradedb/paradedb Aug 7, 2024
@philippemnoel philippemnoel transferred this issue from paradedb/pg_analytics Aug 8, 2024
@philippemnoel philippemnoel transferred this issue from paradedb/paradedb Aug 8, 2024
@philippemnoel philippemnoel added feature New feature or request priority-high High priority issue labels Aug 8, 2024
@shamb0
Copy link

shamb0 commented Aug 15, 2024

Hi @philippemnoel , I am interested to work on this feature, could you please assign it to me

@philippemnoel
Copy link
Collaborator

Hi @philippemnoel , I am interested to work on this feature, could you please assign it to me

Absolutely, it is yours! Thank you for your work :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request good first issue Good for newcomers priority-low Low priority issue
Projects
None yet
3 participants