Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support generated/renamed columns in DuckDB #103

Merged
merged 2 commits into from
Aug 26, 2024

Conversation

rebasedming
Copy link
Contributor

Ticket(s) Closed

  • Closes #

What

In DuckDB, it is possible to control which columns are selected or generate new columns in a view:

SELECT col1 as renamed_col, 1 as fixed FROM read_parquet('path/to/file.parquet');

We now expose this to the user so they can do the same when creating foreign tables. This is done via the new select option. (I couldn't name it columns because the CSV reader already has a columns option).

CREATE FOREIGN TABLE trips ()
SERVER parquet_server
OPTIONS (files 's3://paradedb-benchmarks/yellow_tripdata_2024-01.parquet', select 'vendorid as vendor_id, 2024 as year, 1 as month');

SELECT * FROM trips LIMIT 1;
 vendor_id | year | month
-----------+------+-------
         2 | 2024 |     1
(1 row)

Why

This makes it possible to add generated columns to Parquet files that can be used as partition columns (will be documented in a future PR). See #56.

How

Tests

See added test.

@rebasedming rebasedming merged commit 1a7381f into dev Aug 26, 2024
5 checks passed
@rebasedming rebasedming deleted the feat/configurable-columns branch August 26, 2024 17:59
shamb0 pushed a commit to shamb0/pg_analytics that referenced this pull request Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants