interactions with S3 bucket (parquet/iceberg export, object reads) #321

alanpaulkwan · 2024-12-25T15:02:37Z

The architecture many are going for is to have a datalake without lockin. One issue with Apecloud, while I love DuckDB's internal storage, is its reliance on the DuckDB format (albeit a very open one).

Thus, it would be great to support Parquet exports per DuckDB, and writing out to s3 buckets if possible. Later on, as Iceberg becomes easier to implement, Iceberg as well.

In other words, support

select * from parquet_scan('s3://bucket/file.parquet')

Ideally, deviating from DuckDB's syntax, it would also allow you to specify endpoints like

s3:endpoint_pnemonic:// 

And same with exports

Right now one can s3fuse an s3 bucket but still no object exports.

The text was updated successfully, but these errors were encountered:

TianyuZhang1214 · 2024-12-26T02:15:19Z

@alanpaulkwan
Thanks for your feedback! Let me answer your questions:

Export/Import and Scan Data in Parquet Format:
We already support this feature, as DuckDB does. You can connect to MyDuck Server via the Postgres protocol and execute the EXPORT/IMPORT DATABASE command. Here’s an example:

SET s3_region='ap-northeast-1';
SET s3_access_key_id='xxxxxxxxxxxxxxxxx';
SET s3_secret_access_key='xxxxxxxxxxxxxxxxxxx';
SET s3_endpoint='s3.ap-northeast-1.amazonaws.com';

-- Export data into Parquet format on S3
EXPORT DATABASE 's3://your-bucket-name/your-path-name/' (FORMAT PARQUET);

-- Import data from directory on S3
IMPORT DATABASE 's3://your-bucket-name/your-path-name/';

-- Read data in Parquet format on S3
SELECT * FROM parquet_scan('s3://your-bucket-name/your-file.parquet');

Export Data in Iceberg Format:
I’m currently investigating the implementation of Add S3 support for writing Iceberg-format files #276 , including INSERT/UPDATE/DELETE operations. While append-only is relatively straightforward, modifications require more time. As you mentioned, exporting files in Iceberg format is not particularly complex, so I plan to implement a strategy to export data in Iceberg format first. Issue Export Data into Iceberg Format on Object Storage #324 has been created for this task.

alanpaulkwan · 2024-12-26T06:20:40Z

Ah my bad, I think I was confused becuase of some stuff I read in the documentation, and I was operating in MySQL interface where a lot of this stuff wasn't working. But it's great!

alanpaulkwan closed this as completed Dec 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

interactions with S3 bucket (parquet/iceberg export, object reads) #321

interactions with S3 bucket (parquet/iceberg export, object reads) #321

alanpaulkwan commented Dec 25, 2024 •

edited

Loading

TianyuZhang1214 commented Dec 26, 2024 •

edited

Loading

alanpaulkwan commented Dec 26, 2024

interactions with S3 bucket (parquet/iceberg export, object reads) #321

interactions with S3 bucket (parquet/iceberg export, object reads) #321

Comments

alanpaulkwan commented Dec 25, 2024 • edited Loading

TianyuZhang1214 commented Dec 26, 2024 • edited Loading

alanpaulkwan commented Dec 26, 2024

alanpaulkwan commented Dec 25, 2024 •

edited

Loading

TianyuZhang1214 commented Dec 26, 2024 •

edited

Loading