You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The architecture many are going for is to have a datalake without lockin. One issue with Apecloud, while I love DuckDB's internal storage, is its reliance on the DuckDB format (albeit a very open one).
Thus, it would be great to support Parquet exports per DuckDB, and writing out to s3 buckets if possible. Later on, as Iceberg becomes easier to implement, Iceberg as well.
In other words, support
select * from parquet_scan('s3://bucket/file.parquet')
Ideally, deviating from DuckDB's syntax, it would also allow you to specify endpoints like
s3:endpoint_pnemonic://
And same with exports
Right now one can s3fuse an s3 bucket but still no object exports.
The text was updated successfully, but these errors were encountered:
@alanpaulkwan
Thanks for your feedback! Let me answer your questions:
Export/Import and Scan Data in Parquet Format:
We already support this feature, as DuckDB does. You can connect to MyDuck Server via the Postgres protocol and execute the EXPORT/IMPORT DATABASE command. Here’s an example:
SET s3_region='ap-northeast-1';
SET s3_access_key_id='xxxxxxxxxxxxxxxxx';
SET s3_secret_access_key='xxxxxxxxxxxxxxxxxxx';
SET s3_endpoint='s3.ap-northeast-1.amazonaws.com';
-- Export data into Parquet format on S3
EXPORT DATABASE 's3://your-bucket-name/your-path-name/' (FORMAT PARQUET);
-- Import data from directory on S3
IMPORT DATABASE 's3://your-bucket-name/your-path-name/';
-- Read data in Parquet format on S3SELECT*FROM parquet_scan('s3://your-bucket-name/your-file.parquet');
Export Data in Iceberg Format:
I’m currently investigating the implementation of Add S3 support for writing Iceberg-format files #276 , including INSERT/UPDATE/DELETE operations. While append-only is relatively straightforward, modifications require more time. As you mentioned, exporting files in Iceberg format is not particularly complex, so I plan to implement a strategy to export data in Iceberg format first. Issue Export Data into Iceberg Format on Object Storage #324 has been created for this task.
Ah my bad, I think I was confused becuase of some stuff I read in the documentation, and I was operating in MySQL interface where a lot of this stuff wasn't working. But it's great!
The architecture many are going for is to have a datalake without lockin. One issue with Apecloud, while I love DuckDB's internal storage, is its reliance on the DuckDB format (albeit a very open one).
Thus, it would be great to support Parquet exports per DuckDB, and writing out to s3 buckets if possible. Later on, as Iceberg becomes easier to implement, Iceberg as well.
In other words, support
Right now one can s3fuse an s3 bucket but still no object exports.
The text was updated successfully, but these errors were encountered: