feat(iceberg): support iceberg engine connection #20298

chenzl25 · 2025-01-24T09:28:13Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Related: Tracking: Iceberg engine table #19418
Introduce a session variable iceberg_engine_connection to allow users to provide their own bucket for the iceberg engine via iceberg connection. Currently, only warehouse information is allowed to be configured in the iceberg engine connection. Iceberg catalog is still handled by us in the meta sql backend. With this config, it can make us much easier to share iceberg tables with users, since the underlying warehouse is managed by users and they can have a better control of the warehouse credential.

Checklist

I have written necessary rustdoc comments.
I have added necessary unit tests and integration tests.
I have added test labels as necessary.
I have added fuzzing tests or opened an issue to track them.
My PR contains breaking changes.
My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
My PR contains critical fixes that are necessary to be merged into the latest release.

Documentation

My PR needs documentation updates.

Release note

xxchan · 2025-01-24T09:55:04Z

e2e_test/iceberg/test_case/iceberg_engine.slt

+);
+
+statement ok
+set iceberg_engine_connection = 'public.my_conn';


Why don't we use the connection = conn syntax, but use a session variable instead? :thinking

Finding a way to allow users to set a default behavior for a whole database. Adding a syntax is also ok, however, our iceberg table can be used with a connector which also can have a connection. This might be somewhat confusing for the user.

Oh, that's indeed a problem... So the with can be both for connector and for the table.

crunchybridge https://arc.net/l/quote/vdcxffua
snowflake https://arc.net/l/quote/fomfghdw
Both of them allow the setting of a variable in a database.

The term connection in iceberg_engine_connection seems a little vague here, since it's not clear whether it's for the catalog or the volume. What about iceberg_engine_volume...?

Oh, we will allow user to specify both volume and catalog (or any 1 of them) in the same connection, right?

yes. iceberg connection can specify catalog and bucket together. Although for iceberg engine, we only allow specifying bucket

gru-agent · 2025-01-24T10:04:01Z

This pull request has been modified. If you want me to regenerate unit test for any of the files related, please find the file in "Files Changed" tab and add a comment @gru-agent. (The github "Comment on this file" feature is in the upper right corner of each file in "Files Changed" tab.)

xxchan · 2025-01-28T08:26:05Z

src/frontend/src/catalog/root_catalog.rs

+    pub fn get_secret_by_id(
+        &self,
+        db_name: &str,
+        secret_id: u32,
+    ) -> CatalogResult<&Arc<SecretCatalog>> {
+        let secret_id = SecretId::new(secret_id);
+        for schema in self.get_database_by_name(db_name)?.iter_schemas() {
+            if let Some(secret) = schema.get_secret_by_id(&secret_id) {
+                return Ok(secret);
+            }
+        }
+        Err(CatalogError::NotFound("secret", secret_id.to_string()))
+    }
+


This seems unused

xxchan · 2025-01-28T08:42:52Z

e2e_test/iceberg/test_case/iceberg_engine.slt

+);
+
+statement ok
+set iceberg_engine_connection = 'public.my_conn';


The term connection in iceberg_engine_connection seems a little vague here, since it's not clear whether it's for the catalog or the volume. What about iceberg_engine_volume...?

xxchan · 2025-01-28T08:44:51Z

e2e_test/iceberg/test_case/iceberg_engine.slt

+);
+
+statement ok
+set iceberg_engine_connection = 'public.my_conn';


Oh, we will allow user to specify both volume and catalog (or any 1 of them) in the same connection, right?

xxchan · 2025-01-28T08:48:42Z

src/frontend/src/handler/create_table.rs

-    with.insert("warehouse.path".to_owned(), warehouse_path.clone());
+    if let Some(warehouse_path) = warehouse_path.clone() {
+        with.insert("warehouse.path".to_owned(), warehouse_path.clone());
+    }


sink and source options look basically the same. We'd better have a with_common shared by with_source and with_sink

xxchan · 2025-01-28T08:52:01Z

src/frontend/src/handler/create_table.rs

+                let _s3_region = params
+                    .properties
+                    .get("s3.region")
+                    .ok_or_else(|| anyhow!("`s3.region` must be set in iceberg engine connection"))?
+                    .to_owned();
+                let _s3_endpoint = params.properties.get("s3.endpoint").map(|s| s.to_owned());
+                let _warehouse_path = params
+                    .properties
+                    .get("warehouse.path")
+                    .map(|s| s.to_owned())
+                    .ok_or_else(|| {
+                        anyhow!("`warehouse.path` must be set in iceberg engine connection")
+                    })?;


Validate properties here isn't very elegant and not very user-friendly. Could we validate when create connection?

chenzl25 added 2 commits January 24, 2025 16:04

first version

cd77184

second version

4550012

github-actions bot added the type/feature label Jan 24, 2025

chenzl25 requested review from fuyufjh, hzxa21, xxchan, Li0k and BugenZhao January 24, 2025 09:28

xxchan reviewed Jan 24, 2025

View reviewed changes

fmt

0db7172

chenzl25 added ci/run-e2e-iceberg-engine-tests ci/run-e2e-iceberg-sink-v2-tests labels Jan 24, 2025

chenzl25 added 2 commits January 26, 2025 12:23

fmt

d5a3c50

fmt

144af50

chenzl25 requested a review from wenym1 January 26, 2025 09:51

fix

23d0584

chenzl25 mentioned this pull request Jan 26, 2025

feat(iceberg): support emr iceberg compaction for the external bucket #20315

Open

8 tasks

xxchan reviewed Jan 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(iceberg): support iceberg engine connection #20298

feat(iceberg): support iceberg engine connection #20298

chenzl25 commented Jan 24, 2025

xxchan Jan 24, 2025

chenzl25 Jan 24, 2025 •

edited

Loading

xxchan Jan 24, 2025

chenzl25 Jan 24, 2025 •

edited

Loading

xxchan Jan 28, 2025

xxchan Jan 28, 2025

chenzl25 Jan 28, 2025

gru-agent bot commented Jan 24, 2025

xxchan Jan 28, 2025

xxchan Jan 28, 2025

xxchan Jan 28, 2025

xxchan Jan 28, 2025

xxchan Jan 28, 2025

feat(iceberg): support iceberg engine connection #20298

Are you sure you want to change the base?

feat(iceberg): support iceberg engine connection #20298

Conversation

chenzl25 commented Jan 24, 2025

What's changed and what's your intention?

Checklist

Documentation

Choose a reason for hiding this comment

chenzl25 Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenzl25 Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gru-agent bot commented Jan 24, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenzl25 Jan 24, 2025 •

edited

Loading

chenzl25 Jan 24, 2025 •

edited

Loading