v0.7.1 release (#321)

* add docs for running extension tests * Delta Lake docs (#313) * Create delta.mdx * Update delta.mdx * Update index.mdx * Update delta.mdx * Update delta.mdx * Fixes --------- Co-authored-by: Prashanth Rao <[email protected]> Co-authored-by: prrao87 <[email protected]> * Add Iceberg Extension Documentation (#314) * add ice_berg docu * Update src/content/docs/extensions/iceberg.mdx Co-authored-by: Guodong Jin <[email protected]> * Update src/content/docs/extensions/iceberg.mdx Co-authored-by: Guodong Jin <[email protected]> * restructure * restructure * restructure * update table * update table * Apply suggestions from code review * update table * Fixes --------- Co-authored-by: Guodong Jin <[email protected]> Co-authored-by: Prashanth Rao <[email protected]> Co-authored-by: prrao87 <[email protected]> * Fix file extension * Fix header * Update sidebar * Minor fixes * bump version --------- Co-authored-by: sterling <[email protected]> Co-authored-by: ziyi chen <[email protected]> Co-authored-by: Sterling Shi <[email protected]> Co-authored-by: Guodong Jin <[email protected]>
kuzudb · Dec 20, 2024 · c0f5ff4 · c0f5ff4
1 parent 252c198
commit c0f5ff4
Show file tree

Hide file tree

Showing 6 changed files with 448 additions and 18 deletions.
diff --git a/astro.config.mjs b/astro.config.mjs
@@ -196,6 +196,8 @@ export default defineConfig({
                             ]
                         },
                         { label: 'JSON', link: '/extensions/json' },
+                        { label: 'Iceberg', link: '/extensions/iceberg', badge: { text: 'New' }},
+                        { label: 'Delta Lake', link: '/extensions/delta', badge: { text: 'New' }},
                     ],
                     autogenerate: { directory: 'reference' },
                 },

diff --git a/src/content/docs/developer-guide/testing-framework.md b/src/content/docs/developer-guide/testing-framework.md
@@ -15,7 +15,7 @@ you must specify the dataset to be used and other optional
 parameters such as `BUFFER_POOL_SIZE`.
 
 :::caution[Note]
-Avoid using the character `-` in test file names. In the Google Test Framework, `-` has a special meaning that can inadvertently exclude a test case, leading to the test file being silently skipped. To prevent this issue, our `e2e_test` framework will throw an exception if a test file name contains `-`.
+Avoid using the character `-` in test file names and case names. In the Google Test Framework, `-` has a special meaning that can inadvertently exclude a test case, leading to the test file being silently skipped. To prevent this issue, our `e2e_test` framework will throw an exception if a test file name contains `-`.
 :::
 
 Here is a basic example of a test:
@@ -36,15 +36,18 @@ Here is a basic example of a test:
 The first three lines represents the header, separated by `--`. The testing
 framework will parse the file and register a [GTest
 programatically](http://google.github.io/googletest/advanced.html#registering-tests-programmatically).
+All e2e tests will have a prefix `e2e_test_` when being registered, which is used to distinguish them from other internal tests. e.g. a e2e_test named `BasicTest` will be registered as a GTest named `e2e_test_BasicTest`.
 When it comes to the test case name, the provided example above would be equivalent to:
 
 ```
-TEST_F(basic, BasicTest) {
+TEST_F(basic, e2e_test_BasicTest) {
 ...
 }
 ```
 
-The test group name will be the relative path of the file under the `test/test_files` directory, delimited by `~`, followed by a dot and the test case name.
+For the main source code tests, the test group name will be the relative path of the file under the `test/test_files` directory, delimited by `~`, followed by a dot and the test case name.
+
+For the extension code tests, the test group name will be the relative path of the file under the `extension/name_of_extension/test/test_files` directory, delimited by `~`, followed by a dot and the test case name.
 
 The testing framework will test each logical plan created from the prepared
 statements and assert the result.
@@ -81,10 +84,29 @@ $ ctest -V -R common~types~interval.DifferentTypesCheck
 $ ctest -j 10
 ```
 
+To switch between main tests and extension tests, pass 'E2E_TEST_FILES_DIRECTORY=extension' as an environment variable when calling ctest.
+
+Example:
+
+```
+# First cd to build/relwithdebinfo/test (after running make extension-test)
+$ cd build/relwithdebinfo/test
+
+# Run all the extension tests (-R e2e_test is used to filter the extension tests, as all extension tests are e2e tests)
+$ E2E_TEST_FILES_DIRECTORY=extension ctest -R e2e_test
+```
+
+:::caution[Note]
+Windows has different syntax for setting environment variable, to run all extension tests in windows, run
+```
+$ set "E2E_TEST_FILES_DIRECTORY=extension" && ctest -R e2e_test
+```
+:::
+
 #### 2. Running directly from `e2e_test` binary
 
-The test binaries are available in `build/release[or debug]/test/runner`
-folder. You can run `e2e_test` specifying the relative path file inside
+The test binaries are available in `build/relwithdebinfo[or debug or release]/test/runner`
+folder. To run any of the main tests, you can run `e2e_test` specifying the relative path file inside
 `test_files`:
 
 ```
@@ -98,6 +120,19 @@ $ ./e2e_test long_string_pk/long_string_pk.test
 $ ./e2e_test .
 ```
 
+To run any of the extension tests, you can run `e2e_test` with environment variable `E2E_TEST_FILES_DIRECTORY=extension` and specify the relative path file inside
+`extension`:
+```
+# Run all tests inside extension/duckdb
+$ E2E_TEST_FILES_DIRECTORY=extension ./e2e_test duckdb
+
+# Run all tests from extension/json/test/copy_to_json.test file
+$ E2E_TEST_FILES_DIRECTORY=extension ./e2e_test json/test/copy_to_json.test
+
+# Run all extension tests
+$ E2E_TEST_FILES_DIRECTORY=extension ./e2e_test .
+```
+
 :::caution[Note]
 Some test files contain multiple test cases, and sometimes it is not easy
 to find the output from a failed test. In this situation, the flag

diff --git a/src/content/docs/extensions/delta.md b/src/content/docs/extensions/delta.md
@@ -0,0 +1,145 @@
+---
+title: "Delta Lake"
+---
+
+## Usage
+
+The `delta` extension adds support for scanning/copying from the [`Delta Lake open-source storage format`](https://delta.io/).
+Delta Lake is an open-source storage framework that enables building a format agnostic Lakehouse architecture.
+Using this extension, you can interact with Delta tables from within Kùzu using the `LOAD FROM` and `COPY FROM` clauses.
+
+The Delta functionality is not available by default, so you would first need to install the `DELTA`
+extension by running the following commands:
+
+```sql
+INSTALL DELTA;
+LOAD EXTENSION DELTA;
+```
+
+### Example dataset
+
+Let's look at an example dataset to demonstrate how the Delta extension can be used.
+Firstly, let's create a Delta table containing student information using Python and save the Delta table in the `'/tmp/student'` directory:
+Before running the script, make sure the `deltalake` Python package is properly installed (we will also use Pandas).
+```shell
+pip install deltalake pandas
+```
+
+```python
+# create_delta_table.py
+import pandas as pd
+from deltalake import DeltaTable, write_deltalake
+
+student = {
+    "name": ["Alice", "Bob", "Carol"],
+    "ID": [0, 3, 7]
+}
+
+write_deltalake(f"/tmp/student", pd.DataFrame.from_dict(student))
+```
+
+In the following sections, we will first scan the Delta table to query its contents in Cypher, and
+then proceed to copy the data and construct a node table.
+
+### Scan the Delta table
+`LOAD FROM` is a Cypher clause that scans a file or object element by element, but doesn’t actually
+move the data into a Kùzu table.
+
+To scan the Delta table created above, you can do the following:
+
+```cypher
+LOAD FROM '/tmp/student' (file_format='delta') RETURN *;
+```
+```
+┌────────┬───────┐
+│ name   │ ID    │
+│ STRING │ INT64 │
+├────────┼───────┤
+│ Alice  │ 0     │
+│ Bob    │ 3     │
+│ Carol  │ 7     │
+└────────┴───────┘
+```
+:::note[Note]
+Note: The `file_format` parameter is used to explicitly specify the file format of the given file instead of letting Kùzu autodetect the file format at runtime.
+When scanning from the Delta table, `file_format` option must be provided since Kùzu is not capable of autodetecting Delta tables.
+:::
+
+### Copy the Delta table into a node table
+You can then use a `COPY FROM` statement to directly copy the contents of the Delta table into a Kùzu node table.
+
+```cypher
+CREATE NODE TABLE student (name STRING, ID INT64, PRIMARY KEY(ID));
+COPY student FROM '/tmp/student' (file_format='delta')
+```
+
+Just like above in `LOAD FROM`, the `file_format` parameter is mandatory when specifying the `COPY FROM` clause as well.
+
+```cypher
+// First, create the node table
+CREATE NODE TABLE student (name STRING, ID INT64, PRIMARY KEY(ID));
+```
+```
+┌─────────────────────────────────┐
+│ result                          │
+│ STRING                          │
+├─────────────────────────────────┤
+│ Table student has been created. │
+└─────────────────────────────────┘
+```
+```cypher
+COPY student FROM '/tmp/student' (file_format='delta');
+```
+```
+┌─────────────────────────────────────────────────┐
+│ result                                          │
+│ STRING                                          │
+├─────────────────────────────────────────────────┤
+│ 3 tuples have been copied to the student table. │
+└─────────────────────────────────────────────────┘
+```
+
+### Access Delta tables hosted on S3
+Kùzu also supports scanning/copying a Delta table hosted on S3 in the same way as from a local file system.
+Before reading and writing from S3, you have to configure the connection using the [CALL](https://kuzudb.com/docusaurus/cypher/configuration) statement.
+
+#### Supported options
+
+| Option name | Description |
+|----------|----------|
+| `s3_access_key_id` | S3 access key id |
+| `s3_secret_access_key` | S3 secret access key |
+| `s3_endpoint` | S3 endpoint |
+| `s3_url_style` | Uses [S3 url style](https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html) (should either be vhost or path) |
+| `s3_region` | S3 region |
+
+#### Requirements on the S3 server API
+
+| Feature | Required S3 API features |
+|----------|----------|
+| Public file reads | HTTP Range request |
+| Private file reads | Secret key authentication|
+
+#### Scan Delta table from S3
+Reading or scanning a Delta table that's on S3 is as simple as reading from regular files:
+
+```sql
+LOAD FROM 's3://kuzu-sample/sample-delta' (file_format='delta')
+RETURN *
+```
+
+#### Copy Delta table hosted on S3 into a local node table
+
+Copying from Delta tables on S3 is also as simple as copying from regular files:
+
+```cypher
+CREATE NODE TABLE student (name STRING, ID INT64, PRIMARY KEY(ID));
+COPY student FROM 's3://kuzu-sample/student-delta' (file_format='delta')
+```
+
+## Limitations
+
+When using the Delta Lake extension in Kùzu, keep the following limitations in mind.
+
+- Writing (i.e., exporting to) Delta files from Kùzu is currently not supported.
+