kuzudb · prrao87 · Dec 19, 2024 · Dec 9, 2024 · Dec 9, 2024 · Dec 18, 2024
diff --git a/src/content/docs/extensions/delta.mdx b/src/content/docs/extensions/delta.mdx
@@ -0,0 +1,126 @@
+---
+title: "DELTA extension"
+---
+
+import { Tabs, TabItem } from '@astrojs/starlight/components';
+
+## Usage
+
+The `delta` extension adds support for scanning/copying from the [`Delta Lake open-source storage format`](https://delta.io/). Using this extension, you can
+interact with DELTA tables using [`LOAD FROM`](/cypher/query-clauses/load-from),
+[`COPY FROM`](/import/copy-from-query-results), similar to how you would
+with CSV files.
+
+The DELTA functionality is not available by default, so you would first need to install the DELTA
+extension by running the following commands:
+
+```sql
+INSTALL DELTA;
+LOAD EXTENSION DELTA;
+```
+
+### Example dataset
+
+Let's look at an example dataset to demonstrate how the DELTA extension can be used.
+Firstly, let's create a DELTA table containing student information using python and save the delta table in the `'/tmp/student'` directory:
+```python
+import pandas as pd
+from deltalake import DeltaTable, write_deltalake
+
+student = {
+    "name": ["Alice", "Bob", "Carol"],
+    "ID": [0, 3, 7]
+}
+
+write_deltalake(f"/tmp/student", pd.DataFrame.from_dict(student))
+```
+
+In the following sections, we will first scan the DELTA table to query its contents in Cypher, and
+then proceed to copy the data and construct a node table.
+
+### Scan the DELTA table
+`LOAD FROM` is a Cypher query that scans a file or object element by element, but doesn’t actually
+move the data into a Kùzu table.
+
+To scan the delta table created above, you can do the following:
+
+```cypher
+LOAD FROM '/tmp/student'(file_format='delta') RETURN *;
+```
+Note: The `file_format` parameter is used to explicitly specify the file format of the given file instead of letting kuzu sniff the file format at runtime. When scanning from the DELTA table, `file_format` option must be provided since kuzu is not capable of sniffing delta tables.
+
+Result:
+```cypher
+kuzu> LOAD FROM '/tmp/student'(file_format='delta') RETURN *;
+┌────────┬───────┐
+│ name   │ ID    │
+│ STRING │ INT64 │
+├────────┼───────┤
+│ Alice  │ 0     │
+│ Bob    │ 3     │
+│ Carol  │ 7     │
+└────────┴───────┘
+```
+
+### Copy the DELTA table into a node table
+You can then use a `COPY FROM` statement to directly copy the contents of the DELTA table into a node table.
+
+```cypher
+CREATE NODE TABLE student (name STRING, ID INT64, PRIMARY KEY(ID));
+COPY student FROM '/tmp/student' (file_format='delta')
+```
+Note: The `file_format` parameter is also needed in the copy from clause as mentioned in the `LOAD FROM` section.
+
+Result:
+```cypher
+kuzu> CREATE NODE TABLE student (name STRING, ID INT64, PRIMARY KEY(ID));
+┌─────────────────────────────────┐
+│ result                          │
+│ STRING                          │
+├─────────────────────────────────┤
+│ Table student has been created. │
+└─────────────────────────────────┘
+
+kuzu> COPY student FROM '/tmp/student' (file_format='delta');
+┌─────────────────────────────────────────────────┐
+│ result                                          │
+│ STRING                                          │
+├─────────────────────────────────────────────────┤
+│ 3 tuples have been copied to the student table. │
+└─────────────────────────────────────────────────┘
+```
+
+### Access the DELTA table hosted on S3
+Kùzu also supports scanning/copying a DELTA table hosted on S3 in the same way as from a local file system.
+Before reading and writing from S3, users have to configure using the [CALL](https://kuzudb.com/docusaurus/cypher/configuration) statement.
+
+### Supported options:
+
+| Option name | Description |
+|----------|----------|
+| `s3_access_key_id` | S3 access key id |
+| `s3_secret_access_key` | S3 secret access key |
+| `s3_endpoint` | S3 endpoint |
+| `s3_url_style` | Uses [S3 url style](https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html) (should either be vhost or path) |
+| `s3_region` | S3 region |
+
+### Requirements on the S3 server API
+
+| Feature | Required S3 API features |
+|----------|----------|
+| Public file reads | HTTP Range request |
+| Private file reads | Secret key authentication|
+
+### Read DELTA table from S3:
+Reading from S3 is as simple as reading from regular files:
+
+```sql
+LOAD FROM 's3://kuzu-sample/sample-delta' (file_format='delta')
+RETURN *;
+```
+
+### Copy DELTA table hosted on S3 into a local node table
+```cypher
+CREATE NODE TABLE student (name STRING, ID INT64, PRIMARY KEY(ID));
+COPY student FROM 's3://kuzu-sample/student-delta' (file_format='delta')
+```