Skip to content

Commit

Permalink
added sections for ast and relation algebra steps
Browse files Browse the repository at this point in the history
  • Loading branch information
dzlab committed Oct 10, 2024
1 parent e79b16c commit 257e2d4
Showing 1 changed file with 106 additions and 43 deletions.
149 changes: 106 additions & 43 deletions _posts/2024-07-17-sql-processor-calcite.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ img_excerpt: assets/logos/Apache_Calcite_Logo.svg

In a [previous article]({{ "database/2024/07/06/apache-calcite/" | absolute_url }}), we saw how to create an Adapter for Apache Calcite and then how to run SQL queries against random data source. In this article we will see in [step by step](https://github.com/zabetak/slides/blob/master/2021/boss-workshop/apache-calcite-tutorial.pdf) how to use Apache Cacite to implement a SQL processor to parse an input query, validate it and then execute it.

**Query**
As an example query we will use the following simple `JOIN` query between two tables `customer` and `orders`.

```sql
SELECT `C_NAME`, `O_ORDERKEY`, `O_ORDERDATE`
Expand All @@ -24,71 +24,98 @@ WHERE `CUSTOMER`.`c_custkey` < 3
ORDER BY `C_NAME`, `O_ORDERKEY`
```

## Catalog
We need to build the catalog of metadata for Caclite to resolve the query.

First, we need to create the root schema and type factory:

```java
// TODO 1. Create the root schema and type factory
CalciteSchema schema = CalciteSchema.createRootSchema(false);
RelDataTypeFactory typeFactory = new JavaTypeFactoryImpl();
```
// TODO 2. Create the data type for each TPC-H table
// TODO 3. Add the TPC-H table to the schema

Then, create the metadata of the two tables (columns and data types) then register them with the root schema

```java
for(TpchTable table: TpchTable.values()) {
RelDataTypeFactory.Builder builder = typeFactory.builder();
for(TpchTable.Column c: table.columns) {
builder.add(c.name, typeFactory.createJavaType(c.type).getSqlTypeName());
}
String indexPath = Paths.get(DatasetIndexer.INDEX_LOCATION, "tpch", table.name()).toString();
schema.add(table.name(), new LuceneTable(indexPath, builder.build()));
}
RelDataTypeFactory.Builder builder1 = typeFactory.builder();
builder1.add("c_custkey", typeFactory.createJavaType(Integer.class).getSqlTypeName());
builder1.add("c_name", typeFactory.createJavaType(String.class).getSqlTypeName());
schema.add("customer", new MyTable(builder1.build(), ...){...});

RelDataTypeFactory.Builder builder2 = typeFactory.builder();
builder2.add("o_orderkey", typeFactory.createJavaType(Integer.class).getSqlTypeName());
builder2.add("o_custkey", typeFactory.createJavaType(Integer.class).getSqlTypeName());
builder2.add("o_orderdate", typeFactory.createJavaType(Date.class).getSqlTypeName());
schema.add("orders", new MyTable(builder2.build(), ...){...});
```

> Note: `MyTable` should be replaced with the actual class used to access the data and implements Calcite's `Table` / `ScannableTable`
After that, Configure and instantiate a catalog reader that Calcite can use to access the metadata

```java
CalciteConnectionConfig readerConfig = CalciteConnectionConfig.DEFAULT
.set(CalciteConnectionProperty.CASE_SENSITIVE, "false");
CalciteCatalogReader catalogReader = new CalciteCatalogReader(schema, Collections.emptyList(), typeFactory, readerConfig);
```

> Note: we set the case-sensitivity to false so that we it is OK to user all uppercase table or column names.
## Query to AST
To parse the text query into an Abstract Syntax Tree (AST), we first create a SQL parser

// TODO 4. Create an SQL parser
```java
SqlParser parser = SqlParser.create(sqlQuery);
```
// TODO 5. Parse the query into an AST

Then, we can use it to parse the query into an AST as follows:

```java
SqlNode parseAst = parser.parseQuery();
// TODO 6. Print and check the AST
System.out.println("[Parsed query]");
System.out.println(parseAst.toString());
```
// TODO 7. Configure and instantiate the catalog reader
```java
CalciteConnectionConfig readerConfig = CalciteConnectionConfig.DEFAULT
.set(CalciteConnectionProperty.CASE_SENSITIVE, "false");
CalciteCatalogReader catalogReader = new CalciteCatalogReader(schema, Collections.emptyList(), typeFactory,
readerConfig);
```
// TODO 8. Create the SQL validator using the standard operator table and default configuration

We can get back the original query from the AST with `parseAst.toString()`.


Once we have the AST, we can validate it against the catalog.
First, create a SQL validator using the standard operator table and default configuration.

```java
SqlValidator sqlValidator = SqlValidatorUtil.newValidator(SqlStdOperatorTable.instance(),
catalogReader, typeFactory, SqlValidator.Config.DEFAULT);
```
// TODO 9. Validate the initial AST

Now we can validate the initial AST:

```java
SqlNode validAst = sqlValidator.validate(parseAst);
System.out.println("[Validated query");
System.out.println(validAst.toString());
```

Similarly to before, we can get back the original query from the validated AST with `validAst.toString()`

## AST to Logical plan
Query optimization cannot be applied to an AST, the later must be converted to Relational Algebra expression.


First, Create the optimization cluster to maintain planning information

// TODO 10. Create the optimization cluster to maintain planning information
// TODO 11. Configure and instantiate the converter of the AST to Logical plan
// - No view expansion (use NOOP_EXPANDER)
// - Standard expression normalization (use StandardConvertletTable.INSTANCE)
// - Default configuration (SqlToRelConverter.config())
```java
RelOptCluster cluster = newCluster(typeFactory);
RelOptPlanner planner = new VolcanoPlanner();
planner.addRelTraitDef(ConventionTraitDef.INSTANCE);
RelOptCluster cluster = RelOptCluster.create(planner, new RexBuilder(typeFactory));
```

Then, Configure and instantiate an AST to Logical plan converter with default configuration and Standard expression normalization

```java
RelOptTable.ViewExpander NOOP_EXPANDER = (type, query, schema, path) -> null;
SqlToRelConverter sqlToRelConverter = new SqlToRelConverter(NOOP_EXPANDER,
sqlValidator, catalogReader, cluster,
StandardConvertletTable.INSTANCE,
SqlToRelConverter.config());
```
// TODO 12. Convert the valid AST into a logical plan

Now, we can convert the validated AST into a logical plan and print it to standard output
```java
RelNode logPlan = sqlToRelConverter.convertQuery(validAst, false, true).rel;
// TODO 13. Display the logical plan with explain attributes
Expand All @@ -97,8 +124,8 @@ System.out.println(
);
```

We should see a **Logical plan** that look like this:

**Logical plan**
```
LogicalSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
LogicalProject(C_NAME=[$1], O_ORDERKEY=[$8], O_ORDERDATE=[$12])
Expand All @@ -109,8 +136,10 @@ LogicalSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
```

## Logical to Physical plan
We need to optimize the Logical Plan and convert it to a plan that can be executed by the underlying storage system.

First, initialize optimizer/planner with the necessary rules that will be used to transform the Logical Plan:

// TODO 14. Initialize optimizer/planner with the necessary rules
```java
RelOptPlanner planner = cluster.getPlanner();
planner.addRule(CoreRules.FILTER_TO_CALC);
Expand All @@ -120,23 +149,30 @@ planner.addRule(EnumerableRules.ENUMERABLE_CALC_RULE);
planner.addRule(EnumerableRules.ENUMERABLE_JOIN_RULE);
planner.addRule(EnumerableRules.ENUMERABLE_TABLE_SCAN_RULE);
```
// TODO 15. Define the type of the output plan (in this case we want a physical plan in
// EnumerableContention)

Next, define the type of the output plan, in this case we want a physical plan in `EnumerableContention`

```java
logPlan = planner.changeTraits(logPlan, logPlan.getTraitSet().replace(EnumerableConvention.INSTANCE));
planner.setRoot(logPlan);
```

// TODO 16. Start the optimization process to obtain the most efficient physical plan based on
// the provided rule set.
Start the optimization process to obtain the most efficient physical plan based on the provided rule set.

```java
EnumerableRel phyPlan = (EnumerableRel) planner.findBestExp();
```

// TODO 17. Display the physical plan
We can visualize the **Physical plan**

```java
System.out.println(
RelOptUtil.dumpPlan("[Physical plan]", phyPlan, SqlExplainFormat.TEXT, SqlExplainLevel.EXPPLAN_ATTRIBUTES)
);
```

**Physical plan**
Which will give us something like this:

```
EnumerableSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
EnumerableCalc(expr#0..16=[{inputs}], C_NAME=[$t1], O_ORDERKEY=[$t8], O_ORDERDATE=[$t12])
Expand All @@ -146,11 +182,38 @@ EnumerableSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
EnumerableTableScan(table=[[ORDERS]])
```

Or generate a [Dotviz graph](graphviz.org) which would look like this:

![Physical Plan](/assets/2024/07/20240717-physical_plan.svg)

## Physical to Executable plan
```java
/**
* A simple data context only with schema information.
*/
private static final class SchemaOnlyDataContext implements DataContext {
private final SchemaPlus schema;

SchemaOnlyDataContext(CalciteSchema calciteSchema) {
this.schema = calciteSchema.plus();
}

@Override public SchemaPlus getRootSchema() {
return schema;
}

@Override public JavaTypeFactory getTypeFactory() {
return new JavaTypeFactoryImpl();
}

@Override public QueryProvider getQueryProvider() {
return null;
}

@Override public Object get(final String name) {
return null;
}
}
```
// TODO 18. Compile generated code and obtain the executable program
```java
Expand Down

0 comments on commit 257e2d4

Please sign in to comment.