diff --git a/docs/en/quickstart/sdk/java_sdk.md b/docs/en/quickstart/sdk/java_sdk.md index a74f4c98f3c..489dea47282 100644 --- a/docs/en/quickstart/sdk/java_sdk.md +++ b/docs/en/quickstart/sdk/java_sdk.md @@ -1,8 +1,10 @@ # Java SDK -## Java SDK package installation +In Java SDK, the default execution mode for JDBC Statements is online, while the default execution mode for SqlClusterExecutor is offline. Please keep this in mind. -- Installing Java SDK package on Linux +## Java SDK Installation + +- Install Java SDK on Linux Configure the maven pom: @@ -19,7 +21,7 @@ ``` -- Installing Java SDK package on Mac +- Install Java SDK on Mac Configure the maven pom @@ -36,16 +38,14 @@ ``` -Note: Since the openmldb-native package contains the C++ static library compiled for OpenMLDB, it is defaults to the Linux static library. For macOS, the version of openmldb-native should be changed to `0.8.3-macos`, while the version of openmldb-jdbc should remain unchanged. - -The macOS version of openmldb-native only supports macOS 12. To run it on macOS 11 or macOS 10.15, the openmldb-native package needs to be compiled from source code on the corresponding OS. For detailed compilation methods, please refer to [Concurrent Compilation of Java SDK](https://openmldb.ai/docs/zh/main/deploy/compile.html#java-sdk). - -To connect to the OpenMLDB service using the Java SDK, you can use JDBC (recommended) or connect directly through SqlClusterExecutor. The following will demonstrate both connection methods in order. +Note: Since the openmldb-native package contains the C++ static library compiled for OpenMLDB, it defaults to the Linux static library. For macOS, the version of openmldb-native should be changed to `0.8.3-macos`, while the version of openmldb-jdbc remains unchanged. -## JDBC method +The macOS version of openmldb-native only supports macOS 12. To run it on macOS 11 or macOS 10.15, the openmldb-native package needs to be compiled from the source code on the corresponding OS. For detailed compilation methods, please refer to [Java SDK](../../deploy/compile.md#Build-java-sdk-with-multi-processes). +When using a self-compiled openmldb-native package, it is recommended to install it into your local Maven repository using `mvn install`. After that, you can reference it in your project's pom.xml file. It's not advisable to reference it using `scope=system`. -The connection method using JDBC is as follows: +To connect to the OpenMLDB service using the Java SDK, you can use JDBC (recommended) or connect directly through SqlClusterExecutor. The following will demonstrate both connection methods. +## Connection with JDBC ```java Class.forName("com._4paradigm.openmldb.jdbc.SQLDriver"); // No database in jdbcUrl @@ -58,10 +58,10 @@ Connection connection1 = DriverManager.getConnection("jdbc:openmldb:///test_db?z The database specified in the Connection address must exist when creating the connection. ```{caution} -he default execution mode for JDBC Connection is `online`. +The default execution mode for JDBC Connection is `online`. ``` -### Usage overview +### Statement All SQL commands can be executed using `Statement`, both in online and offline modes. To switch between offline and online modes, use command `SET @@execute_mode='...';``. For example: @@ -77,17 +77,22 @@ res = stmt.executeQuery("SELECT * from t1"); // For online mode, select or execu The `LOAD DATA` command is an asynchronous command, and the returned ResultSet contains information such as the job ID and state. You can execute `show job ` to check if the job has been completed. Note that the ResultSet needs to execute `next()` method to move the cursor to the first row of data. -It is also possible to change it to a synchronous command: +In offline mode, the default behavior is asynchronous execution, and the ResultSet returned is a Job Info. You can change this behavior to synchronous execution using `SET @@sync_job=true;`. However, please note that the ResultSet returned can vary depending on the specific SQL command. For more details, please refer to the [Function Boundary](../function_boundary.md). Synchronous execution is recommended when using `LOAD DATA` or `SELECT INTO` commands. -```SQL -SET @@sync_job=true; -``` +If synchronous commands are timing out, you can adjust the configuration as described in the [Offline Command Configuration](../../openmldb_sql/ddl/SET_STATEMENT.md). -If the actual execution time of the synchronous command exceeds the default maximum idle wait time of 0.5 hours, please [adjust the configuration](https://openmldb.ai/docs/zh/main/openmldb_sql/ddl/SET_STATEMENT.html#id4). +```{caution} +When you execute `SET @@execute_mode='offline'` on a `Statement`, it not only affects the current `Statement` but also impacts all `Statement` objects created, both existing and yet to be created, within the same `Connection`. Therefore, it is not advisable to create multiple `Statement` objects and expect them to execute in different modes. If you need to execute SQL in different modes, it's recommended to create multiple `Connection`. +``` ### PreparedStatement -`PreparedStatement` supports `SELECT`, `INSERT`, and `DELETE` operations. Note that `INSERT` only supports online insertion. +`PreparedStatement` supports `SELECT`, `INSERT`, and `DELETE`. +```{warning} +Any `PreparedStatement` executes only in the **online mode** and is not affected by the state before the `PreparedStatement` is created. `PreparedStatement` does not support switching to the offline mode. If you need to execute SQL in the offline mode, you can use a `Statement`. + +There are three types of `PreparedStatement` created by a `Connection`, which correspond to `getPreparedStatement`, `getInsertPreparedStmt`, and `getDeletePreparedStm`t in SqlClusterExecutor. +``` ```java PreparedStatement selectStatement = connection.prepareStatement("SELECT * FROM t1 WHERE id=?"); @@ -95,9 +100,10 @@ PreparedStatement insertStatement = connection.prepareStatement("INSERT INTO t1 PreparedStatement insertStatement = connection.prepareStatement("DELETE FROM t1 WHERE id=?"); ``` -## SqlClusterExecutor method +## SqlClusterExecutor +`SqlClusterExecutor` is the most comprehensive Java SDK connection method. It not only provides the basic CRUD operations that you can use with JDBC but also offers additional features like request modes and more. -### Creating a SqlClusterExecutor +### Create a SqlClusterExecutor First, configure the OpenMLDB connection parameters. @@ -108,14 +114,13 @@ option.setZkPath("/openmldb"); option.setSessionTimeout(10000); option.setRequestTimeout(60000); ``` - Then, use SdkOption to create the Executor. ```java sqlExecutor = new SqlClusterExecutor(option); ``` -`SqlClusterExecutor` execution of SQL operations is thread-safe, and in actual environments, a single `SqlClusterExecutor` can be created. However, since the execution mode (execute_mode) is an internal variable of `SqlClusterExecutor`, if you want to execute an offline command and an online command at the same time, unexpected results may occur. In this case, please use multiple `SqlClusterExecutors`. +`SqlClusterExecutor` execution of SQL operations is thread-safe, and in actual environments, a single `SqlClusterExecutor` can be created. However, since the execution mode (`execute_mode`) is an internal variable of `SqlClusterExecutor`, if you want to execute an offline command and an online command at the same time, unexpected results may occur. In this case, please use multiple `SqlClusterExecutors`. ```{caution} The default execution mode for SqlClusterExecutor is offline, which is different from the default mode for JDBC. @@ -158,7 +163,7 @@ try { } ``` -#### Executing batch SQL queries with Statement +#### Execute Batch SQL Queries with Statement Use the `Statement::execute` interface to execute batch SQL queries: @@ -200,15 +205,15 @@ try { ### PreparedStatement -`SqlClusterExecutor` can also obtain `PreparedStatement`, but you need to specify which type of `PreparedStatement` to obtain. For example, when using InsertPreparedStmt for insertion operations, there are three ways to do it. +`SqlClusterExecutor` can also obtain `PreparedStatement`, but you need to specify which type of `PreparedStatement` to obtain. For example, when using `InsertPreparedStmt` for insertion operations, there are three ways to do it. ```{note} -Insert operation only supports online mode and is not affected by execution mode. The data will always be inserted into the online database. +Any `PreparedStatement` executes exclusively in the **online mode** and is not influenced by the state of the `SqlClusterExecutor` at the time of its creation. `PreparedStatement` does not support switching to the offline mode. If you need to execute SQL in the offline mode, you can use a `Statement`. ``` #### Common Insert -1. Use the `SqlClusterExecutor::getInsertPreparedStmt(db, insertSql)` method to get the InsertPrepareStatement. +1. Use the `SqlClusterExecutor::getInsertPreparedStmt(db, insertSql)` method to get the `InsertPrepareStatement`. 2. Use the `PreparedStatement::execute()` method to execute the insert statement. ```java @@ -232,14 +237,14 @@ try { } ``` -#### Insert With Placeholder +#### Insert with Placeholder -1. Get InsertPrepareStatement by calling `SqlClusterExecutor::getInsertPreparedStmt(db, insertSqlWithPlaceHolder)` interface. -2. Use `PreparedStatement::setType(index, value)` interface to fill in data to the InsertPrepareStatement. Note that the index starts from 1. +1. Get `InsertPrepareStatement` by calling `SqlClusterExecutor::getInsertPreparedStmt(db, insertSqlWithPlaceHolder)` interface. +2. Use `PreparedStatement::setType(index, value)` interface to fill in data to the `InsertPrepareStatement`. Note that the index starts from 1. 3. Use `PreparedStatement::execute()` interface to execute the insert statement. ```{note} -When the conditions of the PreparedStatement are the same, you can repeatedly call the set method of the same object to fill in data before executing execute(). There is no need to create a new PreparedStatement object. +When the conditions of the `PreparedStatement` are the same, you can repeatedly call the set method of the same object to fill in data before executing `execute`. There is no need to create a new `PreparedStatement` object. ``` ```java @@ -266,13 +271,13 @@ try { ``` ```{note} -After execute, the cached data will be cleared and it is not possible to retry execute. +After `execute`, the cached data will be cleared and it is not possible to rerun `execute`. ``` -#### Batch Insert With Placeholder +#### Batch Insert with Placeholder -1. To use batch insert, first obtain the InsertPrepareStatement using the `SqlClusterExecutor::getInsertPreparedStmt(db, insertSqlWithPlaceHolder)` interface. -2. Then use the `PreparedStatement::setType(index, value)` interface to fill data into the InsertPrepareStatement. +1. To use batch insert, first obtain the `InsertPrepareStatement` using the `SqlClusterExecutor::getInsertPreparedStmt(db, insertSqlWithPlaceHolder)` interface. +2. Then use the `PreparedStatement::setType(index, value)` interface to fill data into the `InsertPrepareStatement`. 3. Use the `PreparedStatement::addBatch()` interface to complete filling for one row. 4. Continue to use `setType(index, value)` and `addBatch()` to fill multiple rows. 5. Use the `PreparedStatement::executeBatch()` interface to complete the batch insertion. @@ -305,12 +310,12 @@ try { ``` ```{note} -After executeBatch(), all cached data will be cleared and it's not possible to retry executeBatch(). +After `executeBatch`, all cached data will be cleared and it's not possible to rerun `executeBatch`. ``` -### Execute SQL request query +### Execute SQL Query -`RequestPreparedStmt` is a unique query mode (not supported by JDBC). This mode requires both the selectSql and a request data, so you need to provide the SQL and set the request data using setType when calling `getRequestPreparedStmt`. +`RequestPreparedStmt` is a unique query mode (not supported by JDBC). This mode requires both the selectSql and a request data, so you need to provide the SQL and set the request data using `setType` when calling `getRequestPreparedStmt`. There are three steps to execute a SQL request query: @@ -359,7 +364,7 @@ try { Assert.assertEquals(resultSet.getInt(2), 24); Assert.assertEquals(resultSet.getLong(3), 34); - // The return result set of the ordinary request query contains only one row of results. Therefore, the result of the second call to resultSet. next() is false + // The return result set of the ordinary request query contains only one row of results. Therefore, the result of the second call to resultSet.next() is false Assert.assertFalse(resultSet.next()); } catch (SQLException e) { @@ -368,7 +373,7 @@ try { } finally { try { if (resultSet != null) { - // result用完之后需要close + // close result resultSet.close(); } if (pstmt != null) { @@ -379,16 +384,82 @@ try { } } ``` +### Execute Deployment +To execute a deployment, you can use the `SqlClusterExecutor::getCallablePreparedStmt(db, deploymentName)` interface to obtain a `CallablePreparedStatement`. In contrast to the SQL request-based queries mentioned earlier, deployments are already online on the server, which makes them faster compared to SQL request-based queries. + +The process of using a deployment consists of two steps: +- Online Deployment +```java +// Deploy online (use selectSql). In a real production environment, deployments are typically already online and operational. +java.sql.Statement state = sqlExecutor.getStatement(); +try { + String selectSql = String.format("SELECT c1, c3, sum(c4) OVER w1 as w1_c4_sum FROM %s WINDOW w1 AS " + + "(PARTITION BY %s.c1 ORDER BY %s.c7 ROWS_RANGE BETWEEN 2d PRECEDING AND CURRENT ROW);", table, + table, table); + // Deploy + String deploySql = String.format("DEPLOY %s OPTIONS(RANGE_BIAS='inf', ROWS_BIAS='inf') %s", deploymentName, selectSql); + // set return null rs, don't check the returned value, it's false + state.execute(deploySql); +} catch (Exception e) { + e.printStackTrace(); +} +``` +- Execute Deployment +When executing a deployment, recreating a `CallablePreparedStmt` can be time-consuming. It is recommended to reuse the `CallablePreparedStmt` whenever possible. The `executeQuery()` method will automatically clear the request row cache for `setXX` requests. + +```java +// Execute Deployment +PreparedStatement pstmt = null; +ResultSet resultSet = null; +try { + pstmt = sqlExecutor.getCallablePreparedStmt(db, deploymentName); + // Obtain preparedstatement with name + // pstmt = sqlExecutor.getCallablePreparedStmt(db, deploymentName); + ResultSetMetaData metaData = pstmt.getMetaData(); + // Execute request mode requires setting query data in RequestPreparedStatement + setData(pstmt, metaData); + // executeQuery will execute select sql, and put result in resultSet + resultSet = pstmt.executeQuery(); -### Delete all data of a key under the specified index + Assert.assertTrue(resultSet.next()); + Assert.assertEquals(resultSet.getMetaData().getColumnCount(), 3); + Assert.assertEquals(resultSet.getString(1), "bb"); + Assert.assertEquals(resultSet.getInt(2), 24); + Assert.assertEquals(resultSet.getLong(3), 34); + Assert.assertFalse(resultSet.next()); + + // reuse way + for (int i = 0; i < 5; i++) { + setData(pstmt, metaData); + pstmt.executeQuery(); + // skip result check + } +} catch (SQLException e) { + e.printStackTrace(); + Assert.fail(); +} finally { + try { + if (resultSet != null) { + // close result + resultSet.close(); + } + if (pstmt != null) { + pstmt.close(); + } + } catch (SQLException throwables) { + throwables.printStackTrace(); + } +} +``` + +### Delete All Data of a Key under the Specified Index There are two ways to delete data through the Java SDK: - Execute delete SQL directly - - Use delete PreparedStatement -Note that this can only delete data under one index, not all indexes. Refer to [DELETE function boundary](https://openmldb.ai/docs/zh/main/quickstart/function_boundary.html#delete) for details. +Note that this can only delete data under one index, not all indexes. Refer to [DELETE function boundary](../function_boundary.md#delete) for details. ```java java.sql.Statement state = router.getStatement(); @@ -412,7 +483,7 @@ try { } ``` -### A complete example of using SqlClusterExecutor +### A Complete Example of SqlClusterExecutor Refer to the [Java quickstart demo](https://github.com/4paradigm/OpenMLDB/tree/main/demo/java_quickstart/demo). If it is used on macOS, please use openmldb-native of macOS version and increase the dependency of openmldb-native. @@ -427,9 +498,9 @@ java -cp target/demo-1.0-SNAPSHOT.jar com.openmldb.demo.App You must fill in `zkCluster` and `zkPath` (set method or the configuration `foo=bar` after `?` in JDBC). -### Optional configuration +### Optional Configuration -| Optional configuration | Description | +| Optional Configuration | Description | | ---------------------- | ------------------------------------------------------------ | | enableDebug | The default is false. Enable the debug log of hybridse (note that it is not the global debug log). You can view more logs of sql compilation and operation. However, not all of these logs are collected by the client. You need to view the tablet server logs. | | requestTimeout | The default is 60000 ms. This timeout is the rpc timeout sent by the client, except for those sent to the taskmanager (the rpc timeout of the job is controlled by the variable `job_timeout`). | @@ -441,16 +512,18 @@ You must fill in `zkCluster` and `zkPath` (set method or the configuration `foo= | zkLogFile | The default is empty, which is printed to stdout. | | sparkConfPath | The default is empty. You can change the spark conf used by the job through this configuration without configuring the taskmanager to restart. | -## SQL verification +## SQL Validation -The Java client supports the correct verification of SQL to verify whether it is executable. It is divided into batch and request modes. +The Java client supports the verification of SQL to verify whether it is executable. It is divided into batch and request modes. -- `ValidateSQLInBatch` can verify whether SQL can be executed at the offline end. +- `ValidateSQLInBatch` can verify whether SQL can be executed offline. - `ValidateSQLInRequest` can verify whether SQL can be deployed online. -Both interfaces need to go through all table schemas required by SQL. Currently, only single db is supported. Please do not use `db.table` format in SQL statements. +Both interfaces require providing all the table schemas required by the SQL and support multiple databases. For backward compatibility, it's allowed not to specify the db (current database in use) in the parameters. In such cases, it's equivalent to using the first db listed in use schema. It's important to ensure that the `` format tables are from the first db, which doesn't affect SQL statements in the `.
` format. + +For example, verify SQL `select count (c1) over w1 from t3 window w1 as (partition by c1 order by c2 rows between unbounded preceding and current row);`, In addition to this statement, you need to go through in the schema of table `t3` as the second parameter schemaMaps. The format is Map, key is the name of the db, and value is all the table schemas (maps) of each db. In fact, only a single db is supported, so there is usually only one db here, as shown in db3 below. The table schema map key under db is table name, and the value is `com._ 4paradigm.openmldb.sdk.Schema`, consisting of the name and type of each column. -For example, verify SQL `select count (c1) over w1 from t3 window w1 as (partition by c1 order by c2 rows between unbounded preceding and current row);`, In addition to this statement, you need to go through in the schema of table `t3` as the second parameter schemaMaps. The format is Map, key is the name of the db, and value is all the table schemas (maps) of each db. In fact, only a single db is supported, so there is usually only one db here, as shown in db3 below. The table schema map key under db is table name, and the value is com._ 4paradigm.openmldb.sdk.Schema, consisting of the name and type of each column. +The return result is a `List`. If the validation is successful, it returns an empty list. If the validation fails, it returns a list of error messages, such as `[error_msg, error_trace]`. ```java Map> schemaMaps = new HashMap<>(); @@ -461,5 +534,66 @@ schemaMaps.put("db3", dbSchema); List ret = SqlClusterExecutor.validateSQLInRequest("select count(c1) over w1 from t3 window "+ "w1 as(partition by c1 order by c2 rows between unbounded preceding and current row);", schemaMaps); Assert.assertEquals(ret.size(), 0); + +Map> schemaMaps = new HashMap<>(); +Map dbSchema = new HashMap<>(); +dbSchema = new HashMap<>(); +dbSchema.put("t3", new Schema(Arrays.asList(new Column("c1", Types.VARCHAR), new Column("c2", Types.BIGINT)))); +schemaMaps.put("db3", dbSchema); +// Can use parameter format of no db. Make sure that there's only one db in schemaMaps,and only
format is used in sql. +// List ret = SqlClusterExecutor.validateSQLInRequest("select count(c1) over w1 from t3 window "+ +// "w1 as(partition by c1 order by c2 rows between unbounded preceding and current row);", schemaMaps); +List ret = SqlClusterExecutor.validateSQLInRequest("select count(c1) over w1 from t3 window "+ + "w1 as(partition by c1 order by c2 rows between unbounded preceding and current row);", "db3", schemaMaps); +Assert.assertEquals(ret.size(), 0); ``` +## DDL Generation + +The `public static List genDDL(String sql, Map> tableSchema)` method can help users generate table creation statements based on the SQL they want to deploy. It currently supports only a **single** database. The `sql` parameter should not be in the `.
` format. The `tableSchema` parameter should include the schemas of all tables that the SQL depends on. The format of `tableSchema` should be consistent with what was discussed earlier. Even if `tableSchema` contains multiple databases, the database information will be discarded, and all tables will be treated as if they belong to an unknown database. + +## SQL Output Schema + +The `public static Schema genOutputSchema(String sql, String usedDB, Map> tableSchema)` method allows you to obtain the Output Schema for SQL queries and supports multiple databases. If you specify the `usedDB`, you can use tables from that database within the SQL using the `
` format. For backward compatibility, there is also support for the` public static Schema genOutputSchema(String sql, Map> tableSchema)` method without specifying a database (usedDB). In this case, it is equivalent to using the first database listed as the used db. Therefore, you should ensure that tables in `
` format within the `SQ`L query are associated with this first database. + + +## SQL Table Lineage +The `public static List> getDependentTables(String sql, String usedDB, Map> tableSchema)` method allows you to retrieve all tables that the sql query depends on. Each `Pair` in the list corresponds to the database name and table name, with the first element being the primary table, and the rest `[1, end)` representing other dependent tables (excluding the primary table). If the input parameter `usedDB` is an empty string, it means the query is performed without specifying a database (use db) context, which is different from the compatibility rules mentioned earlier for methods like `genDDL`. + +## SQL Merge +The Java client supports merging multiple SQL statements and performs correctness validation in request mode using the `mergeSQL` interface. However, it's important to note that merging is only possible when all the input SQL statements have the same primary table. + +Input parameters: SQL group to be merged; the name of the current database being used; the join key(s) for the primary table (which can be multiple); the schema for all tables involved. + +For example, let's consider four SQL feature views: +``` +// Single-table direct feature +select c1 from main; +// Single-table aggregation feature +select sum(c1) over w1 of2 from main window w1 as (partition by c1 order by c2 rows between unbounded preceding and current row); +// Multi-table feature +select t1.c2 of4 from main last join t1 order by t1.c2 on main.c1==t1.c1; +// Multi-table aggregation feature +select sum(c2) over w1 from main window w1 as (union (select \"\" as id, * from t1) partition by c1 order by c2 rows between unbounded preceding and current row); +``` + +Since all of them have the same primary table, "main," they can be merged. The merging process is essentially a join operation. To perform this operation, you also need to specify a unique column in the "main" table that can be used to identify a unique row of data. For example, if the "id" column in the "main" table is not unique and there may be multiple rows with the same "id" values, you can use a combination of "id" and "c1" columns for the join. Similar to SQL validation, you would also provide a schema map for the tables involved in the merge. + + +```java +//To simplify the demonstration, we are using tables from a single database, so you only need to specify used db and table names if your SQL statements all use the
format. you can leave the used database parameter as an empty string. If your SQL statements use the .
format, you can leave the used db parameter as an empty string +String merged = SqlClusterExecutor.mergeSQL(sqls, "db", Arrays.asList("id", "c1"), schemaMaps); +``` + +The output is a single merged SQL statement, as shown below. The input SQL includes a total of four features, so the merged SQL will only output these four feature columns. (The join keys are automatically filtered.) + +``` +select `c1`, `of2`, `of4`, `sum(c2)over w1` from (select main.id as merge_id_0, c1 from main) as out0 last join (select main.id as merge_id_1, sum(c1) over w1 of2 from main window w1 as (partition by c1 order by c2 rows between unbounded preceding and current row)) as out1 on out0.merge_id_0 = out1.merge_id_1 last join (select main.id as merge_id_2, t1.c2 of4 from main last join t1 order by t1.c2 on main.c1==t1.c1) as out2 on out0.merge_id_0 = out2.merge_id_2 last join (select main.id as merge_id_3, sum(c2) over w1 from main window w1 as (union (select "" as id, * from t1) partition by c1 order by c2 rows between unbounded preceding and current row)) as out3 on out0.merge_id_0 = out3.merge_id_3; +``` + +```{note} +If you encounter an "Ambiguous column name" error during the merging process, it may be due to having the same column names in different feature groups. To resolve this, you should use aliases in your input SQL to distinguish between them. +``` + + +