Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: update-quickstart-concepts-folder #3536

Merged
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/en/quickstart/concepts/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ Concept
.. toctree::
:maxdepth: 1

workflow
modes
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ OpenMLDB supports different execution modes at different stages of the feature e

The following diagram illustrates the typical process of using OpenMLDB for feature engineering development and deployment, as well as the execution modes used in the process:

![image-20220310170024349](https://openmldb.ai/docs/zh/main/_images/modes-flow.png)
![image-20220310170024349](images/modes-flow.png)

1. Offline Data Import: Import offline data for offline feature engineering development and debugging.
2. Offline Feature Development: Develop feature engineering scripts and debug them until satisfactory results are achieved. This step involves joint debugging of machine learning models (such as XGBoost, LightGBM, etc.), but this article mainly focuses on feature engineering development related to OpenMLDB.
Expand All @@ -18,32 +18,31 @@ The following diagram illustrates the typical process of using OpenMLDB for feat

## Overview of execution mode

As the data objects for offline and online scenarios are different, their underlying storage and computing nodes are also different. Therefore, OpenMLDB provides several built-in execution modes to support completing the above steps. The following table summarizes the execution modes and development tools used for each step, and three execution modes will be discussed in detail later.
As the data objects for offline and online scenarios are different, their underlying storage and computing nodes are also different. Therefore, OpenMLDB provides several built-in execution modes to support the above steps. The following table summarizes the execution modes and development tools used for each step, and three execution modes will be discussed in detail later.

| Steps | Execution Mode | Development Tool |
| ------------------------------ | ------------------- | ------------------------------------------------------------ |
| 1. Offline Data Import | Offline Mode | OpenMLDB CLI, SDKs |
| Offline Feature Development | Offline Mode | OpenMLDB CLI, SDKs |
| Feature Deployment | Offline Mode | OpenMLDB CLI, SDKs |
| Cold Start Online Data Import | Online Preview Mode | OpenMLDB CLI, SDKs, [Data Import Tool](https://openmldb.ai/docs/zh/main/tutorial/data_import.html) |
| Real-time Data Integration | Online Preview Mode | Connectors, SDKs |
| Online Data Preview (optional) | Online Preview Mode | OpenMLDB CLI, SDKs, [Data Export Tool](https://openmldb.ai/docs/zh/main/tutorial/data_export.html) |
| Real-time Feature Calculation | Online Request Mode | CLI (REST APIs), SDKs |
| 2. Offline Feature Development | Offline Mode | OpenMLDB CLI, SDKs |
| 3. Feature Deployment | Offline Mode | OpenMLDB CLI, SDKs |
| 4. Cold Start Online Data Import | Online Preview Mode | OpenMLDB CLI, SDKs, [Data Import Tool](../../tutorial/data_import.md) |
| 5. Real-time Data Integration | Online Preview Mode | Connectors, SDKs |
| 6. Online Data Preview (optional) | Online Preview Mode | OpenMLDB CLI, SDKs, [Data Export Tool](../../tutorial/data_export.md) |
| 7. Real-time Feature Calculation | Online Request Mode | CLI (REST APIs), SDKs |

### Offline Mode

After starting OpenMLDB CLI, the **default mode is offline mode**. Offline data import, offline feature development, and feature deployment are all executed in offline mode. The purpose of offline mode is to manage and compute offline data. The computing nodes involved are supported by OpenMLDB Spark optimized for feature engineering, and the storage nodes support commonly used storage systems such as HDFS.
After starting OpenMLDB CLI, the **default mode is offline mode**. Offline data import, offline feature development, and feature deployment are all executed in offline mode. The purpose of offline mode is to manage and compute offline data. The computing nodes involved are supported by [OpenMLDB Spark Distribution](../../tutorial/openmldbspark_distribution.md) optimized for feature engineering, and the storage nodes support commonly used storage systems such as HDFS.

Offline mode has the following main features:

- The offline mode supports most of the SQL syntax provided by OpenMLDB, including complex SQL syntaxes such as `LAST JOIN` and `WINDOW UNION`, which are optimized for feature engineering.
- The offline mode supports most of the SQL syntax provided by OpenMLDB, including complex SQL syntax such as `LAST JOIN` and `WINDOW UNION`.

- In offline mode, some SQL commands are executed asynchronously, such as `LOAD DATA`, `SELECT`, and `SELECT INTO` commands. Other SQL commands are executed synchronously.
- In offline mode, some SQL commands are executed asynchronously, such as `LOAD DATA`, `SELECT`, and `SELECT INTO`. Other SQL commands are executed synchronously.

- The asynchronous SQL is managed by the internal TaskManager and can be viewed and managed through commands such as `SHOW JOBS`, `SHOW JOB`, and `STOP JOB`.

```{tip}
:::
:::{tip}
Unlike many relational database systems, the `SELECT` command in offline mode is executed asynchronously by default. If you need to set it to synchronous execution, refer to setting the command to run synchronously in offline mode. During offline feature development, if asynchronous execution is used, it is strongly recommended to use the `SELECT INTO` statement for development and debugging, which can export the results to a file for easy viewing.
:::
```
Expand Down Expand Up @@ -78,7 +77,7 @@ The online request mode requires three inputs:

Based on the above inputs, for each real-time request row, the online request mode will return a feature extraction result. The computing logic is as follows: The request row is virtually inserted into the correct position of the online data table based on the logic in the SQL script (such as `PARTITION BY`, `ORDER BY`, etc.), and then only the feature aggregation computing is performed on that row, returning the unique corresponding extraction result. The following diagram intuitively explains the operation process of the online request mode.

![modes-request](https://openmldb.ai/docs/zh/main/_images/modes-request.png)
![modes-request](images/modes-request.png)

Online request mode is supported in the following ways:

Expand Down
Loading