chore: add pipeline and log doc (#1026)

Co-authored-by: Yiran <[email protected]> Co-authored-by: shuiyisong <[email protected]> Co-authored-by: Jeremyhi <[email protected]>
GreptimeTeam · Jul 15, 2024 · aa2f2ae · aa2f2ae
1 parent 238d114
commit aa2f2ae
Show file tree

Hide file tree

Showing 10 changed files with 1,511 additions and 0 deletions.
diff --git a/docs/nightly/en/user-guide/log/log-pipeline.md b/docs/nightly/en/user-guide/log/log-pipeline.md
diff --git a/docs/nightly/en/user-guide/log/manage-pipeline.md b/docs/nightly/en/user-guide/log/manage-pipeline.md
@@ -0,0 +1,90 @@
+# Managing Pipelines
+
+In GreptimeDB, each `pipeline` is a collection of data processing units used for parsing and transforming the ingested log content. This document provides guidance on creating and deleting pipelines to efficiently manage the processing flow of log data.
+
+
+For specific pipeline configurations, please refer to the [Pipeline Configuration](log-pipeline.md) documentation.
+
+## Create a Pipeline
+
+GreptimeDB provides a dedicated HTTP interface for creating pipelines.
+Assuming you have prepared a pipeline configuration file `pipeline.yaml`, use the following command to upload the configuration file, where `test` is the name you specify for the pipeline:
+
+```shell
+## Upload the pipeline file. 'test' is the name of the pipeline
+curl -X "POST" "http://localhost:4000/v1/events/pipelines/test" -F "[email protected]"
+```
+
+## Delete a Pipeline
+
+You can use the following HTTP interface to delete a pipeline:
+
+```shell
+## 'test' is the name of the pipeline
+curl -X "DELETE" "http://localhost:4000/v1/events/pipelines/test?version=2024-06-27%2012%3A02%3A34.257312110Z"
+```
+
+In the above example, we deleted a pipeline named `test`. The `version` parameter is required to specify the version of the pipeline to be deleted.
+
+## Query Pipelines
+
+Currently, you can use SQL to query pipeline information.
+
+```sql
+SELECT * FROM greptime_private.pipelines;
+```
+
+Please note that if you are using the MySQL or PostgreSQL protocol to connect to GreptimeDB, the precision of the pipeline time information may vary, and nanosecond-level precision may be lost.
+
+To address this issue, you can cast the `created_at` field to a timestamp to view the pipeline's creation time. For example, the following query displays `created_at` in `bigint` format:
+
+```sql
+SELECT name, pipeline, created_at::bigint FROM greptime_private.pipelines;
+```
+
+The query result is as follows:
+
+```
+ name |             pipeline              | greptime_private.pipelines.created_at
+------+-----------------------------------+---------------------------------------
+ test | processors:                      +|                   1719489754257312110
+      |   - date:                        +|
+      |       field: time                +|
+      |       formats:                   +|
+      |         - "%Y-%m-%d %H:%M:%S%.3f"+|
+      |       ignore_missing: true       +|
+      |                                  +|
+      | transform:                       +|
+      |   - fields:                      +|
+      |       - id1                      +|
+      |       - id2                      +|
+      |     type: int32                  +|
+      |   - fields:                      +|
+      |       - type                     +|
+      |       - logger                   +|
+      |     type: string                 +|
+      |     index: tag                   +|
+      |   - fields:                      +|
+      |       - log                      +|
+      |     type: string                 +|
+      |     index: fulltext              +|
+      |   - field: time                  +|
+      |     type: time                   +|
+      |     index: timestamp             +|
+      |                                   |
+(1 row)
+```
+
+Then, you can use a program to convert the bigint type timestamp from the SQL result into a time string.
+
+```shell
+timestamp_ns="1719489754257312110"; readable_timestamp=$(TZ=UTC date -d @$((${timestamp_ns:0:10}+0)) +"%Y-%m-%d %H:%M:%S").${timestamp_ns:10}Z; echo "Readable timestamp (UTC): $readable_timestamp"
+```
+
+Output:
+
+```shell
+Readable timestamp (UTC): 2024-06-27 12:02:34.257312110Z
+```
+
+The output `Readable timestamp (UTC)` represents the creation time of the pipeline and also serves as the version number.
diff --git a/docs/nightly/en/user-guide/log/overview.md b/docs/nightly/en/user-guide/log/overview.md
@@ -0,0 +1,6 @@
+# Overview
+
+- [Quick Start](./quick-start.md): Provides an introduction on how to quickly get started with GreptimeDB log service.
+- [Pipeline Configuration](./log-pipeline.md): Provides in-depth information on each specific configuration of pipelines in GreptimeDB.
+- [Managing Pipelines](./manage-pipeline.md): Explains how to create and delete pipelines.
+- [Writing Logs with Pipelines](./write-log.md): Provides detailed instructions on efficiently writing log data by leveraging the pipeline mechanism.
diff --git a/docs/nightly/en/user-guide/log/quick-start.md b/docs/nightly/en/user-guide/log/quick-start.md
@@ -0,0 +1,185 @@
+# Quick Start
+
+
+## Download and install & start GreptimeDB
+
+Follow the [Installation Guide](/getting-started/overview.md) to install and start GreptimeDB.
+
+## Create a Pipeline
+
+GreptimeDB provides a dedicated HTTP interface for creating Pipelines. Here's how to do it:
+
+First, create a Pipeline file, for example, `pipeline.yaml`.
+
+```yaml
+# pipeline.yaml
+processors:
+  - date:
+      field: time
+      formats:
+        - "%Y-%m-%d %H:%M:%S%.3f"
+      ignore_missing: true
+
+transform:
+  - fields:
+      - id1
+      - id2
+    type: int32
+  - fields:
+      - type
+      - logger
+    type: string
+    index: tag
+  - fields:
+      - log
+    type: string
+    index: fulltext
+  - field: time
+    type: time
+    index: timestamp
+```
+
+Then, execute the following command to upload the configuration file:
+
+```shell
+## Upload the pipeline file. "test" is the name of the Pipeline
+curl -X "POST" "http://localhost:4000/v1/events/pipelines/test" -F "[email protected]"
+```
+
+After the successful execution of this command, a Pipeline named `test` will be created, and the result will be returned as: `{"name":"test","version":"2024-06-27 12:02:34.257312110Z"}`.
+Here, `name` is the name of the Pipeline, and `version` is the Pipeline version.
+
+This Pipeline includes one Processor and three Transforms. The Processor uses the Rust time format string `%Y-%m-%d %H:%M:%S%.3f` to parse the timestamp field in the logs, and then the Transforms convert the `id1` and `id2` fields to `int32` type, the `type` and `logger` fields to `string` type with an index of "tag", the `log` field to `string` type with an index of "fulltext", and the `time` field to a time type with an index of "timestamp".
+
+Refer to the [Pipeline Introduction](log-pipeline.md) for specific syntax details.
+
+## Query Pipelines
+
+You can use SQL to query the pipeline content stored in the database. The example query is as follows:
+
+```sql
+SELECT * FROM greptime_private.pipelines;
+```
+
+The query result is as follows:
+
+```sql
+ name | schema | content_type |             pipeline              |         created_at
+------+--------+--------------+-----------------------------------+----------------------------
+ test | public | yaml         | processors:                      +| 2024-06-27 12:02:34.257312
+      |        |              |   - date:                        +|
+      |        |              |       field: time                +|
+      |        |              |       formats:                   +|
+      |        |              |         - "%Y-%m-%d %H:%M:%S%.3f"+|
+      |        |              |       ignore_missing: true       +|
+      |        |              |                                  +|
+      |        |              | transform:                       +|
+      |        |              |   - fields:                      +|
+      |        |              |       - id1                      +|
+      |        |              |       - id2                      +|
+      |        |              |     type: int32                  +|
+      |        |              |   - fields:                      +|
+      |        |              |       - type                     +|
+      |        |              |       - logger                   +|
+      |        |              |     type: string                 +|
+      |        |              |     index: tag                   +|
+      |        |              |   - fields:                      +|
+      |        |              |       - log                      +|
+      |        |              |     type: string                 +|
+      |        |              |     index: fulltext              +|
+      |        |              |   - field: time                  +|
+      |        |              |     type: time                   +|
+      |        |              |     index: timestamp             +|
+      |        |              |                                   |
+(1 row)
+```
+
+## Write logs
+
+The HTTP interface for writing logs is as follows:
+
+```shell
+curl -X "POST" "http://localhost:4000/v1/events/logs?db=public&table=logs&pipeline_name=test" \
+     -H 'Content-Type: application/json' \
+     -d $'{"time":"2024-05-25 20:16:37.217","id1":"2436","id2":"2528","type":"I","logger":"INTERACT.MANAGER","log":"ClusterAdapter:enter sendTextDataToCluster\\n"}
+{"time":"2024-05-25 20:16:37.217","id1":"2436","id2":"2528","type":"I","logger":"INTERACT.MANAGER","log":"ClusterAdapter:enter sendTextDataToCluster\\n"}
+{"time":"2024-05-25 20:16:37.217","id1":"2436","id2":"2528","type":"I","logger":"INTERACT.MANAGER","log":"ClusterAdapter:enter sendTextDataToCluster\\n"}
+{"time":"2024-05-25 20:16:37.217","id1":"2436","id2":"2528","type":"I","logger":"INTERACT.MANAGER","log":"ClusterAdapter:enter sendTextDataToCluster\\n"}'
+```
+
+The above command returns the following result:
+
+```json
+{"output":[{"affectedrows":4}],"execution_time_ms":22}
+```
+
+In the above example, we successfully wrote 4 log entries to the `public.logs` table.
+
+Please refer to [Writing Logs with Pipeline](write-log.md) for specific syntax for writing logs.
+
+## `logs` table structure
+
+We can use SQL to query the structure of the `public.logs` table.
+
+```sql
+DESC TABLE logs;
+```
+
+The query result is as follows:
+
+```sql
+ Column |        Type         | Key | Null | Default | Semantic Type
+--------+---------------------+-----+------+---------+---------------
+ id1    | Int32               |     | YES  |         | FIELD
+ id2    | Int32               |     | YES  |         | FIELD
+ type   | String              | PRI | YES  |         | TAG
+ logger | String              | PRI | YES  |         | TAG
+ log    | String              |     | YES  |         | FIELD
+ time   | TimestampNanosecond | PRI | NO   |         | TIMESTAMP
+(6 rows)
+```
+
+From the above result, we can see that based on the processed result of the pipeline, the `public.logs` table contains 6 fields: `id1` and `id2` are converted to the `Int32` type, `type`, `log`, and `logger` are converted to the `String` type, and time is converted to a `TimestampNanosecond` type and indexed as Timestamp.
+
+## Query logs
+
+We can use standard SQL to query log data.
+
+```shell
+# Connect to GreptimeDB using MySQL or PostgreSQL protocol
+
+# MySQL
+mysql --host=127.0.0.1 --port=4002 public
+
+# PostgreSQL
+psql -h 127.0.0.1 -p 4003 -d public
+```
+
+You can query the log table using SQL:
+
+```sql
+SELECT * FROM public.logs;
+```
+
+The query result is as follows:
+
+```sql
+ id1  | id2  | type |      logger      |                    log                     |            time
+------+------+------+------------------+--------------------------------------------+----------------------------
+ 2436 | 2528 | I    | INTERACT.MANAGER | ClusterAdapter:enter sendTextDataToCluster+| 2024-05-25 20:16:37.217000
+      |      |      |                  |                                            |
+ 2436 | 2528 | I    | INTERACT.MANAGER | ClusterAdapter:enter sendTextDataToCluster+| 2024-05-25 20:16:37.217000
+      |      |      |                  |                                            |
+ 2436 | 2528 | I    | INTERACT.MANAGER | ClusterAdapter:enter sendTextDataToCluster+| 2024-05-25 20:16:37.217000
+      |      |      |                  |                                            |
+ 2436 | 2528 | I    | INTERACT.MANAGER | ClusterAdapter:enter sendTextDataToCluster+| 2024-05-25 20:16:37.217000
+      |      |      |                  |                                            |
+(4 rows)
+```
+
+As you can see, the logs have been stored as structured logs after applying type conversions using the pipeline. This provides convenience for further querying and analysis of the logs.
+
+## Conclusion
+
+By following the above steps, you have successfully created a pipeline, written logs, and performed queries. This is just the tip of the iceberg in terms of the capabilities offered by GreptimeDB.
+Next, please continue reading [Pipeline Configuration](log-pipeline.md) and [Managing Pipelines](manage-pipeline.md) to learn more about advanced features and best practices.
diff --git a/docs/nightly/en/user-guide/log/write-log.md b/docs/nightly/en/user-guide/log/write-log.md
@@ -0,0 +1,31 @@
+# Writing Logs Using a Pipeline
+
+This document describes how to write logs to GreptimeDB by processing them through a specified pipeline using the HTTP interface.
+
+Before writing logs, please read the [Pipeline Configuration](log-pipeline.md) and [Managing Pipelines](manage-pipeline.md) documents to complete the configuration setup and upload.
+
+## HTTP API
+
+You can use the following command to write logs via the HTTP interface:
+
+```shell
+curl -X "POST" "http://localhost:4000/v1/events/logs?db=<db-name>&table=<table-name>&pipeline_name=<pipeline-name>" \
+     -H 'Content-Type: application/json' \
+     -d "$<log-items>"
+```
+
+## Query parameters
+
+This interface accepts the following parameters:
+
+- `db`: The name of the database.
+- `table`: The name of the table.
+- `pipeline_name`: The name of the [pipeline](./log-pipeline.md).
+
+## Body data format
+
+The request body supports NDJSON and JSON Array formats, where each JSON object represents a log entry.
+
+## Example
+
+Please refer to the "Writing Logs" section in the [Quick Start](quick-start.md#write-logs) guide for an example.