Skip to content

Commit

Permalink
chore: add pipeline and log doc (#1026)
Browse files Browse the repository at this point in the history
Co-authored-by: Yiran <[email protected]>
Co-authored-by: shuiyisong <[email protected]>
Co-authored-by: Jeremyhi <[email protected]>
  • Loading branch information
4 people authored Jul 15, 2024
1 parent 238d114 commit aa2f2ae
Show file tree
Hide file tree
Showing 10 changed files with 1,511 additions and 0 deletions.
440 changes: 440 additions & 0 deletions docs/nightly/en/user-guide/log/log-pipeline.md

Large diffs are not rendered by default.

90 changes: 90 additions & 0 deletions docs/nightly/en/user-guide/log/manage-pipeline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Managing Pipelines

In GreptimeDB, each `pipeline` is a collection of data processing units used for parsing and transforming the ingested log content. This document provides guidance on creating and deleting pipelines to efficiently manage the processing flow of log data.


For specific pipeline configurations, please refer to the [Pipeline Configuration](log-pipeline.md) documentation.

## Create a Pipeline

GreptimeDB provides a dedicated HTTP interface for creating pipelines.
Assuming you have prepared a pipeline configuration file `pipeline.yaml`, use the following command to upload the configuration file, where `test` is the name you specify for the pipeline:

```shell
## Upload the pipeline file. 'test' is the name of the pipeline
curl -X "POST" "http://localhost:4000/v1/events/pipelines/test" -F "[email protected]"
```

## Delete a Pipeline

You can use the following HTTP interface to delete a pipeline:

```shell
## 'test' is the name of the pipeline
curl -X "DELETE" "http://localhost:4000/v1/events/pipelines/test?version=2024-06-27%2012%3A02%3A34.257312110Z"
```

In the above example, we deleted a pipeline named `test`. The `version` parameter is required to specify the version of the pipeline to be deleted.

## Query Pipelines

Currently, you can use SQL to query pipeline information.

```sql
SELECT * FROM greptime_private.pipelines;
```

Please note that if you are using the MySQL or PostgreSQL protocol to connect to GreptimeDB, the precision of the pipeline time information may vary, and nanosecond-level precision may be lost.

To address this issue, you can cast the `created_at` field to a timestamp to view the pipeline's creation time. For example, the following query displays `created_at` in `bigint` format:

```sql
SELECT name, pipeline, created_at::bigint FROM greptime_private.pipelines;
```

The query result is as follows:

```
name | pipeline | greptime_private.pipelines.created_at
------+-----------------------------------+---------------------------------------
test | processors: +| 1719489754257312110
| - date: +|
| field: time +|
| formats: +|
| - "%Y-%m-%d %H:%M:%S%.3f"+|
| ignore_missing: true +|
| +|
| transform: +|
| - fields: +|
| - id1 +|
| - id2 +|
| type: int32 +|
| - fields: +|
| - type +|
| - logger +|
| type: string +|
| index: tag +|
| - fields: +|
| - log +|
| type: string +|
| index: fulltext +|
| - field: time +|
| type: time +|
| index: timestamp +|
| |
(1 row)
```

Then, you can use a program to convert the bigint type timestamp from the SQL result into a time string.

```shell
timestamp_ns="1719489754257312110"; readable_timestamp=$(TZ=UTC date -d @$((${timestamp_ns:0:10}+0)) +"%Y-%m-%d %H:%M:%S").${timestamp_ns:10}Z; echo "Readable timestamp (UTC): $readable_timestamp"
```

Output:

```shell
Readable timestamp (UTC): 2024-06-27 12:02:34.257312110Z
```

The output `Readable timestamp (UTC)` represents the creation time of the pipeline and also serves as the version number.
6 changes: 6 additions & 0 deletions docs/nightly/en/user-guide/log/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Overview

- [Quick Start](./quick-start.md): Provides an introduction on how to quickly get started with GreptimeDB log service.
- [Pipeline Configuration](./log-pipeline.md): Provides in-depth information on each specific configuration of pipelines in GreptimeDB.
- [Managing Pipelines](./manage-pipeline.md): Explains how to create and delete pipelines.
- [Writing Logs with Pipelines](./write-log.md): Provides detailed instructions on efficiently writing log data by leveraging the pipeline mechanism.
185 changes: 185 additions & 0 deletions docs/nightly/en/user-guide/log/quick-start.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
# Quick Start


## Download and install & start GreptimeDB

Follow the [Installation Guide](/getting-started/overview.md) to install and start GreptimeDB.

## Create a Pipeline

GreptimeDB provides a dedicated HTTP interface for creating Pipelines. Here's how to do it:

First, create a Pipeline file, for example, `pipeline.yaml`.

```yaml
# pipeline.yaml
processors:
- date:
field: time
formats:
- "%Y-%m-%d %H:%M:%S%.3f"
ignore_missing: true

transform:
- fields:
- id1
- id2
type: int32
- fields:
- type
- logger
type: string
index: tag
- fields:
- log
type: string
index: fulltext
- field: time
type: time
index: timestamp
```
Then, execute the following command to upload the configuration file:
```shell
## Upload the pipeline file. "test" is the name of the Pipeline
curl -X "POST" "http://localhost:4000/v1/events/pipelines/test" -F "[email protected]"
```

After the successful execution of this command, a Pipeline named `test` will be created, and the result will be returned as: `{"name":"test","version":"2024-06-27 12:02:34.257312110Z"}`.
Here, `name` is the name of the Pipeline, and `version` is the Pipeline version.

This Pipeline includes one Processor and three Transforms. The Processor uses the Rust time format string `%Y-%m-%d %H:%M:%S%.3f` to parse the timestamp field in the logs, and then the Transforms convert the `id1` and `id2` fields to `int32` type, the `type` and `logger` fields to `string` type with an index of "tag", the `log` field to `string` type with an index of "fulltext", and the `time` field to a time type with an index of "timestamp".

Refer to the [Pipeline Introduction](log-pipeline.md) for specific syntax details.

## Query Pipelines

You can use SQL to query the pipeline content stored in the database. The example query is as follows:

```sql
SELECT * FROM greptime_private.pipelines;
```

The query result is as follows:

```sql
name | schema | content_type | pipeline | created_at
------+--------+--------------+-----------------------------------+----------------------------
test | public | yaml | processors: +| 2024-06-27 12:02:34.257312
| | | - date: +|
| | | field: time +|
| | | formats: +|
| | | - "%Y-%m-%d %H:%M:%S%.3f"+|
| | | ignore_missing: true +|
| | | +|
| | | transform: +|
| | | - fields: +|
| | | - id1 +|
| | | - id2 +|
| | | type: int32 +|
| | | - fields: +|
| | | - type +|
| | | - logger +|
| | | type: string +|
| | | index: tag +|
| | | - fields: +|
| | | - log +|
| | | type: string +|
| | | index: fulltext +|
| | | - field: time +|
| | | type: time +|
| | | index: timestamp +|
| | | |
(1 row)
```

## Write logs

The HTTP interface for writing logs is as follows:

```shell
curl -X "POST" "http://localhost:4000/v1/events/logs?db=public&table=logs&pipeline_name=test" \
-H 'Content-Type: application/json' \
-d $'{"time":"2024-05-25 20:16:37.217","id1":"2436","id2":"2528","type":"I","logger":"INTERACT.MANAGER","log":"ClusterAdapter:enter sendTextDataToCluster\\n"}
{"time":"2024-05-25 20:16:37.217","id1":"2436","id2":"2528","type":"I","logger":"INTERACT.MANAGER","log":"ClusterAdapter:enter sendTextDataToCluster\\n"}
{"time":"2024-05-25 20:16:37.217","id1":"2436","id2":"2528","type":"I","logger":"INTERACT.MANAGER","log":"ClusterAdapter:enter sendTextDataToCluster\\n"}
{"time":"2024-05-25 20:16:37.217","id1":"2436","id2":"2528","type":"I","logger":"INTERACT.MANAGER","log":"ClusterAdapter:enter sendTextDataToCluster\\n"}'
```

The above command returns the following result:

```json
{"output":[{"affectedrows":4}],"execution_time_ms":22}
```

In the above example, we successfully wrote 4 log entries to the `public.logs` table.

Please refer to [Writing Logs with Pipeline](write-log.md) for specific syntax for writing logs.

## `logs` table structure

We can use SQL to query the structure of the `public.logs` table.

```sql
DESC TABLE logs;
```

The query result is as follows:

```sql
Column | Type | Key | Null | Default | Semantic Type
--------+---------------------+-----+------+---------+---------------
id1 | Int32 | | YES | | FIELD
id2 | Int32 | | YES | | FIELD
type | String | PRI | YES | | TAG
logger | String | PRI | YES | | TAG
log | String | | YES | | FIELD
time | TimestampNanosecond | PRI | NO | | TIMESTAMP
(6 rows)
```

From the above result, we can see that based on the processed result of the pipeline, the `public.logs` table contains 6 fields: `id1` and `id2` are converted to the `Int32` type, `type`, `log`, and `logger` are converted to the `String` type, and time is converted to a `TimestampNanosecond` type and indexed as Timestamp.

## Query logs

We can use standard SQL to query log data.

```shell
# Connect to GreptimeDB using MySQL or PostgreSQL protocol

# MySQL
mysql --host=127.0.0.1 --port=4002 public

# PostgreSQL
psql -h 127.0.0.1 -p 4003 -d public
```

You can query the log table using SQL:

```sql
SELECT * FROM public.logs;
```

The query result is as follows:

```sql
id1 | id2 | type | logger | log | time
------+------+------+------------------+--------------------------------------------+----------------------------
2436 | 2528 | I | INTERACT.MANAGER | ClusterAdapter:enter sendTextDataToCluster+| 2024-05-25 20:16:37.217000
| | | | |
2436 | 2528 | I | INTERACT.MANAGER | ClusterAdapter:enter sendTextDataToCluster+| 2024-05-25 20:16:37.217000
| | | | |
2436 | 2528 | I | INTERACT.MANAGER | ClusterAdapter:enter sendTextDataToCluster+| 2024-05-25 20:16:37.217000
| | | | |
2436 | 2528 | I | INTERACT.MANAGER | ClusterAdapter:enter sendTextDataToCluster+| 2024-05-25 20:16:37.217000
| | | | |
(4 rows)
```

As you can see, the logs have been stored as structured logs after applying type conversions using the pipeline. This provides convenience for further querying and analysis of the logs.

## Conclusion

By following the above steps, you have successfully created a pipeline, written logs, and performed queries. This is just the tip of the iceberg in terms of the capabilities offered by GreptimeDB.
Next, please continue reading [Pipeline Configuration](log-pipeline.md) and [Managing Pipelines](manage-pipeline.md) to learn more about advanced features and best practices.
31 changes: 31 additions & 0 deletions docs/nightly/en/user-guide/log/write-log.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Writing Logs Using a Pipeline

This document describes how to write logs to GreptimeDB by processing them through a specified pipeline using the HTTP interface.

Before writing logs, please read the [Pipeline Configuration](log-pipeline.md) and [Managing Pipelines](manage-pipeline.md) documents to complete the configuration setup and upload.

## HTTP API

You can use the following command to write logs via the HTTP interface:

```shell
curl -X "POST" "http://localhost:4000/v1/events/logs?db=<db-name>&table=<table-name>&pipeline_name=<pipeline-name>" \
-H 'Content-Type: application/json' \
-d "$<log-items>"
```

## Query parameters

This interface accepts the following parameters:

- `db`: The name of the database.
- `table`: The name of the table.
- `pipeline_name`: The name of the [pipeline](./log-pipeline.md).

## Body data format

The request body supports NDJSON and JSON Array formats, where each JSON object represents a log entry.

## Example

Please refer to the "Writing Logs" section in the [Quick Start](quick-start.md#write-logs) guide for an example.
Loading

0 comments on commit aa2f2ae

Please sign in to comment.