diff --git a/docs/nightly/en/user-guide/log/log-pipeline.md b/docs/nightly/en/user-guide/log/log-pipeline.md new file mode 100644 index 000000000..c45502239 --- /dev/null +++ b/docs/nightly/en/user-guide/log/log-pipeline.md @@ -0,0 +1,440 @@ +# Pipeline Configuration + +Pipeline is a mechanism in GreptimeDB for transforming log data. It consists of a unique name and a set of configuration rules that define how log data is formatted, split, and transformed. Currently, we support JSON (`application/json`) and plain text (`text/plain`) formats as input for log data. + +These configurations are provided in YAML format, allowing the Pipeline to process data during the log writing process according to the defined rules and store the processed data in the database for subsequent structured queries. + +## The overall structure + +Pipeline consists of two parts: Processors and Transform, both of which are in array format. A Pipeline configuration can contain multiple Processors and multiple Transforms. The data type described by Transform determines the table structure when storing log data in the database. + +- Processors are used for preprocessing log data, such as parsing time fields and replacing fields. +- Transform is used for converting log data formats, such as converting string types to numeric types. + +Here is an example of a simple configuration that includes Processors and Transform: + +```yaml +processors: + - urlencoding: + fields: + - string_field_a + - string_field_b + method: decode + ignore_missing: true +transform: + - fields: + - string_field_a + - string_field_b + type: string + # The written data must include the timestamp field + - fields: + - reqTimeSec, req_time_sec + # epoch is a special field type and must specify precision + type: epoch, ms + index: timestamp +``` + +## Processor + +The Processor is used for preprocessing log data, and its configuration is located under the `processors` field in the YAML file. The Pipeline processes data by applying multiple Processors in sequential order, where each Processor depends on the result of the previous Processor. A Processor consists of a name and multiple configurations, and different types of Processors have different fields in their configuration. + +We currently provide the following built-in Processors: + +- `date`: Used to parse formatted time string fields, such as `2024-07-12T16:18:53.048`. +- `epoch`: Used to parse numeric timestamp fields, such as `1720772378893`. +- `dissect`: Used to split log data fields. +- `gsub`: Used to replace log data fields. +- `join`: Used to merge array-type fields in logs. +- `letter`: Used to convert log data fields to letters. +- `regex`: Used to perform regular expression matching on log data fields. +- `urlencoding`: Used to perform URL encoding/decoding on log data fields. +- `csv`: Used to parse CSV data fields in logs. + +### `date` + +The `date` processor is used to parse time fields. Here's an example configuration: + +```yaml +processors: + - date: + fields: + - time + formats: + - '%Y-%m-%d %H:%M:%S%.3f' + ignore_missing: true + timezone: 'Asia/Shanghai' +``` + +In the above example, the configuration of the `date` processor includes the following fields: + +- `fields`: A list of time field names to be parsed. +- `formats`: Time format strings, supporting multiple format strings. Parsing is attempted in the order provided until successful. +- `ignore_missing`: Ignores the case when the field is missing. Defaults to `false`. If the field is missing and this configuration is set to `false`, an exception will be thrown. +- `timezone`: Time zone. Use the time zone identifiers from the [tz_database](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) to specify the time zone. Defaults to `UTC`. + +### `epoch` + +The `epoch` processor is used to parse timestamp fields. Here's an example configuration: + +```yaml +processors: + - epoch: + fields: + - reqTimeSec + resolution: millisecond + ignore_missing: true +``` + +In the above example, the configuration of the `epoch` processor includes the following fields: + +- `fields`: A list of timestamp field names to be parsed. +- `resolution`: Timestamp precision, supports `s`, `sec`, `second`, `ms`, `millisecond`, `milli`, `us`, `microsecond`, `micro`, `ns`, `nanosecond`, `nano`. Defaults to `ms`. +- `ignore_missing`: Ignores the case when the field is missing. Defaults to `false`. If the field is missing and this configuration is set to `false`, an exception will be thrown. + +### `dissect` + +The `dissect` processor is used to split log data fields. Here's an example configuration: + +```yaml +processors: + - dissect: + fields: + - message + patterns: + - '%{key1} %{key2}' + ignore_missing: true + append_separator: '-' +``` + +In the above example, the configuration of the `dissect` processor includes the following fields: + +- `fields`: A list of field names to be split. +- `patterns`: The dissect pattern for splitting. +- `ignore_missing`: Ignores the case when the field is missing. Defaults to `false`. If the field is missing and this configuration is set to `false`, an exception will be thrown. +- `append_separator`: Specifies the separator for concatenating multiple fields with same field name together. Defaults to an empty string. See `+` modifier below. + +#### Dissect pattern + +Similar to Logstash's dissect pattern, the dissect pattern consists of `%{key}`, where `%{key}` is a field name. For example: + +``` +"%{key1} %{key2} %{+key3} %{+key4/2} %{key5->} %{?key6} %{*key7} %{&key8}" +``` + +#### Dissect modifiers + +The dissect pattern supports the following modifiers: + +| Modifier | Description | Example | +| ---------- | ---------------------------------------------------- | -------------------- | +| `+` | Concatenates two or more fields together | `%{+key} %{+key}` | +| `+` and `/n` | Concatenates two or more fields in the specified order | `%{+key/2} %{+key/1}` | +| `->` | Ignores any repeating characters on the right side | `%{key1->} %{key2->}` | +| `?` | Ignores matching values | `%{?key}` | +| `*` and `&` | Sets the output key as \* and the output value as & | `%{*key} %{&value}` | + +#### `dissect` examples + +For example, given the following log data: + +``` +"key1 key2 key3 key4 key5 key6 key7 key8" +``` + +Using the following Dissect pattern: + +``` +"%{key1} %{key2} %{+key3} %{+key3/2} %{key5->} %{?key6} %{*key7} %{&key8}" +``` + +The result will be: + +``` +{ + "key1": "key1", + "key2": "key2", + "key3": "key3 key4", + "key5": "key5", + "key7": "key8" +} +``` + +### `gsub` + +The `gsub` processor is used to replace values in log data fields. Here's an example configuration: + +```yaml +processors: + - gsub: + fields: + - message + pattern: 'old' + replacement: 'new' + ignore_missing: true +``` + +In the above example, the configuration of the `gsub` processor includes the following fields: + +- `fields`: A list of field names to be replaced. +- `pattern`: The string to be replaced. Supports regular expressions. +- `replacement`: The string to replace with. +- `ignore_missing`: Ignores the case when the field is missing. Defaults to `false`. If the field is missing and this configuration is set to `false`, an exception will be thrown. + +### `join` + +The `join` processor is used to merge Array-type fields in log data. Here's an example configuration: + +```yaml +processors: + - join: + fields: + - message + separator: ',' + ignore_missing: true +``` + +In the above example, the configuration of the `join` processor includes the following fields: + +- `fields`: A list of field names to be merged. Note that all the fields mentioned are already of array type. Each field will have its own array merged separately. The content of multiple fields will not be combined. +- `separator`: The separator for the merged result. +- `ignore_missing`: Ignores the case when the field is missing. Defaults to `false`. If the field is missing and this configuration is set to `false`, an exception will be thrown. + +#### `join` example + +For example, given the following log data: + +```json +{ + "message": ["a", "b", "c"] +} +``` + +Using the following configuration: + +```yaml +processors: + - join: + fields: + - message + separator: ',' +``` + +The result will be: + +```json +{ + "message": "a,b,c" +} +``` + +### `letter` + +The `letter` processor is used to convert the case of characters in log data fields. Here's an example configuration: + +```yaml +processors: + - letter: + fields: + - message + method: upper + ignore_missing: true +``` + +In the above example, the configuration of the `letter` processor includes the following fields: + +- `fields`: A list of field names to be transformed. +- `method`: The transformation method, supports `upper`, `lower`, `capital`. Defaults to `lower`. +- `ignore_missing`: Ignores the case when the field is missing. Defaults to `false`. If the field is missing and this configuration is set to `false`, an exception will be thrown. + +### `regex` + +The `regex` processor is used to perform regular expression matching on log data fields. Here's an example configuration: + +```yaml +processors: + - regex: + fields: + - message + pattern: ':(?[0-9])' + ignore_missing: true +``` + +In the above example, the configuration of the `regex` processor includes the following fields: + +- `fields`: A list of field names to be matched. +- `pattern`: The regular expression pattern to match. Named capture groups are required to extract corresponding data from the respective field. +- `ignore_missing`: Ignores the case when the field is missing. Defaults to `false`. If the field is missing and this configuration is set to `false`, an exception will be thrown. + +#### Rules for named capture groups in regex + +The `regex` processor supports the syntax `(?...)` to define named capture groups. The data will be processed into the following format: + +```json +{ + "_": "" +} +``` + +For example, if the field name specified in the `regex` processor is `message`, and the corresponding content is `"[ERROR] error message"`, you can set the pattern as `\[(?[A-Z]+)\] (?.+)`, and the data will be processed as: + +```json +{ + "message_level": "ERROR", + "message_content": "error message" +} +``` + +### `urlencoding` + +The `urlencoding` processor is used to perform URL encoding on log data fields. Here's an example configuration: + +```yaml +processors: + - urlencoding: + fields: + - string_field_a + - string_field_b + method: decode + ignore_missing: true +``` + +In the above example, the configuration of the `urlencoding` processor includes the following fields: + +- `fields`: A list of field names to be encoded. +- `method`: The encoding method, supports `encode`, `decode`. Defaults to `encode`. +- `ignore_missing`: Ignores the case when the field is missing. Defaults to `false`. If the field is missing and this configuration is set to `false`, an exception will be thrown. + +### `csv` + +The `csv` processor is used to parse CSV-type fields in log data that do not have a header. Here's an example configuration: + +```yaml +processors: + - csv: + fields: + - message + separator: ',' + quote: '"' + trim: true + ignore_missing: true +``` + +In the above example, the configuration of the `csv` processor includes the following fields: + +- `fields`: A list of field names to be parsed. +- `separator`: The separator. +- `quote`: The quotation mark. +- `trim`: Whether to trim whitespace. Defaults to `false`. +- `ignore_missing`: Ignores the case when the field is missing. Defaults to `false`. If the field is missing and this configuration is set to `false`, an exception will be thrown. + + +## Transform + +Transform is used to convert log data, and its configuration is located under the `transform` field in the YAML file. + +A Transform consists of one or more configurations, and each configuration contains the following fields: + +- `fields`: A list of field names to be transformed. +- `type`: The transformation type. +- `index`: Index type (optional). +- `on_failure`: Handling method for transformation failures (optional). +- `default`: Default value (optional). + +### The `fields` field + +Each field name is a string. When a field name contains `,`, the field will be renamed. For example, `reqTimeSec, req_time_sec` means renaming the `reqTimeSec` field to `req_time_sec`, and the final data will be written to the `req_time_sec` column in GreptimeDB. + +### The `type` field + +GreptimeDB currently provides the following built-in transformation types: + +- `int8`, `int16`, `int32`, `int64`: Integer types. +- `uint8`, `uint16`, `uint32`, `uint64`: Unsigned integer types. +- `float32`, `float64`: Floating-point types. +- `string`: String type. +- `time`: Time type, which will be converted to GreptimeDB `timestamp(9)` type. +- `epoch`: Timestamp type, which will be converted to GreptimeDB `timestamp(n)` type. The value of `n` depends on the precision of the epoch. When the precision is `s`, `n` is 0; when the precision is `ms`, `n` is 3; when the precision is `us`, `n` is 6; when the precision is `ns`, `n` is 9. + +If a field obtains an illegal value during the transformation process, the Pipeline will throw an exception. For example, when converting a string `abc` to an integer, an exception will be thrown because the string is not a valid integer. + +### The `index` field + +The `Pipeline` will write the processed data to the automatically created table in GreptimeDB. To improve query efficiency, GreptimeDB creates indexes for certain columns in the table. The `index` field is used to specify which fields need to be indexed. For information about GreptimeDB column types, please refer to the [Data Model](/user-guide/concepts/data-model.md) documentation. + +GreptimeDB supports the following three types of index for fields: + +- `tag`: Specifies a column as a Tag column. +- `fulltext`: Specifies a column to use the fulltext index type. The column must be of string type. +- `timestamp`: Specifies a column as a timestamp index column. + +When `index` field is not provided, GreptimeDB treats the field as a `Field` column. + +In GreptimeDB, a table must include one column of type `timestamp` as the time index column. Therefore, a Pipeline can have only one time index column. + +#### The Timestamp column + +Specify which field is the timestamp index column using `index: timestamp`. Refer to the [Transform Example](#transform-example) below for syntax. + +#### The Tag column + +Specify which field is the Tag column using `index: tag`. Refer to the [Transform Example](#transform-example) below for syntax. + +#### The Fulltext column + +Specify which field will be used for full-text search using `index: fulltext`. This index greatly improves the performance of [log search](./log-query.md). Refer to the [Transform Example](#transform-example) below for syntax. + +### The `on_failure` field + +The `on_failure` field is used to specify the handling method when a transformation fails. It supports the following methods: + +- `ignore`: Ignore the failed field and do not write it to the database. +- `default`: Write the default value. The default value is specified by the `default` field. + +### The `default` field + +The `default` field is used to specify the default value when a transformation fails. + +### Transform Example + +For example, with the following log data: + +```json +{ + "num_field_a": "3", + "string_field_a": "john", + "string_field_b": "It was snowing when he was born.", + "time_field_a": 1625760000 +} +``` + +Using the following configuration: + +```yaml +transform: + - fields: + - string_field_a, name + type: string + index: tag + - fields: + - num_field_a, age + type: int32 + - fields: + - string_field_b, description + type: string + index: fulltext + - fields: + - time_field_a, bron_time + type: epoch, s + index: timestamp +``` + +The result will be: + +``` +{ + "name": "john", + "age": 3, + "description": "It was snowing when he was born.", + "bron_time": 2021-07-08 16:00:00 +} +``` \ No newline at end of file diff --git a/docs/nightly/en/user-guide/log/manage-pipeline.md b/docs/nightly/en/user-guide/log/manage-pipeline.md new file mode 100644 index 000000000..72cca643e --- /dev/null +++ b/docs/nightly/en/user-guide/log/manage-pipeline.md @@ -0,0 +1,90 @@ +# Managing Pipelines + +In GreptimeDB, each `pipeline` is a collection of data processing units used for parsing and transforming the ingested log content. This document provides guidance on creating and deleting pipelines to efficiently manage the processing flow of log data. + + +For specific pipeline configurations, please refer to the [Pipeline Configuration](log-pipeline.md) documentation. + +## Create a Pipeline + +GreptimeDB provides a dedicated HTTP interface for creating pipelines. +Assuming you have prepared a pipeline configuration file `pipeline.yaml`, use the following command to upload the configuration file, where `test` is the name you specify for the pipeline: + +```shell +## Upload the pipeline file. 'test' is the name of the pipeline +curl -X "POST" "http://localhost:4000/v1/events/pipelines/test" -F "file=@pipeline.yaml" +``` + +## Delete a Pipeline + +You can use the following HTTP interface to delete a pipeline: + +```shell +## 'test' is the name of the pipeline +curl -X "DELETE" "http://localhost:4000/v1/events/pipelines/test?version=2024-06-27%2012%3A02%3A34.257312110Z" +``` + +In the above example, we deleted a pipeline named `test`. The `version` parameter is required to specify the version of the pipeline to be deleted. + +## Query Pipelines + +Currently, you can use SQL to query pipeline information. + +```sql +SELECT * FROM greptime_private.pipelines; +``` + +Please note that if you are using the MySQL or PostgreSQL protocol to connect to GreptimeDB, the precision of the pipeline time information may vary, and nanosecond-level precision may be lost. + +To address this issue, you can cast the `created_at` field to a timestamp to view the pipeline's creation time. For example, the following query displays `created_at` in `bigint` format: + +```sql +SELECT name, pipeline, created_at::bigint FROM greptime_private.pipelines; +``` + +The query result is as follows: + +``` + name | pipeline | greptime_private.pipelines.created_at +------+-----------------------------------+--------------------------------------- + test | processors: +| 1719489754257312110 + | - date: +| + | field: time +| + | formats: +| + | - "%Y-%m-%d %H:%M:%S%.3f"+| + | ignore_missing: true +| + | +| + | transform: +| + | - fields: +| + | - id1 +| + | - id2 +| + | type: int32 +| + | - fields: +| + | - type +| + | - logger +| + | type: string +| + | index: tag +| + | - fields: +| + | - log +| + | type: string +| + | index: fulltext +| + | - field: time +| + | type: time +| + | index: timestamp +| + | | +(1 row) +``` + +Then, you can use a program to convert the bigint type timestamp from the SQL result into a time string. + +```shell +timestamp_ns="1719489754257312110"; readable_timestamp=$(TZ=UTC date -d @$((${timestamp_ns:0:10}+0)) +"%Y-%m-%d %H:%M:%S").${timestamp_ns:10}Z; echo "Readable timestamp (UTC): $readable_timestamp" +``` + +Output: + +```shell +Readable timestamp (UTC): 2024-06-27 12:02:34.257312110Z +``` + +The output `Readable timestamp (UTC)` represents the creation time of the pipeline and also serves as the version number. \ No newline at end of file diff --git a/docs/nightly/en/user-guide/log/overview.md b/docs/nightly/en/user-guide/log/overview.md new file mode 100644 index 000000000..5e8ef1ae9 --- /dev/null +++ b/docs/nightly/en/user-guide/log/overview.md @@ -0,0 +1,6 @@ +# Overview + +- [Quick Start](./quick-start.md): Provides an introduction on how to quickly get started with GreptimeDB log service. +- [Pipeline Configuration](./log-pipeline.md): Provides in-depth information on each specific configuration of pipelines in GreptimeDB. +- [Managing Pipelines](./manage-pipeline.md): Explains how to create and delete pipelines. +- [Writing Logs with Pipelines](./write-log.md): Provides detailed instructions on efficiently writing log data by leveraging the pipeline mechanism. \ No newline at end of file diff --git a/docs/nightly/en/user-guide/log/quick-start.md b/docs/nightly/en/user-guide/log/quick-start.md new file mode 100644 index 000000000..343847efc --- /dev/null +++ b/docs/nightly/en/user-guide/log/quick-start.md @@ -0,0 +1,185 @@ +# Quick Start + + +## Download and install & start GreptimeDB + +Follow the [Installation Guide](/getting-started/overview.md) to install and start GreptimeDB. + +## Create a Pipeline + +GreptimeDB provides a dedicated HTTP interface for creating Pipelines. Here's how to do it: + +First, create a Pipeline file, for example, `pipeline.yaml`. + +```yaml +# pipeline.yaml +processors: + - date: + field: time + formats: + - "%Y-%m-%d %H:%M:%S%.3f" + ignore_missing: true + +transform: + - fields: + - id1 + - id2 + type: int32 + - fields: + - type + - logger + type: string + index: tag + - fields: + - log + type: string + index: fulltext + - field: time + type: time + index: timestamp +``` + +Then, execute the following command to upload the configuration file: + +```shell +## Upload the pipeline file. "test" is the name of the Pipeline +curl -X "POST" "http://localhost:4000/v1/events/pipelines/test" -F "file=@pipeline.yaml" +``` + +After the successful execution of this command, a Pipeline named `test` will be created, and the result will be returned as: `{"name":"test","version":"2024-06-27 12:02:34.257312110Z"}`. +Here, `name` is the name of the Pipeline, and `version` is the Pipeline version. + +This Pipeline includes one Processor and three Transforms. The Processor uses the Rust time format string `%Y-%m-%d %H:%M:%S%.3f` to parse the timestamp field in the logs, and then the Transforms convert the `id1` and `id2` fields to `int32` type, the `type` and `logger` fields to `string` type with an index of "tag", the `log` field to `string` type with an index of "fulltext", and the `time` field to a time type with an index of "timestamp". + +Refer to the [Pipeline Introduction](log-pipeline.md) for specific syntax details. + +## Query Pipelines + +You can use SQL to query the pipeline content stored in the database. The example query is as follows: + +```sql +SELECT * FROM greptime_private.pipelines; +``` + +The query result is as follows: + +```sql + name | schema | content_type | pipeline | created_at +------+--------+--------------+-----------------------------------+---------------------------- + test | public | yaml | processors: +| 2024-06-27 12:02:34.257312 + | | | - date: +| + | | | field: time +| + | | | formats: +| + | | | - "%Y-%m-%d %H:%M:%S%.3f"+| + | | | ignore_missing: true +| + | | | +| + | | | transform: +| + | | | - fields: +| + | | | - id1 +| + | | | - id2 +| + | | | type: int32 +| + | | | - fields: +| + | | | - type +| + | | | - logger +| + | | | type: string +| + | | | index: tag +| + | | | - fields: +| + | | | - log +| + | | | type: string +| + | | | index: fulltext +| + | | | - field: time +| + | | | type: time +| + | | | index: timestamp +| + | | | | +(1 row) +``` + +## Write logs + +The HTTP interface for writing logs is as follows: + +```shell +curl -X "POST" "http://localhost:4000/v1/events/logs?db=public&table=logs&pipeline_name=test" \ + -H 'Content-Type: application/json' \ + -d $'{"time":"2024-05-25 20:16:37.217","id1":"2436","id2":"2528","type":"I","logger":"INTERACT.MANAGER","log":"ClusterAdapter:enter sendTextDataToCluster\\n"} +{"time":"2024-05-25 20:16:37.217","id1":"2436","id2":"2528","type":"I","logger":"INTERACT.MANAGER","log":"ClusterAdapter:enter sendTextDataToCluster\\n"} +{"time":"2024-05-25 20:16:37.217","id1":"2436","id2":"2528","type":"I","logger":"INTERACT.MANAGER","log":"ClusterAdapter:enter sendTextDataToCluster\\n"} +{"time":"2024-05-25 20:16:37.217","id1":"2436","id2":"2528","type":"I","logger":"INTERACT.MANAGER","log":"ClusterAdapter:enter sendTextDataToCluster\\n"}' +``` + +The above command returns the following result: + +```json +{"output":[{"affectedrows":4}],"execution_time_ms":22} +``` + +In the above example, we successfully wrote 4 log entries to the `public.logs` table. + +Please refer to [Writing Logs with Pipeline](write-log.md) for specific syntax for writing logs. + +## `logs` table structure + +We can use SQL to query the structure of the `public.logs` table. + +```sql +DESC TABLE logs; +``` + +The query result is as follows: + +```sql + Column | Type | Key | Null | Default | Semantic Type +--------+---------------------+-----+------+---------+--------------- + id1 | Int32 | | YES | | FIELD + id2 | Int32 | | YES | | FIELD + type | String | PRI | YES | | TAG + logger | String | PRI | YES | | TAG + log | String | | YES | | FIELD + time | TimestampNanosecond | PRI | NO | | TIMESTAMP +(6 rows) +``` + +From the above result, we can see that based on the processed result of the pipeline, the `public.logs` table contains 6 fields: `id1` and `id2` are converted to the `Int32` type, `type`, `log`, and `logger` are converted to the `String` type, and time is converted to a `TimestampNanosecond` type and indexed as Timestamp. + +## Query logs + +We can use standard SQL to query log data. + +```shell +# Connect to GreptimeDB using MySQL or PostgreSQL protocol + +# MySQL +mysql --host=127.0.0.1 --port=4002 public + +# PostgreSQL +psql -h 127.0.0.1 -p 4003 -d public +``` + +You can query the log table using SQL: + +```sql +SELECT * FROM public.logs; +``` + +The query result is as follows: + +```sql + id1 | id2 | type | logger | log | time +------+------+------+------------------+--------------------------------------------+---------------------------- + 2436 | 2528 | I | INTERACT.MANAGER | ClusterAdapter:enter sendTextDataToCluster+| 2024-05-25 20:16:37.217000 + | | | | | + 2436 | 2528 | I | INTERACT.MANAGER | ClusterAdapter:enter sendTextDataToCluster+| 2024-05-25 20:16:37.217000 + | | | | | + 2436 | 2528 | I | INTERACT.MANAGER | ClusterAdapter:enter sendTextDataToCluster+| 2024-05-25 20:16:37.217000 + | | | | | + 2436 | 2528 | I | INTERACT.MANAGER | ClusterAdapter:enter sendTextDataToCluster+| 2024-05-25 20:16:37.217000 + | | | | | +(4 rows) +``` + +As you can see, the logs have been stored as structured logs after applying type conversions using the pipeline. This provides convenience for further querying and analysis of the logs. + +## Conclusion + +By following the above steps, you have successfully created a pipeline, written logs, and performed queries. This is just the tip of the iceberg in terms of the capabilities offered by GreptimeDB. +Next, please continue reading [Pipeline Configuration](log-pipeline.md) and [Managing Pipelines](manage-pipeline.md) to learn more about advanced features and best practices. \ No newline at end of file diff --git a/docs/nightly/en/user-guide/log/write-log.md b/docs/nightly/en/user-guide/log/write-log.md new file mode 100644 index 000000000..fb80b1423 --- /dev/null +++ b/docs/nightly/en/user-guide/log/write-log.md @@ -0,0 +1,31 @@ +# Writing Logs Using a Pipeline + +This document describes how to write logs to GreptimeDB by processing them through a specified pipeline using the HTTP interface. + +Before writing logs, please read the [Pipeline Configuration](log-pipeline.md) and [Managing Pipelines](manage-pipeline.md) documents to complete the configuration setup and upload. + +## HTTP API + +You can use the following command to write logs via the HTTP interface: + +```shell +curl -X "POST" "http://localhost:4000/v1/events/logs?db=&table=&pipeline_name=" \ + -H 'Content-Type: application/json' \ + -d "$" +``` + +## Query parameters + +This interface accepts the following parameters: + +- `db`: The name of the database. +- `table`: The name of the table. +- `pipeline_name`: The name of the [pipeline](./log-pipeline.md). + +## Body data format + +The request body supports NDJSON and JSON Array formats, where each JSON object represents a log entry. + +## Example + +Please refer to the "Writing Logs" section in the [Quick Start](quick-start.md#write-logs) guide for an example. \ No newline at end of file diff --git a/docs/nightly/zh/user-guide/log/log-pipeline.md b/docs/nightly/zh/user-guide/log/log-pipeline.md new file mode 100644 index 000000000..8e2f1e513 --- /dev/null +++ b/docs/nightly/zh/user-guide/log/log-pipeline.md @@ -0,0 +1,444 @@ +# Pipeline 配置 + +Pipeline 是 GreptimeDB 中对 log 数据进行转换的一种机制, 由一个唯一的名称和一组配置规则组成,这些规则定义了如何对日志数据进行格式化、拆分和转换。目前我们支持 JSON(`application/json`)和纯文本(`text/plain`)格式的日志数据作为输入。 + +这些配置以 YAML 格式提供,使得 Pipeline 能够在日志写入过程中,根据设定的规则对数据进行处理,并将处理后的数据存储到数据库中,便于后续的结构化查询。 + +## 整体结构 + +Pipeline 由两部分组成:Processors 和 Transform,这两部分均为数组形式。一个 Pipeline 配置可以包含多个 Processor 和多个 Transform。Transform 所描述的数据类型会决定日志数据保存到数据库时的表结构。 + +- Processor 用于对 log 数据进行预处理,例如解析时间字段,替换字段等。 +- Transform 用于对 log 数据进行格式转换,例如将字符串类型转换为数字类型。 + +一个包含 Processor 和 Transform 的简单配置示例如下: + +```yaml +processors: + - urlencoding: + fields: + - string_field_a + - string_field_b + method: decode + ignore_missing: true +transform: + - fields: + - string_field_a + - string_field_b + type: string + # 写入的数据必须包含 timestamp 字段 + - fields: + - reqTimeSec, req_time_sec + # epoch 是特殊字段类型,必须指定精度 + type: epoch, ms + index: timestamp +``` + +## Processor + +Processor 用于对 log 数据进行预处理,其配置位于 YAML 文件中的 `processors` 字段下。 +Pipeline 会按照多个 Processor 的顺序依次加工数据,每个 Processor 都依赖于上一个 Processor 处理的结果。 +Processor 由一个 name 和多个配置组成,不同类型的 Processor 配置有不同的字段。 + +我们目前内置了以下几种 Processor: + +- `date`: 用于解析格式化的时间字符串字段,例如 `2024-07-12T16:18:53.048`。 +- `epoch`: 用于解析数字时间戳字段,例如 `1720772378893`。 +- `dissect`: 用于对 log 数据字段进行拆分。 +- `gsub`: 用于对 log 数据字段进行替换。 +- `join`: 用于对 log 中的 array 类型字段进行合并。 +- `letter`: 用于对 log 数据字段进行字母转换。 +- `regex`: 用于对 log 数据字段进行正则匹配。 +- `urlencoding`: 用于对 log 数据字段进行 URL 编解码。 +- `csv`: 用于对 log 数据字段进行 CSV 解析。 + +### `date` + +`date` Processor 用于解析时间字段。示例配置如下: + +```yaml +processors: + - date: + fields: + - time + formats: + - '%Y-%m-%d %H:%M:%S%.3f' + ignore_missing: true + timezone: 'Asia/Shanghai' +``` + +如上所示,`date` Processor 的配置包含以下字段: + +- `fields`: 需要解析的时间字段名列表。 +- `formats`: 时间格式化字符串,支持多个时间格式化字符串。按照提供的顺序尝试解析,直到解析成功。 +- `ignore_missing`: 忽略字段不存在的情况。默认为 `false`。如果字段不存在,并且此配置为 false,则会抛出异常。 +- `timezone`: 时区。使用[tz_database](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) 中的时区标识符来指定时区。默认为 `UTC`。 + +### `epoch` + +`epoch` Processor 用于解析时间戳字段,示例配置如下: + +```yaml +processors: + - epoch: + fields: + - reqTimeSec + resolution: millisecond + ignore_missing: true +``` + +如上所示,`epoch` Processor 的配置包含以下字段: + +- `fields`: 需要解析的时间戳字段名列表。 +- `resolution`: 时间戳精度,支持 `s`, `sec` , `second` , `ms`, `millisecond`, `milli`, `us`, `microsecond`, `micro`, `ns`, `nanosecond`, `nano`。默认为 `ms`。 +- `ignore_missing`: 忽略字段不存在的情况。默认为 `false`。如果字段不存在,并且此配置为 false,则会抛出异常。 + +### `dissect` + +`dissect` Processor 用于对 log 数据字段进行拆分,示例配置如下: + +```yaml +processors: + - dissect: + fields: + - message + patterns: + - '%{key1} %{key2}' + ignore_missing: true + append_separator: '-' +``` + +如上所示,`dissect` Processor 的配置包含以下字段: + +- `fields`: 需要拆分的字段名列表。 +- `patterns`: 拆分的 dissect 模式。 +- `ignore_missing`: 忽略字段不存在的情况。默认为 `false`。如果字段不存在,并且此配置为 false,则会抛出异常。 +- `append_separator`: 对于多个追加到一起的字段,指定连接符。默认是一个空字符串。 + +#### Dissect 模式 + +和 Logstash 的 Dissect 模式类似,Dissect 模式由 `%{key}` 组成,其中 `%{key}` 为一个字段名。例如: + +``` +"%{key1} %{key2} %{+key3} %{+key4/2} %{key5->} %{?key6} %{*key7} %{&key8}" +``` + +#### Dissect 修饰符 + +Dissect 模式支持以下修饰符: + +| 修饰符 | 说明 | 示例 | +| ----------- | ---------------------------------------- | --------------------- | +| `+` | 将两个或多个字段追加到一起 | `%{+key} %{+key}` | +| `+` 和 `/n` | 按照指定的顺序将两个或多个字段追加到一起 | `%{+key/2} %{+key/1}` | +| `->` | 忽略右侧的任何重复字符 | `%{key1->} %{key2->}` | +| `?` | 忽略匹配的值 | `%{?key}` | +| `*` 和 `&` | 将输出键设置为 \*,输出值设置为 &。 | `%{*key} %{&value}` | + +#### `dissect` 示例 + +例如,对于以下 log 数据: + +``` +"key1 key2 key3 key4 key5 key6 key7 key8" +``` + +使用以下 Dissect 模式: + +``` +"%{key1} %{key2} %{+key3} %{+key3/2} %{key5->} %{?key6} %{*key7} %{&key8}" +``` + +将得到以下结果: + +``` +{ + "key1": "key1", + "key2": "key2", + "key3": "key3 key4", + "key5": "key5", + "key7": "key8" +} +``` + +### `gsub` + +`gsub` Processor 用于对 log 数据字段进行替换,示例配置如下: + +```yaml +processors: + - gsub: + fields: + - message + pattern: 'old' + replacement: 'new' + ignore_missing: true +``` + +如上所示,`gsub` Processor 的配置包含以下字段: + +- `fields`: 需要替换的字段名列表。 +- `pattern`: 需要替换的字符串。支持正则表达式。 +- `replacement`: 替换后的字符串。 +- `ignore_missing`: 忽略字段不存在的情况。默认为 `false`。如果字段不存在,并且此配置为 false,则会抛出异常。 + +### `join` + +`join` Processor 用于对 log 中的 Array 类型字段进行合并,示例配置如下: + +```yaml +processors: + - join: + fields: + - message + separator: ',' + ignore_missing: true +``` + +如上所示,`join` Processor 的配置包含以下字段: + +- `fields`: 需要合并的字段名列表。注意,这里每行字段的值需要是 Array 类型,每行字段会单独合并自己数组内的值,所有行的字段不会合并到一起。 +- `separator`: 合并后的分隔符。 +- `ignore_missing`: 忽略字段不存在的情况。默认为 `false`。如果字段不存在,并且此配置为 false,则会抛出异常。 + +#### `join` 示例 + +例如,对于以下 log 数据: + +```json +{ + "message": ["a", "b", "c"] +} +``` + +使用以下配置: + +```yaml +processors: + - join: + fields: + - message + separator: ',' +``` + +将得到以下结果: + +```json +{ + "message": "a,b,c" +} +``` + +### `letter` + +`letter` Processor 用于对 log 数据字段进行字母转换,示例配置如下: + +```yaml +processors: + - letter: + fields: + - message + method: upper + ignore_missing: true +``` + +如上所示,`letter` Processor 的配置包含以下字段: + +- `fields`: 需要转换的字段名列表。 +- `method`: 转换方法,支持 `upper`, `lower` ,`capital`。默认为 `lower`。 +- `ignore_missing`: 忽略字段不存在的情况。默认为 `false`。如果字段不存在,并且此配置为 false,则会抛出异常。 + +### `regex` + +`regex` Processor 用于对 log 数据字段进行正则匹配,示例配置如下: + +```yaml +processors: + - regex: + fields: + - message + pattern: ':(?[0-9])' + ignore_missing: true +``` + +如上所示,`regex` Processor 的配置包含以下字段: + +- `fields`: 需要匹配的字段名列表。 +- `pattern`: 要进行匹配的正则表达式,需要使用命名捕获组才可以从对应字段中取出对应数据。 +- `ignore_missing`: 忽略字段不存在的情况。默认为 `false`。如果字段不存在,并且此配置为 false,则会抛出异常。 + +#### regex 命名捕获组的规则 + +`regex` Processor 支持使用 `(?...)` 的语法来命名捕获组,最终将数据处理为这种形式: + +```json +{ + "_": "" +} +``` + +例如 `regex` Processor 中 field 填写的字段名为 `message`,对应的内容为 `"[ERROR] error message"`, +你可以将 pattern 设置为 `\[(?[A-Z]+)\] (?.+)`, +最终数据会被处理为: +```json +{ + "message_level": "ERROR", + "message_content": "error message" +} +``` + +### `urlencoding` + +`urlencoding` Processor 用于对 log 数据字段进行 URL 编码,示例配置如下: + +```yaml +processors: + - urlencoding: + fields: + - string_field_a + - string_field_b + method: decode + ignore_missing: true +``` + +如上所示,`urlencoding` Processor 的配置包含以下字段: + +- `fields`: 需要编码的字段名列表。 +- `method`: 编码方法,支持 `encode`, `decode`。默认为 `encode`。 +- `ignore_missing`: 忽略字段不存在的情况。默认为 `false`。如果字段不存在,并且此配置为 false,则会抛出异常。 + +### `csv` + +`csv` Processor 用于对 log 数据中没有携带 header 的 CSV 类型字段解析,示例配置如下: + +```yaml +processors: + - csv: + fields: + - message + separator: ',' + quote: '"' + trim: true + ignore_missing: true +``` + +如上所示,`csv` Processor 的配置包含以下字段: + +- `fields`: 需要解析的字段名列表。 +- `separator`: 分隔符。 +- `quote`: 引号。 +- `trim`: 是否去除空格。默认为 `false`。 +- `ignore_missing`: 忽略字段不存在的情况。默认为 `false`。如果字段不存在,并且此配置为 false,则会抛出异常。 + + +## Transform + +Transform 用于对 log 数据进行转换,其配置位于 YAML 文件中的 `transform` 字段下。 + +Transform 由一个或多个配置组成,每个配置包含以下字段: + +- `fields`: 需要转换的字段名列表。 +- `type`: 转换类型 +- `index`: 索引类型(可选) +- `on_failure`: 转换失败时的处理方式(可选) +- `default`: 默认值(可选) + +### `fields` 字段 + +每个字段名都是一个字符串,当字段名称包含 `,` 时,会进行字段重命名。例如,`reqTimeSec, req_time_sec` 表示将 `reqTimeSec` 字段重命名为 `req_time_sec`, +最终数据将被写入到 GreptimeDB 的 `req_time_sec` 列。 + +### `type` 字段 + +GreptimeDB 目前内置了以下几种转换类型: + +- `int8`, `int16`, `int32`, `int64`: 整数类型。 +- `uint8`, `uint16`, `uint32`, `uint64`: 无符号整数类型。 +- `float32`, `float64`: 浮点数类型。 +- `string`: 字符串类型。 +- `time`: 时间类型。将被转换为 GreptimeDB `timestamp(9)` 类型。 +- `epoch`: 时间戳类型。将被转换为 GreptimeDB `timestamp(n)` 类型。n 为时间戳精度,n 的值视 epoch 精度而定。当精度为 `s` 时,n 为 0;当精度为 `ms` 时,n 为 3;当精度为 `us` 时,n 为 6;当精度为 `ns` 时,n 为 9。 + +如果字段在转换过程中获得了非法值,Pipeline 将会抛出异常。例如将一个字符串 `abc` 转换为整数时,由于该字符串不是一个合法的整数,Pipeline 将会抛出异常。 + +### `index` 字段 + +`Pipeline` 会将处理后的数据写入到 GreptimeDB 自动创建的数据表中。为了提高查询效率,GreptimeDB 会为表中的某些列创建索引。`index` 字段用于指定哪些字段需要被索引。关于 GreptimeDB 的列类型,请参考[数据模型](/user-guide/concepts/data-model.md)文档。 + +GreptimeDB 支持以下三种字段的索引类型: + +- `tag`: 用于指定某列为 Tag 列 +- `fulltext`: 用于指定某列使用 fulltext 类型的索引,该列需要是字符串类型 +- `timestamp`: 用于指定某列是时间索引列 + +不提供 `index` 字段时,GreptimeDB 会将该字段作为 `Field` 列。 + +在 GreptimeDB 中,一张表里必须包含一个 `timestamp` 类型的列作为该表的时间索引列,因此一个 Pipeline 有且只有一个时间索引列。 + +#### 时间戳列 + +通过 `index: timestamp` 指定哪个字段是时间索引列,写法请参考下方的 [Transform 示例](#transform-示例)。 + +#### Tag 列 + +通过 `index: tag` 指定哪个字段是 Tag 列,写法请参考下方的 [Transform 示例](#transform-示例)。 + +#### Fulltext 列 + +通过 `index: fulltext` 指定哪个字段将会被用于全文搜索,该索引可大大提升 [日志搜索](./log-query.md) 的性能,写法请参考下方的 [Transform 示例](#transform-示例)。 + +### `on_failure` 字段 + +`on_failure` 字段用于指定转换失败时的处理方式,支持以下几种方式: + +- `ignore`: 忽略转换失败的字段,不写入数据库。 +- `default`: 写入默认值。默认值由 `default` 字段指定。 + +### `default` 字段 + +`default` 字段用于指定转换失败时的默认值。 + +### Transform 示例 + +例如,对于以下 log 数据: + +```json +{ + "num_field_a": "3", + "string_field_a": "john", + "string_field_b": "It was snowing when he was born.", + "time_field_a": 1625760000 +} +``` + +使用以下配置: + +```yaml +transform: + - fields: + - string_field_a, name + type: string + index: tag + - fields: + - num_field_a, age + type: int32 + - fields: + - string_field_b, description + type: string + index: fulltext + - fields: + - time_field_a, bron_time + type: epoch, s + index: timestamp +``` + +将得到以下结果: + +``` +{ + "name": "john", + "age": 3, + "description": "It was snowing when he was born.", + "bron_time": 2021-07-08 16:00:00 +} +``` \ No newline at end of file diff --git a/docs/nightly/zh/user-guide/log/manage-pipeline.md b/docs/nightly/zh/user-guide/log/manage-pipeline.md new file mode 100644 index 000000000..dfef4bfa8 --- /dev/null +++ b/docs/nightly/zh/user-guide/log/manage-pipeline.md @@ -0,0 +1,90 @@ +# 管理 Pipeline + +在 GreptimeDB 中,每个 `pipeline` 是一个数据处理单元集合,用于解析和转换写入的日志内容。本文档旨在指导您如何创建和删除 Pipeline,以便高效地管理日志数据的处理流程。 + + +有关 Pipeline 的具体配置,请阅读 [Pipeline 配置](log-pipeline.md)。 + +## 创建 Pipeline + +GreptimeDB 提供了专用的 HTTP 接口用于创建 Pipeline。 +假设你已经准备好了一个 Pipeline 配置文件 pipeline.yaml,使用以下命令上传配置文件,其中 `test` 是你指定的 Pipeline 的名称: + +```shell +## 上传 pipeline 文件。test 为 Pipeline 的名称 +curl -X "POST" "http://localhost:4000/v1/events/pipelines/test" -F "file=@pipeline.yaml" +``` + +## 删除 Pipeline + +可以使用以下 HTTP 接口删除 Pipeline: + +```shell +## test 为 Pipeline 的名称 +curl -X "DELETE" "http://localhost:4000/v1/events/pipelines/test?version=2024-06-27%2012%3A02%3A34.257312110Z" +``` + +上面的例子中,我们删除了一个名为 `test` 的 Pipeline。`version` 参数是必须的,用于指定要删除的 Pipeline 的版本号。 + +## 查询 Pipeline + +目前可以使用 SQL 来查询 Pipeline 的信息。 + +```sql +SELECT * FROM greptime_private.pipelines; +``` + +请注意,如果您使用 MySQL 或者 PostgreSQL 协议作为连接 GreptimeDB 的方式,查询出来的 Pipeline 时间信息精度可能有所不同,可能会丢失纳秒级别的精度。 + +为了解决这个问题,可以将 `created_at` 字段强制转换为 timestamp 来查看 Pipeline 的创建时间。例如,下面的查询将 `created_at` 以 `bigint` 的格式展示: + +```sql +SELECT name, pipeline, created_at::bigint FROM greptime_private.pipelines; +``` + +查询结果如下: + +``` + name | pipeline | greptime_private.pipelines.created_at +------+-----------------------------------+--------------------------------------- + test | processors: +| 1719489754257312110 + | - date: +| + | field: time +| + | formats: +| + | - "%Y-%m-%d %H:%M:%S%.3f"+| + | ignore_missing: true +| + | +| + | transform: +| + | - fields: +| + | - id1 +| + | - id2 +| + | type: int32 +| + | - fields: +| + | - type +| + | - logger +| + | type: string +| + | index: tag +| + | - fields: +| + | - log +| + | type: string +| + | index: fulltext +| + | - field: time +| + | type: time +| + | index: timestamp +| + | | +(1 row) +``` + +然后可以使用程序将 SQL 结果中的 bigint 类型的时间戳转换为时间字符串。 + +```shell +timestamp_ns="1719489754257312110"; readable_timestamp=$(TZ=UTC date -d @$((${timestamp_ns:0:10}+0)) +"%Y-%m-%d %H:%M:%S").${timestamp_ns:10}Z; echo "Readable timestamp (UTC): $readable_timestamp" +``` + +输出: + +```shell +Readable timestamp (UTC): 2024-06-27 12:02:34.257312110Z +``` + +输出的 `Readable timestamp (UTC)` 即为 Pipeline 的创建时间同时也是版本号。 \ No newline at end of file diff --git a/docs/nightly/zh/user-guide/log/overview.md b/docs/nightly/zh/user-guide/log/overview.md new file mode 100644 index 000000000..22d08d2a0 --- /dev/null +++ b/docs/nightly/zh/user-guide/log/overview.md @@ -0,0 +1,6 @@ +# 概述 + +- [快速开始](./quick-start.md):介绍了如何快速开始使用 GreptimeDB 日志服务。 +- [Pipeline 配置](./log-pipeline.md):深入介绍 GreptimeDB 中的 Pipeline 的每项具体配置。 +- [管理 Pipeline](./manage-pipeline.md):介绍了如何创建、删除 Pipeline。 +- [配合 Pipeline 写入日志](./write-log.md): 详细说明了如何结合 Pipeline 机制高效写入日志数据。 \ No newline at end of file diff --git a/docs/nightly/zh/user-guide/log/quick-start.md b/docs/nightly/zh/user-guide/log/quick-start.md new file mode 100644 index 000000000..da2fa5726 --- /dev/null +++ b/docs/nightly/zh/user-guide/log/quick-start.md @@ -0,0 +1,186 @@ +# 快速开始 + + +## 下载并安装 GreptimeDB & 启动 GreptimeDB + +请遵循[安装指南](/getting-started/overview.md) 来安装并启动 GreptimeDB。 + +## 创建 Pipeline + +GreptimeDB 提供了专门的 HTTP 接口用于创建 Pipeline,具体操作如下: + +首先创建一个 Pipeline 文件,例如 `pipeline.yaml`。 + +```yaml +# pipeline.yaml +processors: + - date: + field: time + formats: + - "%Y-%m-%d %H:%M:%S%.3f" + ignore_missing: true + +transform: + - fields: + - id1 + - id2 + type: int32 + - fields: + - type + - logger + type: string + index: tag + - fields: + - log + type: string + index: fulltext + - field: time + type: time + index: timestamp + +然后执行以下命令上传配置文件: + +```shell +## 上传 pipeline 文件。test 为 Pipeline 的名称 +curl -X "POST" "http://localhost:4000/v1/events/pipelines/test" -F "file=@pipeline.yaml" +``` + +该命令执行成功后,会创建了一个名为 `test` 的 Pipeline,并返回结果:`{"name":"test","version":"2024-06-27 12:02:34.257312110Z"}`。 +其中 `name` 为 Pipeline 名称,`version` 为 Pipeline 版本号。 + +此 Pipeline 包含一个 Processor 和三个 Transform。Processor 使用了 Rust 的时间格式化字符串 `%Y-%m-%d %H:%M:%S%.3f` 解析日志中的 timestamp 字段,然后 Transform 将 id1 和 id2 字段转换为 int32 类型,将 level、content、logger 字段转换为 string 类型,最后将 timestamp 字段转换为时间类型,并将其设置为 Timestamp 索引。 + +请参考 [Pipeline 介绍](log-pipeline.md)查看具体的语法。 + + + +## 查询 Pipeline + +可以使用 SQL 查询保存在数据库中的 pipeline 内容,请求示例如下: + +```sql +SELECT * FROM greptime_private.pipelines; +``` + +查询结果如下: + +```sql + name | schema | content_type | pipeline | created_at +------+--------+--------------+-----------------------------------+---------------------------- + test | public | yaml | processors: +| 2024-06-27 12:02:34.257312 + | | | - date: +| + | | | field: time +| + | | | formats: +| + | | | - "%Y-%m-%d %H:%M:%S%.3f"+| + | | | ignore_missing: true +| + | | | +| + | | | transform: +| + | | | - fields: +| + | | | - id1 +| + | | | - id2 +| + | | | type: int32 +| + | | | - fields: +| + | | | - type +| + | | | - logger +| + | | | type: string +| + | | | index: tag +| + | | | - fields: +| + | | | - log +| + | | | type: string +| + | | | index: fulltext +| + | | | - field: time +| + | | | type: time +| + | | | index: timestamp +| + | | | | +(1 row) +``` + +## 写入日志 + +可以使用 HTTP 接口写入日志,请求示例如下: + +```shell +curl -X "POST" "http://localhost:4000/v1/events/logs?db=public&table=logs&pipeline_name=test" \ + -H 'Content-Type: application/json' \ + -d $'{"time":"2024-05-25 20:16:37.217","id1":"2436","id2":"2528","type":"I","logger":"INTERACT.MANAGER","log":"ClusterAdapter:enter sendTextDataToCluster\\n"} +{"time":"2024-05-25 20:16:37.217","id1":"2436","id2":"2528","type":"I","logger":"INTERACT.MANAGER","log":"ClusterAdapter:enter sendTextDataToCluster\\n"} +{"time":"2024-05-25 20:16:37.217","id1":"2436","id2":"2528","type":"I","logger":"INTERACT.MANAGER","log":"ClusterAdapter:enter sendTextDataToCluster\\n"} +{"time":"2024-05-25 20:16:37.217","id1":"2436","id2":"2528","type":"I","logger":"INTERACT.MANAGER","log":"ClusterAdapter:enter sendTextDataToCluster\\n"}' +``` +上述命令返回结果如下: + +```json +{"output":[{"affectedrows":4}],"execution_time_ms":22} +``` + +上面的例子中,我们向 `public.logs` 表中成功写入了 4 条日志。 + +请参考[配合 Pipeline 写入日志](write-log.md)获取具体的日志写入语法。 + +## `logs` 表结构 + +我们可以使用 SQL 查询来查看 `public.logs` 表的结构。 + +```sql +DESC TABLE logs; +``` + +查询结果如下: + +```sql + Column | Type | Key | Null | Default | Semantic Type +--------+---------------------+-----+------+---------+--------------- + id1 | Int32 | | YES | | FIELD + id2 | Int32 | | YES | | FIELD + type | String | PRI | YES | | TAG + logger | String | PRI | YES | | TAG + log | String | | YES | | FIELD + time | TimestampNanosecond | PRI | NO | | TIMESTAMP +(6 rows) +``` + +从上述结果可以看出,根据 Pipeline 处理后的结果,`public.logs` 表包含了 6 个字段:id1 和 id2 都被转换为 int32 类型,type、log、logger 都被转换为 string 类型,time 被转换为时间戳类型,并且设置为 Timestamp 索引。 + +## 查询日志 + +就像任何其他数据一样,我们可以使用标准 SQL 来查询日志数据。 + +```shell +# 使用 MySQL 或者 PostgreSQL 协议连接 GreptimeDB + +# mysql +mysql --host=127.0.0.1 --port=4002 public + +# postgresql +psql -h 127.0.0.1 -p 4003 -d public +``` + +可通过 SQL 查询日志表: + +```sql +SELECT * FROM public.logs; +``` + +查询结果如下: + +```sql + + id1 | id2 | type | logger | log | time +------+------+------+------------------+--------------------------------------------+---------------------------- + 2436 | 2528 | I | INTERACT.MANAGER | ClusterAdapter:enter sendTextDataToCluster+| 2024-05-25 20:16:37.217000 + | | | | | + 2436 | 2528 | I | INTERACT.MANAGER | ClusterAdapter:enter sendTextDataToCluster+| 2024-05-25 20:16:37.217000 + | | | | | + 2436 | 2528 | I | INTERACT.MANAGER | ClusterAdapter:enter sendTextDataToCluster+| 2024-05-25 20:16:37.217000 + | | | | | + 2436 | 2528 | I | INTERACT.MANAGER | ClusterAdapter:enter sendTextDataToCluster+| 2024-05-25 20:16:37.217000 + | | | | | +(4 rows) +``` + +可以看出,通过 Pipeline 将 Log 进行类型转换后存储为结构化的日志,为日志的进一步查询和分析带来了便利。 + +## 结语 + +通过以上步骤,您已经成功创建了 Pipeline,写入日志并进行了查询。这只是 GreptimeDB 提供功能的冰山一角。 +接下来请继续阅读 [Pipeline 配置](log-pipeline.md)和[管理 Pipeline](manage-pipeline.md) 来了解更多高级特性和最佳实践。 \ No newline at end of file diff --git a/docs/nightly/zh/user-guide/log/write-log.md b/docs/nightly/zh/user-guide/log/write-log.md new file mode 100644 index 000000000..7e392d661 --- /dev/null +++ b/docs/nightly/zh/user-guide/log/write-log.md @@ -0,0 +1,33 @@ +# 使用 Pipeline 写入日志 + +本文档介绍如何通过 HTTP 接口使用指定的 Pipeline 进行处理后将日志写入 GreptimeDB。 + +在写入日志之前,请先阅读 [Pipeline 配置](log-pipeline.md)和[管理 Pipeline](manage-pipeline.md) 完成配置的设定和上传。 + +## HTTP API + +您可以使用以下命令通过 HTTP 接口写入日志: + +```shell +curl -X "POST" "http://localhost:4000/v1/events/logs?db=&table=&pipeline_name=" \ + -H 'Content-Type: application/json' \ + -d "$" +``` + + +## Query 参数 + +此接口接受以下参数: + +- `db`:数据库名称。 +- `table`:表名称。 +- `pipeline_name`:[Pipeline](./log-pipeline.md) 名称。 + +## Body 数据格式 + +请求体支持 NDJSON 和 JSON Array 格式,其中每个 JSON 对象代表一条日志记录。 + + +## 示例 + +请参考快速开始中的[写入日志](quick-start.md#写入日志)部分。 \ No newline at end of file