diff --git a/.gitbook/assets/FluentBitDocumentation-01-01.png b/.gitbook/assets/FluentBitDocumentation-01-01.png new file mode 100644 index 000000000..c440b9f1b Binary files /dev/null and b/.gitbook/assets/FluentBitDocumentation-01-01.png differ diff --git a/README.md b/README.md index 09fc3778e..49bef07be 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ description: High Performance Log and Metrics Processor # Fluent Bit v2.1 Documentation -![](.gitbook/assets/logo\_documentation\_2.1.png) +
[Fluent Bit](http://fluentbit.io) is a Fast and Lightweight **Telemetry Agent** for Logs, Metrics, and Traces for Linux, macOS, Windows, and BSD family operating systems. It has been made with a strong focus on performance to allow the collection and processing of telemetry data from different sources without complexity.![](https://static.scarf.sh/a.png?x-pxid=71f0e011-761f-4c6f-9a89-38817887faae) @@ -12,22 +12,22 @@ description: High Performance Log and Metrics Processor * High Performance: High throughput with low resources consumption * Data Parsing - * Convert your unstructured messages using our parsers: [JSON](broken-reference), [Regex](broken-reference), [LTSV](broken-reference) and [Logfmt](broken-reference) -* Metrics Support: Prometheus and OpenTelemetry compatible + * Convert your unstructured messages using our parsers: [JSON](broken-reference/), [Regex](broken-reference/), [LTSV](broken-reference/) and [Logfmt](broken-reference/) +* Metrics Support: Prometheus and OpenTelemetry compatible * Reliability and Data Integrity - * [Backpressure](broken-reference) Handling - * [Data Buffering](broken-reference) in memory and file system + * [Backpressure](broken-reference/) Handling + * [Data Buffering](broken-reference/) in memory and file system * Networking * Security: built-in TLS/SSL support * Asynchronous I/O -* Pluggable Architecture and [Extensibility](broken-reference): Inputs, Filters and Outputs +* Pluggable Architecture and [Extensibility](broken-reference/): Inputs, Filters and Outputs * More than 100 built-in plugins are available * Extensibility * Write any input, filter or output plugin in C language - * WASM: [WASM Filter Plugins](broken-reference) or [WASM Input Plugins](broken-reference) - * Bonus: write [Filters in Lua](broken-reference) or [Output plugins in Golang](broken-reference) -* [Monitoring](broken-reference): expose internal metrics over HTTP in JSON and [Prometheus](https://prometheus.io/) format -* [Stream Processing](broken-reference): Perform data selection and transformation using simple SQL queries + * WASM: [WASM Filter Plugins](broken-reference/) or [WASM Input Plugins](broken-reference/) + * Bonus: write [Filters in Lua](broken-reference/) or [Output plugins in Golang](broken-reference/) +* [Monitoring](broken-reference/): expose internal metrics over HTTP in JSON and [Prometheus](https://prometheus.io/) format +* [Stream Processing](broken-reference/): Perform data selection and transformation using simple SQL queries * Create new streams of data using query results * Aggregation Windows * Data analysis and prediction: Timeseries forecasting @@ -35,6 +35,6 @@ description: High Performance Log and Metrics Processor ## Fluent Bit, Fluentd and CNCF -[Fluent Bit](http://fluentbit.io) is a [CNCF](https://cncf.io) **graduated** sub-project under the umbrella of [Fluentd](http://fluentd.org), it's licensed under the terms of the [Apache License v2.0](http://www.apache.org/licenses/LICENSE-2.0). +[Fluent Bit](http://fluentbit.io) is a [CNCF](https://cncf.io) **graduated** sub-project under the umbrella of [Fluentd](http://fluentd.org), it's licensed under the terms of the [Apache License v2.0](http://www.apache.org/licenses/LICENSE-2.0). Fluent Bit was originally created by [Eduardo Silva](https://www.linkedin.com/in/edsiper/); as a CNCF-hosted project is a fully **vendor-neutral** and community-driven project. diff --git a/about/history.md b/about/history.md index 0b8630a26..aaca6fc2f 100644 --- a/about/history.md +++ b/about/history.md @@ -4,7 +4,7 @@ description: Every project has a story # A Brief History of Fluent Bit -On 2014, the [Fluentd](https://fluentd.org) team at [Treasure Data](https://www.treasuredata.com) forecasted the need of a lightweight log processor for constraint environments like Embedded Linux and Gateways, the project aimed to be part of the Fluentd Ecosystem and we called it [Fluent Bit](https://fluentbit.io), fully open source and available under the terms of the [Apache License v2.0](http://www.apache.org/licenses/LICENSE-2.0). +On 2014, the [Fluentd](https://fluentd.org/) team at [Treasure Data](https://www.treasuredata.com/) was forecasting the need for a lightweight log processor for constraint environments like Embedded Linux and Gateways, the project aimed to be part of the Fluentd Ecosystem; at that moment, Eduardo created [Fluent Bit](https://fluentbit.io/), a new open source solution written from scratch available under the terms of the [Apache License v2.0](http://www.apache.org/licenses/LICENSE-2.0).\ -After the project was around for some time, it got some traction in the Embedded market but we also started getting requests for several features from the Cloud community like more inputs, filters, and outputs. Not so long after that, Fluent Bit becomes one of the preferred solutions to solve the logging challenges in Cloud environments. +After the project was around for some time, it got more traction for normal Linux systems, also with the new containerized world, the Cloud Native community asked to extend the project scope to support more sources, filters, and destinations. Not so long after, Fluent Bit became one of the preferred solutions to solve the logging challenges in Cloud environments. diff --git a/concepts/data-pipeline/buffer.md b/concepts/data-pipeline/buffer.md index c13f904e4..194d97bd7 100644 --- a/concepts/data-pipeline/buffer.md +++ b/concepts/data-pipeline/buffer.md @@ -8,7 +8,7 @@ Previously defined in the [Buffering](../buffering.md) concept section, the `buf The `buffer` phase already contains the data in an immutable state, meaning, no other filter can be applied. -![](<../../.gitbook/assets/logging\_pipeline\_buffer (1) (1) (2) (2) (2) (2) (2) (2) (2) (1).png>) +![](<../../.gitbook/assets/logging\_pipeline\_buffer (1) (1) (2) (2) (2) (2) (2) (2) (2) (2) (1).png>) {% hint style="info" %} Note that buffered data is not raw text, it's in Fluent Bit's internal binary representation. diff --git a/concepts/data-pipeline/filter.md b/concepts/data-pipeline/filter.md index 2323d165e..803767fca 100644 --- a/concepts/data-pipeline/filter.md +++ b/concepts/data-pipeline/filter.md @@ -6,7 +6,7 @@ description: Modify, Enrich or Drop your records In production environments we want to have full control of the data we are collecting, filtering is an important feature that allows us to **alter** the data before delivering it to some destination. -![](<../../.gitbook/assets/logging\_pipeline\_filter (1) (2) (2) (2) (2) (2) (2) (1).png>) +![](<../../.gitbook/assets/logging\_pipeline\_filter (1) (2) (2) (2) (2) (2) (2) (2).png>) Filtering is implemented through plugins, so each filter available could be used to match, exclude or enrich your logs with some specific metadata. diff --git a/concepts/data-pipeline/input.md b/concepts/data-pipeline/input.md index ca8500b0b..33cc29962 100644 --- a/concepts/data-pipeline/input.md +++ b/concepts/data-pipeline/input.md @@ -6,7 +6,7 @@ description: The way to gather data from your sources [Fluent Bit](http://fluentbit.io) provides different _Input Plugins_ to gather information from different sources, some of them just collect data from log files while others can gather metrics information from the operating system. There are many plugins for different needs. -![](<../../.gitbook/assets/logging\_pipeline\_input (1) (2) (2) (2) (2) (2) (2) (2) (1).png>) +![](<../../.gitbook/assets/logging\_pipeline\_input (1) (2) (2) (2) (2) (2) (2) (2) (2) (1).png>) When an input plugin is loaded, an internal _instance_ is created. Every instance has its own and independent configuration. Configuration keys are often called **properties**. diff --git a/concepts/data-pipeline/output.md b/concepts/data-pipeline/output.md index 5a96f7ee6..fb281551c 100644 --- a/concepts/data-pipeline/output.md +++ b/concepts/data-pipeline/output.md @@ -6,7 +6,7 @@ description: 'Destinations for your data: databases, cloud services and more!' The output interface allows us to define destinations for the data. Common destinations are remote services, local file system or standard interface with others. Outputs are implemented as plugins and there are many available. -![](<../../.gitbook/assets/logging\_pipeline\_output (1) (1).png>) +![](<../../.gitbook/assets/logging\_pipeline\_output (1).png>) When an output plugin is loaded, an internal _instance_ is created. Every instance has its own independent configuration. Configuration keys are often called **properties**. diff --git a/concepts/data-pipeline/parser.md b/concepts/data-pipeline/parser.md index 034376606..c6b7ea6d4 100644 --- a/concepts/data-pipeline/parser.md +++ b/concepts/data-pipeline/parser.md @@ -6,7 +6,7 @@ description: Convert Unstructured to Structured messages Dealing with raw strings or unstructured messages is a constant pain; having a structure is highly desired. Ideally we want to set a structure to the incoming data by the Input Plugins as soon as they are collected: -![](<../../.gitbook/assets/logging\_pipeline\_parser (1) (1) (1) (1) (2) (2) (2) (3) (3) (3) (3) (3) (1).png>) +![](<../../.gitbook/assets/logging\_pipeline\_parser (1) (1) (1) (1) (2) (2) (2) (3) (3) (3) (3) (3) (2).png>) The Parser allows you to convert from unstructured to structured data. As a demonstrative example consider the following Apache (HTTP Server) log entry: diff --git a/concepts/data-pipeline/router.md b/concepts/data-pipeline/router.md index 416bdf8cc..cb3cd20ea 100644 --- a/concepts/data-pipeline/router.md +++ b/concepts/data-pipeline/router.md @@ -6,7 +6,7 @@ description: Create flexible routing rules Routing is a core feature that allows to **route** your data through Filters and finally to one or multiple destinations. The router relies on the concept of [Tags](../key-concepts.md) and [Matching](../key-concepts.md) rules -![](<../../.gitbook/assets/logging\_pipeline\_routing (1) (1) (2) (2) (2) (2) (2) (2) (2) (1) (1).png>) +![](<../../.gitbook/assets/logging\_pipeline\_routing (1) (1) (2) (2) (2) (2) (2) (2) (2) (1) (2) (1).png>) There are two important concepts in Routing: diff --git a/installation/kubernetes.md b/installation/kubernetes.md index f2b547893..1cb52e40a 100644 --- a/installation/kubernetes.md +++ b/installation/kubernetes.md @@ -4,7 +4,7 @@ description: Kubernetes Production Grade Log Processor # Kubernetes -![](<../.gitbook/assets/fluentbit\_kube\_logging (1).png>) +![](<../.gitbook/assets/fluentbit\_kube\_logging (1) (1).png>) [Fluent Bit](http://fluentbit.io) is a lightweight and extensible **Log Processor** that comes with full support for Kubernetes: diff --git a/installation/windows.md b/installation/windows.md index 9e48f6f7b..581c90176 100644 --- a/installation/windows.md +++ b/installation/windows.md @@ -1,7 +1,6 @@ # Windows -Fluent Bit is distributed as **fluent-bit** package for Windows and as a [Windows container on Docker Hub](./docker.md). -Fluent Bit has two flavours of Windows installers: a ZIP archive (for quick testing) and an EXE installer (for system installation). +Fluent Bit is distributed as **fluent-bit** package for Windows and as a [Windows container on Docker Hub](docker.md). Fluent Bit has two flavours of Windows installers: a ZIP archive (for quick testing) and an EXE installer (for system installation). ## Configuration @@ -74,15 +73,14 @@ Make sure to provide a valid Windows configuration with the installation, a samp ## Migration to Fluent Bit -From version 1.9, `td-agent-bit` is a deprecated package and was removed after 1.9.9. -The correct package name to use now is `fluent-bit`. +From version 1.9, `td-agent-bit` is a deprecated package and was removed after 1.9.9. The correct package name to use now is `fluent-bit`. ## Installation Packages The latest stable version is 2.0.9, each version is available on the Github release as well as at `https://releases.fluentbit.io//fluent-bit--win[32|64].[exe|zip]`: -| INSTALLERS | SHA256 CHECKSUMS | -| ------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------- | +| INSTALLERS | SHA256 CHECKSUMS | +| ------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------- | | [fluent-bit-2.0.9-win32.exe](https://releases.fluentbit.io/2.0/fluent-bit-2.0.9-win32.exe) | [a6c1a74acc00ce6211694f4f0a037b1b6ce3ab8dd4e6d857ea7d0d4cbadec682](https://releases.fluentbit.io/2.0/fluent-bit-2.0.9-win32.exe.sha256) | | [fluent-bit-2.0.9-win32.zip](https://releases.fluentbit.io/2.0/fluent-bit-2.0.9-win32.zip) | [8c0935a89337d073d4eae3440c65f55781bc097cdefa8819d2475db6c1befc9c](https://releases.fluentbit.io/2.0/fluent-bit-2.0.9-win32.zip.sha256) | | [fluent-bit-2.0.9-win64.exe](https://releases.fluentbit.io/2.0/fluent-bit-2.0.9-win64.exe) | [7970350f5bd0212be7d87ad51046a6d1600f3516c6209cd69af6d95759d280df](https://releases.fluentbit.io/2.0/fluent-bit-2.0.9-win64.exe.sha256) | @@ -106,7 +104,7 @@ PS> Expand-Archive fluent-bit-2.0.9-win64.zip The ZIP package contains the following set of files. -```text +``` fluent-bit ├── bin │ ├── fluent-bit.dll @@ -154,17 +152,13 @@ To halt the process, press CTRL-C in the terminal. ## Installing from EXE installer -Download an EXE installer from the [download page](https://fluentbit.io/download/). -It has both 32-bit and 64-bit builds. -Choose one which is suitable for you. +Download an EXE installer from the [download page](https://fluentbit.io/download/). It has both 32-bit and 64-bit builds. Choose one which is suitable for you. -Double-click the EXE installer you've downloaded. -The installation wizard will automatically start. +Double-click the EXE installer you've downloaded. The installation wizard will automatically start. -![Installation wizard screenshot](<../.gitbook/assets/windows_installer (1) (1).png>) +![Installation wizard screenshot](<../.gitbook/assets/windows\_installer (1).png>) -Click Next and proceed. -By default, Fluent Bit is installed into `C:\Program Files\fluent-bit\`, so you should be able to launch fluent-bit as follows after installation. +Click Next and proceed. By default, Fluent Bit is installed into `C:\Program Files\fluent-bit\`, so you should be able to launch fluent-bit as follows after installation. ```powershell PS> C:\Program Files\fluent-bit\bin\fluent-bit.exe -i dummy -o stdout @@ -172,7 +166,7 @@ PS> C:\Program Files\fluent-bit\bin\fluent-bit.exe -i dummy -o stdout ### Installer options -The Windows installer is built by [`CPack` using NSIS() and so supports the [default options](https://nsis.sourceforge.io/Docs/Chapter3.html#3.2.1) that all NSIS installers do for silent installation and the directory to install to. +The Windows installer is built by \[`CPack` using NSIS([https://cmake.org/cmake/help/latest/cpack\_gen/nsis.html](https://cmake.org/cmake/help/latest/cpack\_gen/nsis.html)) and so supports the [default options](https://nsis.sourceforge.io/Docs/Chapter3.html#3.2.1) that all NSIS installers do for silent installation and the directory to install to. To silently install to `C:\fluent-bit` directory here is an example: @@ -180,8 +174,7 @@ To silently install to `C:\fluent-bit` directory here is an example: PS> /S /D=C:\fluent-bit ``` -The uninstaller automatically provided also supports a silent un-install using the same `/S` flag. -This may be useful for provisioning with automation like Ansible, Puppet, etc. +The uninstaller automatically provided also supports a silent un-install using the same `/S` flag. This may be useful for provisioning with automation like Ansible, Puppet, etc. ## Windows Service Support @@ -189,7 +182,7 @@ Windows services are equivalent to "daemons" in UNIX (i.e. long-running backgrou Suppose you have the following installation layout: -```text +``` C:\fluent-bit\ ├── conf │ ├── fluent-bit.conf @@ -226,20 +219,19 @@ To halt the Fluent Bit service, just execute the "stop" command. To start Fluent Bit automatically on boot, execute the following: -```text +``` % sc.exe config fluent-bit start= auto ``` -### [FAQ] Fluent Bit fails to start up when installed under `C:\Program Files` +### \[FAQ] Fluent Bit fails to start up when installed under `C:\Program Files` -Quotations are required if file paths contain spaces. -Here is an example: +Quotations are required if file paths contain spaces. Here is an example: -```text +``` % sc.exe create fluent-bit binpath= "\"C:\Program Files\fluent-bit\bin\fluent-bit.exe\" -c \"C:\Program Files\fluent-bit\conf\fluent-bit.conf\"" ``` -### [FAQ] How can I manage Fluent Bit service via PowerShell? +### \[FAQ] How can I manage Fluent Bit service via PowerShell? Instead of `sc.exe`, PowerShell can be used to manage Windows services. diff --git a/pipeline/outputs/cloudwatch.md b/pipeline/outputs/cloudwatch.md index c15bb9b9f..dee7ede5e 100644 --- a/pipeline/outputs/cloudwatch.md +++ b/pipeline/outputs/cloudwatch.md @@ -4,7 +4,7 @@ description: Send logs and metrics to Amazon CloudWatch # Amazon CloudWatch -![](<../../.gitbook/assets/image (3) (2) (2) (4) (4) (3) (1).png>) +![](<../../.gitbook/assets/image (3) (2) (2) (4) (4) (3) (2) (1).png>) The Amazon CloudWatch output plugin allows to ingest your records into the [CloudWatch Logs](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html) service. Support for CloudWatch Metrics is also provided via [EMF](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch\_Embedded\_Metric\_Format\_Specification.html). @@ -32,7 +32,7 @@ See [here](https://github.com/fluent/fluent-bit-docs/tree/43c4fe134611da471e706b | metric\_dimensions | A list of lists containing the dimension keys that will be applied to all metrics. The values within a dimension set MUST also be members on the root-node. For more information about dimensions, see [Dimension](https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API\_Dimension.html) and [Dimensions](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch\_concepts.html#Dimension). In the fluent-bit config, metric\_dimensions is a comma and semicolon separated string. If you have only one list of dimensions, put the values as a comma separated string. If you want to put list of lists, use the list as semicolon separated strings. For example, if you set the value as 'dimension\_1,dimension\_2;dimension\_3', we will convert it as \[\[dimension\_1, dimension\_2],\[dimension\_3]] | | sts\_endpoint | Specify a custom STS endpoint for the AWS STS API. | | auto\_retry\_requests | Immediately retry failed requests to AWS services once. This option does not affect the normal Fluent Bit retry mechanism with backoff. Instead, it enables an immediate retry with no delay for networking errors, which may help improve throughput when there are transient/random networking issues. This option defaults to `true`. | -| external\_id | Specify an external ID for the STS API, can be used with the role_arn parameter if your role requires an external ID. | +| external\_id | Specify an external ID for the STS API, can be used with the role\_arn parameter if your role requires an external ID. | ## Getting Started @@ -149,26 +149,28 @@ If the kubernetes structure is not found in the log record, then the `log_group_ [2022/06/30 06:09:29] [ warn] [record accessor] translation failed, root key=kubernetes ``` -#### Limitations of record_accessor syntax +#### Limitations of record\_accessor syntax -Notice in the example above, that the template values are separated by dot characters. This is important; the Fluent Bit record_accessor library has a limitation in the characters that can separate template variables- only dots and commas (`.` and `,`) can come after a template variable. This is because the templating library must parse the template and determine the end of a variable. +Notice in the example above, that the template values are separated by dot characters. This is important; the Fluent Bit record\_accessor library has a limitation in the characters that can separate template variables- only dots and commas (`.` and `,`) can come after a template variable. This is because the templating library must parse the template and determine the end of a variable. Assume that your log records contain the metadata keys `container_name` and `task`. The following would be invalid templates because the two template variables are not separated by commas or dots: -- `$task-$container_name` -- `$task/$container_name` -- `$task_$container_name` -- `$taskfooo$container_name` +* `$task-$container_name` +* `$task/$container_name` +* `$task_$container_name` +* `$taskfooo$container_name` However, the following are valid: -- `$task.$container_name` -- `$task.resource.$container_name` -- `$task.fooo.$container_name` + +* `$task.$container_name` +* `$task.resource.$container_name` +* `$task.fooo.$container_name` And the following are valid since they only contain one template variable with nothing after it: -- `fooo$task` -- `fooo____$task` -- `fooo/bar$container_name` + +* `fooo$task` +* `fooo____$task` +* `fooo/bar$container_name` ### Metrics Tutorial diff --git a/pipeline/outputs/s3.md b/pipeline/outputs/s3.md index d4fc248da..de6ace52f 100644 --- a/pipeline/outputs/s3.md +++ b/pipeline/outputs/s3.md @@ -4,7 +4,7 @@ description: Send logs, data, metrics to Amazon S3 # Amazon S3 -![](<../../.gitbook/assets/image (9).png>) +![](<../../.gitbook/assets/image (9) (1).png>) The Amazon S3 output plugin allows you to ingest your records into the [S3](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html) cloud object store. @@ -16,38 +16,38 @@ Records are stored in files in S3 as newline delimited JSON. See [here](https://github.com/fluent/fluent-bit-docs/tree/43c4fe134611da471e706b0edb2f9acd7cdfdbc3/administration/aws-credentials.md) for details on how AWS credentials are fetched. -**NOTE**: *The [Prometheus success/retry/error metrics values](administration/monitoring.md) outputted by Fluent Bit's built-in http server are meaningless for the S3 output*. This is because S3 has its own buffering and retry mechanisms. The Fluent Bit AWS S3 maintainers apologize for this feature gap; you can [track our progress fixing it on GitHub](https://github.com/fluent/fluent-bit/issues/6141). +**NOTE**: _The_ [_Prometheus success/retry/error metrics values_](administration/monitoring.md) _outputted by Fluent Bit's built-in http server are meaningless for the S3 output_. This is because S3 has its own buffering and retry mechanisms. The Fluent Bit AWS S3 maintainers apologize for this feature gap; you can [track our progress fixing it on GitHub](https://github.com/fluent/fluent-bit/issues/6141). ## Configuration Parameters -| Key | Description | Default | -| -------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------- | -| region | The AWS region of your S3 bucket | us-east-1 | -| bucket | S3 Bucket name | None | -| json\_date\_key | Specify the name of the time key in the output record. To disable the time key just set the value to `false`. | date | -| json\_date\_format | Specify the format of the date. Supported formats are _double_, _epoch_, _iso8601_ (eg: _2018-05-30T09:39:52.000681Z_) and _java\_sql\_timestamp_ (eg: _2018-05-30 09:39:52.000681_) | iso8601 | -| total\_file\_size | Specifies the size of files in S3. Minimum size is 1M. With `use_put_object On` the maximum size is 1G. With multipart upload mode, the maximum size is 50G. | 100M | -| upload\_chunk\_size | The size of each 'part' for multipart uploads. Max: 50M | 5,242,880 bytes | -| upload\_timeout | Whenever this amount of time has elapsed, Fluent Bit will complete an upload and create a new file in S3. For example, set this value to 60m and you will get a new file every hour. | 10m | -| store\_dir | Directory to locally buffer data before sending. When multipart uploads are used, data will only be buffered until the `upload_chunk_size` is reached. S3 will also store metadata about in progress multipart uploads in this directory; this allows pending uploads to be completed even if Fluent Bit stops and restarts. It will also store the current $INDEX value if enabled in the S3 key format so that the $INDEX can keep incrementing from its previous value after Fluent Bit restarts. | /tmp/fluent-bit/s3 | -| store\_dir\_limit\_size | The size of the limitation for disk usage in S3. Limit the amount of s3 buffers in the `store_dir` to limit disk usage. Note: Use `store_dir_limit_size` instead of `storage.total_limit_size` which can be used to other plugins, because S3 has its own buffering system. | 0, which means unlimited | -| s3\_key\_format | Format string for keys in S3. This option supports a UUID, strftime time formatters, a syntax for selecting parts of the Fluent log tag using a syntax inspired by the rewrite\_tag filter. Add $UUID in the format string to insert a random string. Add $INDEX in the format string to insert an integer that increments each upload. The $INDEX value will be saved in the store_dir so that if Fluent Bit restarts the value will keep incrementing from the previous run. Add $TAG in the format string to insert the full log tag; add $TAG\[0] to insert the first part of the tag in the s3 key. The tag is split into “parts” using the characters specified with the `s3_key_format_tag_delimiters` option. Add extension directly after the last piece of the format string to insert a key suffix. If you want to specify a key suffix and you are in `use_put_object` mode, you must specify $UUID as well. More explanations can be found in the S3 Key Format explainer section further down in this document. See the in depth examples and tutorial in the documentation. Time in s3_key is the timestamp of the first record in the S3 file. | /fluent-bit-logs/$TAG/%Y/%m/%d/%H/%M/%S | -| s3\_key\_format\_tag\_delimiters | A series of characters which will be used to split the tag into 'parts' for use with the s3\_key\_format option. See the in depth examples and tutorial in the documentation. | . | -| static\_file\_path | Disables behavior where UUID string is automatically appended to end of S3 key name when $UUID is not provided in s3\_key\_format. $UUID, time formatters, $TAG, and other dynamic key formatters all work as expected while this feature is set to true. | false | -| use\_put\_object | Use the S3 PutObject API, instead of the multipart upload API. When this option is on, key extension is only available when $UUID is specified in `s3_key_format`. If $UUID is not included, a random string will be appended at the end of the format string and the key extension cannot be customized in this case. | false | -| role\_arn | ARN of an IAM role to assume (ex. for cross account access). | None | -| endpoint | Custom endpoint for the S3 API. An endpoint can contain scheme and port. | None | -| sts\_endpoint | Custom endpoint for the STS API. | None | -| canned\_acl | [Predefined Canned ACL policy](https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl) for S3 objects. | None | -| compression | Compression type for S3 objects. 'gzip' is currently the only supported value by default. If Apache Arrow support was enabled at compile time, you can also use 'arrow'. For gzip compression, the Content-Encoding HTTP Header will be set to 'gzip'. Gzip compression can be enabled when `use_put_object` is 'on' or 'off' (PutObject and Multipart). Arrow compression can only be enabled with `use_put_object On`. | None | -| content\_type | A standard MIME type for the S3 object; this will be set as the Content-Type HTTP header. | None | -| send\_content\_md5 | Send the Content-MD5 header with PutObject and UploadPart requests, as is required when Object Lock is enabled. | false | -| auto\_retry\_requests | Immediately retry failed requests to AWS services once. This option does not affect the normal Fluent Bit retry mechanism with backoff. Instead, it enables an immediate retry with no delay for networking errors, which may help improve throughput when there are transient/random networking issues. | true | -| log\_key | By default, the whole log record will be sent to S3. If you specify a key name with this option, then only the value of that key will be sent to S3. For example, if you are using Docker, you can specify log\_key log and only the log message will be sent to S3. | None | -| preserve\_data\_ordering | Normally, when an upload request fails, there is a high chance for the last received chunk to be swapped with a later chunk, resulting in data shuffling. This feature prevents this shuffling by using a queue logic for uploads. | true | -| storage\_class | Specify the [storage class](https://docs.aws.amazon.com/AmazonS3/latest/API/API\_PutObject.html#AmazonS3-PutObject-request-header-StorageClass) for S3 objects. If this option is not specified, objects will be stored with the default 'STANDARD' storage class. | None | -| retry\_limit | Integer value to set the maximum number of retries allowed. Note: this configuration is released since version 1.9.10 and 2.0.1. For previous version, the number of retries is 5 and is not configurable. | 1 | -| external\_id | Specify an external ID for the STS API, can be used with the role_arn parameter if your role requires an external ID. | None | +| Key | Description | Default | +| -------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------- | +| region | The AWS region of your S3 bucket | us-east-1 | +| bucket | S3 Bucket name | None | +| json\_date\_key | Specify the name of the time key in the output record. To disable the time key just set the value to `false`. | date | +| json\_date\_format | Specify the format of the date. Supported formats are _double_, _epoch_, _iso8601_ (eg: _2018-05-30T09:39:52.000681Z_) and _java\_sql\_timestamp_ (eg: _2018-05-30 09:39:52.000681_) | iso8601 | +| total\_file\_size | Specifies the size of files in S3. Minimum size is 1M. With `use_put_object On` the maximum size is 1G. With multipart upload mode, the maximum size is 50G. | 100M | +| upload\_chunk\_size | The size of each 'part' for multipart uploads. Max: 50M | 5,242,880 bytes | +| upload\_timeout | Whenever this amount of time has elapsed, Fluent Bit will complete an upload and create a new file in S3. For example, set this value to 60m and you will get a new file every hour. | 10m | +| store\_dir | Directory to locally buffer data before sending. When multipart uploads are used, data will only be buffered until the `upload_chunk_size` is reached. S3 will also store metadata about in progress multipart uploads in this directory; this allows pending uploads to be completed even if Fluent Bit stops and restarts. It will also store the current $INDEX value if enabled in the S3 key format so that the $INDEX can keep incrementing from its previous value after Fluent Bit restarts. | /tmp/fluent-bit/s3 | +| store\_dir\_limit\_size | The size of the limitation for disk usage in S3. Limit the amount of s3 buffers in the `store_dir` to limit disk usage. Note: Use `store_dir_limit_size` instead of `storage.total_limit_size` which can be used to other plugins, because S3 has its own buffering system. | 0, which means unlimited | +| s3\_key\_format | Format string for keys in S3. This option supports a UUID, strftime time formatters, a syntax for selecting parts of the Fluent log tag using a syntax inspired by the rewrite\_tag filter. Add $UUID in the format string to insert a random string. Add $INDEX in the format string to insert an integer that increments each upload. The $INDEX value will be saved in the store\_dir so that if Fluent Bit restarts the value will keep incrementing from the previous run. Add $TAG in the format string to insert the full log tag; add $TAG\[0] to insert the first part of the tag in the s3 key. The tag is split into “parts” using the characters specified with the `s3_key_format_tag_delimiters` option. Add extension directly after the last piece of the format string to insert a key suffix. If you want to specify a key suffix and you are in `use_put_object` mode, you must specify $UUID as well. More explanations can be found in the S3 Key Format explainer section further down in this document. See the in depth examples and tutorial in the documentation. Time in s3\_key is the timestamp of the first record in the S3 file. | /fluent-bit-logs/$TAG/%Y/%m/%d/%H/%M/%S | +| s3\_key\_format\_tag\_delimiters | A series of characters which will be used to split the tag into 'parts' for use with the s3\_key\_format option. See the in depth examples and tutorial in the documentation. | . | +| static\_file\_path | Disables behavior where UUID string is automatically appended to end of S3 key name when $UUID is not provided in s3\_key\_format. $UUID, time formatters, $TAG, and other dynamic key formatters all work as expected while this feature is set to true. | false | +| use\_put\_object | Use the S3 PutObject API, instead of the multipart upload API. When this option is on, key extension is only available when $UUID is specified in `s3_key_format`. If $UUID is not included, a random string will be appended at the end of the format string and the key extension cannot be customized in this case. | false | +| role\_arn | ARN of an IAM role to assume (ex. for cross account access). | None | +| endpoint | Custom endpoint for the S3 API. An endpoint can contain scheme and port. | None | +| sts\_endpoint | Custom endpoint for the STS API. | None | +| canned\_acl | [Predefined Canned ACL policy](https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl) for S3 objects. | None | +| compression | Compression type for S3 objects. 'gzip' is currently the only supported value by default. If Apache Arrow support was enabled at compile time, you can also use 'arrow'. For gzip compression, the Content-Encoding HTTP Header will be set to 'gzip'. Gzip compression can be enabled when `use_put_object` is 'on' or 'off' (PutObject and Multipart). Arrow compression can only be enabled with `use_put_object On`. | None | +| content\_type | A standard MIME type for the S3 object; this will be set as the Content-Type HTTP header. | None | +| send\_content\_md5 | Send the Content-MD5 header with PutObject and UploadPart requests, as is required when Object Lock is enabled. | false | +| auto\_retry\_requests | Immediately retry failed requests to AWS services once. This option does not affect the normal Fluent Bit retry mechanism with backoff. Instead, it enables an immediate retry with no delay for networking errors, which may help improve throughput when there are transient/random networking issues. | true | +| log\_key | By default, the whole log record will be sent to S3. If you specify a key name with this option, then only the value of that key will be sent to S3. For example, if you are using Docker, you can specify log\_key log and only the log message will be sent to S3. | None | +| preserve\_data\_ordering | Normally, when an upload request fails, there is a high chance for the last received chunk to be swapped with a later chunk, resulting in data shuffling. This feature prevents this shuffling by using a queue logic for uploads. | true | +| storage\_class | Specify the [storage class](https://docs.aws.amazon.com/AmazonS3/latest/API/API\_PutObject.html#AmazonS3-PutObject-request-header-StorageClass) for S3 objects. If this option is not specified, objects will be stored with the default 'STANDARD' storage class. | None | +| retry\_limit | Integer value to set the maximum number of retries allowed. Note: this configuration is released since version 1.9.10 and 2.0.1. For previous version, the number of retries is 5 and is not configurable. | 1 | +| external\_id | Specify an external ID for the STS API, can be used with the role\_arn parameter if your role requires an external ID. | None | ## TLS / SSL @@ -72,20 +72,19 @@ The plugin requires the following AWS IAM permissions: ## Differences between S3 and other Fluent Bit outputs -The s3 output plugin is special because its use case is to upload files of non-trivial size to an Amazon S3 bucket. This is in contrast to most other outputs which send many requests to upload data in batches of a few Megabytes or less. +The s3 output plugin is special because its use case is to upload files of non-trivial size to an Amazon S3 bucket. This is in contrast to most other outputs which send many requests to upload data in batches of a few Megabytes or less. -When Fluent Bit recieves logs, it stores them in chunks, either in memory or the filesystem depending on your settings. A chunk is usually around 2 MB in size. Fluent Bit sends the chunks in order to each output that matches their tag. Most outputs then send the chunk immediately to their destination. A chunk is sent to the output's "flush callback function", which must return one of `FLB_OK`, `FLB_RETRY`, or `FLB_ERROR`. Fluent Bit keeps count of the return values from each outputs "flush callback function"; these counters are the data source for Fluent Bit's error, retry, and success metrics available in prometheus format via its monitoring interface. +When Fluent Bit recieves logs, it stores them in chunks, either in memory or the filesystem depending on your settings. A chunk is usually around 2 MB in size. Fluent Bit sends the chunks in order to each output that matches their tag. Most outputs then send the chunk immediately to their destination. A chunk is sent to the output's "flush callback function", which must return one of `FLB_OK`, `FLB_RETRY`, or `FLB_ERROR`. Fluent Bit keeps count of the return values from each outputs "flush callback function"; these counters are the data source for Fluent Bit's error, retry, and success metrics available in prometheus format via its monitoring interface. -The S3 output plugin is a Fluent Bit output plugin and thus it conforms to the Fluent Bit output plugin specification. However, since the S3 use case is to upload large files, generally much larger than 2 MB, its behavior is different. The S3 "flush callback function" simply buffers the incoming chunk to the filesystem, and returns an `FLB_OK`. *Consequently, the prometheus metrics available via the Fluent Bit http server are meaningless for S3.* In addition, the `storage.total_limit_size` parameter is not meaningful for S3 since it has its own buffering system in the `store_dir`. Instead, use `store_dir_limit_size`. +The S3 output plugin is a Fluent Bit output plugin and thus it conforms to the Fluent Bit output plugin specification. However, since the S3 use case is to upload large files, generally much larger than 2 MB, its behavior is different. The S3 "flush callback function" simply buffers the incoming chunk to the filesystem, and returns an `FLB_OK`. _Consequently, the prometheus metrics available via the Fluent Bit http server are meaningless for S3._ In addition, the `storage.total_limit_size` parameter is not meaningful for S3 since it has its own buffering system in the `store_dir`. Instead, use `store_dir_limit_size`. -S3 uploads are primarily initiated via the S3 "timer callback function", which runs separately from its "flush callback function". Because S3 has its own system of buffering and its own callback to upload data, the normal sequential data ordering of chunks provided by the Fluent Bit engine may be compromised. Consequently, S3 has the `presevere_data_ordering` option which will ensure data is uploaded in the original order it was collected by Fluent Bit. +S3 uploads are primarily initiated via the S3 "timer callback function", which runs separately from its "flush callback function". Because S3 has its own system of buffering and its own callback to upload data, the normal sequential data ordering of chunks provided by the Fluent Bit engine may be compromised. Consequently, S3 has the `presevere_data_ordering` option which will ensure data is uploaded in the original order it was collected by Fluent Bit. -### Summary: Uniqueness in S3 Plugin - -1. *The HTTP Monitoring interface output metrics are not meaningful for S3*: AWS understands that this is non-ideal; we have [opened an issue with a design](https://github.com/fluent/fluent-bit/issues/6141) that will allow S3 to manage its own output metrics. -2. *You must use `store_dir_limit_size` to limit the space on disk used by S3 buffer files*. -3. *The original ordering of data inputted to Fluent Bit may not be preserved unless you enable `preserve_data_ordering On`*. +### Summary: Uniqueness in S3 Plugin +1. _The HTTP Monitoring interface output metrics are not meaningful for S3_: AWS understands that this is non-ideal; we have [opened an issue with a design](https://github.com/fluent/fluent-bit/issues/6141) that will allow S3 to manage its own output metrics. +2. _You must use `store_dir_limit_size` to limit the space on disk used by S3 buffer files_. +3. _The original ordering of data inputted to Fluent Bit may not be preserved unless you enable `preserve_data_ordering On`_. ## S3 Key Format and Tag Delimiters @@ -122,11 +121,12 @@ The Fluent Bit S3 output was designed to ensure that previous uploads will never For files uploaded with the PutObject API, the S3 output requires that a unique random string be present in the S3 key. This is because many of the use cases for PutObject uploads involve a short time period between uploads such that a timestamp in the S3 key may not be unique enough between uploads. For example, if you only specify minute granularity timestamps in the S3 key, with a small upload size, it is possible to have two uploads that have timestamps set in the same minute. This "requirement" can be disabled with `static_file_path On`. There are three cases where the PutObject API is used: + 1. When you explicitly set `use_put_object On` -2. On startup when the S3 output finds old buffer files in the `store_dir` from a previous run and attempts to send all of them at once. -3. On shutdown, when to prevent data loss the S3 output attempts to send all currently buffered data at once. +2. On startup when the S3 output finds old buffer files in the `store_dir` from a previous run and attempts to send all of them at once. +3. On shutdown, when to prevent data loss the S3 output attempts to send all currently buffered data at once. -Consequently, you should always specify `$UUID` somewhere in your S3 key format. Otherwise, if the PutObject API is used, S3 will append a random 8 character UUID to the end of your S3 key. This means that a file extension set at the end of an S3 key will have the random UUID appended to it. This behavior can be disabled with `static_file_path On`. +Consequently, you should always specify `$UUID` somewhere in your S3 key format. Otherwise, if the PutObject API is used, S3 will append a random 8 character UUID to the end of your S3 key. This means that a file extension set at the end of an S3 key will have the random UUID appended to it. This behavior can be disabled with `static_file_path On`. Let's walk through this via an example. First case, we attempt to set a `.gz` extension without specifying `$UUID`. @@ -148,7 +148,7 @@ In the case where pending data is uploaded on shutdown, if the tag was `app`, th /app/2022/12/25/00_00_00.gz-apwgylqg ``` -The S3 output appended a random string to the "extension", since this upload on shutdown used the PutObject API. +The S3 output appended a random string to the "extension", since this upload on shutdown used the PutObject API. There are two ways of disabling this behavior. Option 1, use `static_file_path`: @@ -166,6 +166,7 @@ There are two ways of disabling this behavior. Option 1, use `static_file_path`: ``` Option 2, explicitly define where the random UUID will go in the S3 key format: + ``` [OUTPUT] Name s3 diff --git a/pipeline/outputs/vivo-exporter.md b/pipeline/outputs/vivo-exporter.md index 1b0244778..1f8cd5698 100644 --- a/pipeline/outputs/vivo-exporter.md +++ b/pipeline/outputs/vivo-exporter.md @@ -4,13 +4,16 @@ Vivo Exporter is an output plugin that exposes logs, metrics, and traces through ### Configuration Parameters -| Key | Description | Default | -| ---------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | ------- | -| `empty_stream_on_read` | If enabled, when an HTTP client consumes the data from a stream, the stream content will be removed. | Off | -| `stream_queue_size` | Specify the maximum queue size per stream. Each specific stream for logs, metrics and traces can hold up to `stream_queue_size` bytes. | 20M | +| Key | Description | Default | +| ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------- | ------- | +| `empty_stream_on_read` | If enabled, when an HTTP client consumes the data from a stream, the stream content will be removed. | Off | +| `stream_queue_size` | Specify the maximum queue size per stream. Each specific stream for logs, metrics and traces can hold up to `stream_queue_size` bytes. | 20M | +| `http_cors_allow_origin` | Specify the value for the HTTP Access-Control-Allow-Origin header (CORS). | | ### Getting Started +Here is a simple configuration of Vivo Exporter, note that this example is not based on defaults. + ```python [INPUT] name dummy @@ -18,9 +21,10 @@ Vivo Exporter is an output plugin that exposes logs, metrics, and traces through rate 2 [OUTPUT] - name vivo_exporter - empty_stream_on_read off - stream_queue_size 20M
 + name vivo_exporter + empty_stream_on_read off + stream_queue_size 20M + 
http_cors_allow_origin * ``` ### How it works