From aeaf178424dffd3e46f986dc05057d6fcfcd6fb5 Mon Sep 17 00:00:00 2001 From: Hiroshi Hatake Date: Fri, 11 Oct 2024 15:17:32 +0900 Subject: [PATCH] in_tail: Add a description and note for Unicode.Encoding parameter Signed-off-by: Hiroshi Hatake --- pipeline/inputs/tail.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/pipeline/inputs/tail.md b/pipeline/inputs/tail.md index c5019a75c..343314618 100644 --- a/pipeline/inputs/tail.md +++ b/pipeline/inputs/tail.md @@ -37,9 +37,18 @@ The plugin supports the following configuration parameters: | Static\_Batch\_Size | Set the maximum number of bytes to process per iteration for the monitored static files (files that already exists upon Fluent Bit start). | 50M | | File\_Cache\_Advise | Set the posix_fadvise in POSIX_FADV_DONTNEED mode. This will reduce the usage of the kernel file cache. This option is ignored if not running on Linux. | On | | Threaded | Indicates whether to run this input in its own [thread](../../administration/multithreading.md#inputs). | `false` | +| Unicode.Encoding | Set the encoding which the origin of character encoding. Currently, UTF-16LE, UTF-16BE, and auto is supported. | | Note that if the database parameter `DB` is **not** specified, by default the plugin will start reading each target file from the beginning. This also might cause some unwanted behavior, for example when a line is bigger that `Buffer_Chunk_Size` and `Skip_Long_Lines` is not turned on, the file will be read from the beginning of each `Refresh_Interval` until the file is rotated. +Note that `Unicode.Encoding` depends on simdutf library which is written in C++11 or above. +So, the older platforms are not supported for this feature. +In addition, `Unicode.Encoding auto` is not covered for the all of the usages. +This is because sometimes this auto-detecting for character encodings makes a mistake to guess the correct encoding. +We recommend to use `UTF-16LE` or `UTF-16BE` if the target file encoding is pre-determined or known beforehand. +In details, this parameter requests to use 2-bytes aligned chunk and buffer sizes. +If they are not aligned for 2 bytes, Fluent Bit will use 2-bytes alignments automatically to avoid character breakages on consuming boundaries. + ## Multiline Support Starting from Fluent Bit v1.8 we have introduced a new Multiline core functionality. For Tail input plugin, it means that now it supports the **old** configuration mechanism but also the **new** one. In order to avoid breaking changes, we will keep both but encourage our users to use the latest one. We will call the two mechanisms as: