Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser Filter on Docker reduces Fluent Bit benchmark performance from 100MB/s to 15MB/s #7633

Closed
mselinger75 opened this issue Jun 30, 2023 · 12 comments

Comments

@mselinger75
Copy link

mselinger75 commented Jun 30, 2023

Bug Report

Describe the bug
I have a log generator which produces 30mb/sec logs to the file. Then I have FluentBit setup to tail this file, and send to NULL output.
When I look at FluentBit metrics I see that number of logs processed by tail input is way less than number of logs generated.
I've tried with different log volumes:
100mb/sec, 50mb/sec, 30mb/sec, 15mb/sec. The only rate I see FluentBit is able to keep up with is 15mb/sec.

I tested on versions 1.9.8 and 2.1.6, for 2.1.6 I've tried threaded on - didn't help.
For 2.1.6 when I set threaded on - I see a lot of errors in FluentBit log:

[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0

To Reproduce
My config:

[SERVICE]
    Flush         1
    Log_Level     info
    Daemon        off
    Parsers_File  parsers.conf
    HTTP_Server   On
    HTTP_Listen   0.0.0.0
    HTTP_Port     2020
    storage.metrics            on
    storage.sync               normal
    storage.checksum           off
    storage.max_chunks_up      128
    storage.backlog.mem_limit  100M
    storage.path               /var/log/flb/storage/

[FILTER]
    Name parser
    Match k8s_audit
    Key_Name log
    Parser docker

[INPUT]
    Name               tail
    Tag                k8s_audit
    Path               /usr/local/home/test/fluentbit/output.txt
    DB                 /var/log/flb/fluent-bit.db
    DB.locking         true
    Read_from_Head     false
    Buffer_Max_Size    20M
    Buffer_Chunk_Size  2MB
    Mem_Buf_Limit      100MB
    Refresh_Interval   5
    #Rotate_Wait 15
    storage.type       filesystem
    threaded on
    Skip_Long_Lines    On

[Output]
    Name null
    Match k8s_audit
    Retry_Limit               no_limits
    workers 8
  • Steps to reproduce the problem:
    Run FluentBit with config above:
    bin/fluent-bit -c /usr/local/home/test/fluentbit/fluent-bit.conf

Generate logs to /usr/local/home/test/fluentbit/output.txt
Check that FluentBit metrics are behind the logs volume by looking at http://localhost:2020/api/v1/metrics

Expected behavior
FluentBit should be able to keep up with logs volume

Your Environment
Debian 6.1.25-1rodete1 (2023-05-11) x86_64 GNU/Linux

  • Version used:
    Both 1.9.8 and 2.1.6
@agup006
Copy link
Member

agup006 commented Jun 30, 2023

What application is writing to the filesystem? Also is it running on the same Operating System as where Fluent Bit is running? Lastly, what are the specifications of your server?

@mselinger75
Copy link
Author

mselinger75 commented Jun 30, 2023

What application is writing to the filesystem?

https://github.com/GoogleCloudPlatform/ops-agent/blob/master/integration_test/soak_test/cmd/launcher/log_generator.py

is it running on the same Operating System as where Fluent Bit is running?

yes

what are the specifications of your server?

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  128
  On-line CPU(s) list:   0-127
Vendor ID:               AuthenticAMD
  Model name:            AMD EPYC 7B12


MemTotal:       528330084 kB
MemFree:        328793528 kB
MemAvailable:   494561172 kB

@agup006
Copy link
Member

agup006 commented Jun 30, 2023

What disk is it using?

@mselinger75
Copy link
Author

What disk is it using?

which command do you want me run for that?

Also, I forgot to mention: once I remove docker FILTER plugin from the FluentBit config - I can process 100mb/sec no problem

@agup006
Copy link
Member

agup006 commented Jun 30, 2023

What disk is it using?

which command do you want me run for that?

No specific command, more if it is using SSD / HDD and if we are facing limitations of I/O. Based on your next comment though probably not relevant

Also, I forgot to mention: once I remove docker FILTER plugin from the FluentBit config - I can process 100mb/sec no problem

Yes very helpful :), what happens if you add the Docker Parser in the Input tail?

@agup006 agup006 changed the title FluentBit can't keep up with logs volume Parser Filter on Docker reduces Fluent Bit benchmark performance from 100MB/s to 15MB/s Jun 30, 2023
@mselinger75
Copy link
Author

mselinger75 commented Jun 30, 2023

Yes very helpful :), what happens if you add the Docker Parser in the Input tail?

same issue

@mselinger75
Copy link
Author

One more note:
For 2.1.6 when I set threaded on - metrics seems to match log generator numbers, but I see a lot of errors in FluentBit log:

[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0

not sure what does it mean

@agup006
Copy link
Member

agup006 commented Jul 2, 2023

One more note: For 2.1.6 when I set threaded on - metrics seems to match log generator numbers, but I see a lot of errors in FluentBit log:

Could you elaborate on what you mean by metrics seem to match log generator numbers? Is this saying you are not seeing a performance bottleneck?

[tail.0] failed buffer write, retries=0 [tail.0] failed buffer write, retries=0 [tail.0] failed buffer write, retries=0 [tail.0] failed buffer write, retries=0 [tail.0] failed buffer write, retries=0 [tail.0] failed buffer write, retries=0 [tail.0] failed buffer write, retries=0

not sure what does it mean

@mselinger75
Copy link
Author

mselinger75 commented Jul 5, 2023

For 2.1.6 when I set threaded on for 100mb/sec and run for few minutes I see:

  1. Backpressure on file system.
  2. In FluentBit logs I see
[2023/07/05 14:22:12] [ info] [input:tail:tail.0] inode=15370259 handle rotation(): /usr/local/home/fbit/output.txt => /usr/local/home/fbit/output.txt.old
[2023/07/05 14:22:12] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=15370259 watch_fd=15
[2023/07/05 14:22:12] [ info] [input:tail:tail.0] inotify_fs_add(): inode=15370259 watch_fd=16 name=/usr/local/home/fbit/output.txt.old
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=1
[tail.0] failed buffer write, retries=2
[tail.0] failed buffer write, retries=3
[tail.0] failed buffer write, retries=4
[tail.0] failed buffer write, retries=5
[tail.0] failed buffer write, retries=6
[tail.0] failed buffer write, retries=7
[tail.0] failed buffer write, retries=8
[tail.0] failed buffer write, retries=9
[2023/07/05 14:22:16] [error] [input:tail:tail.0] could not enqueue records into the ring buffer
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=1
[tail.0] failed buffer write, retries=2
[tail.0] failed buffer write, retries=3
[tail.0] failed buffer write, retries=4
[tail.0] failed buffer write, retries=5
[tail.0] failed buffer write, retries=6
[tail.0] failed buffer write, retries=7
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[2023/07/05 14:22:19] [ info] [input:tail:tail.0] inotify_fs_add(): inode=15370063 watch_fd=17 name=/usr/local/home/fbit/output.txt
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[2023/07/05 14:22:20] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=15370259 watch_fd=16
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0
[tail.0] failed buffer write, retries=0

  1. When I stop log generator, I see backpressure resolved within ~1 min. When I check http://localhost:2020/api/v1/metrics I see number of records matches number of logs generated.

I also see fbit-pipeline thread consumes 100% of CPU:

image

On a side note: I noticed multiple time a problem when if I stop log generator, FluentBit also stops processing tail input, even though I see it didn't process all the records according to metrics stats. If I restart FluentBit I see it immediately processes the rest of the file with logs.

@lecaros
Copy link
Contributor

lecaros commented Aug 16, 2023

Hi @mselinger75
could you please try with a build from this PR?
#7815

Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

@github-actions github-actions bot added the Stale label Dec 11, 2023
Copy link
Contributor

This issue was closed because it has been stalled for 5 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants