alert on failed to write data into buffer by buffer overflow action=:block #104

GAHila · 2019-07-15T17:26:11Z

We run pretty much everything on kubernetes/prometheus/ fluentd/elasticsearch and currently using we are

k8s.gcr.io/fluentd-elasticsearch:v2.4.0

which seems to include this plugin.

We sometimes have bursts of logs in our environments basically, something spamming the logs and it causes the fluentd output plugin that sends logs over to elasticsearch to block given our overflow_action is block (we do not want drop or throw_exception as we do not log loss). However, when buffers get full basically there is no more logs sent over and since the situation needs to be fixed by fixing the log spammer, we cannot hope to solve this by increasing the following values:

      flush_interval 
      chunk_limit_size 
      queue_limit_length

I do not see any way using this plugin to monitor to this particular scenario as the block is not complete there is some logs moving through so cannot rely on the counter of outgoing logs(fluentd_output_status_num_records_total) . I also cannot rely on buffer size as it seems gauge fluctuates a lot and the number of errors does not reflect this (which is just a warning strangely enough).

However, this situation has caused us a couple of times to miss logs for days and having to manually remove big logs and it is quite a headache.

Am I missing some thing on how to alert on this scenario using this plugin?

The text was updated successfully, but these errors were encountered:

kazegusuri · 2019-07-21T08:24:29Z

I'm not sure you want know overflow chunk limit size by sending a very large message or exceeding queue limits by slow flushing.
The former case is it's difficult to find it because fluentd does not provide the metrics yet (AFAIK).
The latter case is you can use prometheus_output_monitor plugin. It provides statues of each output plugin as prometheus metrics. With fluentd_output_status_buffer_queue_length you can set the threshold to alert for slow flushing.

sb1975 · 2020-02-14T16:08:42Z

We use below in buffer plugin : overflow_action drop_oldest_chunk (https://docs.fluentd.org/configuration/buffer-section ).
However, we would like to know the metrics and alert in case the buffer overflow drop is happening and how many times the buffer overflow is happening in a day.
This is an important problem, I hope someone picks up.

tirelibirefe · 2020-07-06T09:45:52Z

v1.0 has still the same problem.
Anybody found a workaround?

nikhilagrawal577 · 2020-10-14T06:45:43Z

I am facing the same issue. Any workaround ?

cosmo0920 · 2020-10-14T07:26:56Z

From Fluentd documentation, using overflow_action block is out of scope to improve write performance:

overflow_action [enum: throw_exception/block/drop_oldest_chunk]

Default: throw_exception

How does output plugin behave when its buffer queue is full?

throw_exception: raises an exception to show the error in log

block: wait until buffer can store more data.
After buffer is ready for storing more data, writing buffer is retried.
Because of such behavior, block is suitable for processing batch execution,
so do not use for improving processing throughput or performance.

drop_oldest_chunk: drops/purges the oldest chunk to accept newly
incoming chunk

ref: https://docs.fluentd.org/configuration/buffer-section#flushing-parameters

Using overflow_action throw_exception or overflow_action drop_oldest_chunk should be handled in this case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alert on failed to write data into buffer by buffer overflow action=:block #104

alert on failed to write data into buffer by buffer overflow action=:block #104

GAHila commented Jul 15, 2019

kazegusuri commented Jul 21, 2019

sb1975 commented Feb 14, 2020

tirelibirefe commented Jul 6, 2020

nikhilagrawal577 commented Oct 14, 2020

cosmo0920 commented Oct 14, 2020

alert on failed to write data into buffer by buffer overflow action=:block #104

alert on failed to write data into buffer by buffer overflow action=:block #104

Comments

GAHila commented Jul 15, 2019

kazegusuri commented Jul 21, 2019

sb1975 commented Feb 14, 2020

tirelibirefe commented Jul 6, 2020

nikhilagrawal577 commented Oct 14, 2020

cosmo0920 commented Oct 14, 2020