You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Pending tasks fails to flush during hot reload when using external go output plugins. Basically, if there are any chunks pending to be flushed during a hot reload then those chunks fails to flush if you are using a go output plugin. The error message is
[2024/12/16 12:15:29] [ warn] [engine] failed to flush chunk '1-1734351323.147762458.flb', retry in 1 seconds: task_id=0, input=fluent-tail-input > output=new_alias_3
Impact of the issue
When using external go output plugins, during a hot reload if there are any chunks pending to be flushed, then those will fail continuously until the retries are exhausted and those chunks are dropped so ultimately you end up losing all those pending chunks.
And because the fluent-bit keeps retrying and keeps failing to flush chunks, it ultimately delays the hot reload by a several minutes depending on how many pending chunks were waiting to be flushed and the value set for Retry_Limit. This can again lead to delay in log processing or even losing logs because the new config wasn't reloaded for several minutes.
To Reproduce
Steps to reproduce the problem:
I prepared a repository with all the required files and steps to reproduce the problem. Please clone the repository https://github.com/imankurpatel000/fluent-bit-hot-reload-issue/ and follow the steps as provided in readme. It just requires you to run a few commands and that should replicate the issue for you. Feel free to let me know if there is further help required to replicate the issue.
Expected behavior
When the fluent-bit is hot reloaded, it should allow go plugins to flush the pending chunks without any error.
Additional context
This issue is caused by the changes done in #7997, specifically commit 25b470d which starts returning FLB_RETRY during a hot reload and does not actually call the plugin to flush the remaining chunks. And because it returns FLB_RETRY every time, fluent-bit keeps retrying all the chunks with exponential backoff which overall delays the hot reload and we also lose the pending chunks. The comment in this code says
/* To prevent flush callback executions, we need to check the * status of hot-reloading. The actual problem is: we don't have * pause procedure/mechanism for output plugin. For now, we just halt the * flush callback here during hot-reloading is in progress. */
So maybe there is some reason behind it, but at least, I can't understand it. Fluent-bit is already pausing all the inputs so new logs are anyway not being ingested so why not just let the output plugin flush out pending chunks and continue with the actual reload process. I also don't understand why this was added for external go plugins but not for internal output plugins because they continue to flush out pending chunks.
I already tested by removing this code and after which I don't see any problem with go output plugins during hot reloading. So I am going to raise a PR to remove this code. But please feel free to let me know if this code is actually required and if it is required then how else can we solve this problem.
The text was updated successfully, but these errors were encountered:
Bug Report
Describe the bug
Pending tasks fails to flush during hot reload when using external go output plugins. Basically, if there are any chunks pending to be flushed during a hot reload then those chunks fails to flush if you are using a go output plugin. The error message is
Impact of the issue
Retry_Limit
. This can again lead to delay in log processing or even losing logs because the new config wasn't reloaded for several minutes.To Reproduce
Expected behavior
When the fluent-bit is hot reloaded, it should allow go plugins to flush the pending chunks without any error.
Your Environment
Additional context
This issue is caused by the changes done in #7997, specifically commit 25b470d which starts returning
FLB_RETRY
during a hot reload and does not actually call the plugin to flush the remaining chunks. And because it returnsFLB_RETRY
every time, fluent-bit keeps retrying all the chunks with exponential backoff which overall delays the hot reload and we also lose the pending chunks. The comment in this code saysSo maybe there is some reason behind it, but at least, I can't understand it. Fluent-bit is already pausing all the inputs so new logs are anyway not being ingested so why not just let the output plugin flush out pending chunks and continue with the actual reload process. I also don't understand why this was added for external go plugins but not for internal output plugins because they continue to flush out pending chunks.
I already tested by removing this code and after which I don't see any problem with go output plugins during hot reloading. So I am going to raise a PR to remove this code. But please feel free to let me know if this code is actually required and if it is required then how else can we solve this problem.
The text was updated successfully, but these errors were encountered: