-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"[ Error] timerfd: Too many open files, errno=24 at /tmp/fluent-bit/lib/monkey/mk_core/mk_event_epoll.c:221" #9966
Comments
I ran into this recently, I think on RHEL8 services are defaulted to running with a max open file ulimit of 1024. Had to add LimitNOFILE=12345 or whatever big number into the systemd unit file to get around it. Fluentbit doesn't have any code to try change its own (soft) limit of open files - it just crashes out if open() syscalls etc. fail. So the LimitNOFILESoft=1024 thing might be causing your problems. Each input plugin spins up one or more POSIX pipes for internal signalling, which consumes 2 file descriptors (one for the in side one for the out side). epoll and inotify stuff happens too which all adds up to consuming more file descriptors than you might first think. |
For same scenario in my test case, adding
However, I still have some confuses regarding the case of Output/Loki is down but fluent-bit is keeping running and buffering data: 1> I noticed that the maximum task_id for fluent-bit is still 2048 (0-2047). Even though this time there is no "Too many open files" error, but I am not sure if this value is fixed or not, as the result of "ulimit -n" for common user is still 1024. 2> With the storage strategy of memory + filesystem, the local file of chunks is 2050 (total_chunks) at maximum as I could list in data/tail.xx/ folder.
Basically, I assumed one task_id deal with one fs_chunk for flushing action (I set retry_limit to no_limits), but the number of them are diff. Does the number of total_chunks have some relationship with task_id or "LimitNOFILE" ? 3> Still above case, when total_chunks is reach 2050, only the number of up and down fs_chunks are changed but keeping the sum of them is 2050. Appreciate for any comments. |
hi @nuclearpidgeon, after trying the suggestion offered, FLB is working as expected for now and this may bring several other questions as my colleague mentioned, it would be much appreciated if you could help check these at your convenience. |
Bug Report
Describe the bug
We are trying to understand how much data Fluent-Bit can store when downstream (Loki) is out of service, below is our configuration:
Logs below started posted out:
Output of storage api:
Below is the kernel config:
Per the suggestion from links below, it seems I need to tuning the kernel configuration mentioned above:
#1777 (comment)
#9151 (comment)
Your Environment
And here are my questions:
task_id
will not be higher than2047
, may I know if this is by design? And is there any mapping between the task_id and chunk file stored in the filesystem?The text was updated successfully, but these errors were encountered: