-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fluent-bit hanging and stopping operation #9927
Comments
I can confirm that I am experiencing the same issue with the same configuration and versions. The behavior is identical to what has been reported. |
@Snak3GMS @nikitamishagin any logs that you can attach ? Seems like behavior isn't exactly perfect, but these might all be related: These all exhibit behaviors of hanging |
[pid 18] 00:12:40.898547 <... restart_syscall resumed>) = 0 <0.891804>
[pid 18] 00:12:40.898699 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, <unfinished ...>
[pid 1] 00:12:41.064008 <... clock_nanosleep resumed>0x7fff3e6c4b10) = 0 <1.000104>
[pid 1] 00:12:41.064069 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, <unfinished ...>
[pid 18] 00:12:41.898923 <... clock_nanosleep resumed>0x7f2a635fb760) = 0 <1.000146>
[pid 18] 00:12:41.899006 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, <unfinished ...>
[pid 1] 00:12:42.064434 <... clock_nanosleep resumed>0x7fff3e6c4b10) = 0 <1.000231>
[pid 1] 00:12:42.064566 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, <unfinished ...>
[pid 16] 00:12:42.152983 <... epoll_wait resumed>[{EPOLLIN, {u32=1684037792, u64=139820049391776}}], 64, -1) = 1 <1.498995>
[pid 15] 00:12:42.153083 <... epoll_wait resumed>[{EPOLLIN, {u32=1688232096, u64=139820053586080}}], 64, -1) = 1 <1.499113>
[pid 14] 00:12:42.153209 <... epoll_wait resumed>[{EPOLLIN, {u32=1692426400, u64=139820057780384}}], 64, -1) = 1 <1.499217>
[pid 13] 00:12:42.153285 <... epoll_wait resumed>[{EPOLLIN, {u32=1696620704, u64=139820061974688}}], 64, -1) = 1 <1.499286>
[pid 12] 00:12:42.153319 <... epoll_wait resumed>[{EPOLLIN, {u32=1742758048, u64=139820108112032}}], 64, -1) = 1 <1.499237>
[pid 11] 00:12:42.153364 <... epoll_wait resumed>[{EPOLLIN, {u32=1746952352, u64=139820112306336}}], 64, -1) = 1 <1.499269>
[pid 10] 00:12:42.153389 <... epoll_wait resumed>[{EPOLLIN, {u32=1751146656, u64=139820116500640}}], 64, -1) = 1 <1.499238>
[pid 9] 00:12:42.153414 <... epoll_wait resumed>[{EPOLLIN, {u32=1755340960, u64=139820120694944}}], 64, -1) = 1 <1.499250>
[pid 16] 00:12:42.153447 read(184, <unfinished ...>
[pid 15] 00:12:42.153461 read(180, <unfinished ...>
[pid 14] 00:12:42.153474 read(176, <unfinished ...>
[pid 13] 00:12:42.153501 read(172, <unfinished ...>
[pid 12] 00:12:42.153515 read(135, <unfinished ...>
[pid 11] 00:12:42.153534 read(137, <unfinished ...>
[pid 10] 00:12:42.153548 read(136, <unfinished ...>
[pid 16] 00:12:42.153574 <... read resumed>"\1\0\0\0\0\0\0\0", 8) = 8 <0.000111>
[pid 15] 00:12:42.153629 <... read resumed>"\1\0\0\0\0\0\0\0", 8) = 8 <0.000158>
[pid 14] 00:12:42.153650 <... read resumed>"\1\0\0\0\0\0\0\0", 8) = 8 <0.000167>
[pid 13] 00:12:42.153669 <... read resumed>"\1\0\0\0\0\0\0\0", 8) = 8 <0.000160>
[pid 12] 00:12:42.153690 <... read resumed>"\1\0\0\0\0\0\0\0", 8) = 8 <0.000166>
[pid 11] 00:12:42.153712 <... read resumed>"\1\0\0\0\0\0\0\0", 8) = 8 <0.000168>
[pid 9] 00:12:42.153734 read(134, <unfinished ...>
[pid 16] 00:12:42.153754 epoll_wait(161, <unfinished ...>
[pid 15] 00:12:42.153803 epoll_wait(158, <unfinished ...>
[pid 14] 00:12:42.153828 epoll_wait(149, <unfinished ...>
[pid 13] 00:12:42.153851 epoll_wait(140, <unfinished ...>
[pid 12] 00:12:42.153871 epoll_wait(121, <unfinished ...>
[pid 11] 00:12:42.153893 epoll_wait(118, <unfinished ...>
[pid 10] 00:12:42.153908 <... read resumed>"\1\0\0\0\0\0\0\0", 8) = 8 <0.000349>
[pid 16] 00:12:42.153966 <... epoll_wait resumed>[], 64, 0) = 0 <0.000189>
[pid 15] 00:12:42.153986 <... epoll_wait resumed>[], 64, 0) = 0 <0.000168>
[pid 14] 00:12:42.154003 <... epoll_wait resumed>[], 64, 0) = 0 <0.000159>
[pid 13] 00:12:42.154020 <... epoll_wait resumed>[], 64, 0) = 0 <0.000157>
[pid 12] 00:12:42.154037 <... epoll_wait resumed>[], 64, 0) = 0 <0.000152>
[pid 11] 00:12:42.154055 <... epoll_wait resumed>[], 64, 0) = 0 <0.000153>
[pid 9] 00:12:42.154072 <... read resumed>"\1\0\0\0\0\0\0\0", 8) = 8 <0.000329>
[pid 16] 00:12:42.154126 epoll_wait(161, <unfinished ...>
[pid 15] 00:12:42.154161 epoll_wait(158, <unfinished ...>
[pid 14] 00:12:42.154176 epoll_wait(149, <unfinished ...>
[pid 13] 00:12:42.154188 epoll_wait(140, <unfinished ...>
[pid 12] 00:12:42.154203 epoll_wait(121, <unfinished ...>
[pid 11] 00:12:42.154216 epoll_wait(118, <unfinished ...>
[pid 10] 00:12:42.154229 epoll_wait(109, <unfinished ...>
[pid 9] 00:12:42.154242 epoll_wait(106, <unfinished ...>
[pid 10] 00:12:42.154258 <... epoll_wait resumed>[], 64, 0) = 0 <0.000020>
[pid 9] 00:12:42.154278 <... epoll_wait resumed>[], 64, 0) = 0 <0.000027>
[pid 10] 00:12:42.154297 epoll_wait(109, <unfinished ...>
[pid 9] 00:12:42.154313 epoll_wait(106, <unfinished ...>
[pid 19] 00:12:42.652866 <... epoll_wait resumed>[{EPOLLIN, {u32=1644290128, u64=139820009644112}}], 256, -1) = 1 <2.646141>
[pid 19] 00:12:42.652957 read(202, "\1\0\0\0\0\0\0\0", 8) = 8 <0.000025>
[pid 19] 00:12:42.653034 epoll_wait(196, <unfinished ...>
[pid 18] 00:12:42.899288 <... clock_nanosleep resumed>0x7f2a635fb760) = 0 <1.000156>
[pid 18] 00:12:42.899357 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, <unfinished ...>
[pid 1] 00:12:43.064814 <... clock_nanosleep resumed>0x7fff3e6c4b10) = 0 <1.000198>
[pid 1] 00:12:43.064876 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, <unfinished ...>
[pid 15] 00:12:43.652891 <... epoll_wait resumed>[{EPOLLIN, {u32=1688232096, u64=139820053586080}}], 64, -1) = 1 <1.498676>
[pid 13] 00:12:43.653018 <... epoll_wait resumed>[{EPOLLIN, {u32=1696620704, u64=139820061974688}}], 64, -1) = 1 <1.498813>
[pid 11] 00:12:43.653115 <... epoll_wait resumed>[{EPOLLIN, {u32=1746952352, u64=139820112306336}}], 64, -1) = 1 <1.498886>
[pid 9] 00:12:43.653176 <... epoll_wait resumed>[{EPOLLIN, {u32=1755340960, u64=139820120694944}}], 64, -1) = 1 <1.498851>
[pid 16] 00:12:43.653209 <... epoll_wait resumed>[{EPOLLIN, {u32=1684037792, u64=139820049391776}}], 64, -1) = 1 <1.499074>
[pid 15] 00:12:43.653234 read(180, <unfinished ...>
[pid 14] 00:12:43.653252 <... epoll_wait resumed>[{EPOLLIN, {u32=1692426400, u64=139820057780384}}], 64, -1) = 1 <1.499068>
[pid 13] 00:12:43.653276 read(172, <unfinished ...>
[pid 12] 00:12:43.653292 <... epoll_wait resumed>[{EPOLLIN, {u32=1742758048, u64=139820108112032}}], 64, -1) = 1 <1.499079>
[pid 11] 00:12:43.653315 read(137, <unfinished ...>
[pid 10] 00:12:43.653328 <... epoll_wait resumed>[{EPOLLIN, {u32=1751146656, u64=139820116500640}}], 64, -1) = 1 <1.499021>
[pid 16] 00:12:43.653356 read(184, <unfinished ...>
[pid 15] 00:12:43.653372 <... read resumed>"\1\0\0\0\0\0\0\0", 8) = 8 <0.000127>
[pid 14] 00:12:43.653395 read(176, <unfinished ...>
[pid 13] 00:12:43.653409 <... read resumed>"\1\0\0\0\0\0\0\0", 8) = 8 <0.000124>
[pid 11] 00:12:43.653429 <... read resumed>"\1\0\0\0\0\0\0\0", 8) = 8 <0.000105>
[pid 9] 00:12:43.653450 read(134, <unfinished ...>
[pid 16] 00:12:43.653474 <... read resumed>"\1\0\0\0\0\0\0\0", 8) = 8 <0.000108>
[pid 15] 00:12:43.653504 epoll_wait(158, <unfinished ...>
[pid 14] 00:12:43.653567 <... read resumed>"\1\0\0\0\0\0\0\0", 8) = 8 <0.000161>
[pid 13] 00:12:43.653590 epoll_wait(140, <unfinished ...>
[pid 12] 00:12:43.653603 read(135, <unfinished ...>
[pid 11] 00:12:43.653617 epoll_wait(118, <unfinished ...>
[pid 10] 00:12:43.653631 read(136, <unfinished ...>
[pid 9] 00:12:43.653641 <... read resumed>"\1\0\0\0\0\0\0\0", 8) = 8 <0.000182>
[pid 16] 00:12:43.653672 epoll_wait(161, <unfinished ...>
[pid 15] 00:12:43.653685 <... epoll_wait resumed>[], 64, 0) = 0 <0.000171>
[pid 14] 00:12:43.653702 epoll_wait(149, <unfinished ...>
[pid 13] 00:12:43.653717 <... epoll_wait resumed>[], 64, 0) = 0 <0.000118>
[pid 12] 00:12:43.653732 <... read resumed>"\1\0\0\0\0\0\0\0", 8) = 8 <0.000121>
[pid 11] 00:12:43.653754 <... epoll_wait resumed>[], 64, 0) = 0 <0.000127>
[pid 10] 00:12:43.653801 <... read resumed>"\1\0\0\0\0\0\0\0", 8) = 8 <0.000162>
[pid 9] 00:12:43.653825 epoll_wait(106, <unfinished ...>
[pid 16] 00:12:43.653848 <... epoll_wait resumed>[], 64, 0) = 0 <0.000167>
[pid 15] 00:12:43.653865 epoll_wait(158, <unfinished ...>
[pid 14] 00:12:43.653878 <... epoll_wait resumed>[], 64, 0) = 0 <0.000166>
[pid 13] 00:12:43.653894 epoll_wait(140, <unfinished ...>
[pid 12] 00:12:43.653908 epoll_wait(121, <unfinished ...>
[pid 11] 00:12:43.653922 epoll_wait(118, <unfinished ...>
[pid 10] 00:12:43.653938 epoll_wait(109, <unfinished ...>
[pid 9] 00:12:43.653952 <... epoll_wait resumed>[], 64, 0) = 0 <0.000118>
[pid 16] 00:12:43.653974 epoll_wait(161, <unfinished ...>
[pid 14] 00:12:43.653986 epoll_wait(149, <unfinished ...>
[pid 12] 00:12:43.653998 <... epoll_wait resumed>[], 64, 0) = 0 <0.000081>
[pid 10] 00:12:43.654015 <... epoll_wait resumed>[], 64, 0) = 0 <0.000069>
[pid 12] 00:12:43.654038 epoll_wait(121, <unfinished ...>
[pid 9] 00:12:43.654051 epoll_wait(106, <unfinished ...>
[pid 10] 00:12:43.654067 epoll_wait(109, <unfinished ...>
[pid 18] 00:12:43.899534 <... clock_nanosleep resumed>0x7f2a635fb760) = 0 <1.000131>
[pid 18] 00:12:43.899609 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, <unfinished ...>
[pid 1] 00:12:44.065226 <... clock_nanosleep resumed>0x7fff3e6c4b10) = 0 <1.000236>
[pid 1] 00:12:44.065374 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, <unfinished ...>
[pid 19] 00:12:44.241955 <... epoll_wait resumed>[{EPOLLIN, {u32=1644212352, u64=139820009566336}}], 256, -1) = 1 <1.588873>
[pid 19] 00:12:44.242051 accept4(199, {sa_family=AF_INET, sin_port=htons(33386), sin_addr=inet_addr("10.99.10.35")}, [16], SOCK_CLOEXEC|SOCK_NONBLOCK) = 46 <0.000034>
[pid 19] 00:12:44.242162 epoll_ctl(196, EPOLL_CTL_ADD, 46, {EPOLLIN|EPOLLERR|EPOLLHUP|EPOLLRDHUP, {u32=1644781568, u64=139820010135552}}) = 0 <0.000017>
[pid 19] 00:12:44.242232 epoll_wait(196, [{EPOLLIN, {u32=1644781568, u64=139820010135552}}], 256, -1) = 1 <0.000014>
[pid 19] 00:12:44.242305 recvfrom(46, "GET /api/v1/metrics/prometheus H"..., 4096, 0, NULL, NULL) = 1733 <0.000017>
[pid 19] 00:12:44.242478 newfstatat(AT_FDCWD, "/api/v1/metrics/prometheus", 0x7f2a62bfafd0, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory) <0.000034>
[pid 19] 00:12:44.242809 writev(46, [{iov_base="HTTP/1.1 200 OK\r\n", iov_len=17}, {iov_base="Server: Monkey/1.7.2\r\nDate: Sat,"..., iov_len=59}, {iov_base="Transfer-Encoding: chunked\r\n", iov_len=28}], 3) = 104 <0.000050>
[pid 19] 00:12:44.242926 writev(46, [{iov_base="Content-Type: text/plain; versio"..., iov_len=41}, {iov_base="\r\n", iov_len=2}], 2) = 43 <0.000034>
[pid 19] 00:12:44.243018 sendto(46, "2255\r\n", 6, 0, NULL, 0) = 6 <0.000054>
[pid 19] 00:12:44.243136 epoll_ctl(196, EPOLL_CTL_DEL, 46, NULL) = 0 <0.000019>
[pid 19] 00:12:44.243221 sendto(46, "# HELP fluentbit_filter_add_reco"..., 8789, 0, NULL, 0) = 8789 <0.000075>
[pid 19] 00:12:44.243357 sendto(46, "\r\n", 2, 0, NULL, 0) = 2 <0.000019>
[pid 19] 00:12:44.243430 epoll_ctl(196, EPOLL_CTL_ADD, 46, {EPOLLOUT|EPOLLERR|EPOLLHUP|EPOLLRDHUP, {u32=1644781568, u64=139820010135552}}) = 0 <0.000014>
[pid 19] 00:12:44.243504 epoll_wait(196, [{EPOLLOUT, {u32=1644781568, u64=139820010135552}}], 256, -1) = 1 <0.000016>
[pid 19] 00:12:44.243588 epoll_ctl(196, EPOLL_CTL_DEL, 46, NULL) = 0 <0.000045>
[pid 19] 00:12:44.243699 sendto(46, "0\r\n\r\n", 5, 0, NULL, 0) = 5 <0.000015>
[pid 19] 00:12:44.243811 epoll_ctl(196, EPOLL_CTL_ADD, 46, {EPOLLOUT|EPOLLERR|EPOLLHUP|EPOLLRDHUP, {u32=1644781568, u64=139820010135552}}) = 0 <0.000021>
[pid 19] 00:12:44.243883 epoll_wait(196, [{EPOLLOUT, {u32=1644781568, u64=139820010135552}}], 256, -1) = 1 <0.000029>
[pid 19] 00:12:44.243958 epoll_ctl(196, EPOLL_CTL_DEL, 46, NULL) = 0 <0.000015>
[pid 19] 00:12:44.244051 epoll_ctl(196, EPOLL_CTL_ADD, 46, {EPOLLIN|EPOLLERR|EPOLLHUP|EPOLLRDHUP, {u32=1644781568, u64=139820010135552}}) = 0 <0.000015>
[pid 19] 00:12:44.244122 epoll_wait(196, <unfinished ...>
[pid 18] 00:12:44.899854 <... clock_nanosleep resumed>0x7f2a635fb760) = 0 <1.000158>
[pid 18] 00:12:44.899931 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, <unfinished ...>
[pid 1] 00:12:45.065666 <... clock_nanosleep resumed>0x7fff3e6c4b10) = 0 <1.000178>
[pid 1] 00:12:45.065735 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, <unfinished ...>
[pid 16] 00:12:45.154860 <... epoll_wait resumed>[{EPOLLIN, {u32=1684037792, u64=139820049391776}}], 64, -1) = 1 <1.500826>
[pid 15] 00:12:45.154910 <... epoll_wait resumed>[{EPOLLIN, {u32=1688232096, u64=139820053586080}}], 64, -1) = 1 <1.501036>
[pid 13] 00:12:45.154936 <... epoll_wait resumed>[{EPOLLIN, {u32=1696620704, u64=139820061974688}}], 64, -1) = 1 <1.501033>
[pid 12] 00:12:45.154972 <... epoll_wait resumed>[{EPOLLIN, {u32=1742758048, u64=139820108112032}}], 64, -1) = 1 <1.500919>
[pid 11] 00:12:45.154999 <... epoll_wait resumed>[{EPOLLIN, {u32=1746952352, u64=139820112306336}}], 64, -1) = 1 <1.501068>
[pid 10] 00:12:45.155025 <... epoll_wait resumed>[{EPOLLIN, {u32=1751146656, u64=139820116500640}}], 64, -1) = 1 <1.500950>
[pid 9] 00:12:45.155050 <... epoll_wait resumed>[{EPOLLIN, {u32=1755340960, u64=139820120694944}}], 64, -1) = 1 <1.500989>
{
"storage_layer": {
"chunks": {
"total_chunks": 4,
"mem_chunks": 4,
"fs_chunks": 0,
"fs_chunks_up": 0,
"fs_chunks_down": 0
}
},
"input_chunks": {
"systemd.0": {
"status": {
"overlimit": false,
"mem_size": "0b",
"mem_limit": "0b"
},
"chunks": {
"total": 0,
"up": 0,
"down": 0,
"busy": 0,
"busy_size": "0b"
}
},
"tail.1": {
"status": {
"overlimit": false,
"mem_size": "0b",
"mem_limit": "0b"
},
"chunks": {
"total": 0,
"up": 0,
"down": 0,
"busy": 0,
"busy_size": "0b"
}
},
"tail.2": {
"status": {
"overlimit": false,
"mem_size": "0b",
"mem_limit": "0b"
},
"chunks": {
"total": 0,
"up": 0,
"down": 0,
"busy": 0,
"busy_size": "0b"
}
},
"storage_backlog.3": {
"status": {
"overlimit": false,
"mem_size": "0b",
"mem_limit": "0b"
},
"chunks": {
"total": 0,
"up": 0,
"down": 0,
"busy": 0,
"busy_size": "0b"
}
},
"emitter_for_rewrite_tag.4": {
"status": {
"overlimit": false,
"mem_size": "0b",
"mem_limit": "9.5M"
},
"chunks": {
"total": 0,
"up": 0,
"down": 0,
"busy": 0,
"busy_size": "0b"
}
},
"emitter_for_rewrite_tag.5": {
"status": {
"overlimit": false,
"mem_size": "22.2K",
"mem_limit": "9.5M"
},
"chunks": {
"total": 4,
"up": 4,
"down": 0,
"busy": 0,
"busy_size": "0b"
}
},
"emitter_for_rewrite_tag.6": {
"status": {
"overlimit": false,
"mem_size": "0b",
"mem_limit": "9.5M"
},
"chunks": {
"total": 0,
"up": 0,
"down": 0,
"busy": 0,
"busy_size": "0b"
}
}
}
}
State: Running
Started: Mon, 30 Dec 2024 03:39:27 +0300
Ready: True
Restart Count: 2
Limits:
cpu: 200m
memory: 128Mi
Requests:
cpu: 200m
memory: 128Mi
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready True
ContainersReady True
PodScheduled True
[2025/01/09 15:24:27] [ info] [input:tail:tail.1] inotify_fs_add(): inode=337644498 watch_fd=4787 name=/var/log/containers/xxx-5784f8cbcd-2txq5_xxx_migrations-4b50a78e51a4e07aa11245e624fc4e67bfd0d4b34edb965388eb0eaf05c2dca5.log
[2025/01/09 15:24:28] [ warn] [engine] failed to flush chunk '1-1736436267.677796148.flb', retry in 9 seconds: task_id=7, input=emitter_for_rewrite_tag.6 > output=es.3 (out_id=3)
[2025/01/09 15:24:37] [error] [engine] chunk '1-1736436267.677796148.flb' cannot be retried: task_id=7, input=emitter_for_rewrite_tag.6 > output=es.3
[2025/01/09 15:25:26] [ info] [input:tail:tail.1] inotify_fs_remove(): inode=433188932 watch_fd=4506
[2025/01/09 15:25:27] [ info] [input:tail:tail.1] inotify_fs_remove(): inode=370864192 watch_fd=4775
[2025/01/09 15:25:28] [ info] [input:tail:tail.1] inotify_fs_remove(): inode=332235753 watch_fd=4509
[2025/01/09 15:25:29] [ info] [input:tail:tail.1] inotify_fs_remove(): inode=296558453 watch_fd=4756
[2025/01/09 15:25:29] [ info] [input:tail:tail.1] inotify_fs_remove(): inode=449752668 watch_fd=4500
[2025/01/09 15:25:29] [ info] [input:tail:tail.1] inotify_fs_remove(): inode=386694775 watch_fd=4748
[2025/01/09 15:25:29] [ info] [input:tail:tail.1] inotify_fs_remove(): inode=72887180 watch_fd=4542
[2025/01/09 15:25:29] [ info] [input:tail:tail.1] inotify_fs_remove(): inode=5697947 watch_fd=4383
[2025/01/09 15:27:27] [ info] [filter:kubernetes:kubernetes.2] token updated
[2025/01/09 15:27:27] [ info] [input:tail:tail.1] inotify_fs_add(): inode=391506156 watch_fd=4788 name=/var/log/containers/xxx-b57647789-gfbvk_xxx-static-dc5b651c143f9e36af9c19da6cde65e55d252417df909c551aad241f4aef5a64.log
[2025/01/09 15:27:27] [ info] [input:tail:tail.1] inotify_fs_add(): inode=223083475 watch_fd=4789 name=/var/log/containers/xxx-7b8c7bd55f-bn7ch_xxx-server-6520de96062c8f54116157f179ab0c8460c31ac6da042a33fd5f0cb2c59b57f7.log
[2025/01/09 15:27:27] [ info] [input:tail:tail.1] inotify_fs_add(): inode=400043213 watch_fd=4790 name=/var/log/containers/xxx-bob-7b8c7bd55f-bn7ch_xxx-xstatic-2a429bca73887c93233634ca80462272fc2669bb6cdf95da84c687d6dbab4ff9.log
[2025/01/09 15:27:27] [ info] [input:tail:tail.1] inotify_fs_add(): inode=244744155 watch_fd=4791 name=/var/log/containers/xxx-bob-7b8c7bd55f-rqxh2_xxx-xnews-server-5c8fc7ae42b6924bb0c2bfe9ed9ea4c4f82d46907696bfd9262f99f042b0f0bf.log
[2025/01/09 15:27:27] [ info] [input:tail:tail.1] inotify_fs_add(): inode=386711596 watch_fd=4792 name=/var/log/containers/xxx-bob-7b8c7bd55f-rqxh2_xxx-xstatic-e0ba58f5aca4b17063df13b4efd2018ecad2f896fdda82a612fb5875b6f01b22.log
[2025/01/09 15:27:27] [ info] [input:tail:tail.1] inotify_fs_add(): inode=213421478 watch_fd=4793 name=/var/log/containers/xxx-bob-b57647789-gfbvk_xxx-xserver-1c25bac499bfecc754d112857ec94805ced64f35db604af02c1d6a8c5c9b6423.log
[2025/01/09 15:27:28] [ info] [input:tail:tail.1] inotify_fs_remove(): inode=391506156 watch_fd=4788
[2025/01/09 15:27:28] [ info] [input:tail:tail.1] inotify_fs_remove(): inode=213421478 watch_fd=4793 |
these seem similar to #9903 as well, could you also check the cpu usage when fluent-bit hangs? |
@AzureLeMoon, ok. I will record the cpu usage and send it when the problem repeats |
@pawel-lmcb, there is nothing unusual in the logs. They look something like this:
And when a problem occurs the logging just stops. |
@AzureLeMoon [xxx@xxx containers]$ mpstat 1 5
Linux 6.10.10-1.el9.elrepo.x86_64 (xxx) 02/10/2025 _x86_64_ (16 CPU)
12:02:14 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
12:02:15 PM all 8.98 0.00 1.32 0.00 0.00 0.06 0.00 0.00 0.00 89.64
12:02:16 PM all 7.76 0.00 0.94 0.00 0.00 0.25 0.00 0.00 0.00 91.05
12:02:17 PM all 5.47 0.00 0.82 0.00 0.00 0.06 0.00 0.00 0.00 93.65
12:02:18 PM all 6.03 0.00 1.38 0.00 0.00 0.06 0.00 0.00 0.00 92.52
12:02:19 PM all 7.83 0.00 2.19 0.00 0.00 0.06 0.00 0.00 0.00 89.91
Average: all 7.22 0.00 1.33 0.00 0.00 0.10 0.00 0.00 0.00 91.35
kubectl top pod fluent-bit-5gx94 --namespace kube-system
NAME CPU(cores) MEMORY(bytes)
fluent-bit-5gx94 1m 106Mi |
Bumping also this, looks similar |
Bug Report
Describe the bug
From time to time, the fluent-bit application hangs and stops collecting logs. No activity is performed, there are no logs from fluent-bit itself, and metrics stop being generated. The Kubernetes pod does not crash or restart when this issue occurs, and it remains in the
Ready
state. This problem can only be detected manually by noticing that the application's logs have not been delivered.To Reproduce
Expected behavior
Stable operation and collection of all logs without hangs.
Your Environment
Additional context
The issue does not occur frequently. It happens a few times per week on one of over 400 instances. It might occur more often, but we do not notice it due to the difficulty of detecting the problem.
The text was updated successfully, but these errors were encountered: