You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When running a zookeeper cluster, the health probe operation may take more than 1 minute to complete and timeout. At the same time, the kb-agent may not 'waitpid' for the probe script properly, finally, the periodic probe scripts incurs lots of 'defunct' processes in the container.
I observe and trace the probe process in the container, find that in the process of health probe:
when I use nc to replace 'java' cmd for health probe, the timeout and defunct problems disappear.
And this defection may occur frequently on VM with low-performance CPUs.
So I suggest using health probe cmd as belows:
- bash
- -c
- |
ZK_CLIENT_PORT=2181
echo "ruok" | timeout 2 nc localhost ${ZK_CLIENT_PORT}; if [ $? -eq 0 ]; then echo "yes"; fi
To Reproduce
Steps to reproduce the behavior:
run a zookeeper cluster in GCP 4C16G vm
wait for the defunct processes to come out
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
Describe the bug
When running a zookeeper cluster, the health probe operation may take more than 1 minute to complete and timeout. At the same time, the kb-agent may not 'waitpid' for the probe script properly, finally, the periodic probe scripts incurs lots of 'defunct' processes in the container.
I observe and trace the probe process in the container, find that in the process of health probe:
the jvm takes a very long time to do the JIT work
when I use nc to replace 'java' cmd for health probe, the timeout and defunct problems disappear.
And this defection may occur frequently on VM with low-performance CPUs.
So I suggest using health probe cmd as belows:
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: