Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

huge number of watch in zookeeper cause zookeeper full gc #6647

Closed
kaijianding opened this issue Nov 20, 2018 · 6 comments · May be fixed by #17482
Closed

huge number of watch in zookeeper cause zookeeper full gc #6647

kaijianding opened this issue Nov 20, 2018 · 6 comments · May be fixed by #17482
Labels

Comments

@kaijianding
Copy link
Contributor

I noticed there are about 100M watches in zookeeper in my product environment, and once I restart some of realtime tasks, zookeeper may full gc even it is configured to 32GB heap and druid is configured to use http server view.

After investigation, this issue is caused by the Announcer.java implementation.
The Announcer is trying best to avoid temporary disconnecting to zookeeper server and watch every child path under the specified base path, but for base path like /druid/prod/announcements and /druid/prod/listeners/lookups/__default/, there are plenty of hosts as child path under them.

If there are 10000 realtime tasks(not single realtime job, but like 50 jobs), then each task will create 2 * 10000 watches, finally we will get 200M watches!!

I'm not familiar with curator, can curator only watch a particular leaf path? Thus we can implement a simpler Announcer for these two pathes to reduce watch number in zookeeper to avoid zk OOM. @gianm

@jihoonson
Copy link
Contributor

I'm not familiar with curator, can curator only watch a particular leaf path? Thus we can implement a simpler Announcer for these two paths to reduce watch number in zookeeper to avoid zk OOM.

It sounds possible, but I wonder what your solution is. Would you elaborate more details? What is the simpler implementation of Announcer you think?

@kaijianding
Copy link
Contributor Author

My thought is that only watch the leaf path instead of watching all the child paths under the base path.
Rather than use childEvent, watch the path itself, I don't know whether it is possible, I'm not familiar with curator.

@love1693294577
Copy link

I also encountered this problem, how did you solve it?

@love1693294577
Copy link

What I want is to delete the directory under discovery and delete it after the task is finished.

@stale
Copy link

stale bot commented Apr 11, 2020

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the [email protected] list. Thank you for your contributions.

@stale stale bot added the stale label Apr 11, 2020
@stale
Copy link

stale bot commented May 9, 2020

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants