Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Huge Number of Watches in ZooKeeper #17482

Draft
wants to merge 19 commits into
base: master
Choose a base branch
from
Draft

Conversation

GWphua
Copy link
Contributor

@GWphua GWphua commented Nov 15, 2024

Fixes #6647

Description

This PR is built upon #6683 and #9172 and aims to reduce the number of ZooKeeper watch counts.

Fixed Huge Number of Watches in ZooKeeper

Previously, Announcer.java causes all child nodes under the parent path to be watched by the ZooKeeper ensemble. This causes an unnecessarily large number of ZooKeeper watches to be produced.

The new NodeAnnouncer.java class, which is heavily modelled on the Announcer.java, aims to address this issue by announcing a single node within a ZooKeeper ensemble. By eliminating the watches on child nodes, this approach significantly reduces the total number of watch counts in ZooKeeper. Tests conducted on the production server also indicate a decrease in watch counts resulting from this change.

ZK Watch Count

The original Announcer.java still provides better performance for segment announcements, and hence will be retained in the codebase.

Documentation

Refactoring

  • Shift Announceable class in Announcer.java to Announceable.java.
  • Refactor long methods by creating helper functions.
  • Add ZKPathsUtils.java to abstract the retrieval of ZooKeeper path and ZooKeeper node.

Release note

Improved: ZooKeeper no longer spins up an unnecessary large number of watches when running realtime tasks.


Key changed/added classes in this PR
  • Announcer.java
  • NodeAnnouncer.java
  • Announceable.java
  • AnnouncerModule.java
  • ZKPathsUtils.java

Further Actionable

I plan to create issues for the following follow-up actions after this PR:

Deprecated Code

The PathChildrenCache, NodeCache classes have been deprecated since Curator 5.0.0.

We can look into replacing these deprecated classes with CuratorCache. CuratorCache requires ZooKeeper 3.6+, and we are currently using ZooKeeper 3.8.4.

Concurrency Flow

Add a concurrent control flow documentation for NodeAnnouncer.

This PR has:

  • been self-reviewed.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

@GWphua GWphua marked this pull request as draft November 21, 2024 01:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

huge number of watch in zookeeper cause zookeeper full gc
2 participants