Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(rules): rule triggering on late-connecting targets #1646

Merged
merged 35 commits into from
Sep 18, 2023

Conversation

andrewazores
Copy link
Member

@andrewazores andrewazores commented Aug 31, 2023

Welcome to Cryostat! 👋

Before contributing, make sure you have:

  • Read the contributing guidelines
  • Linked a relevant issue which this PR resolves
  • Linked any other relevant issues, PR's, or documentation, if any
  • Resolved all conflicts, if any
  • Rebased your branch PR on top of the latest upstream main branch
  • Attached at least one of the following labels to the PR: [chore, ci, docs, feat, fix, test]
  • Signed all commits using a GPG signature

To recreate commits with GPG signature git fetch upstream && git rebase --force --gpg-sign upstream/main


Fixes #1494
Fixes #1497
Related to #1593

Description of the change:

This changeset:

  1. handles MODIFIED JVM discovery events in a couple of places. Primarily to avoid an UnsupportedOperationException that was wrongly thrown and could cause rule activations to fail, and then secondarily to use these events to trigger a proactive re-check of non-connectable targets in case the discovery update made the target connectable.
  2. Enhances the non-connectable target recheck logic. Previously these were always checked on a fixed 15 second timer. Now, the timer fires every 2 seconds, but each non-connectable target will only be rechecked according to some backoff timing logic, and after 1 minute of retry failures, that target will not be rechecked again.
  3. Performs rules processing on a dedicated executor (thread pool), rather than the Vert.x worker pool, to help isolate connection failure exceptions or other connectivity issues, and also to simplify the internal logic to use simple Futures rather than lists of Vert.x timer IDs
  4. Updates the Rule equals/hashCode to ignore the enabled state - this equality bug actually was a root cause for the broken deactivate/reactivate workflows

Motivation for the change:

This should fix the bug where rules may not activate against targets that appear but are not immediately connectable, and should also shorten the time it takes for Cryostat to detect that a slowly-connectable target becomes connectable.

How to manually test:

Deploy test image in crc or other OpenShift/k8s. Deploy a sample application. Quite often, the k8s server notifies Cryostat of the new Endpoints object before the target JVM is actually ready to accept JMX connections, so the initial JVM ID retrieval attempt fails and so does the initial attempt to trigger automated rules. After this change, this workflow should again work as expected and Cryostat should notice that the target becomes connectable, and then triggers automated rules against it.

@andrewazores
Copy link
Member Author

/build_test

@andrewazores
Copy link
Member Author

@andrewazores andrewazores requested a review from tthvo August 31, 2023 18:54
@andrewazores andrewazores marked this pull request as ready for review August 31, 2023 18:54
@andrewazores
Copy link
Member Author

/build_test

@andrewazores
Copy link
Member Author

@github-actions
Copy link
Contributor

ARCH IMAGE
amd64 ghcr.io/cryostatio/cryostat:pr-1646-72075bbd3b75be7b5eaaa63e98022fb15dcdbbab-linux-amd64
arm64 ghcr.io/cryostatio/cryostat:pr-1646-72075bbd3b75be7b5eaaa63e98022fb15dcdbbab-linux-arm64

To run smoketest:

# amd64          
CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1646-72075bbd3b75be7b5eaaa63e98022fb15dcdbbab-linux-amd64 sh smoketest.sh

# or arm64
CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1646-72075bbd3b75be7b5eaaa63e98022fb15dcdbbab-linux-arm64 sh smoketest.sh

@github-actions
Copy link
Contributor

ARCH IMAGE
amd64 ghcr.io/cryostatio/cryostat:pr-1646-9ac9583f0ecd0040949a97f1b87eeb7205a49f45-linux-amd64
arm64 ghcr.io/cryostatio/cryostat:pr-1646-9ac9583f0ecd0040949a97f1b87eeb7205a49f45-linux-arm64

To run smoketest:

# amd64          
CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1646-9ac9583f0ecd0040949a97f1b87eeb7205a49f45-linux-amd64 sh smoketest.sh

# or arm64
CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1646-9ac9583f0ecd0040949a97f1b87eeb7205a49f45-linux-arm64 sh smoketest.sh

@andrewazores
Copy link
Member Author

/build_test

@andrewazores
Copy link
Member Author

@github-actions
Copy link
Contributor

ARCH IMAGE
amd64 ghcr.io/cryostatio/cryostat:pr-1646-93ca66f09cd119384bc7a36442043f0644a0292c-linux-amd64
arm64 ghcr.io/cryostatio/cryostat:pr-1646-93ca66f09cd119384bc7a36442043f0644a0292c-linux-arm64

To run smoketest:

# amd64          
CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1646-93ca66f09cd119384bc7a36442043f0644a0292c-linux-amd64 sh smoketest.sh

# or arm64
CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1646-93ca66f09cd119384bc7a36442043f0644a0292c-linux-arm64 sh smoketest.sh

@andrewazores andrewazores force-pushed the gh1494 branch 3 times, most recently from 99bd4f0 to 4bc613f Compare September 5, 2023 17:26
@andrewazores
Copy link
Member Author

/request_review

Copy link
Contributor

@aali309 aali309 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Everything working as expected
image

image

aali309
aali309 previously approved these changes Sep 6, 2023
@andrewazores
Copy link
Member Author

/build_test

@andrewazores
Copy link
Member Author

@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2023

ARCH IMAGE
amd64 ghcr.io/cryostatio/cryostat:pr-1646-4d923a4ed07b0e4f5cf2db59e6c22cd32930b68f-linux-amd64
arm64 ghcr.io/cryostatio/cryostat:pr-1646-4d923a4ed07b0e4f5cf2db59e6c22cd32930b68f-linux-arm64

To run smoketest:

# amd64          
CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1646-4d923a4ed07b0e4f5cf2db59e6c22cd32930b68f-linux-amd64 sh smoketest.sh

# or arm64
CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1646-4d923a4ed07b0e4f5cf2db59e6c22cd32930b68f-linux-arm64 sh smoketest.sh

…activation already done on another duplicate target definition
…al for activation already done on another duplicate target definition
@mwangggg
Copy link
Member

regarding the issue I was having with the vertx-fib-demo-2 targets, it seems to have been resolved for the most part. I just tested it by making the same automated rule /vertx-fib-demo/.test(target.alias) and most of the time, the error clears once the credentials are acknowledged, but once there was the thread-blocked errors in the cryostat logs.

@andrewazores
Copy link
Member Author

Okay, I think that would indeed be some deeper piece of networking code then, not specifically related to the rules triggering system, so we should track that as a separate issue and try to get more details about what is causing the thread block.

@andrewazores
Copy link
Member Author

Probably the same as #1669 ^

@mwangggg
Copy link
Member

other than that, this PR looks good to me 👍

@andrewazores andrewazores merged commit 5f578f9 into cryostatio:main Sep 18, 2023
@andrewazores andrewazores deleted the gh1494 branch September 18, 2023 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
4 participants