fix(rules): rule triggering on late-connecting targets #1646

andrewazores · 2023-08-31T18:40:43Z

Welcome to Cryostat! 👋

Before contributing, make sure you have:

Read the contributing guidelines
Linked a relevant issue which this PR resolves
Linked any other relevant issues, PR's, or documentation, if any
Resolved all conflicts, if any
Rebased your branch PR on top of the latest upstream main branch
Attached at least one of the following labels to the PR: [chore, ci, docs, feat, fix, test]
Signed all commits using a GPG signature

To recreate commits with GPG signature git fetch upstream && git rebase --force --gpg-sign upstream/main

Fixes #1494
Fixes #1497
Related to #1593

Description of the change:

This changeset:

handles MODIFIED JVM discovery events in a couple of places. Primarily to avoid an UnsupportedOperationException that was wrongly thrown and could cause rule activations to fail, and then secondarily to use these events to trigger a proactive re-check of non-connectable targets in case the discovery update made the target connectable.
Enhances the non-connectable target recheck logic. Previously these were always checked on a fixed 15 second timer. Now, the timer fires every 2 seconds, but each non-connectable target will only be rechecked according to some backoff timing logic, and after 1 minute of retry failures, that target will not be rechecked again.
Performs rules processing on a dedicated executor (thread pool), rather than the Vert.x worker pool, to help isolate connection failure exceptions or other connectivity issues, and also to simplify the internal logic to use simple Futures rather than lists of Vert.x timer IDs
Updates the Rule equals/hashCode to ignore the enabled state - this equality bug actually was a root cause for the broken deactivate/reactivate workflows

Motivation for the change:

This should fix the bug where rules may not activate against targets that appear but are not immediately connectable, and should also shorten the time it takes for Cryostat to detect that a slowly-connectable target becomes connectable.

How to manually test:

Deploy test image in crc or other OpenShift/k8s. Deploy a sample application. Quite often, the k8s server notifies Cryostat of the new Endpoints object before the target JVM is actually ready to accept JMX connections, so the initial JVM ID retrieval attempt fails and so does the initial attempt to trigger automated rules. After this change, this workflow should again work as expected and Cryostat should notice that the target becomes connectable, and then triggers automated rules against it.

andrewazores · 2023-08-31T18:41:28Z

/build_test

andrewazores · 2023-08-31T18:41:55Z

https://github.com/cryostatio/cryostat/actions/runs/6041221285

andrewazores · 2023-08-31T19:00:52Z

/build_test

andrewazores · 2023-08-31T19:01:06Z

https://github.com/cryostatio/cryostat/actions/runs/6041376728

github-actions · 2023-08-31T19:02:35Z

ARCH	IMAGE
amd64	ghcr.io/cryostatio/cryostat:pr-1646-72075bbd3b75be7b5eaaa63e98022fb15dcdbbab-linux-amd64
arm64	ghcr.io/cryostatio/cryostat:pr-1646-72075bbd3b75be7b5eaaa63e98022fb15dcdbbab-linux-arm64

To run smoketest:

# amd64          
CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1646-72075bbd3b75be7b5eaaa63e98022fb15dcdbbab-linux-amd64 sh smoketest.sh

# or arm64
CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1646-72075bbd3b75be7b5eaaa63e98022fb15dcdbbab-linux-arm64 sh smoketest.sh

github-actions · 2023-08-31T19:21:31Z

ARCH	IMAGE
amd64	ghcr.io/cryostatio/cryostat:pr-1646-9ac9583f0ecd0040949a97f1b87eeb7205a49f45-linux-amd64
arm64	ghcr.io/cryostatio/cryostat:pr-1646-9ac9583f0ecd0040949a97f1b87eeb7205a49f45-linux-arm64

To run smoketest:

# amd64          
CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1646-9ac9583f0ecd0040949a97f1b87eeb7205a49f45-linux-amd64 sh smoketest.sh

# or arm64
CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1646-9ac9583f0ecd0040949a97f1b87eeb7205a49f45-linux-arm64 sh smoketest.sh

andrewazores · 2023-08-31T20:06:32Z

/build_test

andrewazores · 2023-08-31T20:15:43Z

https://github.com/cryostatio/cryostat/actions/runs/6041942311

github-actions · 2023-08-31T20:29:07Z

ARCH	IMAGE
amd64	ghcr.io/cryostatio/cryostat:pr-1646-93ca66f09cd119384bc7a36442043f0644a0292c-linux-amd64
arm64	ghcr.io/cryostatio/cryostat:pr-1646-93ca66f09cd119384bc7a36442043f0644a0292c-linux-arm64

To run smoketest:

# amd64          
CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1646-93ca66f09cd119384bc7a36442043f0644a0292c-linux-amd64 sh smoketest.sh

# or arm64
CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1646-93ca66f09cd119384bc7a36442043f0644a0292c-linux-arm64 sh smoketest.sh

andrewazores · 2023-09-05T18:37:06Z

/request_review

aali309

Looking good. Everything working as expected

andrewazores · 2023-09-06T20:32:56Z

/build_test

andrewazores · 2023-09-06T20:33:09Z

https://github.com/cryostatio/cryostat/actions/runs/6102135156

github-actions · 2023-09-06T20:56:08Z

ARCH	IMAGE
amd64	ghcr.io/cryostatio/cryostat:pr-1646-4d923a4ed07b0e4f5cf2db59e6c22cd32930b68f-linux-amd64
arm64	ghcr.io/cryostatio/cryostat:pr-1646-4d923a4ed07b0e4f5cf2db59e6c22cd32930b68f-linux-arm64

To run smoketest:

# amd64          
CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1646-4d923a4ed07b0e4f5cf2db59e6c22cd32930b68f-linux-amd64 sh smoketest.sh

# or arm64
CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1646-4d923a4ed07b0e4f5cf2db59e6c22cd32930b68f-linux-arm64 sh smoketest.sh

src/main/java/io/cryostat/discovery/DiscoveryStorage.java

… of already-stopped condition

…activation already done on another duplicate target definition

…al for activation already done on another duplicate target definition

… manager pool

This reverts commit 3f0d699.

…nd worker pool

…nnection manager pool" This reverts commit a675f4d.

…ogic higher

…rget connection manager pool"

mwangggg · 2023-09-18T18:46:48Z

regarding the issue I was having with the vertx-fib-demo-2 targets, it seems to have been resolved for the most part. I just tested it by making the same automated rule /vertx-fib-demo/.test(target.alias) and most of the time, the error clears once the credentials are acknowledged, but once there was the thread-blocked errors in the cryostat logs.

andrewazores · 2023-09-18T18:48:18Z

Okay, I think that would indeed be some deeper piece of networking code then, not specifically related to the rules triggering system, so we should track that as a separate issue and try to get more details about what is causing the thread block.

andrewazores · 2023-09-18T18:48:56Z

Probably the same as #1669 ^

mwangggg · 2023-09-18T18:49:00Z

other than that, this PR looks good to me 👍

andrewazores added the fix label Aug 31, 2023

mergify bot added the safe-to-test label Aug 31, 2023

andrewazores requested a review from tthvo August 31, 2023 18:54

andrewazores marked this pull request as ready for review August 31, 2023 18:54

andrewazores force-pushed the gh1494 branch from 636baa2 to 9ac9583 Compare August 31, 2023 19:00

andrewazores force-pushed the gh1494 branch from 9ac9583 to 93ca66f Compare August 31, 2023 20:06

andrewazores requested review from aali309 and mwangggg August 31, 2023 20:18

andrewazores force-pushed the gh1494 branch 3 times, most recently from 99bd4f0 to 4bc613f Compare September 5, 2023 17:26

github-actions bot added the review-requested label Sep 5, 2023

aali309 reviewed Sep 6, 2023

View reviewed changes

aali309 previously approved these changes Sep 6, 2023

View reviewed changes

andrewazores force-pushed the gh1494 branch from 4bc613f to 4d923a4 Compare September 6, 2023 20:32

mwangggg reviewed Sep 7, 2023

View reviewed changes

src/main/java/io/cryostat/discovery/DiscoveryStorage.java Outdated Show resolved Hide resolved

andrewazores dismissed aali309’s stale review via 8364518 September 7, 2023 17:28

andrewazores added 23 commits September 18, 2023 14:45

perform rule triggering when credentials are added on executor pool

d10c3fc

evaluate rule cleanup against all discovered targets, and be tolerant…

bac8dff

… of already-stopped condition

refactor rule activation for concurrency

139b096

attempt rule activation against all targets and handle potential for …

81ad2f8

…activation already done on another duplicate target definition

eliminate bugged method

dc843a9

fixup! attempt rule activation against all targets and handle potenti…

7764568

…al for activation already done on another duplicate target definition

perform task executions on rule processor pool, not target connection…

2396ebb

… manager pool

use ScheduledExecutorService for background rules processing

3fa1640

use multithreaded pool

b08eec2

re-trigger rules on credential removal

b1579f4

track rule activations using JVM IDs, not whole ServiceRefs

f079f3b

avoid useless thread context switch

73a11a5

rule uniqueness should not depend on currently enabled state

6de553c

remove unused field

9799cf2

fixup! remove unused field

a9f958e

Revert "avoid useless thread context switch"

8102a70

This reverts commit 3f0d699.

perform discovery background tasks using dedicated scheduler thread a…

bc0745c

…nd worker pool

try to add more timeout handling to JVM ID retrieval

987af7b

Revert "perform task executions on rule processor pool, not target co…

143eace

…nnection manager pool" This reverts commit a675f4d.

restore execution on TargetConnectionManager pool, and lift timeout l…

03cf148

…ogic higher

fixup! Revert "perform task executions on rule processor pool, not ta…

46082ae

…rget connection manager pool"

add tolerance to flaky wall-time-checking unit test

b71dbb7

cleanup

8720914

andrewazores force-pushed the gh1494 branch from d0b7509 to 8720914 Compare September 18, 2023 18:45

andrewazores merged commit 5f578f9 into cryostatio:main Sep 18, 2023

andrewazores deleted the gh1494 branch September 18, 2023 18:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(rules): rule triggering on late-connecting targets #1646

fix(rules): rule triggering on late-connecting targets #1646

andrewazores commented Aug 31, 2023 •

edited

Loading

andrewazores commented Aug 31, 2023

andrewazores commented Aug 31, 2023

andrewazores commented Aug 31, 2023

andrewazores commented Aug 31, 2023

github-actions bot commented Aug 31, 2023

github-actions bot commented Aug 31, 2023

andrewazores commented Aug 31, 2023

andrewazores commented Aug 31, 2023

github-actions bot commented Aug 31, 2023

andrewazores commented Sep 5, 2023

aali309 left a comment

andrewazores commented Sep 6, 2023

andrewazores commented Sep 6, 2023

github-actions bot commented Sep 6, 2023

mwangggg commented Sep 18, 2023

andrewazores commented Sep 18, 2023

andrewazores commented Sep 18, 2023

mwangggg commented Sep 18, 2023

fix(rules): rule triggering on late-connecting targets #1646

fix(rules): rule triggering on late-connecting targets #1646

Conversation

andrewazores commented Aug 31, 2023 • edited Loading

Welcome to Cryostat! 👋

Before contributing, make sure you have:

Description of the change:

Motivation for the change:

How to manually test:

andrewazores commented Aug 31, 2023

andrewazores commented Aug 31, 2023

andrewazores commented Aug 31, 2023

andrewazores commented Aug 31, 2023

github-actions bot commented Aug 31, 2023

github-actions bot commented Aug 31, 2023

andrewazores commented Aug 31, 2023

andrewazores commented Aug 31, 2023

github-actions bot commented Aug 31, 2023

andrewazores commented Sep 5, 2023

aali309 left a comment

Choose a reason for hiding this comment

andrewazores commented Sep 6, 2023

andrewazores commented Sep 6, 2023

github-actions bot commented Sep 6, 2023

mwangggg commented Sep 18, 2023

andrewazores commented Sep 18, 2023

andrewazores commented Sep 18, 2023

mwangggg commented Sep 18, 2023

andrewazores commented Aug 31, 2023 •

edited

Loading