only mark tokens as unsupported based on metrics for a limited time #3205

MartinquaXD · 2025-01-03T08:58:54Z

Description

Currently the bad token detection based on metrics will mark tokens as unsupported forever. This is problematic for tokens which only have issues temporarily. For example this can happen when the most important pool for a token gets into a weird state or when a token gets paused or a while.

Changes

Adjusts the logic to freeze tokens for a configurable period of time. Once the freeze period is over we give the token another chance (even if the stats indicate that it's currently unsupported). To not run into issues when a token is always bad the logic was built such that 1 more bad measurement is enough to freeze the token again.
That way we can safely configure a very high min_measurements without having periods where a token that was flagged as bad can issues again because we need to get a lot new measurements to mark it as unsupported again.

Additionally the PR simplifies how the metrics based bad token detector gets instantiated and gives each solver their completely separate instance (how it was originally communicated because each solver may support different tokens).

How to test

added a unit test

crates/driver/src/domain/competition/bad_tokens/metrics.rs

m-lord-renkse · 2025-01-03T10:21:06Z

crates/driver/src/domain/competition/bad_tokens/metrics.rs

+        let stats = self.counter.get(token)?;
+        if stats
+            .flagged_unsupported_at
+            .is_some_and(|t| now.duration_since(t) > self.token_freeze_time)


is this correct? if the flagged_unsupported_at is some and the time between freezing period and now is bigger than token freeze time, should it return None?

This is explained in the comment. I think the confusion might come from the interface. I think all but 1 strategy (the hardcoded list) can only really return whether a token should be dropped but not if it needs to be kept.
The reason is that it's enough for a single metric to indicate that a token is bad but it's not enough if only 1 strategy says the token is good.
This could maybe be improved in a follow up PR adjusting these functions to return Quality instead of Option<Quality> and have the wrapping detector only pay attention to Quality::Unsupported results in the short circuiting logic.

Yeah, the upcoming PR you proposed would be really nice! thanks for the explanation!

Actually after thinking about this more Option<Quality> seems correct to me. That way we can express:

not enough information to make a decision

information indicates good

information indicates bad

I think I'll just adjust the comment and make it more explicit what gets returned. Because the current code focuses only on whether or not we have enough information to mark the token as unsupported.

sunce86

LG. Good unit test.

m-lord-renkse

Thanks for the explanation, now it is clear. LGTM.

mstrug

Good decision to remove DetectorBuilder.

mstrug · 2025-01-03T11:01:21Z

crates/driver/src/infra/config/file/mod.rs

@@ -742,3 +751,7 @@ fn default_settle_queue_size() -> usize {
 fn default_metrics_bad_token_detector_log_only() -> bool {
    true
 }
+
+fn default_metrics_bad_token_detector_freeze_time() -> Duration {
+    Duration::from_secs(60 * 10)


Why not use from_mins(10)?

AFAIK, it doesn't exist.

Ah you're right, it is nightly api.

MartinquaXD requested a review from a team as a code owner January 3, 2025 08:58

MartinquaXD changed the base branch from main to metrics-detection-log-only-mode January 3, 2025 09:49

m-lord-renkse reviewed Jan 3, 2025

View reviewed changes

Base automatically changed from metrics-detection-log-only-mode to main January 3, 2025 10:23

MartinquaXD added 4 commits January 3, 2025 10:33

Allow unfreezing tokens as unsupported

90fc80c

Fix metrics based detector instantiation

ba288d1

Unit test

3345809

Remove duplicated comment

e732c6f

sunce86 approved these changes Jan 3, 2025

View reviewed changes

rename function and handle div by zero

b3cadea

MartinquaXD force-pushed the allow-unfreezing-tokens branch from 5a8fb19 to b3cadea Compare January 3, 2025 10:38

cargo fmt

61b60e9

m-lord-renkse approved these changes Jan 3, 2025

View reviewed changes

mstrug approved these changes Jan 3, 2025

View reviewed changes

MartinquaXD merged commit 3ca92b7 into main Jan 3, 2025
11 checks passed

MartinquaXD deleted the allow-unfreezing-tokens branch January 3, 2025 11:28

github-actions bot locked and limited conversation to collaborators Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

only mark tokens as unsupported based on metrics for a limited time #3205

only mark tokens as unsupported based on metrics for a limited time #3205

MartinquaXD commented Jan 3, 2025

m-lord-renkse Jan 3, 2025

MartinquaXD Jan 3, 2025

m-lord-renkse Jan 3, 2025

MartinquaXD Jan 3, 2025

sunce86 left a comment

m-lord-renkse left a comment

mstrug left a comment

mstrug Jan 3, 2025

m-lord-renkse Jan 3, 2025

mstrug Jan 3, 2025

only mark tokens as unsupported based on metrics for a limited time #3205

only mark tokens as unsupported based on metrics for a limited time #3205

Conversation

MartinquaXD commented Jan 3, 2025

Description

Changes

How to test

m-lord-renkse Jan 3, 2025

Choose a reason for hiding this comment

MartinquaXD Jan 3, 2025

Choose a reason for hiding this comment

m-lord-renkse Jan 3, 2025

Choose a reason for hiding this comment

MartinquaXD Jan 3, 2025

Choose a reason for hiding this comment

sunce86 left a comment

Choose a reason for hiding this comment

m-lord-renkse left a comment

Choose a reason for hiding this comment

mstrug left a comment

Choose a reason for hiding this comment

mstrug Jan 3, 2025

Choose a reason for hiding this comment

m-lord-renkse Jan 3, 2025

Choose a reason for hiding this comment

mstrug Jan 3, 2025

Choose a reason for hiding this comment