Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent snapshot experiment seems to affect healthcheck #5

Closed
wants to merge 1 commit into from

Conversation

agourlay
Copy link
Member

@agourlay agourlay commented Apr 6, 2023

cargo run -r -- --drills-to-run high_concurrency

having concurrent snapshot from time to time seems to affect systematically the healthcheck

[2023-04-06T20:26:47.278Z INFO  coach::drill_runner] Coach is scheduling 1 drills against ["http://localhost:6334", "http://localhost:6344", "http://localhost:6354"] (by batch of 2):
[2023-04-06T20:26:47.278Z INFO  coach::drill_runner] - high_concurrency (repeating after 10 seconds)
Creating snapshot because 0
[2023-04-06T20:26:49.406Z ERROR coach::healthcheck] healthcheck failed for http://localhost:6354 after 103.704523ms [min: 0ms, p50: 0ms, p95: 103ms, p99: 103ms, max: 103ms] (status: Cancelled, message: "Timeout expired", details: [], metadata: MetadataMap { headers: {} })
[2023-04-06T20:26:49.682Z INFO  coach::healthcheck] http://localhost:6354 is healthy again after 1 consecutive failures [min: 0ms, p50: 0ms, p95: 103ms, p99: 103ms, max: 103ms]
Creating snapshot because 2000
[2023-04-06T20:26:51.254Z ERROR coach::healthcheck] healthcheck failed for http://localhost:6354 after 103.481038ms [min: 0ms, p50: 1ms, p95: 103ms, p99: 103ms, max: 103ms] (status: Cancelled, message: "Timeout expired", details: [], metadata: MetadataMap { headers: {} })
[2023-04-06T20:26:51.510Z INFO  coach::healthcheck] http://localhost:6354 is healthy again after 1 consecutive failures [min: 0ms, p50: 1ms, p95: 103ms, p99: 103ms, max: 103ms]
[2023-04-06T20:26:51.814Z ERROR coach::healthcheck] healthcheck failed for http://localhost:6354 after 103.422898ms [min: 0ms, p50: 2ms, p95: 103ms, p99: 103ms, max: 103ms] (status: Cancelled, message: "Timeout expired", details: [], metadata: MetadataMap { headers: {} })
[2023-04-06T20:26:52.027Z INFO  coach::healthcheck] http://localhost:6354 is healthy again after 1 consecutive failures [min: 0ms, p50: 2ms, p95: 103ms, p99: 103ms, max: 103ms]
Creating snapshot because 4000
[2023-04-06T20:26:53.495Z ERROR coach::healthcheck] healthcheck failed for http://localhost:6344 after 103.246612ms [min: 0ms, p50: 1ms, p95: 13ms, p99: 103ms, max: 103ms] (status: Cancelled, message: "Timeout expired", details: [], metadata: MetadataMap { headers: {} })
[2023-04-06T20:26:54.413Z INFO  coach::healthcheck] http://localhost:6344 is healthy again after 3 consecutive failures [min: 0ms, p50: 1ms, p95: 103ms, p99: 103ms, max: 103ms]
Creating snapshot because 6000
[2023-04-06T20:26:56.222Z ERROR coach::healthcheck] healthcheck failed for http://localhost:6344 after 103.247113ms [min: 0ms, p50: 1ms, p95: 103ms, p99: 103ms, max: 103ms] (status: Cancelled, message: "Timeout expired", details: [], metadata: MetadataMap { headers: {} })
[2023-04-06T20:26:56.432Z INFO  coach::healthcheck] http://localhost:6344 is healthy again after 1 consecutive failures [min: 0ms, p50: 2ms, p95: 103ms, p99: 103ms, max: 103ms]
[2023-04-06T20:26:56.737Z ERROR coach::healthcheck] healthcheck failed for http://localhost:6344 after 103.841972ms [min: 0ms, p50: 2ms, p95: 103ms, p99: 103ms, max: 103ms] (status: Cancelled, message: "Timeout expired", details: [], metadata: MetadataMap { headers: {} })
[2023-04-06T20:26:57.033Z INFO  coach::healthcheck] http://localhost:6344 is healthy again after 1 consecutive failures [min: 0ms, p50: 2ms, p95: 103ms, p99: 103ms, max: 103ms]
[2023-04-06T20:26:57.336Z ERROR coach::healthcheck] healthcheck failed for http://localhost:6344 after 103.62555ms [min: 0ms, p50: 2ms, p95: 103ms, p99: 103ms, max: 103ms] (status: Cancelled, message: "Timeout expired", details: [], metadata: MetadataMap { headers: {} })
[2023-04-06T20:26:57.545Z INFO  coach::healthcheck] http://localhost:6344 is healthy again after 1 consecutive failures [min: 0ms, p50: 2ms, p95: 103ms, p99: 103ms, max: 103ms]
Creating snapshot because 8000

@agourlay agourlay closed this Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant