-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
argus.client.base.ArgusClientError: 'API Error encountered using endpoint on submit_results #492
Comments
problem is that due some reason this run was not created in Argus at all:
I don't know yet the root cause. |
This could mean that test failed creation on a separate stage, but then got re-created by the sct itself, hence the "run not found" exception. Next response indicates that it got created, but something maybe dropped it later. Maybe id re-use or a consistency issue. Need to check all stages that interacted with argus |
It could also mean that another test (read/write) did this, but mixed didn't. |
@k0machi what's up with this one ? seems like we don't have "Create Argus Test Run" in the pefRegressionParallelPipeline.groovy regardless we don't expect creation to fail, and we need to be able to lookup the logs for this failure, to understand it (on argus end) |
Without the explicit stage the run is created inside "Run SCT Stages" stage, specifically during ClusterTester init, so the cause for failure should be visible in the logs for that particular SCT run. |
That is very disturbing. Did you check on each node separately? It's not the first time we run into such cases, last time it was during upgrade of Scylla or node replacement What more information we can log to help figure out the next time (We should collect logs, as soon as issues are reported, or archive logs periodically to s3 to something like that) |
I haven't checked individual nodes yet, I'll do that. We should collect logs periodically, maybe have a github action that would do a snapshot of last N hours of production each time an issue is reported? |
I suspect MV scylla issue - see both id and test_id are indexed columns - so when MV fails to update (which is not ensured by cql insert/update request) we may hit issue with insert being correct and querying for it not. @k0machi can you confirm this ID exists without using indexed column in query? |
No, the id doesn't exist at all - only two runs that have |
what next ? what can we do to capture more data when it's gonna happen again ? |
I will add context dump to the exceptions happening in the submit requests (or we could just dump them wholly on every error to improve readability - session data, request body, etc. |
This commit adds additional information and trace ids to the exception that occur inside the API calls, allowing to collect more information about the error, including the request data. Fixes scylladb#492
This commit adds additional information and trace ids to the exception that occur inside the API calls, allowing to collect more information about the error, including the request data. Fixes scylladb#492
We can't identify the root cause, adding new logging for identify those |
scylla-enterprise-perf-regression-predefined-throughput-steps-vnodes
test runs 3 load stages.write
andread
stages are passed:read
write
But
mixed stage
failed with :https://jenkins.scylladb.com/job/scylla-enterprise/job/perf-regression/job/scylla-enterprise-perf-regression-predefined-throughput-steps-vnodes/20/
The text was updated successfully, but these errors were encountered: