Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel request to bulk service failed with a few jobs in IN_PROGRESS STATE #1374

Open
chandrams opened this issue Nov 15, 2024 · 0 comments
Assignees
Labels
bug Something isn't working remote_monitoring

Comments

@chandrams
Copy link
Contributor

Describe the bug
Observed the below issues with parallel requests to bulk service:

  • A few jobs are stuck in IN_PROGRESS STATE
  • A few jobs had unprocessed experiments with status as completed and with notifications as null

How to reproduce it

  • Configure thanos datasource with a single experiment and 15 days of usage results
  • Deploy kruize
  • Invoke Bulk service without any filters with datasource as thanos

Expected behavior

  • Parallel requests to bulk service should complete or fail and not remain pending in IN_PROGRESS state
  • Job status reports as completed with unprocessed requirements, should the job status be failed?

Relevant logs

{"status":"IN_PROGRESS","total_experiments":1,"processed_experiments":1,"message":null,"job_id":"f918009a-2e3d-486a-ad22-aa9204ae8c0b","job_start_time":"2024-11-13T08:19:17.990Z","job_end_time":null}

Unprocessed experiments

{
    "status": "COMPLETED",
    "total_experiments": 1,
    "processed_experiments": 1,
    "notifications": null,
    "experiments": {
        "thanos|default|msc-0|tfb-qrh-sample-0(deployment)|tfb-0": {
            "recommendations": {
                "status": "UNPROCESSED",
                "notifications": null
            }
        }
    },

Environment:

  • Kubernetes Cluster : openshift

Additional context
Add any other context about the problem here, any links or screenshots

@chandrams chandrams added the bug Something isn't working label Nov 15, 2024
@chandrams chandrams added this to the Kruize 0.2 Release milestone Nov 15, 2024
@dinogun dinogun moved this to In Progress in Monitoring Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working remote_monitoring
Projects
Status: In Progress
Development

No branches or pull requests

4 participants