Skip to content
This repository has been archived by the owner on Jan 19, 2024. It is now read-only.

Job that cannot start pod -> orphaned #235

Open
pchila opened this issue May 3, 2022 · 3 comments · Fixed by #249
Open

Job that cannot start pod -> orphaned #235

pchila opened this issue May 3, 2022 · 3 comments · Fixed by #249
Labels
type:bug Something is not working as intended/documented

Comments

@pchila
Copy link
Contributor

pchila commented May 3, 2022

When Job Executor Service creates a k8s Job that cannot spawn pod (if the wrong serviceAccount for the task is specified for example) the sequence fails because of timeout but no logs can be retrieved as there are no pods to fetch them from.
Furthermore the created job is not collected by the k8s TTL controller as it never finishes, so it will keep trying to spawn pods long after Job Executor Service gave up on it (possibly indefinitely if there's a configuration error) so it has to be manually removed by the user.

The Job Executor Service should detect that the job failed to start and add relevant information for the user extracted from the job status/events and explicitly delete the job if it didn't spawn any pods.

How to reproduce:

Use a job config with a wrong service account:

apiVersion: v2
actions:
  - name: "Hello World e2e test"
    events:
      - name: "sh.keptn.event.deployment.triggered"
    tasks:
      - name: "Greet the world"
        image: "alpine"
        serviceAccount: "inexistentServiceAccount"
        cmd:
          - echo
        args:
          - "Hello World"
@christian-kreuzberger-dtx christian-kreuzberger-dtx added the type:bug Something is not working as intended/documented label May 16, 2022
@christian-kreuzberger-dtx
Copy link
Contributor

christian-kreuzberger-dtx commented May 16, 2022

Nice catch. We will address this as soon as possible. Related to #234

@christian-kreuzberger-dtx christian-kreuzberger-dtx changed the title Job that cannot start pods are orphaned Job that cannot start -> pods are orphaned May 16, 2022
@christian-kreuzberger-dtx christian-kreuzberger-dtx changed the title Job that cannot start -> pods are orphaned Job that cannot start pod -> orphaned May 16, 2022
@christian-kreuzberger-dtx
Copy link
Contributor

christian-kreuzberger-dtx commented May 16, 2022

Example output of describe job:

Name:           job-executor-service-job-1067030a-bc7f-4e65-98b1-669e-1
Namespace:      keptn-jes
Selector:       controller-uid=1da7d309-aab9-4a91-9487-bc710dfea8a9
Labels:         controller-uid=1da7d309-aab9-4a91-9487-bc710dfea8a9
                job-name=job-executor-service-job-1067030a-bc7f-4e65-98b1-669e-1
Annotations:    <none>
Parallelism:    1
Completions:    1
Pods Statuses:  0 Active / 0 Succeeded / 0 Failed
...
Events:
  Type     Reason        Age                 From            Message
  ----     ------        ----                ----            -------
  Warning  FailedCreate  56s (x4 over 2m6s)  job-controller  Error creating: pods "job-executor-service-job-1067030a-bc7f-4e65-98b1-669e-1-" is forbidden: error looking up service account keptn-jes/inexistentServiceAccount: serviceaccount "inexistentServiceAccount" not found

@christian-kreuzberger-dtx
Copy link
Contributor

christian-kreuzberger-dtx commented May 17, 2022

re-opening this, as the issue still exists - we can probaly solve this via refactoring (#244)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type:bug Something is not working as intended/documented
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants