Job that cannot start pod -> orphaned #235

pchila · 2022-05-03T07:49:21Z

When Job Executor Service creates a k8s Job that cannot spawn pod (if the wrong serviceAccount for the task is specified for example) the sequence fails because of timeout but no logs can be retrieved as there are no pods to fetch them from.
Furthermore the created job is not collected by the k8s TTL controller as it never finishes, so it will keep trying to spawn pods long after Job Executor Service gave up on it (possibly indefinitely if there's a configuration error) so it has to be manually removed by the user.

The Job Executor Service should detect that the job failed to start and add relevant information for the user extracted from the job status/events and explicitly delete the job if it didn't spawn any pods.

How to reproduce:

Use a job config with a wrong service account:

apiVersion: v2
actions:
  - name: "Hello World e2e test"
    events:
      - name: "sh.keptn.event.deployment.triggered"
    tasks:
      - name: "Greet the world"
        image: "alpine"
        serviceAccount: "inexistentServiceAccount"
        cmd:
          - echo
        args:
          - "Hello World"

The text was updated successfully, but these errors were encountered:

christian-kreuzberger-dtx · 2022-05-16T08:31:28Z

Nice catch. We will address this as soon as possible. Related to #234

christian-kreuzberger-dtx · 2022-05-16T11:14:22Z

Example output of describe job:

Name:           job-executor-service-job-1067030a-bc7f-4e65-98b1-669e-1
Namespace:      keptn-jes
Selector:       controller-uid=1da7d309-aab9-4a91-9487-bc710dfea8a9
Labels:         controller-uid=1da7d309-aab9-4a91-9487-bc710dfea8a9
                job-name=job-executor-service-job-1067030a-bc7f-4e65-98b1-669e-1
Annotations:    <none>
Parallelism:    1
Completions:    1
Pods Statuses:  0 Active / 0 Succeeded / 0 Failed
...
Events:
  Type     Reason        Age                 From            Message
  ----     ------        ----                ----            -------
  Warning  FailedCreate  56s (x4 over 2m6s)  job-controller  Error creating: pods "job-executor-service-job-1067030a-bc7f-4e65-98b1-669e-1-" is forbidden: error looking up service account keptn-jes/inexistentServiceAccount: serviceaccount "inexistentServiceAccount" not found

christian-kreuzberger-dtx · 2022-05-17T12:59:15Z

re-opening this, as the issue still exists - we can probaly solve this via refactoring (#244)

christian-kreuzberger-dtx added the type:bug Something is not working as intended/documented label May 16, 2022

christian-kreuzberger-dtx changed the title ~~Job that cannot start pods are orphaned~~ Job that cannot start -> pods are orphaned May 16, 2022

christian-kreuzberger-dtx self-assigned this May 16, 2022

christian-kreuzberger-dtx changed the title ~~Job that cannot start -> pods are orphaned~~ Job that cannot start pod -> orphaned May 16, 2022

christian-kreuzberger-dtx mentioned this issue May 16, 2022

fix: Add output of failed events to logs #249

Merged

christian-kreuzberger-dtx closed this as completed in #249 May 17, 2022

christian-kreuzberger-dtx reopened this May 17, 2022

christian-kreuzberger-dtx removed their assignment May 17, 2022

christian-kreuzberger-dtx mentioned this issue May 20, 2022

Refactor K8sImpl structure to simplify the interaction with kubernetes #244

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Job that cannot start pod -> orphaned #235

Job that cannot start pod -> orphaned #235

pchila commented May 3, 2022

christian-kreuzberger-dtx commented May 16, 2022 •

edited

Loading

christian-kreuzberger-dtx commented May 16, 2022 •

edited

Loading

christian-kreuzberger-dtx commented May 17, 2022 •

edited

Loading

Job that cannot start pod -> orphaned #235

Job that cannot start pod -> orphaned #235

Comments

pchila commented May 3, 2022

How to reproduce:

christian-kreuzberger-dtx commented May 16, 2022 • edited Loading

christian-kreuzberger-dtx commented May 16, 2022 • edited Loading

christian-kreuzberger-dtx commented May 17, 2022 • edited Loading

christian-kreuzberger-dtx commented May 16, 2022 •

edited

Loading

christian-kreuzberger-dtx commented May 16, 2022 •

edited

Loading

christian-kreuzberger-dtx commented May 17, 2022 •

edited

Loading