This repository has been archived by the owner on Jan 19, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 12
Job that cannot start pod -> orphaned #235
Labels
type:bug
Something is not working as intended/documented
Comments
christian-kreuzberger-dtx
added
the
type:bug
Something is not working as intended/documented
label
May 16, 2022
Nice catch. We will address this as soon as possible. Related to #234 |
christian-kreuzberger-dtx
changed the title
Job that cannot start pods are orphaned
Job that cannot start -> pods are orphaned
May 16, 2022
christian-kreuzberger-dtx
changed the title
Job that cannot start -> pods are orphaned
Job that cannot start pod -> orphaned
May 16, 2022
Example output of
|
re-opening this, as the issue still exists - we can probaly solve this via refactoring (#244) |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
When Job Executor Service creates a k8s Job that cannot spawn pod (if the wrong serviceAccount for the task is specified for example) the sequence fails because of timeout but no logs can be retrieved as there are no pods to fetch them from.
Furthermore the created job is not collected by the k8s TTL controller as it never finishes, so it will keep trying to spawn pods long after Job Executor Service gave up on it (possibly indefinitely if there's a configuration error) so it has to be manually removed by the user.
The Job Executor Service should detect that the job failed to start and add relevant information for the user extracted from the job status/events and explicitly delete the job if it didn't spawn any pods.
How to reproduce:
Use a job config with a wrong service account:
The text was updated successfully, but these errors were encountered: