Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TRON-2277: Pass along non_retryable_exit_codes to KubernetesCluster objects #998

Merged
merged 1 commit into from
Sep 26, 2024

Conversation

jfongatyelp
Copy link
Contributor

Looks like we missed one place where we need to pass along the k8s_options from MASTER configs, when we instantiate our KubernetesClusters from KubernetesClusterRepository.

Applying this change to a local instance and infrastage now do what we expect and do not retry failed actions if their exit codes are in non_retryable_exit_codes, plus logs what we expect:

2024-09-25 17:00:22,786 tron.core.actionrun INFO ActionRun: MASTER.testjob0.14.zeroth skipping auto-retries, received non-retryable exit code (72).
2024-09-25 17:00:22,786 tron.core.actionrun INFO ActionRun: MASTER.testjob0.14.zeroth completed with fail, transitioned to running, exit status: 72
2024-09-25 17:00:22,786 tron.utils.state DEBUG transitioning from running to failed
2024-09-25 17:00:22,786 tron.utils.observer DEBUG Notifying 1 listeners for new event 'failed'
2024-09-25 17:00:22,786 tron.core.jobrun INFO JobRun:MASTER.testjob0.14 got an event: failed

(i.e. no retries after the non-retryable exit code line + goes to jobrun failed)

This also adds 2 fairly dumb tests (the actionrun test alone wouldn't have caught the fix since we're mocking get_cluster's result itself, hence why I also added a really dumb KubernetesClusterRepository->KubernetesCluster one to make sure we don't skip passing along new configs here in the future.

Let me know if you have a better idea for any tests!

Copy link
Member

@nemacysts nemacysts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! maybe one day we'll reduce some of the layers here :(

and ++ for the new options test!

@jfongatyelp jfongatyelp merged commit c061971 into master Sep 26, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants