-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A quick script to check if a job took longer than x time to run and update python in precommit #930
base: master
Are you sure you want to change the base?
A quick script to check if a job took longer than x time to run and update python in precommit #930
Conversation
87e47d9
to
3f7fe09
Compare
.pre-commit-config.yaml
Outdated
language_version: python3.6 | ||
args: [--target-version, py36] | ||
language_version: python3.8 | ||
args: [--target-version, py38] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might want this to still use [--target-version, py36]
so that it doesn't try to format anything in an incompatible way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its failed the test below cause:
ERROR: InvocationError for command '/home/runner/work/Tron/Tron/.tox/py36/bin/pre-commit run --all-files' (exited with code 1)
but when I try to run this locally I don't have py36 on this branch, its .tox/py38/...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then when I run make test after make clean:
(py38) emanelsabban@dev208-uswest1adevc:~/pg/Tron$ make test
tox -e py36
py36 create: /nail/home/emanelsabban/pg/Tron/.tox/py36
ERROR: InterpreterNotFound: python3.6
________________________________________________________________________________________________________ summary _________________________________________________________________________________________________________
ERROR: py36: InterpreterNotFound: python3.6
make: *** [Makefile:64: test] Error 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, that's unfortunate :/
I'm just a tad worried about breaking the py36 build and/or somehow allowing py36+ code to sneak in and break at runtime (on py36)
tron/bin/check_job_exceeding_time.py
Outdated
log = logging.getLogger("check_exceeding_time") | ||
|
||
|
||
def parse_cli(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the yelpy naming convention for this is usually parse_args
:)
probably a good idea to have new code by typed as well, so we probably want this to be something like:
def parse_cli(): | |
def parse_args() -> argparse.Namespace: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and depending on how easy it is to trace the tron code to get the type of some of these things, it might be worth using reveal_type
(see https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html#when-you-re-puzzled-or-when-things-are-complicated) to have mypy tell you what it thinks the types are
tron/bin/check_job_exceeding_time.py
Outdated
check_if_time_exceeded(job_runs, time_limit, result) | ||
|
||
|
||
def main(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def main(): | |
def main() -> Optional[int]: |
would match what sys.exit expects :)
tron/bin/check_job_exceeding_time.py
Outdated
if args.job is None: | ||
jobs = client.jobs(include_job_runs=True) | ||
for job in jobs: | ||
check_job_time(job=job, time_limit=args.time_limit, result=result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking suggestion: could we make this a pure function by having it return a list of affected runs?
(and then do something like results.extend(check_job_time(...))
?
tron/bin/check_job_exceeding_time.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super-nit: should this be something like get_jobs_exceeding_runtime
? usually check_*
functions are intended to be used as monitoring checks and will return non-zero if things are failing checks (versus the current behavior here, which is just printing)
tron/bin/check_job_exceeding_time.py
Outdated
|
||
|
||
def check_if_time_exceeded(job_runs, job_expected_runtime, result): | ||
states_to_check = {"queued", "scheduled", "cancelled", "skipped"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we repeat this - might be worth extracting this to a named constant
tron/bin/check_job_exceeding_time.py
Outdated
and job_run.get( | ||
"state", | ||
"unknown", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to do this check in both places? looks like this check is also being done by the function that calls this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, Idk why I initially thought the state might change so I added it in both places
tron/bin/check_job_exceeding_time.py
Outdated
if duration_seconds and duration_seconds > job_expected_runtime: | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
if duration_seconds and duration_seconds > job_expected_runtime: | |
return True | |
return duration_seconds and duration_seconds > job_expected_runtime: |
tron/bin/check_job_exceeding_time.py
Outdated
job_runs = sorted( | ||
job.get("runs", []), | ||
key=lambda k: (k["end_time"] is None, k["end_time"], k["run_time"]), | ||
reverse=True, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious: are we sorting so that the output is ordered later on? should we sort at printing time so that we only need to sort once if so?
tron/bin/check_job_exceeding_time.py
Outdated
job = client.job_runs(job_url) | ||
check_job_time(job=job, client=client, url_index=url_index, result=result) | ||
|
||
if result is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be if not result
? we default result
to []
- which is not None
6ede8b6
to
0275846
Compare
0275846
to
00287f2
Compare
…pdate python in precommit
00287f2
to
9dcc4cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've got some minor comments re: the types - but if i'm blocking you feel free to ignore them as they're not particularly serious and we can always come back and improve them :)
return args | ||
|
||
|
||
def check_if_time_exceeded(job_runs, job_expected_runtime) -> list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def check_if_time_exceeded(job_runs, job_expected_runtime) -> list: | |
def check_if_time_exceeded(job_runs: List[Dict[str, Any]], job_expected_runtime: int) -> List[str]: |
i'm sort of guessing about the job_runs
typing here - that said, if we're just using this script for the current project it's probably fine to be a little lax with the types and just do the easy ones (e.g.,
def check_if_time_exceeded(job_runs, job_expected_runtime) -> list: | |
def check_if_time_exceeded(job_runs: List, job_expected_runtime: int) -> List[str]: |
return result | ||
|
||
|
||
def is_job_run_exceeding_expected_runtime(job_run, job_expected_runtime) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def is_job_run_exceeding_expected_runtime(job_run, job_expected_runtime) -> bool: | |
def is_job_run_exceeding_expected_runtime(job_run: Dict[str, Any], job_expected_runtime: int) -> bool: |
return False | ||
|
||
|
||
def check_job_time(job, time_limit) -> list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def check_job_time(job, time_limit) -> list: | |
def check_job_time(job: Dict[str, Any], time_limit: int) -> List[str]: |
…ck-script-to-check-jobs-running
@EmanElsaban is this still something we're interested in merging? |
This script outputs the jobs that takes longer than x time to run. We can specify the time using
--time
. It's an easy way to get the jobs than to look at the Tron UI