Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A quick script to check if a job took longer than x time to run and update python in precommit #930

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

EmanElsaban
Copy link
Contributor

This script outputs the jobs that takes longer than x time to run. We can specify the time using --time. It's an easy way to get the jobs than to look at the Tron UI

@EmanElsaban EmanElsaban force-pushed the u/emanelsabban/quick-script-to-check-jobs-running branch 4 times, most recently from 87e47d9 to 3f7fe09 Compare November 16, 2023 21:54
language_version: python3.6
args: [--target-version, py36]
language_version: python3.8
args: [--target-version, py38]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might want this to still use [--target-version, py36] so that it doesn't try to format anything in an incompatible way

Copy link
Contributor Author

@EmanElsaban EmanElsaban Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its failed the test below cause:

ERROR: InvocationError for command '/home/runner/work/Tron/Tron/.tox/py36/bin/pre-commit run --all-files' (exited with code 1)

but when I try to run this locally I don't have py36 on this branch, its .tox/py38/...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then when I run make test after make clean:

(py38) emanelsabban@dev208-uswest1adevc:~/pg/Tron$ make test
tox -e py36
py36 create: /nail/home/emanelsabban/pg/Tron/.tox/py36
ERROR: InterpreterNotFound: python3.6
________________________________________________________________________________________________________ summary _________________________________________________________________________________________________________
ERROR:  py36: InterpreterNotFound: python3.6
make: *** [Makefile:64: test] Error 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, that's unfortunate :/

I'm just a tad worried about breaking the py36 build and/or somehow allowing py36+ code to sneak in and break at runtime (on py36)

log = logging.getLogger("check_exceeding_time")


def parse_cli():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the yelpy naming convention for this is usually parse_args :)

probably a good idea to have new code by typed as well, so we probably want this to be something like:

Suggested change
def parse_cli():
def parse_args() -> argparse.Namespace:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and depending on how easy it is to trace the tron code to get the type of some of these things, it might be worth using reveal_type (see https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html#when-you-re-puzzled-or-when-things-are-complicated) to have mypy tell you what it thinks the types are

check_if_time_exceeded(job_runs, time_limit, result)


def main():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def main():
def main() -> Optional[int]:

would match what sys.exit expects :)

if args.job is None:
jobs = client.jobs(include_job_runs=True)
for job in jobs:
check_job_time(job=job, time_limit=args.time_limit, result=result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking suggestion: could we make this a pure function by having it return a list of affected runs?

(and then do something like results.extend(check_job_time(...))?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super-nit: should this be something like get_jobs_exceeding_runtime? usually check_* functions are intended to be used as monitoring checks and will return non-zero if things are failing checks (versus the current behavior here, which is just printing)



def check_if_time_exceeded(job_runs, job_expected_runtime, result):
states_to_check = {"queued", "scheduled", "cancelled", "skipped"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we repeat this - might be worth extracting this to a named constant

Comment on lines 48 to 51
and job_run.get(
"state",
"unknown",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to do this check in both places? looks like this check is also being done by the function that calls this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, Idk why I initially thought the state might change so I added it in both places

Comment on lines 55 to 56
if duration_seconds and duration_seconds > job_expected_runtime:
return True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
if duration_seconds and duration_seconds > job_expected_runtime:
return True
return duration_seconds and duration_seconds > job_expected_runtime:

Comment on lines 61 to 65
job_runs = sorted(
job.get("runs", []),
key=lambda k: (k["end_time"] is None, k["end_time"], k["run_time"]),
reverse=True,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious: are we sorting so that the output is ordered later on? should we sort at printing time so that we only need to sort once if so?

job = client.job_runs(job_url)
check_job_time(job=job, client=client, url_index=url_index, result=result)

if result is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be if not result? we default result to [] - which is not None

@EmanElsaban EmanElsaban force-pushed the u/emanelsabban/quick-script-to-check-jobs-running branch 2 times, most recently from 6ede8b6 to 0275846 Compare November 21, 2023 22:18
@EmanElsaban EmanElsaban force-pushed the u/emanelsabban/quick-script-to-check-jobs-running branch from 0275846 to 00287f2 Compare November 21, 2023 22:51
@EmanElsaban EmanElsaban force-pushed the u/emanelsabban/quick-script-to-check-jobs-running branch from 00287f2 to 9dcc4cb Compare November 21, 2023 22:57
Copy link
Member

@nemacysts nemacysts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've got some minor comments re: the types - but if i'm blocking you feel free to ignore them as they're not particularly serious and we can always come back and improve them :)

return args


def check_if_time_exceeded(job_runs, job_expected_runtime) -> list:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def check_if_time_exceeded(job_runs, job_expected_runtime) -> list:
def check_if_time_exceeded(job_runs: List[Dict[str, Any]], job_expected_runtime: int) -> List[str]:

i'm sort of guessing about the job_runs typing here - that said, if we're just using this script for the current project it's probably fine to be a little lax with the types and just do the easy ones (e.g.,

Suggested change
def check_if_time_exceeded(job_runs, job_expected_runtime) -> list:
def check_if_time_exceeded(job_runs: List, job_expected_runtime: int) -> List[str]:

return result


def is_job_run_exceeding_expected_runtime(job_run, job_expected_runtime) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def is_job_run_exceeding_expected_runtime(job_run, job_expected_runtime) -> bool:
def is_job_run_exceeding_expected_runtime(job_run: Dict[str, Any], job_expected_runtime: int) -> bool:

return False


def check_job_time(job, time_limit) -> list:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def check_job_time(job, time_limit) -> list:
def check_job_time(job: Dict[str, Any], time_limit: int) -> List[str]:

@nemacysts
Copy link
Member

@EmanElsaban is this still something we're interested in merging?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants