need a metric to see if overlap is being hit #18055
Labels
stage/accepted
Confirmed, and intend to work on. No timeline committment though.
theme/batch
Issues related to batch jobs and scheduling
theme/job-summary
theme/metrics
type/enhancement
Proposal
For periodic job which have
prohibit_overlap
set totrue
, we need a method to detect that the overlap threshold is being hit.With this metric it would be easy to detect if the configured schedule is too soon, based on runtime of the job.
Also, if the threshold is being hit often, it could be an indicator that something has changed (for the negative) in the job itself.
Use-cases
If there was an explicit metric which indicated that job FOO was hitting threshold often, we could increase it's schedule duration for longer timeframes and setup an alert in the monitoring system like DataDog, etc.
Attempted Solutions
Currently we monitor pending allocations, but cannot pinpoint if the allocation is pending due to insufficient resources or if certain jobs are hitting their overlap threshold.
The text was updated successfully, but these errors were encountered: