-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add snapshot event for STOP_LONG_RUNNING #10025
Conversation
eca7f83
to
253ffc4
Compare
CodSpeed Performance ReportMerging #10025 will not alter performanceComparing Summary
|
253ffc4
to
ca89d29
Compare
Screenshot is outdated with respect to the actual text (check source code). Also the screenshot displays a wrong Duration. This has also been fixed in the PR. |
src/ert/scheduler/scheduler.py
Outdated
@@ -142,12 +148,17 @@ async def _stop_long_running_jobs( | |||
> long_running_factor * self._average_job_runtime | |||
and not task.done() | |||
): | |||
logger.info( | |||
logger.error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it considered an error? I think that this was "asked" by the user when specifying STOP_LONG_RUNNING
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. From the perspective of the scheduler, it is not an error, I agree. From the perspective of the realization (or job.py), it is an error. For, reasons, the corresponding logging for max_runtime is logged as an error, but this happens inside job.py which you could argue has a different perspective (the realization).
It makes sense to have the same logleven for stop_long_running and max_runtime, and whether it is logged from scheduler.py or job.py does not matter for the user.
Maybe warning is a fair compromise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed both to warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning sounds good!
stop_long_running_events_found = 0 | ||
while not sch._events.empty(): | ||
event = await sch._events.get() | ||
print(event) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover print statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
This makes the effect of STOP_LONG_RUNNING visible in the GUI.
When a realization is killed due to max_runtime, is it per instruction from the user and should not be considered an error.
ca89d29
to
105f891
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice PR @berland ! 🚀
@@ -315,6 +318,24 @@ def update_from_event( | |||
"reaching MAX_RUNTIME", | |||
) | |||
) | |||
elif e_type is RealizationStoppedLongRunning: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate issue but this long if-else seems like a good candidate for a match-case:
match event:
...
case RealizationStoppedLongRunning(real=real):
for fm_step_id, fm_step in source_snapshot.get_fm_steps_for_real(real).items():
if fm_step.get(ids.STATUS) != state.FORWARD_MODEL_STATE_FINISHED:
fm_idx = (real, fm_step_id)
if fm_idx not in source_snapshot._fm_step_snapshots:
self._fm_step_snapshots[fm_idx] = FMStepSnapshot()
self._fm_step_snapshots[fm_idx].update(
FMStepSnapshot(
status=state.FORWARD_MODEL_STATE_FAILURE,
end_time=end_time,
error="The run is cancelled due to "
"excessive runtime, 25% more than the average "
"runtime (check keyword STOP_LONG_RUNNING)",
)
)
This makes the effect of STOP_LONG_RUNNING visible in the GUI.
Issue
Resolves #10016
Approach
Replicate the behaviour and code for MAX_RUNTIME.
git rebase -i main --exec 'just rapid-tests'
)When applicable