Add snapshot event for STOP_LONG_RUNNING #10025

berland · 2025-02-10T13:07:35Z

This makes the effect of STOP_LONG_RUNNING visible in the GUI.

Issue
Resolves #10016

Approach
Replicate the behaviour and code for MAX_RUNTIME.

PR title captures the intent of the changes, and is fitting for release notes.
Added appropriate release note label
Commit history is consistent and clean, in line with the contribution guidelines.
Make sure unit tests pass locally after every commit (git rebase -i main --exec 'just rapid-tests')

When applicable

When there are user facing changes: Updated documentation
New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
Large PR: Prepare changes in small commits for more convenient review
Bug fix: Add regression test for the bug
Bug fix: Create Backport PR to latest release

codspeed-hq · 2025-02-10T13:50:49Z

CodSpeed Performance Report

Merging #10025 will not alter performance

_{Comparing berland:stop_long_running_gui_event (105f891) with main (dd9be56)}

Summary

✅ 25 untouched benchmarks

berland · 2025-02-10T14:30:05Z

Screenshot is outdated with respect to the actual text (check source code).

Also the screenshot displays a wrong Duration. This has also been fixed in the PR.

xjules · 2025-02-10T14:50:32Z

src/ert/scheduler/scheduler.py

@@ -142,12 +148,17 @@ async def _stop_long_running_jobs(
                        > long_running_factor * self._average_job_runtime
                        and not task.done()
                    ):
-                        logger.info(
+                        logger.error(


Is it considered an error? I think that this was "asked" by the user when specifying STOP_LONG_RUNNING

Good question. From the perspective of the scheduler, it is not an error, I agree. From the perspective of the realization (or job.py), it is an error. For, reasons, the corresponding logging for max_runtime is logged as an error, but this happens inside job.py which you could argue has a different perspective (the realization).

It makes sense to have the same logleven for stop_long_running and max_runtime, and whether it is logged from scheduler.py or job.py does not matter for the user.

Maybe warning is a fair compromise?

I have changed both to warning.

warning sounds good!

xjules · 2025-02-10T14:52:57Z

tests/ert/unit_tests/scheduler/test_scheduler.py

+    stop_long_running_events_found = 0
+    while not sch._events.empty():
+        event = await sch._events.get()
+        print(event)


leftover print statement

This makes the effect of STOP_LONG_RUNNING visible in the GUI.

When a realization is killed due to max_runtime, is it per instruction from the user and should not be considered an error.

xjules

Nice PR @berland ! 🚀

eivindjahren · 2025-02-11T11:10:03Z

src/ert/ensemble_evaluator/snapshot.py

@@ -315,6 +318,24 @@ def update_from_event(
                                "reaching MAX_RUNTIME",
                            )
                        )
+            elif e_type is RealizationStoppedLongRunning:


Separate issue but this long if-else seems like a good candidate for a match-case:

match event: ... case RealizationStoppedLongRunning(real=real): for fm_step_id, fm_step in source_snapshot.get_fm_steps_for_real(real).items(): if fm_step.get(ids.STATUS) != state.FORWARD_MODEL_STATE_FINISHED: fm_idx = (real, fm_step_id) if fm_idx not in source_snapshot._fm_step_snapshots: self._fm_step_snapshots[fm_idx] = FMStepSnapshot() self._fm_step_snapshots[fm_idx].update( FMStepSnapshot( status=state.FORWARD_MODEL_STATE_FAILURE, end_time=end_time, error="The run is cancelled due to " "excessive runtime, 25% more than the average " "runtime (check keyword STOP_LONG_RUNNING)", ) )

berland added the release-notes:bug-fix Automatically categorise as bug fix in release notes label Feb 10, 2025

berland force-pushed the stop_long_running_gui_event branch 2 times, most recently from eca7f83 to 253ffc4 Compare February 10, 2025 13:24

berland force-pushed the stop_long_running_gui_event branch from 253ffc4 to ca89d29 Compare February 10, 2025 14:14

berland changed the title ~~Add GUI event for STOP_LONG_RUNNING~~ Add snapshot event for STOP_LONG_RUNNING Feb 10, 2025

xjules reviewed Feb 10, 2025

View reviewed changes

berland added 2 commits February 11, 2025 09:43

Add snapshot event for STOP_LONG_RUNNING

f7ac8b4

This makes the effect of STOP_LONG_RUNNING visible in the GUI.

Downgrade log message from error to warning in max_runtime

105f891

When a realization is killed due to max_runtime, is it per instruction from the user and should not be considered an error.

berland force-pushed the stop_long_running_gui_event branch from ca89d29 to 105f891 Compare February 11, 2025 08:44

xjules approved these changes Feb 11, 2025

View reviewed changes

berland merged commit 6f98b41 into equinor:main Feb 11, 2025
27 checks passed

eivindjahren reviewed Feb 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add snapshot event for STOP_LONG_RUNNING #10025

Add snapshot event for STOP_LONG_RUNNING #10025

berland commented Feb 10, 2025 •

edited

Loading

codspeed-hq bot commented Feb 10, 2025 •

edited

Loading

berland commented Feb 10, 2025

xjules Feb 10, 2025

berland Feb 10, 2025

berland Feb 11, 2025

xjules Feb 11, 2025

xjules Feb 10, 2025

berland Feb 11, 2025

xjules left a comment

eivindjahren Feb 11, 2025 •

edited

Loading

Add snapshot event for STOP_LONG_RUNNING #10025

Add snapshot event for STOP_LONG_RUNNING #10025

Conversation

berland commented Feb 10, 2025 • edited Loading

When applicable

codspeed-hq bot commented Feb 10, 2025 • edited Loading

CodSpeed Performance Report

Merging #10025 will not alter performance

Summary

berland commented Feb 10, 2025

xjules Feb 10, 2025

Choose a reason for hiding this comment

berland Feb 10, 2025

Choose a reason for hiding this comment

berland Feb 11, 2025

Choose a reason for hiding this comment

xjules Feb 11, 2025

Choose a reason for hiding this comment

xjules Feb 10, 2025

Choose a reason for hiding this comment

berland Feb 11, 2025

Choose a reason for hiding this comment

xjules left a comment

Choose a reason for hiding this comment

eivindjahren Feb 11, 2025 • edited Loading

Choose a reason for hiding this comment

berland commented Feb 10, 2025 •

edited

Loading

codspeed-hq bot commented Feb 10, 2025 •

edited

Loading

eivindjahren Feb 11, 2025 •

edited

Loading