Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add snapshot event for STOP_LONG_RUNNING #10025

Merged
merged 2 commits into from
Feb 11, 2025

Conversation

berland
Copy link
Contributor

@berland berland commented Feb 10, 2025

This makes the effect of STOP_LONG_RUNNING visible in the GUI.

Issue
Resolves #10016

Approach
Replicate the behaviour and code for MAX_RUNTIME.

stop_long_running

  • PR title captures the intent of the changes, and is fitting for release notes.
  • Added appropriate release note label
  • Commit history is consistent and clean, in line with the contribution guidelines.
  • Make sure unit tests pass locally after every commit (git rebase -i main --exec 'just rapid-tests')

When applicable

  • When there are user facing changes: Updated documentation
  • New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
  • Large PR: Prepare changes in small commits for more convenient review
  • Bug fix: Add regression test for the bug
  • Bug fix: Create Backport PR to latest release

@berland berland added the release-notes:bug-fix Automatically categorise as bug fix in release notes label Feb 10, 2025
@berland berland force-pushed the stop_long_running_gui_event branch 2 times, most recently from eca7f83 to 253ffc4 Compare February 10, 2025 13:24
Copy link

codspeed-hq bot commented Feb 10, 2025

CodSpeed Performance Report

Merging #10025 will not alter performance

Comparing berland:stop_long_running_gui_event (105f891) with main (dd9be56)

Summary

✅ 25 untouched benchmarks

@berland berland force-pushed the stop_long_running_gui_event branch from 253ffc4 to ca89d29 Compare February 10, 2025 14:14
@berland berland changed the title Add GUI event for STOP_LONG_RUNNING Add snapshot event for STOP_LONG_RUNNING Feb 10, 2025
@berland
Copy link
Contributor Author

berland commented Feb 10, 2025

Screenshot is outdated with respect to the actual text (check source code).

Also the screenshot displays a wrong Duration. This has also been fixed in the PR.

@@ -142,12 +148,17 @@ async def _stop_long_running_jobs(
> long_running_factor * self._average_job_runtime
and not task.done()
):
logger.info(
logger.error(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it considered an error? I think that this was "asked" by the user when specifying STOP_LONG_RUNNING

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. From the perspective of the scheduler, it is not an error, I agree. From the perspective of the realization (or job.py), it is an error. For, reasons, the corresponding logging for max_runtime is logged as an error, but this happens inside job.py which you could argue has a different perspective (the realization).

It makes sense to have the same logleven for stop_long_running and max_runtime, and whether it is logged from scheduler.py or job.py does not matter for the user.

Maybe warning is a fair compromise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed both to warning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning sounds good!

stop_long_running_events_found = 0
while not sch._events.empty():
event = await sch._events.get()
print(event)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover print statement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

This makes the effect of STOP_LONG_RUNNING visible in the GUI.
When a realization is killed due to max_runtime, is it per instruction
from the user and should not be considered an error.
@berland berland force-pushed the stop_long_running_gui_event branch from ca89d29 to 105f891 Compare February 11, 2025 08:44
Copy link
Contributor

@xjules xjules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR @berland ! 🚀

@berland berland merged commit 6f98b41 into equinor:main Feb 11, 2025
27 checks passed
@@ -315,6 +318,24 @@ def update_from_event(
"reaching MAX_RUNTIME",
)
)
elif e_type is RealizationStoppedLongRunning:
Copy link
Contributor

@eivindjahren eivindjahren Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate issue but this long if-else seems like a good candidate for a match-case:

match event:
  ...
  case RealizationStoppedLongRunning(real=real):
    for fm_step_id, fm_step in source_snapshot.get_fm_steps_for_real(real).items():
      if fm_step.get(ids.STATUS) != state.FORWARD_MODEL_STATE_FINISHED:
          fm_idx = (real, fm_step_id)
          if fm_idx not in source_snapshot._fm_step_snapshots:
              self._fm_step_snapshots[fm_idx] = FMStepSnapshot()
          self._fm_step_snapshots[fm_idx].update(
              FMStepSnapshot(
                  status=state.FORWARD_MODEL_STATE_FAILURE,
                  end_time=end_time,
                  error="The run is cancelled due to "
                  "excessive runtime, 25% more than the average "
                  "runtime (check keyword STOP_LONG_RUNNING)",
              )
          )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes:bug-fix Automatically categorise as bug fix in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Realizations killed by STOP_LONG_RUNNING are not properly cancelled
3 participants