Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create end point for events #9691

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

oyvindeide
Copy link
Collaborator

@oyvindeide oyvindeide commented Jan 9, 2025

Issue
Resolves #9001

Approach
Short description of the approach

(Screenshot of new behavior in GUI if applicable)

  • PR title captures the intent of the changes, and is fitting for release notes.
  • Added appropriate release note label
  • Commit history is consistent and clean, in line with the contribution guidelines.
  • Make sure unit tests pass locally after every commit (git rebase -i main --exec 'pytest tests/ert/unit_tests -n logical -m "not integration_test"')

When applicable

  • When there are user facing changes: Updated documentation
  • New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
  • Large PR: Prepare changes in small commits for more convenient review
  • Bug fix: Add regression test for the bug
  • Bug fix: Create Backport PR to latest release

Copy link

codspeed-hq bot commented Jan 9, 2025

CodSpeed Performance Report

Merging #9691 will improve performances by 10.62%

Comparing oyvindeide:add_websockets_endpoint (a0827e3) with main (ee6a12a)

Summary

⚡ 1 improvements
✅ 24 untouched benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
test_load_from_context[gen_x: 20, sum_x: 20 reals: 10] 7.1 ms 6.4 ms +10.62%

@oyvindeide oyvindeide force-pushed the add_websockets_endpoint branch 4 times, most recently from 06dfbd1 to 79c008e Compare January 28, 2025 11:25
@oyvindeide oyvindeide force-pushed the add_websockets_endpoint branch 2 times, most recently from c396805 to e734a28 Compare January 29, 2025 07:22
@oyvindeide oyvindeide force-pushed the add_websockets_endpoint branch from e734a28 to c373dd0 Compare January 29, 2025 14:43
Copy link
Contributor

@DanSava DanSava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions and comments

src/everest/bin/utils.py Outdated Show resolved Hide resolved
Comment on lines +244 to +283
if message:
event_dict = json.loads(message)
if "snapshot" in event_dict:
event_dict["snapshot"] = EnsembleSnapshot.from_nested_dict(
event_dict["snapshot"]
)
try:
event = EventWrapper(event=event_dict).event
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you rewrite this:

                if message:
                    event_dict = json.loads(message)
                    if "snapshot" in event_dict:
                        event_dict["snapshot"] = EnsembleSnapshot.from_nested_dict(
                            event_dict["snapshot"]
                        )
                    try:
                        event = EventWrapper(event=event_dict).event

as

if message:
    try:
        event = status_event_from_json(message)

will make it a bit more clear and you no longer need EventWrapper

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get this then 🤔

pydantic_core._pydantic_core.ValidationError: 1 validation error for tagged-union[ForwardModelStepStart,ForwardModelStepRunning,ForwardModelStepSuccess,ForwardModelStepFailure,ForwardModelStepChecksum,RealizationPending,RealizationRunning,RealizationSuccess,RealizationFailed,RealizationTimeout,RealizationUnknown,RealizationWaiting,RealizationResubmit,EESnapshot,EESnapshotUpdate,EETerminated,EEUserCancel,EEUserDone,EnsembleStarted,EnsembleSucceeded,EnsembleFailed,EnsembleCancelled]
  Input tag 'FullSnapshotEvent' found using 'event_type' does not match any of the expected tags: 'forward_model_step.start', 'forward_model_step.running', 'forward_model_step.success', 'forward_model_step.failure', 'forward_model_step.checksum', 'realization.pending', 'realization.running', 'realization.success', 'realization.failure', 'realization.timeout', 'realization.unknown', 'realization.waiting', 'realization.resubmit', 'ee.snapshot', 'ee.snapshot_update', 'ee.terminated', 'ee.user_cancel', 'ee.user_done', 'ensemble.started', 'ensemble.succeeded', 'ensemble.failed', 'ensemble.cancelled' [type=union_tag_invalid, input_value={'iteration_label': 'Runn...e': 'FullSnapshotEvent'}, input_type=dict]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked out your branch and tested the replacement before I made the suggestion. When I did it again after the latest change I could still not reproduce the issue you encountered.

It is possible you replaced it with event_from_json(message) imported form _ert.events import event_from_json and not status_event_from_json from ert.run_models.event import status_event_from_json.

When I do the suggested above I get the same issue.

src/everest/detached/__init__.py Show resolved Hide resolved
src/everest/detached/jobs/everserver.py Outdated Show resolved Hide resolved
src/everest/detached/jobs/everserver.py Outdated Show resolved Hide resolved
@oyvindeide oyvindeide force-pushed the add_websockets_endpoint branch 3 times, most recently from b9d4f73 to 66155f7 Compare January 31, 2025 08:04
@oyvindeide oyvindeide force-pushed the add_websockets_endpoint branch from 66155f7 to 6221630 Compare February 4, 2025 11:40
@oyvindeide oyvindeide force-pushed the add_websockets_endpoint branch from 6221630 to c92e178 Compare February 4, 2025 11:55
@oyvindeide oyvindeide self-assigned this Feb 4, 2025
@@ -244,42 +192,22 @@ def mocked_server(url, verify, auth, timeout, proxies):
# The server should fail and store a user-friendly message.
assert status["status"] == ServerStatus.failed
assert OPT_FAILURE_REALIZATIONS in status["message"]
assert "job1 Failed with: job 1 error 1" in status["message"]
assert "job1 Failed with: job 1 error 2" in status["message"]
assert "job2 Failed with: job 2 error 1" in status["message"]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not have this information about the failure anymore?

Copy link
Collaborator Author

@oyvindeide oyvindeide Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will if it exists in the snapshot, but the main difference here from before is that we dont read the error file from disk, as that relies more on the shared file system. If the information is in the snapshot, we will have it. If a job fails, the monitoring will give out whatever information is in the snapshot, so in this specific instance, if the snapshot property error had: job 2 error 1, that would be given. It would still be possible to read the error file and get the full error, but the reason for example ert does not do that, and the reason it is not in the snapshot is because it can be quite large. We could potentially tail the file, and it would be smaller. But that should be done on the server side then.

@frode-aarstad
Copy link
Contributor

Seems like quite a few of the tests are testing less (asserting less) are we sure the changed tests tests everything we want ?

@oyvindeide
Copy link
Collaborator Author

Seems like quite a few of the tests are testing less (asserting less) are we sure the changed tests tests everything we want ?

Are they? If you mean: test_monitor_entry_no_show_all_jobs and test_monitor_entry_show_all_jobs they are replaced with faster unit tests checking just the monitor, the coverage should be the same, or slightly higher.

@oyvindeide oyvindeide added the release-notes:improvement Automatically categorise as improvement in release notes label Feb 5, 2025
Running these tests will cause a lot of dangling threads because
the server is started, but not stopped, leading to potential flakyness.

This mocks the server completely, which has the drawback that there
is less testing of the actual server, but that is also tested in other
tests.
Copy link
Contributor

@DanSava DanSava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good!

Not sure if this is a perception illusion because I didn't do any actual testing for this but it seems running the examples case in the cli seem to run faster.

It is unclear if it caused by the current changes but I noticed that test_simulator_cache is very slow. I will add an issue to investigate.

Comment on lines +244 to +283
if message:
event_dict = json.loads(message)
if "snapshot" in event_dict:
event_dict["snapshot"] = EnsembleSnapshot.from_nested_dict(
event_dict["snapshot"]
)
try:
event = EventWrapper(event=event_dict).event
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked out your branch and tested the replacement before I made the suggestion. When I did it again after the latest change I could still not reproduce the issue you encountered.

It is possible you replaced it with event_from_json(message) imported form _ert.events import event_from_json and not status_event_from_json from ert.run_models.event import status_event_from_json.

When I do the suggested above I get the same issue.

@DanSava
Copy link
Contributor

DanSava commented Feb 7, 2025

I did more testing and test_simulator_cache is not just slow it tends to hang when running the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes:improvement Automatically categorise as improvement in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pass events from the Everest run model
4 participants