Add missing features in resource settings tabs #946

edan-bainglass · 2024-11-26T16:57:21Z

Based on #945

This PR looks to clean up the recently implemented plugin-based tab system in the resources/submission step (#939).
The idea is to leverage the newly developed abstraction in #945 to encapsulate the behavior of ANY resource settings panel, leaving the global resource settings panel to override based on its unique added features.

A few added items addressed in this PR:

PW code parallelization widget syncing between tabs
Support for syncing newly setup codes
Warning notification for overridden plugin resources
Syncing back with global resources on un-override

Remaining issues:

The new approach somehow causes three tests, which pass in isolation, to fail when running the full suite.

FAILED tests/test_codes.py::test_check_submission_blockers - assert 0 == 1
 +  where 0 = len([])
 +    where [] = <aiidalab_qe.app.submission.model.SubmissionStepModel object at 0x7fa29d55edc0>.internal_submission_blockers
FAILED tests/test_codes.py::test_qeapp_computational_resources_widget - AssertionError: assert 'none' == 'block'
  - block
  + none
FAILED tests/test_submit_qe_workchain.py::test_create_builder_default - AssertionError: FILES DIFFER:

The last one I suspect may simply require a regeneration of the stored references.

But the first two, I suspect here it is a more serious issue that ONLY shows up in testing. I think the issue is that in testing, we generate more than a single instance of certain classes that have dict and/or list caches. We define these as class variables, which is "fine" for the app, but raises issues when various tests create instances of the class. A quick solution (if indeed this is the issue) is to define these as instance variables. However, this is not straight forward for the codes class variable shared by instances of ResourceSettingsModels.

There is no guarantee that this is the issue. Just a suspicion. Will test further as soon as I am not sick 🤒

@superstar54 good if you take a look, see if I missed something obvious. We can chat when I'm better.

Update

My suspicions were correct. For the last test regarding the differing files, the issue is that part of implementing synchronization of newly setup codes required adding code options to the returned dictionary from CodeModel.get_model_state. This method, however, is also used when constructing builder parameters. Though it does not harm the builder process, it does affect the regression test part of test_create_builder_default. Simply regenerating the cached file does not help, as code options include UUIDs, which change every time the test runs. So I added a remove_code_options method to the test to discard the option keys, which resolves the problem.

Regarding the other two failing tests, indeed the issue here was the use of dictionary class variables. Both plugin_mapping and codes dictionary class variables are now changed to instance variables to avoid sharing state across class instances. Again, this is not an issue for the app, as there is only one instance per class. But this is not true for testing.

codecov · 2024-11-28T17:47:48Z

Codecov Report

Attention: Patch coverage is 84.42029% with 43 lines in your changes missing coverage. Please review.

Project coverage is 68.29%. Comparing base (7bb67ae) to head (0aa53f5).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/aiidalab_qe/common/panel.py	67.07%	27 Missing ⚠️
src/aiidalab_qe/app/submission/model.py	78.57%	6 Missing ⚠️
...iidalab_qe/app/submission/global_settings/model.py	91.48%	4 Missing ⚠️
src/aiidalab_qe/app/submission/__init__.py	92.30%	2 Missing ⚠️
...dalab_qe/app/submission/global_settings/setting.py	92.85%	2 Missing ⚠️
src/aiidalab_qe/common/code/model.py	89.47%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #946      +/-   ##
==========================================
+ Coverage   67.81%   68.29%   +0.47%     
==========================================
  Files         110      110              
  Lines        6221     6210      -11     
==========================================
+ Hits         4219     4241      +22     
+ Misses       2002     1969      -33

Flag	Coverage Δ
python-3.11	`68.29% <84.42%> (+0.47%)`	⬆️
python-3.9	`68.31% <84.42%> (+0.49%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

edan-bainglass · 2024-11-28T17:48:36Z

@superstar54 ready for review 🙏

edan-bainglass · 2024-12-05T06:03:55Z

@superstar54 ready for review 🙏

Pleeeeeeeeeeeeeeease @superstar54 🙏

superstar54 · 2024-12-05T11:06:45Z

@superstar54 ready for review 🙏

Hi @edan-bainglass , my apologies for overlooking your comment earlier. I’m reviewing it now. Thanks for your patience! 🙏

superstar54 · 2024-12-05T12:10:24Z

One more missing feature: the panel shows a wrong warning message when an HPC is selected. This is probably because it checks the resource before the model.selected gets updated, so the app thinks the localhost is used.

In the screenshot, I selected the merlin hpc with 44 cpus for each node, but it give a warning that it only has 28 cpus (which is the number of my localhost).

superstar54 · 2024-12-05T12:24:21Z

src/aiidalab_qe/common/panel.py

-    dependencies = ["global.global_codes"]
-    codes = {}  # To be defined by subclasses
+    dependencies = [
+        "global.global_codes",


One concern: the base class hardcodes a dependency on "global.global_codes," where "global" is an identifier defined in the subclass or the context where the class is used.

Yeah, this is not great. It appears to be non-breaking, as linking a model's trait to its own same trait does not raise an error and also doesn't do anything. However, it is still a thorn in principal.

Two ways to decouple:

Make the global panel unique. I don't like this - regardless of its special treatment, it IS conceptually a resources panel

Link the dependency explicitly during plugin discovery (in _fetch_plugin_resource_settings) rather than via the _link_model method:

global_model = self._model.get_model("global") ... # then later in the for loop... self._model.add_model(identifier, model) tl.dlink( (global_model, "global_codes"), (model, "global_codes"), )

Comments:

This is the only thing _link_model is doing, so by applying the above workaround, we render the method unnecessary

Other future dependencies between the plugin panels and the global panel may require similar workarounds

I'm honestly not too bothered by it. There are more pros than this one conceptual con. But I am open to discussion on the matter.

Another solution is to keep the dependencies empty in the ResourceSettingsModel and create another PluginResourceSettingsModel for the plugin, in which we add this global.global_codes; in this way, we follow the oop and avoid making _fetch_plugin_resource_settings special.

Very nice 💪

Implemented in latest push!

superstar54

Hi @edan-bainglass , thanks for the nice work.

One issue when loading the previous calculations, the code setting and the override box are not loaded correctly.

src/aiidalab_qe/common/panel.py

tests/test_submit_qe_workchain/test_create_builder_default.yml

edan-bainglass · 2024-12-06T06:15:29Z

One issue when loading the previous calculations, the code setting and the override box are not loaded correctly.

Thanks @superstar54 for the review. Indeed, I discovered this issue yesterday morning but didn't have much time to think about it. I tried first to understand how set_model_state works in resource models. I had some questions, but maybe it is best we discuss your implementation in person next week. For now, I'll try to adjust for your other comments, then play with this issue a bit more to see if I can resolve it already 🙂

edan-bainglass · 2024-12-07T13:06:17Z

One more missing feature: the panel shows a wrong warning message when an HPC is selected. This is probably because it checks the resource before the model.selected gets updated, so the app thinks the localhost is used.

In the screenshot, I selected the merlin hpc with 44 cpus for each node, but it give a warning that it only has 28 cpus (which is the number of my localhost).

Right, indeed, it appears to be so that the check is done prior to the update of model.selected. But I'm having difficulty understanding why this is so. In the global settings panel, we do

    def _render_code_widget(
        self,
        code_model: CodeModel,
        code_widget: QEAppComputationalResourcesWidget,
    ):
        super()._render_code_widget(code_model, code_widget)
        if code_model.default_calc_job_plugin == "quantumespresso.pw":
            code_model.observe(
                self._on_pw_code_resource_change,
                [
                    "num_cpus",
                    "num_nodes",
                    "ntasks_per_node",
                    "cpus_per_task",
                    "max_wallclock_seconds",
                ],
            )

where in super()._render_code_widget(...), we set up a two-way link between the dropdown and selected. This means in the observation order, selected should sync prior to triggering the _on_pw_code_resource_change handler. But this is not the case. Not sure why at the moment.

A temporary hack would be to add selected to the event trigger above. This appears to work, albeit in an annoying order where the check is performed prior to the selected update, which shows the warning, then once more after the update, which removes the warning. Clearly not ideal!

edan-bainglass · 2024-12-08T05:24:34Z

@superstar54 could you please have a look at my comments? I think this is closer to a solution, but we need to see if it sufficient. My main concern at the moment is the mistimed firing of check_resources. I'm still uncertain as to how this is happening.

Note that some part of the code is only there to handle changes in options occurring on the widget side. This happens when one introduces a new code using the "new code setup" button. However, we discussed removing this button in favor of a dedicated notebook for creating new codes. If and when this gets merged, some code introduced in this PR will be removed.

superstar54 · 2024-12-08T08:43:21Z

Hi @edan-bainglass , thank you for the update. Regarding the dependencies issue, I suggest creating a PluginResourceSettingsModel class. For the check_resources issue, I don't know the reason either, we need more debugging to find the details.

edan-bainglass · 2024-12-08T13:26:51Z

Hi @edan-bainglass , thank you for the update. Regarding the dependencies issue, I suggest creating a PluginResourceSettingsModel class. For the check_resources issue, I don't know the reason either, we need more debugging to find the details.

@superstar54 I found out the issue with check_resources misfiring. The observations I set up in the app only come after those set up by the QEAppComputationalResourcesWidget. There, the selector's value is observed and triggers a change in num_cpus, which in turn also syncs up ntasks_per_node. As these fire prior to the my links, we see num_cpus change before the selector's value synchronizes with the model's selected.

I'm afraid there is nothing to be done here short of redesigning QEAppComputationalResourcesWidget, which would likely be a minor change.

update

Realizing that's our widget, i.e. not in AWB 😅 Working on a fix...

Fixed!

superstar54

Thanks for the nice work! LGTM!

edan-bainglass force-pushed the fix-resources-step branch from e3ca31b to c788d27 Compare November 27, 2024 09:22

edan-bainglass requested a review from superstar54 November 27, 2024 09:54

edan-bainglass force-pushed the fix-resources-step branch 2 times, most recently from 3552823 to 82113b8 Compare November 27, 2024 10:41

edan-bainglass marked this pull request as draft November 28, 2024 06:27

edan-bainglass force-pushed the fix-resources-step branch 3 times, most recently from f5c4e5f to 888d042 Compare November 28, 2024 17:44

edan-bainglass marked this pull request as ready for review November 28, 2024 17:44

superstar54 reviewed Dec 5, 2024

View reviewed changes

superstar54 requested changes Dec 5, 2024

View reviewed changes

src/aiidalab_qe/common/panel.py Show resolved Hide resolved

tests/test_submit_qe_workchain/test_create_builder_default.yml Outdated Show resolved Hide resolved

edan-bainglass requested a review from superstar54 December 9, 2024 05:15

edan-bainglass added 6 commits December 9, 2024 05:26

Add missing features in resource settings tabs

8550b4e

Rename code modules as resources

d3699ec

Restore override in get/set model state

f577fef

Fix code displays on loading previous process

9c12a9e

Abstract plugin resource panels

fca8140

Fix issue with misfiring code widget observation

0aa53f5

edan-bainglass force-pushed the fix-resources-step branch from 0c7c80a to 0aa53f5 Compare December 9, 2024 05:26

edan-bainglass mentioned this pull request Dec 9, 2024

Fix process-node-related bugs #967

Merged

superstar54 approved these changes Dec 9, 2024

View reviewed changes

edan-bainglass merged commit 9a814ac into aiidalab:main Dec 9, 2024
8 checks passed

edan-bainglass mentioned this pull request Dec 10, 2024

The codes and resources widgets arent uploaded when loading a previous job #947

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add missing features in resource settings tabs #946

Add missing features in resource settings tabs #946

edan-bainglass commented Nov 26, 2024 •

edited

Loading

codecov bot commented Nov 28, 2024 •

edited

Loading

edan-bainglass commented Nov 28, 2024

edan-bainglass commented Dec 5, 2024

superstar54 commented Dec 5, 2024

superstar54 commented Dec 5, 2024

superstar54 Dec 5, 2024

edan-bainglass Dec 6, 2024

superstar54 Dec 8, 2024

edan-bainglass Dec 8, 2024

edan-bainglass Dec 8, 2024

superstar54 left a comment

edan-bainglass commented Dec 6, 2024

edan-bainglass commented Dec 7, 2024 •

edited

Loading

edan-bainglass commented Dec 8, 2024

superstar54 commented Dec 8, 2024

edan-bainglass commented Dec 8, 2024 •

edited

Loading

superstar54 left a comment

Add missing features in resource settings tabs #946

Add missing features in resource settings tabs #946

Conversation

edan-bainglass commented Nov 26, 2024 • edited Loading

Update

codecov bot commented Nov 28, 2024 • edited Loading

Codecov Report

edan-bainglass commented Nov 28, 2024

edan-bainglass commented Dec 5, 2024

superstar54 commented Dec 5, 2024

superstar54 commented Dec 5, 2024

superstar54 Dec 5, 2024

Choose a reason for hiding this comment

edan-bainglass Dec 6, 2024

Choose a reason for hiding this comment

superstar54 Dec 8, 2024

Choose a reason for hiding this comment

edan-bainglass Dec 8, 2024

Choose a reason for hiding this comment

edan-bainglass Dec 8, 2024

Choose a reason for hiding this comment

superstar54 left a comment

Choose a reason for hiding this comment

edan-bainglass commented Dec 6, 2024

edan-bainglass commented Dec 7, 2024 • edited Loading

edan-bainglass commented Dec 8, 2024

superstar54 commented Dec 8, 2024

edan-bainglass commented Dec 8, 2024 • edited Loading

update

Fixed!

superstar54 left a comment

Choose a reason for hiding this comment

edan-bainglass commented Nov 26, 2024 •

edited

Loading

codecov bot commented Nov 28, 2024 •

edited

Loading

edan-bainglass commented Dec 7, 2024 •

edited

Loading

edan-bainglass commented Dec 8, 2024 •

edited

Loading