Proof of concept - `ParallelAnalysisBase` and extensible backend support #4269

p-j-smith · 2023-08-29T07:49:22Z

This is a proof of concept of how to implement ParallelAnalysisBase and allow downstream developers and end users to easily add their own backends.

I've opened this pr only to help with discussion in #4158 - I do not intend to go forwards with it and will close it and continue the discussion in #4158.

Changes compared to #4162:

define a ParallelAnalysisBase that inherits from AnalysisBase
change the way backends are defined to make it easier to add new backends in the future (for MDAnalysis, downstream developers, and end users). I've removed the ParallelExecutor class in [GSoC] Parallelisation of AnalysisBase with multiprocessing and dask #4162 and added a new mda.analysis.backends module that defines supported backends.
the ResultsGroup class is identical to that in [GSoC] Parallelisation of AnalysisBase with multiprocessing and dask #4162
the _setup_computation_groups, _compute, and _get_aggregator methods of ParallelAnalysisBase are identical to those of AnalysisBase in [GSoC] Parallelisation of AnalysisBase with multiprocessing and dask #4162. Only the run method has been changed to reflect the changed way of defining backends

I've also:

updated RMSD analysis to use ParallelAnalysisBase and added a single test to illustrate that the multiprocessing backend can used with this implementation

Edit:To reduce the amount of duplicated code between AnalysisBase and ParallelAnalysisBase, the _compute method has been moved into AnalysisBaseand the _setup_computation_groups method had been simplified to remove code duplicated in _setup_frames

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

Developers certificate of origin

I certify that this contribution is covered by the LGPLv2.1+ license as defined in our LICENSE and adheres to the Developer Certificate of Origin.

📚 Documentation preview 📚: https://mdanalysis--4269.org.readthedocs.build/en/4269/

…or analyses

…Multiprocessing backend

github-actions · 2023-08-29T07:52:16Z

Linter Bot Results:

Hi @p-j-smith! Thanks for making this PR. We linted your code and found the following:

Some issues were found with the formatting of your code.

Code Location	Outcome
main package	⚠️ Possible failure
testsuite	✅ Passed

Please have a look at the darker-main-code and darker-test-code steps here for more details: https://github.com/MDAnalysis/mdanalysis/actions/runs/6026902586/job/16350879245

Please note: The black linter is purely informational, you can safely ignore these outcomes if there are no flake8 failures!

codecov · 2023-08-29T08:03:21Z

Codecov Report

Patch coverage: 60.55% and project coverage change: -0.15% ⚠️

Comparison is base (957430b) 93.40% compared to head (f3927b7) 93.26%.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #4269      +/-   ##
===========================================
- Coverage    93.40%   93.26%   -0.15%     
===========================================
  Files          169      184      +15     
  Lines        22204    23416    +1212     
  Branches      4064     4088      +24     
===========================================
+ Hits         20740    21839    +1099     
- Misses         948     1057     +109     
- Partials       516      520       +4

Files Changed	Coverage Δ
package/MDAnalysis/analysis/__init__.py	`100.00% <ø> (ø)`
package/MDAnalysis/analysis/base.py	`82.77% <58.97%> (-14.18%)`	⬇️
package/MDAnalysis/analysis/backends.py	`59.25% <59.25%> (ø)`
package/MDAnalysis/analysis/rms.py	`93.10% <100.00%> (+0.08%)`	⬆️

... and 14 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

orbeckst

Very useful to have code for discussion.

I am going to block the PR just to be able to enforce your stated intentions:

I've opened this pr only to help with discussion in #4158 - I do not intend to go forwards with it and will close it and continue the discussion in #4158.

orbeckst · 2023-08-29T17:32:08Z

package/MDAnalysis/analysis/base.py

+        if frames is None:
+            start, stop, step = self._trajectory.check_slice_indices(start, stop, step)
+            used_frames = np.arange(start, stop, step)
+        elif not all(opt is None for opt in [start, stop, step]):
+            raise ValueError("start/stop/step cannot be combined with frames")
+        else:
+            used_frames = frames
+
+        if all((isinstance(obj, bool) for obj in used_frames)):
+            arange = np.arange(len(used_frames))
+            used_frames = arange[used_frames]
+


This looks like code that's duplicated in AnalysisBase.run() and would eventually be refactored.

Oh I hadn't spotted that duplication. I've updated ParallelAnalysisBase to remove this duplication as well as that in the inner loop the in ParallelAnalysisBase._compute and AnalysisBase.run

orbeckst · 2023-08-29T17:34:47Z

package/MDAnalysis/analysis/base.py

+        for idx, ts in enumerate(ProgressBar(trajectory, verbose=verbose, **progressbar_kwargs)):
+            i = frame_indices[idx]
+            self._frame_index = i
+            self._ts = ts
+            self.frames[i] = ts.frame
+            self.times[i] = ts.time
+            self._single_frame()
+        logger.info("Finishing up")
+        return self


Could we refactor the inner loop so that the same code is shared between AnalysisBase and ParallelAnalysisBase?

I am trying to get a sense if we can mitigate the class proliferation problem by keeping as much code as possible shared.

orbeckst · 2023-08-29T17:40:11Z

package/MDAnalysis/analysis/base.py

+        if backend is None:
+            return super().run(


I like that.

orbeckst · 2023-08-29T17:41:07Z

package/MDAnalysis/analysis/base.py

+            )
+
+        # Start preparing the run
+        super()._setup_frames(trajectory=self._trajectory, start=start, stop=stop, step=step, frames=frames)


Could be self._setup_frames() I think (?).

…lysisBase

p-j-smith · 2023-08-30T15:27:00Z

closing as this pr won't be developed any further, but I think comments should still be possible once closed

p-j-smith added 4 commits August 29, 2023 08:32

Add new ParallelAnalysisBase that inherits from AnalysisBase

716c6e1

Add new mda.analysis.backends module that defines parallel backends f…

1658722

…or analyses

Update RMSD analysis to use ParallelAnalysisBase

f614245

Add single test to check that the parallel RMSD can be used with the …

f3927b7

…Multiprocessing backend

github-actions bot added the Component-Analysis label Aug 29, 2023

p-j-smith mentioned this pull request Aug 29, 2023

Introducing dask-based parallel backend for the AnalysisBase.run() #4158

Closed

orbeckst requested changes Aug 29, 2023

View reviewed changes

orbeckst reviewed Aug 29, 2023

View reviewed changes

package/MDAnalysis/analysis/base.py

Comment on lines +854 to +855

if backend is None:

return super().run(

Copy link

Member

orbeckst Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that.

orbeckst reviewed Aug 29, 2023

View reviewed changes

reduce amount of code duplicated between AnalysisBase and ParallelAna…

7097753

…lysisBase

p-j-smith closed this Aug 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proof of concept - `ParallelAnalysisBase` and extensible backend support #4269

Proof of concept - `ParallelAnalysisBase` and extensible backend support #4269

p-j-smith commented Aug 29, 2023 •

edited

Loading

github-actions bot commented Aug 29, 2023 •

edited

Loading

codecov bot commented Aug 29, 2023 •

edited

Loading

orbeckst left a comment

orbeckst Aug 29, 2023

p-j-smith Aug 30, 2023

orbeckst Aug 29, 2023

orbeckst Aug 29, 2023

orbeckst Aug 29, 2023

p-j-smith commented Aug 30, 2023

Proof of concept - ParallelAnalysisBase and extensible backend support #4269

Proof of concept - ParallelAnalysisBase and extensible backend support #4269

Conversation

p-j-smith commented Aug 29, 2023 • edited Loading

PR Checklist

Developers certificate of origin

github-actions bot commented Aug 29, 2023 • edited Loading

Linter Bot Results:

codecov bot commented Aug 29, 2023 • edited Loading

Codecov Report

orbeckst left a comment

Choose a reason for hiding this comment

orbeckst Aug 29, 2023

Choose a reason for hiding this comment

p-j-smith Aug 30, 2023

Choose a reason for hiding this comment

orbeckst Aug 29, 2023

Choose a reason for hiding this comment

orbeckst Aug 29, 2023

Choose a reason for hiding this comment

orbeckst Aug 29, 2023

Choose a reason for hiding this comment

p-j-smith commented Aug 30, 2023

Proof of concept - `ParallelAnalysisBase` and extensible backend support #4269

Proof of concept - `ParallelAnalysisBase` and extensible backend support #4269

p-j-smith commented Aug 29, 2023 •

edited

Loading

github-actions bot commented Aug 29, 2023 •

edited

Loading

codecov bot commented Aug 29, 2023 •

edited

Loading