-
Notifications
You must be signed in to change notification settings - Fork 677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proof of concept - ParallelAnalysisBase
and extensible backend support
#4269
Conversation
…Multiprocessing backend
Linter Bot Results:Hi @p-j-smith! Thanks for making this PR. We linted your code and found the following: Some issues were found with the formatting of your code.
Please have a look at the Please note: The |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## develop #4269 +/- ##
===========================================
- Coverage 93.40% 93.26% -0.15%
===========================================
Files 169 184 +15
Lines 22204 23416 +1212
Branches 4064 4088 +24
===========================================
+ Hits 20740 21839 +1099
- Misses 948 1057 +109
- Partials 516 520 +4
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
package/MDAnalysis/analysis/base.py
Outdated
if frames is None: | ||
start, stop, step = self._trajectory.check_slice_indices(start, stop, step) | ||
used_frames = np.arange(start, stop, step) | ||
elif not all(opt is None for opt in [start, stop, step]): | ||
raise ValueError("start/stop/step cannot be combined with frames") | ||
else: | ||
used_frames = frames | ||
|
||
if all((isinstance(obj, bool) for obj in used_frames)): | ||
arange = np.arange(len(used_frames)) | ||
used_frames = arange[used_frames] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like code that's duplicated in AnalysisBase.run()
and would eventually be refactored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I hadn't spotted that duplication. I've updated ParallelAnalysisBase
to remove this duplication as well as that in the inner loop the in ParallelAnalysisBase._compute
and AnalysisBase.run
package/MDAnalysis/analysis/base.py
Outdated
for idx, ts in enumerate(ProgressBar(trajectory, verbose=verbose, **progressbar_kwargs)): | ||
i = frame_indices[idx] | ||
self._frame_index = i | ||
self._ts = ts | ||
self.frames[i] = ts.frame | ||
self.times[i] = ts.time | ||
self._single_frame() | ||
logger.info("Finishing up") | ||
return self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we refactor the inner loop so that the same code is shared between AnalysisBase
and ParallelAnalysisBase
?
I am trying to get a sense if we can mitigate the class proliferation problem by keeping as much code as possible shared.
if backend is None: | ||
return super().run( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that.
package/MDAnalysis/analysis/base.py
Outdated
) | ||
|
||
# Start preparing the run | ||
super()._setup_frames(trajectory=self._trajectory, start=start, stop=stop, step=step, frames=frames) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be self._setup_frames()
I think (?).
closing as this pr won't be developed any further, but I think comments should still be possible once closed |
Related to #4158, #4162
This is a proof of concept of how to implement
ParallelAnalysisBase
and allow downstream developers and end users to easily add their own backends.I've opened this pr only to help with discussion in #4158 - I do not intend to go forwards with it and will close it and continue the discussion in #4158.
Changes compared to #4162:
ParallelAnalysisBase
that inherits fromAnalysisBase
ParallelExecutor
class in [GSoC] Parallelisation of AnalysisBase with multiprocessing and dask #4162 and added a newmda.analysis.backends
module that defines supported backends.ResultsGroup
class is identical to that in [GSoC] Parallelisation of AnalysisBase with multiprocessing and dask #4162_setup_computation_groups
,_compute
, and_get_aggregator
methods ofParallelAnalysisBase
are identical to those ofAnalysisBase
in [GSoC] Parallelisation of AnalysisBase with multiprocessing and dask #4162. Only therun
method has been changed to reflect the changed way of defining backendsI've also:
RMSD
analysis to useParallelAnalysisBase
and added a single test to illustrate that the multiprocessing backend can used with this implementationEdit:To reduce the amount of duplicated code between
AnalysisBase
andParallelAnalysisBase
, the_compute
method has been moved intoAnalysisBase
and the_setup_computation_groups
method had been simplified to remove code duplicated in_setup_frames
PR Checklist
Developers certificate of origin
📚 Documentation preview 📚: https://mdanalysis--4269.org.readthedocs.build/en/4269/