Refactor pmda after universe can be serialized #132

yuxuanzhuang · 2020-07-15T12:25:47Z

Fixes #133

Changes made in this Pull Request:

refactor each part of pmda (test passed)
- parallel.py
- custom.py
- rmsd
- rmsf
- contact
- Hbond
- RDF
- density
- leaflet

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

pep8speaks · 2020-07-15T12:25:50Z

Hello @yuxuanzhuang! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file setup.py:

Line 16:80: E501 line too long (84 > 79 characters)
Line 16:84: W504 line break after binary operator
Line 58:80: E501 line too long (104 > 79 characters)
Line 69:80: E501 line too long (115 > 79 characters)

Comment last updated at 2021-05-12 17:44:51 UTC

orbeckst

This makes the code simpler, nice!

See initial comments.

Docs will also need an update, especially everything that shows how to use ParallelAnalysisBase.

orbeckst · 2020-07-15T18:09:23Z

pmda/custom.py

        self.kwargs = kwargs

    def _prepare(self):
        self.results = []

-    def _single_frame(self, ts, atomgroups):


Hm, cool that this works.

orbeckst · 2020-07-15T18:11:45Z

pmda/density.py

@@ -259,10 +256,15 @@ def __init__(self, atomgroup, delta=1.0, atomselection=None,
        elif not updating and atomselection is not None:
            raise ValueError("""With updating=False, the atomselection='{}' is
                        not used and should be None""".format(atomselection))
+        elif updating and atomselection is not None:
+            self._select_atomgroup = atomgroup.select_atoms(atomselection,
+                                                            updating=True)


Do updating AtomGroups work with the serialization?

Yes! Thanks to what has already been implemented by Richard:)

orbeckst · 2020-07-15T18:14:24Z

pmda/parallel.py

+            np.array([el[4] for el in res]),
+            np.array([el[5] for el in res]))
+
+        #  this is crucial if the analysis does not iterate over


Why is this crucial? What happens otherwise?

Because--no sure it should be defined as a bug--
e.g. Density Analysis (both in MDAnalysis and this PR) depends on the current ts of the universe.

def _prepare(self): coord = self._select_atomgroup.positions # It will change with ts. ...

And currently, the universe will stay its ending frame after analysis if not being rewinded.

oh because this does not return a copy. I would not do the rewind. If people want a copy they should take one. That can be fixed in the density analysis class.

The thing is here we are not utilizing FrameIteratorSliced (which does the rewind after iteration) because we want to get accurate timing by self._ts = self._trajectory[i]. So some discrepancy between AnalysisBase and ParallelAnalysisBase:

u = mda.Universe(GRO, XTC) serial_analysis(u.atoms).run(stop=3) u.trajectory.ts.frame == 0 ... parallel_analysis(u.atoms).rum(stop=3) u.trajectory.ts.frame == 3

orbeckst · 2020-07-15T18:21:44Z

To get the tests going, change Travis to build and install MDAnalysis from yuxuanzhuang:serialize_io in PR MDAnalysis/mdanalysis#2723 – there's a pip command line/url way to directly use a git branch. I think we used it for PMDA in the past.

yuxuanzhuang · 2020-07-17T11:15:33Z

pmda/parallel.py

+            if(isinstance(item, mda.Universe)):
+                universe_dict[key] = item
+        universe_dict.update(base_dict)
+        return universe_dict


Before we are settled about AtomGroup, here I hack the order of the attribute dict (although it should not be ordered, it somehow matters) so we always pickle Universe before Atomgroup.
Not sure how we should deal with unpicklable attributes. Note cloudpickle which dask uses can literally pickle open file handler.

kain88-de · 2020-07-18T20:47:20Z

pmda/parallel.py

        else:
            # raise HalError("I'm sorry Dave, I'm afraid I can't do that")
-            raise AttributeError("Can't set attribute at this time")
+            raise AttributeError("Can't set '{}' at this time".format(key))


ah just use python 3.6 or newer here f"Can't set {key} at this time"

kain88-de

Looks good. Great it uses less code now. Is PMDA now actually faster?

kain88-de · 2020-07-18T20:49:10Z

pmda/parallel.py

+            np.array([el[4] for el in res]),
+            np.array([el[5] for el in res]))
+
+        #  this is crucial if the analysis does not iterate over


oh because this does not return a copy. I would not do the rewind. If people want a copy they should take one. That can be fixed in the density analysis class.

orbeckst · 2020-07-20T19:28:19Z

You'll also have to update PMDA docs and setup.py to say that this requires MDA 2.0.0 and therefore ≥ python 3.6.

There's a question if we want to also do a PMDA 1.0 with the old MDA 1.0 and then PMDA 2.0 to be in sync with MDA 2.0.

yuxuanzhuang · 2020-08-09T16:35:34Z

I have a question regrading starting a PR based on this PR...is it possible? (a quick search indicates it's not possible in github)
The reason is that the other PR (introducing dask mixin) is still experimental; I opt to separate that from this one.

orbeckst · 2020-08-09T22:20:27Z

I think you can do a PR that is relative to this one and that would be merged into this one. Check the settings for base branch when you create a new PR.

yuxuanzhuang · 2020-08-10T09:57:49Z

I think the problem is this branch is not under MDAnalysis but my private one, so that PR will be created under my own repo.

yuxuanzhuang · 2020-08-19T11:02:17Z

I disabled DeprecationWarning in this PR temporarily.

The failed test in rdf_s here seems to be related to the discrepancy between PR MDAnalysis/mdanalysis#2812 and #121

VOD555 · 2020-08-23T00:03:04Z

It's very cool that this PR helps get rid of rebuilding the universe, and make the code much simpler.

Yeah, the test failed as we changed the definition of the option density in MDAnalysis PR, but didn't do it in PMDA.

orbeckst

This looking pretty good. I could only do a superficial read.

Can we do the 0.4 #116 as the last one compatible with MDA 1.x and then we can merge this PR?

EDIT: Other comments from previous code reviews (docs, density analysis) should still be addressed.

orbeckst · 2021-05-12T17:35:10Z

.travis.yml

+    # - CONDA_DEPENDENCIES="mdanalysis mdanalysistests dask joblib pytest-pep8 mock codecov cython hypothesis sphinx"
+    # - CONDA_MDANALYSIS_DEPENDENCIES="cython mmtf-python six biopython networkx scipy griddataformats gsd hypothesis"
+    - CONDA_MDANALYSIS_DEPENDENCIES="mmtf-python biopython networkx cython matplotlib scipy griddataformats hypothesis gsd"
+    - CONDA_DEPENDENCIES="${CONDA_MDANALYSIS_DEPENDENCIES} dask distributed joblib pytest-pep8 mock codecov"
    - CONDA_CHANNELS='conda-forge'
    - CONDA_CHANNEL_PRIORITY=True
    # install development version of MDAnalysis (needed until the test
    # files for analysis.rdf are available in release 0.19.0)


the comment is outdated

yuxuanzhuang added 2 commits July 15, 2020 14:22

refactor parallel.py

1e3d27b

refactor custom

4435f29

yuxuanzhuang added 8 commits July 15, 2020 14:32

pep8

1981673

refactor rmsd

0a70857

refactor rmsf

cafe65f

refactor contacts

185d19a

refactor density

ef86b9d

refactor rdf

c0b0bd6

refactor HBonds

bfa629c

leaflet broken

495033f

orbeckst mentioned this pull request Jul 15, 2020

PMDA with refactored _single_frame #128

Draft

4 tasks

orbeckst linked an issue Jul 15, 2020 that may be closed by this pull request

use serializable Universe #133

Open

orbeckst requested changes Jul 15, 2020

View reviewed changes

yuxuanzhuang added 10 commits July 16, 2020 13:00

build mdanalysis on serialize_io

8700223

build mdanalysis on serialize_io fix

c8e9973

push leaflet fix back

356cfb9

travis fix

58fea5d

travis fix

dfd3588

test parallel

0921882

timing test

49288e8

leaflet fix

9a0e0c5

pep8

187463b

make sure universe before atomgroup

8a42040

yuxuanzhuang commented Jul 17, 2020

View reviewed changes

kain88-de reviewed Jul 18, 2020

View reviewed changes

change travis back

053225b

travis to develop

83becd7

yuxuanzhuang mentioned this pull request Aug 17, 2020

Turn ParallelAnalysisBase into dask custom collection #136

Open

4 tasks

orbeckst mentioned this pull request Aug 19, 2020

release 0.4 #116

Open

yuxuanzhuang added 4 commits August 19, 2020 12:01

remove getstate

f9c89e6

pep8

61bce8f

update setup.py

cb99fc8

pep8 warning

d95add1

yuxuanzhuang added 6 commits August 19, 2020 13:09

travis

a505282

setup reverse

608a803

pep

18988c5

setup py3

61a34c7

rewind doc

7bf68f5

doc

5ceaebf

merge to develop

d424f6d

yuxuanzhuang mentioned this pull request May 12, 2021

Bad Performance of Parallelization with On-the-fly Transformation #144

Open

orbeckst reviewed May 12, 2021

View reviewed changes

Merge branch 'master' into refactor_pmda

b649c04

orbeckst mentioned this pull request May 12, 2021

Issue with AnalysisFromFunction() when reading Amber .nc trajectory files #119

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor pmda after universe can be serialized #132

Refactor pmda after universe can be serialized #132

yuxuanzhuang commented Jul 15, 2020 •

edited by orbeckst

Loading

pep8speaks commented Jul 15, 2020 •

edited

Loading

orbeckst left a comment

orbeckst Jul 15, 2020

orbeckst Jul 15, 2020

yuxuanzhuang Jul 16, 2020

orbeckst Jul 15, 2020

yuxuanzhuang Jul 16, 2020

yuxuanzhuang Jul 16, 2020

kain88-de Jul 18, 2020

yuxuanzhuang Jul 19, 2020

orbeckst commented Jul 15, 2020

yuxuanzhuang Jul 17, 2020

kain88-de Jul 18, 2020

kain88-de left a comment

kain88-de Jul 18, 2020

orbeckst commented Jul 20, 2020

yuxuanzhuang commented Aug 9, 2020

orbeckst commented Aug 9, 2020

yuxuanzhuang commented Aug 10, 2020

yuxuanzhuang commented Aug 19, 2020 •

edited

Loading

VOD555 commented Aug 23, 2020

orbeckst left a comment •

edited

Loading

orbeckst May 12, 2021

Refactor pmda after universe can be serialized #132

Are you sure you want to change the base?

Refactor pmda after universe can be serialized #132

Conversation

yuxuanzhuang commented Jul 15, 2020 • edited by orbeckst Loading

PR Checklist

pep8speaks commented Jul 15, 2020 • edited Loading

Comment last updated at 2021-05-12 17:44:51 UTC

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst commented Jul 15, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kain88-de left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst commented Jul 20, 2020

yuxuanzhuang commented Aug 9, 2020

orbeckst commented Aug 9, 2020

yuxuanzhuang commented Aug 10, 2020

yuxuanzhuang commented Aug 19, 2020 • edited Loading

VOD555 commented Aug 23, 2020

orbeckst left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuxuanzhuang commented Jul 15, 2020 •

edited by orbeckst

Loading

pep8speaks commented Jul 15, 2020 •

edited

Loading

yuxuanzhuang commented Aug 19, 2020 •

edited

Loading

orbeckst left a comment •

edited

Loading