Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: IMDReader Integration #4923

Draft
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

amruthesht
Copy link

@amruthesht amruthesht commented Feb 19, 2025

Fixes #4827

This draft PR addresses the feature request discussed in #4827.

Note:
The IMDReader feature which was previously a part of the imdclient package has been moved into MDAnalysis below. Any other modules have been in retained in imdclient, which has been added as an optional dependency here. We are currently in the process of splitting the imdclient package as mentioned above. (Issue, PR)

Major changes made in this Pull Request:

  • IMDReader, other associated base classes and a utility function were added to coordinates in the main package.
  • Appropriate test cases for the reader were added as a part of test-imd.py
  • Other corresponding changes were made to CI and environment *.yaml files
  • imdclient was added as an optional dependency

PR Checklist

  • Issue raised/referenced?
  • Tests updated/added?
  • Documentation updated/added?
  • package/CHANGELOG file updated?
  • Is your name in package/AUTHORS? (If it is not, add it!)

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.


📚 Documentation preview 📚: https://mdanalysis--4923.org.readthedocs.build/en/4923/

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello there first time contributor! Welcome to the MDAnalysis community! We ask that all contributors abide by our Code of Conduct and that first time contributors introduce themselves on GitHub Discussions so we can get to know you. You can learn more about participating here. Please also add yourself to package/AUTHORS as part of this PR.

Copy link

codecov bot commented Feb 19, 2025

Codecov Report

Attention: Patch coverage is 33.91813% with 113 lines in your changes missing coverage. Please review.

Project coverage is 85.51%. Comparing base (b79c77d) to head (a67cbfb).
Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
package/MDAnalysis/coordinates/base.py 28.20% 56 Missing ⚠️
package/MDAnalysis/coordinates/IMD.py 38.15% 47 Missing ⚠️
package/MDAnalysis/coordinates/util.py 37.50% 9 Missing and 1 partial ⚠️

❗ There is a different number of reports uploaded between BASE (b79c77d) and HEAD (a67cbfb). Click for more details.

HEAD has 4 uploads less than BASE
Flag BASE (b79c77d) HEAD (a67cbfb)
7 3
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4923      +/-   ##
===========================================
- Coverage    93.66%   85.51%   -8.16%     
===========================================
  Files          177      179       +2     
  Lines        21850    22021     +171     
  Branches      3079     3102      +23     
===========================================
- Hits         20466    18831    -1635     
- Misses         933     2746    +1813     
+ Partials       451      444       -7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@amruthesht
Copy link
Author

Your thoughts on this are appreciated - @orbeckst

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the initial PR.

  • The first big step is to get the tests running properly so that the CI uses an imdclient without IMDReader. Otherwise we are not sure we're testing the code here.
  • Minor initial comments while I skimmed.
  • Simple thing: run black over all files to get the formatting and ordering of imports right

Comment on lines +85 to +86
imdclient:
default: 'imdclient'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's going to take time to get imdclient to a stage where the imdclient package does not actually affect MDAnalysis.

Is there a way that we could temporarily (for initial CI testing) install imdclient from a branch or tarball, e.g., in a pip section? Then we could fairly rapidly create a preliminary (unpublished) imdclient package without IMDReader.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By initial CI testing, do you mean "in this PR"?

There's a pip section just below, which should work if you put in the git location for pip install, but also you can just temporarily modify the CI script to do an additional pip install if it's for "testing within the PR itself".

If it's "after merge", this would require more discussion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for right now to bootstrap the PR.

I don't want to merge without a solid conda-forge imdclient package in place.


import logging

logger = logging.getLogger("imdclient.IMDClient")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to MDAnalysis.coordinates.IMDReader

@@ -1841,3 +1841,194 @@ def __repr__(self):

def convert(self, obj):
raise NotImplementedError

class StreamReaderBase(ReaderBase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think coordinates.base is the right place. @hmacdope ??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep correct!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just put this code to where it's needed, e.g., directly into IMD.py.

Do not add a util.py here, we need to keep this module as clean as possible because it's already quite crowded.

import select
import time

logger = logging.getLogger("imdclient.IMDClient")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjust logger

@@ -79,6 +79,7 @@ extra_formats = [
"pytng>=0.2.3",
"gsd>3.0.0",
"rdkit>=2020.03.1",
"imdclient",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We almost certainly need to add a minimal version.

assert_timestep_almost_equal,
)

logger = logging.getLogger("imdclient.IMDClient")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjust logger

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we need logging in tests???

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No logging in tests

Comment on lines +40 to +46
file_handler = logging.FileHandler("test.log")
formatter = logging.Formatter(
"%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)
logger.setLevel(logging.DEBUG)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this or can this go?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

logger = logging.getLogger("imdclient.IMDClient")


class IMDReader(StreamReaderBase):
Copy link
Member

@IAlibay IAlibay Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm getting confused about which PR is for which thing. @orbeckst given our discussion earlier this week, and your comment above which I take to be "IMDClient is still in flux", does it not make sense for the IMDReader to exist upstream and then just import it here? (edit: here my intent is "well then you could make releases and it wouldn't be limited to MDA release frequency").

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to split IMDReader from imdclient and make a version of imdclient without IMDReader (which is in the works Becksteinlab/imdclient#54 ). At the same time we are moving what was split off into coordinates.IMD.

Amru is working on both at the moment.

The way IMDReader depends on imdclient is not the problem, and imdclient itself is also pretty stable, it's just that the tests for imdclient have made use of a lot of MDAnalysis/IMDReader for convenience, and we now have to rewrite some of these tests to use bare-bones python.

@orbeckst orbeckst marked this pull request as draft February 19, 2025 23:48
@orbeckst
Copy link
Member

I set the PR to Work in progress for the time being, just to indicate that we're not yet at the stage where the CI is working. Once the tests run properly, we can update the status.

Obviously, this shouldn't discourage anyone from contributing and commenting.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A first quick look

return ts


@pytest.mark.skipif(not HAS_IMDCLIENT, reason="IMDClient not isntalled")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@pytest.mark.skipif(not HAS_IMDCLIENT, reason="IMDClient not isntalled")
@pytest.mark.skipif(not HAS_IMDCLIENT, reason="IMDClient not installed")

@pytest.mark.skipif(not HAS_IMDCLIENT, reason="IMDClient not isntalled")
class TestIMDReaderBaseAPI(MultiframeReaderTest):

@pytest.fixture()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@pytest.fixture()
@pytest.fixture(scope='function')

If you need it at every test, then it's best to be explicit about the fixture scope

logger.addHandler(file_handler)
logger.setLevel(logging.DEBUG)

@pytest.mark.skipif(not HAS_IMDCLIENT, reason="IMDClient not isntalled")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@pytest.mark.skipif(not HAS_IMDCLIENT, reason="IMDClient not isntalled")
@pytest.mark.skipif(not HAS_IMDCLIENT, reason="IMDClient not installed")

@pytest.mark.skip(
reason="Stream-based reader cannot determine n_frames until EOF"
)
def test_n_frames(self, reader, ref):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all these that have skips, could you reduce the code to just a pass?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, skip normally denotes an issue outside of repo, e.g dependency issue.

if ref.dimensions is None:
assert reader.ts.dimensions is None
else:
assert_array_almost_equal(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please switch to assert_allclose for all these

Copy link
Contributor

@jaclark5 jaclark5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read through and nothing jumped out at me that wasn't already mentioned, except that I didn't see these changes in the documentation that was linked. The following needs to be added:

Add doc/sphinx/source/documentation_pages/coordinates/IMD.rst
.. automodule:: MDAnalysis.coordinates.IMD

doc/sphinx/source/documentation_pages/coordinate_modules.rst
coordinates/IMD

doc/sphinx/source/documentation_pages/references.rst
If you use IMD capability...

Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, had a lookover, I will try and push some changes addressing some of these myself also, but would be good to pick up the momentum here again if possible.


try:
import imdclient
from imdclient.IMDClient import IMDClient
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These need test coverage with mocks

except ImportError:
HAS_IMDCLIENT = False

# Allow building doucmnetation without imdclient
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Allow building doucmnetation without imdclient
# Allow building documentation without imdclient

is the port number.
n_atoms : int (optional)
number of atoms in the system. defaults to number of atoms
in the topology. don't set this unless you know what you're doing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
in the topology. don't set this unless you know what you're doing.
in the topology. Don't set this unless you know what you're doing.


Parameters
----------
filename : a string of the form "host:port" where host is the hostname
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this format checked on __init__

if imdf.positions is not None:
# must call copy because reference is expected to reset
# see 'test_frame_collect_all_same' in MDAnalysisTests.coordinates.base
self.ts.positions = imdf.positions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use np.copyto here and below

@@ -1841,3 +1841,194 @@ def __repr__(self):

def convert(self, obj):
raise NotImplementedError

class StreamReaderBase(ReaderBase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep correct!

assert_timestep_almost_equal,
)

logger = logging.getLogger("imdclient.IMDClient")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No logging in tests

Comment on lines +40 to +46
file_handler = logging.FileHandler("test.log")
formatter = logging.Formatter(
"%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)
logger.setLevel(logging.DEBUG)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

@pytest.mark.skip(
reason="Stream-based reader cannot determine n_frames until EOF"
)
def test_n_frames(self, reader, ref):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, skip normally denotes an issue outside of repo, e.g dependency issue.

@pytest.mark.skip(
reason="Timeseries currently requires n_frames to be known"
)
@pytest.mark.parametrize(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here, remove boilerplate otherwise confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adopt a reader for live-streamed Interactive MD (IMD) version 3 trajectories
6 participants