Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alchemical_introduction tutorial is broken on try.openbiosim.org #5

Closed
jmichel80 opened this issue Jul 28, 2023 · 44 comments
Closed

alchemical_introduction tutorial is broken on try.openbiosim.org #5

jmichel80 opened this issue Jul 28, 2023 · 44 comments
Assignees

Comments

@jmichel80
Copy link
Contributor

Runtime error for exercise 4.5

image

Manually running analyse_freenrg on the somd outputs used for the exercise gives

(base) jovyan@jupyter-jmichel80:~/biosimspace_tutorials/04_fep/01_intro_to_alchemy/o_xylene_benzene_for_analysis/free$ analyse_freenrg mbar -i lambda_*/simfile.dat
(...)
INFO:pymbar.mbar:MBAR initialization complete.
INFO:pymbar.mbar_solvers:Reached a solution to within tolerance with BFGS
INFO:pymbar.mbar_solvers:Solution found within tolerance!
INFO:pymbar.mbar_solvers:Final gradient norm: 6.26e-06
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/sire/legacy/Tools/FreeEnergyAnalysis.py", line 114, in run_mbar
    ) = MBAR_obj.getFreeEnergyDifferences(return_theta=True)
AttributeError: 'MBAR' object has no attribute 'getFreeEnergyDifferences'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/share/Sire/scripts/analyse_freenrg.py", line 799, in <module>
    free_energy_obj.run_mbar(test_overlap)
  File "/opt/conda/lib/python3.10/site-packages/sire/legacy/Tools/FreeEnergyAnalysis.py", line 126, in run_mbar
    ) = MBAR_obj.getFreeEnergyDifferences(return_theta=True)
AttributeError: 'MBAR' object has no attribute 'getFreeEnergyDifferences'

I suspect the problem is that pymbar 4 is used in the environment

ene_benzene_for_analysis/free$ conda list | grep pymb
pymbar                    4.0.1           py310h0a54255_2    conda-forge

Is the solution to build the server image against pymbar3.1.1 (https://anaconda.org/conda-forge/pymbar/files) and also update the installation instructions at https://github.com/OpenBioSim/biosimspace_tutorials#readme ?

@lohedges
Copy link
Contributor

Yes, the analyse_freenrg script doesn't work with pymbar version 4+. I think we've removed the pin against pymbar since Exscientia need to be able to build packages with it in order to use recent versions of alchemlyb. It should also be easy to update analyse_freenrg, but this is a maintenance burden that we don't need if we are going to switch the analysis over soon.

@jmichel80
Copy link
Contributor Author

is then best option for the time being to update the tutorials installation instructions and rebuild the notebook container to pin pymbar<4 so we have a working BSS.FreeEnergy.Relative.Analysis over the Summer ?

@lohedges
Copy link
Contributor

Well, the installation instructions currently specifies alchemlyb=2.1.0. This is unsatisfiable using pymbar<4. We'd need to work out whether we can get away with using an older version of alchemlyb for the analysis that is required. I guess @fjclark and @annamherz might have a better idea than me.

If I do:

mamba create -n test -c conda-forge pymbar=3 alchemlyb  

I get:

alchemlyb               1.0.1  pyhd8ed1ab_0         conda-forge/noarch       Cached
pymbar                  3.1.1  py311h4c7f6c3_2      conda-forge/linux-64      160kB

This is resolvable if I add cinnabar=0.3.0 too, but I've not tested to see if the tutorials still work.

@annamherz
Copy link

I think that should work? Alchemlyb was mainly added for the AMBER analysis, but as there is currently no native/alchemlyb analysis in BSS used for the tutorials downgrading alchemlyb shouldn't cause any issues I think.

@chryswoods
Copy link
Contributor

Yes - you lost the pymbar<4 pin as you wanted alchemlyb>=2 installed. The pymbar pin is still in the sire recipe, but we do need to remove it as it is clear that you do need and want alchemlyb>=2.

It would be easy to switch back to the old notebook image that didn't have alchemylb>=2 installed, but then nothing you built that needed this would then work?

The sustainable solution is to do the porting work in analyse_freenrg so that it supports pymbar>=4.

@jmichel80
Copy link
Contributor Author

@chryswoods - we plan to deprecate use of analyse_freenrg in BioSimSpace and switch to alchemlyb >2 so porting analyse_freenrg would only be a temporary fix.

@anna - the intro tutorial uses BSS.FreeEnergy.Relative.analyse in section 5.1

then again in section 1 of the ABFE analysis tutorial.

@chryswoods
Copy link
Contributor

Ok - does this mean removing the mbar parts of analyse_freenrg? We still need to keep the original non-mbar code because it is used to calculate free energies for waterswap simulations.

@fjclark
Copy link
Contributor

fjclark commented Jul 28, 2023

Since it seems to have been a quick fix, I've updated analyse_freenrg to use pymbar 4 (OpenBioSim/sire#91). My tests gave identical results before and after the updates. Hopefully this makes things easier.

@lohedges
Copy link
Contributor

Great, glad it was easy. Are there any tests for the waterswap analysis anywhere? I'm just wondering if there are parts other parts of the code that aren't covered by the tests above that would need updating to the new API too.

@jmichel80
Copy link
Contributor Author

Ok - does this mean removing the mbar parts of analyse_freenrg? We still need to keep the original non-mbar code because it is used to calculate free energies for waterswap simulations.

We wouldn't touch the analyse_freenrg code in sire, we would just stop using it when calling BSS.FreeEnergy.Analyse. It will still be possible to use analyse_freenrg from the command line.

@fjclark
Copy link
Contributor

fjclark commented Jul 28, 2023

Great, glad it was easy. Are there any tests for the waterswap analysis anywhere? I'm just wondering if there are parts other parts of the code that aren't covered by the tests above that would need updating to the new API too.

As far as I could tell from searching "pymbar", AnalyseFreeEnergy.py seemed to be the only section of code where pymbar was used directly.

@annamherz
Copy link

@anna - the intro tutorial uses BSS.FreeEnergy.Relative.analyse in section 5.1

yes - this uses the native analysis only, so is currently dependant on the analyse_freenrg from sire

@jmichel80
Copy link
Contributor Author

So changing the environment build instructions to

mamba install -n bsstutorials -c openbiosim -c conda-forge biosimspace=2023.3.0 gromacs=2023.1 ambertools=23.3 plumed=2.9.0 cinnabar=0.3.0 pymbar=3 alchemlyb=1.0.1

allows me to call without errors BSS.FreeEnergy.Relative.analyse in section 5.1 and to call BSS.FreeEnergy.AlchemicalFreeEnergy.analyse in 02_analysis_abfe.ipynb

I do get a bunch of warnings from alchemlyb however

Could not calculate statistical inefficiency.
Running without calculating the statistical inefficiency and without subsampling...

But the resulting free energy estimates still appear reasonable.

Note that requesting gromacs=2023.1 will not currently work on osx-arm64 .

I will update the tutorials installation docs and report that this has only been tested on linux-64.

@chryswoods - when you have a chance could you also rebuild the notebook container with the new environment ?

@chryswoods
Copy link
Contributor

chryswoods commented Jul 28, 2023

Thanks for clarifying about analyse_freenrg. The history of the tool is that it used to be two scripts; analyse_freenrg.py and analyse_freenrg_mbar.py. We merged them into one script using the mbar command line option. If this was present, then only the code from analyse_freenrg_mbar.py was run, else only the code from the original analyse_freenrg.py was run.

The port to new pymbar (thanks @fjclark!) doesn't impact the non-mbar code, so shouldn't have any impact on waterswap. There is a test in sire_bigtests, which I'll run. It just verifies that the code can still read the freenrgs.s3 files that waterswap produces, and gives the same Bennetts free energy.

Longer term, we could remove the mbar part of analyse_freenrg with a printout suggesting people use the new tool. Or, we could have analyse_freenrg call the new tool for them (printing out what the command would be, to try to encourage people to migrate).

@akalpokas
Copy link
Contributor

Just a note that this still seems to be broken on https://try.openbiosim.org server (exercise 5.1 in https://github.com/OpenBioSim/biosimspace_tutorials/blob/main/04_fep/01_intro_to_alchemy/alchemical_introduction.ipynb)

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module>:1                                                                                    │
│                                                                                                  │
│ ❱ 1 pmf_free, overlap_matrix_free = BSS.FreeEnergy.Relative.analyse(f'o_xylene_benzene_for_a     │
│   2 pmf_bound, overlap_matrix_bound = BSS.FreeEnergy.Relative.analyse(f'o_xylene_benzene_for     │
│   3 freenrg_rel = BSS.FreeEnergy.Relative.difference(pmf_bound, pmf_free)                        │
│   4                                                                                              │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/BioSimSpace/FreeEnergy/_relative.py:462 in analyse       │
│                                                                                                  │
│    459 │   │                                                                                     │
│    460 │   │   # SOMD.                                                                           │461 │   │   if len(data) > 0:                                                                 │
│ ❱  462 │   │   │   return Relative._analyse_somd(work_dir)                                       │
│    463 │   │                                                                                     │
│    464 │   │   # Now check for GROMACS output.                                                   │465 │   │   else:                                                                             │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/BioSimSpace/FreeEnergy/_relative.py:636 in _analyse_somd │
│                                                                                                  │
│    633 │   │   │   stderr=_subprocess.PIPE,                                                      │
│    634 │   │   )                                                                                 │
│    635 │   │   if proc.returncode != 0:                                                          │
│ ❱  636 │   │   │   raise _AnalysisError("SOMD free-energy analysis failed!")                     │
│    637 │   │                                                                                     │
│    638 │   │   # Re-run without subsampling if the subsampling has resulted in less than 50 sam  │639 │   │   with open("%s/mbar.txt" % work_dir) as file:                                      │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AnalysisError: SOMD free-energy analysis failed!

@lohedges
Copy link
Contributor

Thanks for the update. I assume that the fixed code hasn't yet made it's way to the server. Hopefully this will be resolved when the server image is rebuilt.

@chryswoods
Copy link
Contributor

Are we planning on rebuilding? The server currently has both sire and BioSimSpace 2023.3.0 installed. I hadn't upgraded anything as I assumed that everything was working. Do you want us to update the server to the latest stable sire and BioSimSpace versions (2023.3.2 and 2023.3.1 respectively)?

I can do that tonight if this is desired. @jmichel80 - what do you want?

@jmichel80
Copy link
Contributor Author

Yes can you rebuild ?
@fjclark - would you mind checking again the ABFE notebook on try.openbiosim.org when that’s done
@akalpokas - could you test again quickly the other notebooks after the rebuild ?

@chryswoods
Copy link
Contributor

chryswoods commented Sep 20, 2023

All updated - the notebook server should be identical to before, except that sire 2023.3.2 and BioSimSpace 2023.3.1 are now installed

@akalpokas
Copy link
Contributor

From what I can tell the analysis part of the FEP introduction notebook is still failing with the same error as before.
image
image

@lohedges
Copy link
Contributor

This is because this PR was never backported into main, so isn't available in a release.

@lohedges
Copy link
Contributor

Is it easier to just change the version of pymbar in the image? I believe this was tested above without issue.

@chryswoods
Copy link
Contributor

Ah yes - I remember now that that this PR changes the dependencies and features of sire, so is saved for a major release (2023.4.0) not a point release.

@chryswoods
Copy link
Contributor

Since it is a change in a single file, I could monkey-patch the notebook server to have the new version of FreeEnergyAnalysis.py. I'll give that a go now and will update the server. We will be updating the server properly to 2023.4.0 when that is released in early October.

@lohedges
Copy link
Contributor

Ah yes, good point, this will change the dependency resolution so probably best to save for the next release. As you say, it's a single file patch (just do a patch in the docker build) or simply pin to the old version of pymbar, which still allows the required version of cinnabar to be resolved.

@chryswoods
Copy link
Contributor

Ok, it is patched and updated. Your Jupyter kernels will need to restart (I think you have had them interrupted). I've just updated FreeEnergyAnalysis.py to be newest version from devel.

@chryswoods
Copy link
Contributor

This now looks like it is working

pmf_free is: 2.3962 kcal/mol and the MBAR statistical uncertainty is 0.1011 kcal/mol .
pmf_bound is: 1.7108 kcal/mol and the MBAR statistical uncertainty is 0.1487 kcal/mol .
The RBFE is -0.6854 kcal/mol and the error is 0.1798 kcal/mol

@akalpokas
Copy link
Contributor

The free energy analysis seems to be working, but the LOMAP network generation in 01_setup_RBFE notebook (https://github.com/OpenBioSim/biosimspace_tutorials/blob/main/04_fep/02_RBFE/01_setup_rbfe.ipynb) (Section 2.2) seems to be broken now:

image

@lohedges
Copy link
Contributor

lohedges commented Sep 20, 2023

I you add BSS.setVerbose(True) before running the cell you can get the full traceback. In this case, it's:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /opt/conda/lib/python3.10/site-packages/pydot.py:1856 in create                                  │
│                                                                                                  │
│   1853 │   │   env['LD_LIBRARY_PATH'] = os.environ.get('LD_LIBRARY_PATH', '')                    │
│   1854 │   │   cmdline = [prog, '-T' + format] + args + [tmp_name]                               │
│   1855 │   │   try:                                                                              │
│ ❱ 1856 │   │   │   p = subprocess.Popen(                                                         │
│   1857 │   │   │   │   cmdline,                                                                  │
│   1858 │   │   │   │   env=env,                                                                  │
│   1859 │   │   │   │   cwd=tmp_dir,                                                              │
│                                                                                                  │
│ /opt/conda/lib/python3.10/subprocess.py:971 in __init__                                          │
│                                                                                                  │
│    968 │   │   │   │   │   self.stderr = io.TextIOWrapper(self.stderr,                           │
│    969 │   │   │   │   │   │   │   encoding=encoding, errors=errors)                             │
│    970 │   │   │                                                                                 │
│ ❱  971 │   │   │   self._execute_child(args, executable, preexec_fn, close_fds,                  │
│    972 │   │   │   │   │   │   │   │   pass_fds, cwd, env,                                       │
│    973 │   │   │   │   │   │   │   │   startupinfo, creationflags, shell,                        │
│    974 │   │   │   │   │   │   │   │   p2cread, p2cwrite,                                        │
│                                                                                                  │
│ /opt/conda/lib/python3.10/subprocess.py:1863 in _execute_child                                   │
│                                                                                                  │
│   1860 │   │   │   │   │   │   err_filename = orig_executable                                    │
│   1861 │   │   │   │   │   if errno_num != 0:                                                    │
│   1862 │   │   │   │   │   │   err_msg = os.strerror(errno_num)                                  │
│ ❱ 1863 │   │   │   │   │   raise child_exception_type(errno_num, err_msg, err_filename)          │
│   1864 │   │   │   │   raise child_exception_type(err_msg)                                       │
│   1865                                                                                           │
│   1866                                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
FileNotFoundError: [Errno 2] No such file or directory: 'dot'

During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /opt/conda/lib/python3.10/site-packages/BioSimSpace/Align/_align.py:692 in generateNetwork       │
│                                                                                                  │
│    689 │   │   │                                                                                 │
│    690 │   │   │   # Write to a PNG.                                                             │
│    691 │   │   │   network_plot = f"{work_dir}/images/network.png"                               │
│ ❱  692 │   │   │   dot_graph.write_png(network_plot)                                             │
│    693 │   │   │                                                                                 │
│    694 │   │   │   if _is_notebook:                                                              │
│    695 │   │   │   │   # Create a plot of the network.                                           │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/pydot.py:1671 in new_method                              │
│                                                                                                  │
│   1668 │   │   │   │   │   path, f=frmt, prog=self.prog,                                         │
│   1669 │   │   │   │   │   encoding=None):                                                       │
│   1670 │   │   │   │   """Refer to docstring of method `write.`"""                               │
│ ❱ 1671 │   │   │   │   self.write(                                                               │
│   1672 │   │   │   │   │   path, format=f, prog=prog,                                            │
│   1673 │   │   │   │   │   encoding=encoding)                                                    │
│   1674 │   │   │   name = 'write_{fmt}'.format(fmt=frmt)                                         │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/pydot.py:1756 in write                                   │
│                                                                                                  │
│   1753 │   │   │   with io.open(path, mode='wt', encoding=encoding) as f:                        │
│   1754 │   │   │   │   f.write(s)                                                                │
│   1755 │   │   else:                                                                             │
│ ❱ 1756 │   │   │   s = self.create(prog, format, encoding=encoding)                              │
│   1757 │   │   │   with io.open(path, mode='wb') as f:                                           │
│   1758 │   │   │   │   f.write(s)                                                                │
│   1759 │   │   return True                                                                       │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/pydot.py:1863 in create                                  │
│                                                                                                  │
│   1860 │   │   │   │   shell=False,                                                              │
│   1861 │   │   │   │   stderr=subprocess.PIPE, stdout=subprocess.PIPE)                           │
│   1862 │   │   except OSError as e:                                                              │
│ ❱ 1863 │   │   │   if e.errno == os.errno.ENOENT:                                                │
│   1864 │   │   │   │   args = list(e.args)                                                       │
│   1865 │   │   │   │   args[1] = '"{prog}" not found in path.'.format(                           │
│   1866 │   │   │   │   │   prog=prog)                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: module 'os' has no attribute 'errno'

The above exception was the direct cause of the following exception:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module>:15                                                                                   │
│                                                                                                  │
│   12 │   # append the molecule name to another list so that we can use the name of each molec    │
│   13 │   ligand_names.append(filepath.split("/")[-1].replace(".sdf", ""))                        │
│   14                                                                                             │
│ ❱ 15 transformations, lomap_scores = BSS.Align.generateNetwork(                                  │
│   16 │   ligands,                                                                                │
│   17 │   plot_network=True,                                                                      │
│   18 │   names=ligand_names,                                                                     │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/BioSimSpace/Align/_align.py:705 in generateNetwork       │
│                                                                                                  │
│    702 │   │   │   msg = "Unable to create network plot!"                                        │
│    703 │   │   │   if _isVerbose():                                                              │
│    704 │   │   │   │   msg += ": " + getattr(e, "message", repr(e))                              │
│ ❱  705 │   │   │   │   raise _AlignmentError(msg) from e                                         │
│    706 │   │   │   else:                                                                         │
│    707 │   │   │   │   raise _AlignmentError(msg) from None                                      │
│    708                                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

Here it looks like the pydot package hasn't installed the command line dot executable for some reason. It also seems to be using an invalid version of the os API.

In my local environment I have:

mamba list pydot                                                                             feature_runner ✱ ◼
# packages in environment at /home/lester/.conda/envs/openbiosim:
#
# Name                    Version                   Build  Channel
pydot                     1.4.2           py310hff52083_3    conda-forge

On the server I see:

mamba list pydot
# packages in environment at /opt/conda:
#
# Name                    Version                   Build  Channel
pydot                     1.2.4                      py_0    conda-forge

I'll try updating the version to see if it works. It's weird that a more recent build has resolved an older version of the package. Hopefully this isn't a general problem, which might mean that we need to add a pin to the BIoSImSpace recipe.

@lohedges
Copy link
Contributor

Yes, simply doing the following in the terminal fixes things. (Not sure if it breaks anything else, though.)

mamba install -c conda-forge pydot=1.4.2

@lohedges
Copy link
Contributor

I've just checked and pydot version 1.4.2 is installed when creating fresh BioSimSpace environments from dev or main, so I assume one of the other tutorial dependencies is causing an older version to be pulled in.

@chryswoods
Copy link
Contributor

I'm uploading an image with pydot 1.4.2 installed...

@chryswoods
Copy link
Contributor

This seems to work now, but the debug messages get printed to the output of the first cell (where the first debug message about RDKit gets printed on import). It is a bit confusion, but the output is right.

@lohedges
Copy link
Contributor

Yes, I noticed that too. I hadn't seen that crazy long INFO output from RDKit before. I normally suppress this in BioSimSpace. Perhaps the way the logger is enabled/disabled has changed?

@fjclark
Copy link
Contributor

fjclark commented Sep 20, 2023

I've attempted to test the ABFE notebooks many times now, and each time it stops with the same error:

image

I have been running all cells initially, during which I haven't yet seen this error on either notebook. So far I've only hit this when running the exercises after the initial run through, but sometimes on cells which I wouldn't expect to be computationally intense, e.g. reading the restraint correction from the restraint.

@chryswoods
Copy link
Contributor

This is most likely an issue relating to running out of memory. While the notebook is limited to 4GB per user, there is only 8 GB of memory on the server. This means that two people using it at the same time can exhaust the memory on the server, and crash each other's kernels.

It is worth clearing / closing one notebook before starting another, to minimise memory usage.

It may also be worth coordinating to ensure that only one person today is going through and testing everything (or you take turns)

@akalpokas
Copy link
Contributor

I have tested all FEP notebooks (except ABFE) and everything seems to work now, thanks for your help!

@chryswoods
Copy link
Contributor

(on the day the server will have 256 GB so should be able to support ~60 simultaneous users)

@chryswoods
Copy link
Contributor

I've deleted your pod @akalpokas so now only @fjclark's is running on the server. Assuming no-one starts another pod, there should now be enough memory (assuming it fits under 4GB)

@fjclark
Copy link
Contributor

fjclark commented Sep 20, 2023

Thanks for this. I can run the analysis notebook no problem, but the final exercises in the setup notebook consistently result in the above error. Monitoring top immediately before the crash shows it grabbing a lot of memory:

image

I'll create a branch of the tutorials with these exercises removed.

@fjclark
Copy link
Contributor

fjclark commented Sep 20, 2023

I've created the branch here. However, it might be good to keep the exercises for users running on their own machines. If that sounds reasonable, could someone please create a new branch for me to create a PR to (I don't seem to have permission)? Otherwise, I'll reopen my PR to devel. Thanks.

@chryswoods
Copy link
Contributor

I think 4GB is about what you could expect from people's laptops. The spread will likely be 4-16GB, with 8GB being the norm. So it may be worth keeping them as examples and not having people run them locally, as out of memory on a laptop will cause more issues than on the server. Best case will be swapping, but worst case will be slowing down the laptops a lot or randomly crashing things.

@fjclark
Copy link
Contributor

fjclark commented Sep 21, 2023

Ok, thanks for the comment. I've reopened the PR.

@jmichel80
Copy link
Contributor Author

closing due to inactivity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants