Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature proposal: Adding a ResidenceTime Analysis Routine #4868

Open
mattiafelice-palermo opened this issue Dec 27, 2024 · 5 comments
Open

Comments

@mattiafelice-palermo
Copy link
Contributor

Summary

Dear MDAnalysis community, I would like to propose a new analysis routine that I have developed, ResidenceTime, for MDAnalysis. The routine would allow to analyze how long specific groups of molecules (called "probe groups") stay within a defined distance from another group of molecules (called "reference groups"). By calculating these "residence times," the tool helps quantify solvation dynamics, binding interactions, and local environment stability over time.

I couldn't find this routine among the analysis available in MDAnalysis (hopefully I didn't miss it). The main metric it computes is:

  • Residence Time Distribution:
    Histograms (both raw and normalized) of residence times with customizable bin sizes or counts.

Additionally, the routine computes the Average Coordination Number as a function of time as a secondary analysis. This metric is a natural byproduct of the residence time calculations, as it leverages the same underlying data (probes within the cutoff distance). It tracks how many probe groups stay near the reference groups over time, offering additional insight into coordination dynamics.


Motivation

I developed this routine to assess the strength of coordination of solvent molecules around ions, but it can also be used to study other dynamic molecular interactions, such as ligand binding or protein-ligand residence times. This work is linked to an upcoming publication where I demonstrate its utility in understanding solvation dynamics.


Key Features

  1. Inputs:

    • reference_group: Atoms to calculate residence times around.
    • probe_group: Atoms whose residence times are measured.
    • radius: Cutoff distance in Ångstroms.
    • Optional binning: Choose either bin_size or bin_count for the histogram.
  2. Outputs:

    • results.histogram: Dictionary containing raw and normalized histograms, and bin edges.
    • results.coordination: Time series data for average coordination numbers.
  3. Example Use Cases:

    • Analyze how long water sticks to ions.
    • Track ion coordination in solvation shells or binding sites.

Parallelization Potential

The routine is designed to be parallelization-friendly. Two approaches could be used to compute the residence time distribution, but I've implemented the second version described here:

  1. Step-by-Step Approach (Not Used Here): This method involves analyzing one frame at a time, where the computation of a frame depends on the results of the previous frame to check if a probe molecule is still within the cutoff from the reference. While straightforward in concept, this dependency makes it challenging to divide the workload across multiple processors, limiting its scalability for parallel computation.

  2. Global Monitoring (Implemented Here):
    This implementation collects residence data for all probes in one pass and processes it later. This setup could be chunked for trajectory-based parallelization, making it scalable for large systems. While speculative for now, this design leaves the door open for parallel implementation.


Bonus Ideas: Fitting Residence Time Histograms, Probabilistic Criteria for Coordination and Grace time

Two additional features could enhance the utility of this routine in the future:

  • Fitting Residence Time Histograms:
    Residence time histograms can be fitted with exponential or biexponential decay functions to extract solvation/desolvation timescales. For example:
    P(t) = A e^{-t/\tau}
    or
    P(t) = A_1 e^{-t/\tau_1} + A_2 e^{-t/\tau_2}
    where ( \tau ) represents the characteristic residence time for single processes, and ( \tau_1 ) and ( \tau_2 ) represent distinct timescales for systems with multiple processes. The biexponential fit helps identify fast and slow dynamics in complex systems. While not implemented yet, this feature could serve as a valuable post-processing guide for users.

  • Probabilistic Criteria for Coordination:
    Instead of using a fixed cutoff to determine coordination, a probabilistic approach could be implemented. This might involve using radial distribution functions (RDFs) or Boltzmann distributions to account for temperature and system-specific variations. Such an approach could provide a more nuanced assessment of coordination and may align better with experimental observations.

  • Grace time:
    Alternatively or complementary, in the fixed cutoff version a grace time can be included to account for probes exiting the solvation shell for very short times.


Example Code

Example Script:
You can find a script for analyzing residence times and coordination numbers:

import MDAnalysis as mda
from MDAnalysis.analysis.rdt import ResidenceTime
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('nvt.tpr', 'nvt.trr')

reference = u.select_atoms("resname NA")
probe = u.select_atoms("resname WAT")


rdt_analysis = ResidenceTime(reference, probe, 4, bin_count = 20)
rdt_analysis.run()

# Generate and save residence time distribution
results = rdt_analysis.results.histogram
raw_hist = results["raw_hist"]
bin_edges = results["bin_edges"]
normalized_hist = results["normalized_hist"]

# Compute bin midpoints
bin_midpoints = (bin_edges[:-1] + bin_edges[1:]) / 2

# Save histogram data to a single CSV file
data = np.column_stack((bin_edges[:-1], bin_edges[1:], bin_midpoints, raw_hist, normalized_hist))
np.savetxt(
    "residence_times_histogram.csv",
    data,
    delimiter=",",
    header="#Bin_Start (ps), Bin_End (ps), Bin_Midpoint (ps), Raw_Count, Normalized_Value",
    fmt="%.3f,%.3f,%.3f,%d,%.6f",
    comments=""
)

# Generate and save histogram plot
plt.figure(figsize=(8, 6))
plt.bar(bin_midpoints, normalized_hist, width=np.diff(bin_edges), align="center", edgecolor="black")
plt.xlabel("Residence Time (ps)")
plt.ylabel("Normalized Frequency")
plt.title("Residence Time Distribution")
plt.grid(True, linestyle="--", alpha=0.7)
plt.savefig("residence_times_histogram.png", dpi=300)
plt.show()

Sample Outputs:

  • Residence Time Distribution:
    image

Let me know if you think this analysis routine could be a valuable addition to MDAnalysis. I have already developed the code and can launch a pull request for your review. I’d love to hear your feedback, suggestions, or ideas for improving it, and whether this aligns with the needs of the community.

Looking forward to your thoughts!

Mattia

@RMeli
Copy link
Member

RMeli commented Dec 27, 2024

Hi @mattiafelice-palermo. This looks like a very useful analysis class! Thank you for the detailed explanation and information.

It is definitely an addition we should consider. However, I should point out that for the 3.0 release we have been working on streamlining the MDAnalysis library and deprecated some analysis classes. Have you heard of MDAKits? The MDAKit Registry is where we moved some of these analysis classes and where new ones are constantly being added. There are some more details on the Why and MDAKit page.

I'll let others chime in, but I'm wondering if your work is more suited for such format. What do you think?

@RMeli
Copy link
Member

RMeli commented Dec 27, 2024

BTW, parallelization can now be handled by AnalysisBase. Since you used the second approach, it might be reasonably straightforward to refactor your class to use what MDAnalysis provides.

@mattiafelice-palermo
Copy link
Contributor Author

Hello Rocco, thank you for the kind feedback!

I've been out of the loop with MDAnalysis for a while, so I have definitely missed the MDAKit part :) In fact, I'm having a look at the library and seeing there a Kit by Richard Gowers with a similar aim (residence time, coordination number) - so I guess my analysis routine is a bit redundant!

Of course, if for any reason this kind of contribution is still of interest for MDAnalysis, I'll be happy to contribute, otherwise I will look more into MDAKits and contribute in that form!

P.s.: thank you for the tip on the parallelization part, which I definitely missed.

@orbeckst
Copy link
Member

orbeckst commented Jan 7, 2025

@mattiafelice-palermo residence times are super-useful (we are using them for protein-lipid interactions https://github.com/Becksteinlab/basicrta ). If you have code that makes it easy to calculate them in various contexts then I'd really encourage you to create a MDAKit. It doesn't matter if there's already something out there that looks similar — yours probably has a unique approach and it's possible that the way you do it will attract more users, especially if you keep working on it.

@orbeckst
Copy link
Member

orbeckst commented Jan 7, 2025

Btw, I do like your example code a lot, it does look like a useful basic building block for analysis tools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants