Feature proposal: Adding a ResidenceTime Analysis Routine #4868

mattiafelice-palermo · 2024-12-27T13:45:55Z

Summary

Dear MDAnalysis community, I would like to propose a new analysis routine that I have developed, ResidenceTime, for MDAnalysis. The routine would allow to analyze how long specific groups of molecules (called "probe groups") stay within a defined distance from another group of molecules (called "reference groups"). By calculating these "residence times," the tool helps quantify solvation dynamics, binding interactions, and local environment stability over time.

I couldn't find this routine among the analysis available in MDAnalysis (hopefully I didn't miss it). The main metric it computes is:

Residence Time Distribution:
Histograms (both raw and normalized) of residence times with customizable bin sizes or counts.

Additionally, the routine computes the Average Coordination Number as a function of time as a secondary analysis. This metric is a natural byproduct of the residence time calculations, as it leverages the same underlying data (probes within the cutoff distance). It tracks how many probe groups stay near the reference groups over time, offering additional insight into coordination dynamics.

Motivation

I developed this routine to assess the strength of coordination of solvent molecules around ions, but it can also be used to study other dynamic molecular interactions, such as ligand binding or protein-ligand residence times. This work is linked to an upcoming publication where I demonstrate its utility in understanding solvation dynamics.

Key Features

Inputs:
- reference_group: Atoms to calculate residence times around.
- probe_group: Atoms whose residence times are measured.
- radius: Cutoff distance in Ångstroms.
- Optional binning: Choose either bin_size or bin_count for the histogram.
Outputs:
- results.histogram: Dictionary containing raw and normalized histograms, and bin edges.
- results.coordination: Time series data for average coordination numbers.
Example Use Cases:
- Analyze how long water sticks to ions.
- Track ion coordination in solvation shells or binding sites.

Parallelization Potential

The routine is designed to be parallelization-friendly. Two approaches could be used to compute the residence time distribution, but I've implemented the second version described here:

Step-by-Step Approach (Not Used Here): This method involves analyzing one frame at a time, where the computation of a frame depends on the results of the previous frame to check if a probe molecule is still within the cutoff from the reference. While straightforward in concept, this dependency makes it challenging to divide the workload across multiple processors, limiting its scalability for parallel computation.
Global Monitoring (Implemented Here):
This implementation collects residence data for all probes in one pass and processes it later. This setup could be chunked for trajectory-based parallelization, making it scalable for large systems. While speculative for now, this design leaves the door open for parallel implementation.

Bonus Ideas: Fitting Residence Time Histograms, Probabilistic Criteria for Coordination and Grace time

Two additional features could enhance the utility of this routine in the future:

Fitting Residence Time Histograms:
Residence time histograms can be fitted with exponential or biexponential decay functions to extract solvation/desolvation timescales. For example:
P(t) = A e^{-t/\tau}
or
P(t) = A_1 e^{-t/\tau_1} + A_2 e^{-t/\tau_2}
where ( \tau ) represents the characteristic residence time for single processes, and ( \tau_1 ) and ( \tau_2 ) represent distinct timescales for systems with multiple processes. The biexponential fit helps identify fast and slow dynamics in complex systems. While not implemented yet, this feature could serve as a valuable post-processing guide for users.
Probabilistic Criteria for Coordination:
Instead of using a fixed cutoff to determine coordination, a probabilistic approach could be implemented. This might involve using radial distribution functions (RDFs) or Boltzmann distributions to account for temperature and system-specific variations. Such an approach could provide a more nuanced assessment of coordination and may align better with experimental observations.
Grace time:
Alternatively or complementary, in the fixed cutoff version a grace time can be included to account for probes exiting the solvation shell for very short times.

Example Code

Example Script:
You can find a script for analyzing residence times and coordination numbers:

import MDAnalysis as mda
from MDAnalysis.analysis.rdt import ResidenceTime
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('nvt.tpr', 'nvt.trr')

reference = u.select_atoms("resname NA")
probe = u.select_atoms("resname WAT")


rdt_analysis = ResidenceTime(reference, probe, 4, bin_count = 20)
rdt_analysis.run()

# Generate and save residence time distribution
results = rdt_analysis.results.histogram
raw_hist = results["raw_hist"]
bin_edges = results["bin_edges"]
normalized_hist = results["normalized_hist"]

# Compute bin midpoints
bin_midpoints = (bin_edges[:-1] + bin_edges[1:]) / 2

# Save histogram data to a single CSV file
data = np.column_stack((bin_edges[:-1], bin_edges[1:], bin_midpoints, raw_hist, normalized_hist))
np.savetxt(
    "residence_times_histogram.csv",
    data,
    delimiter=",",
    header="#Bin_Start (ps), Bin_End (ps), Bin_Midpoint (ps), Raw_Count, Normalized_Value",
    fmt="%.3f,%.3f,%.3f,%d,%.6f",
    comments=""
)

# Generate and save histogram plot
plt.figure(figsize=(8, 6))
plt.bar(bin_midpoints, normalized_hist, width=np.diff(bin_edges), align="center", edgecolor="black")
plt.xlabel("Residence Time (ps)")
plt.ylabel("Normalized Frequency")
plt.title("Residence Time Distribution")
plt.grid(True, linestyle="--", alpha=0.7)
plt.savefig("residence_times_histogram.png", dpi=300)
plt.show()

Sample Outputs:

Residence Time Distribution:

Let me know if you think this analysis routine could be a valuable addition to MDAnalysis. I have already developed the code and can launch a pull request for your review. I’d love to hear your feedback, suggestions, or ideas for improving it, and whether this aligns with the needs of the community.

Looking forward to your thoughts!

Mattia

The text was updated successfully, but these errors were encountered:

RMeli · 2024-12-27T15:52:26Z

Hi @mattiafelice-palermo. This looks like a very useful analysis class! Thank you for the detailed explanation and information.

It is definitely an addition we should consider. However, I should point out that for the 3.0 release we have been working on streamlining the MDAnalysis library and deprecated some analysis classes. Have you heard of MDAKits? The MDAKit Registry is where we moved some of these analysis classes and where new ones are constantly being added. There are some more details on the Why and MDAKit page.

I'll let others chime in, but I'm wondering if your work is more suited for such format. What do you think?

RMeli · 2024-12-27T15:56:47Z

BTW, parallelization can now be handled by AnalysisBase. Since you used the second approach, it might be reasonably straightforward to refactor your class to use what MDAnalysis provides.

mattiafelice-palermo · 2024-12-27T16:06:48Z

Hello Rocco, thank you for the kind feedback!

I've been out of the loop with MDAnalysis for a while, so I have definitely missed the MDAKit part :) In fact, I'm having a look at the library and seeing there a Kit by Richard Gowers with a similar aim (residence time, coordination number) - so I guess my analysis routine is a bit redundant!

Of course, if for any reason this kind of contribution is still of interest for MDAnalysis, I'll be happy to contribute, otherwise I will look more into MDAKits and contribute in that form!

P.s.: thank you for the tip on the parallelization part, which I definitely missed.

orbeckst · 2025-01-07T00:02:17Z

@mattiafelice-palermo residence times are super-useful (we are using them for protein-lipid interactions https://github.com/Becksteinlab/basicrta ). If you have code that makes it easy to calculate them in various contexts then I'd really encourage you to create a MDAKit. It doesn't matter if there's already something out there that looks similar — yours probably has a unique approach and it's possible that the way you do it will attract more users, especially if you keep working on it.

orbeckst · 2025-01-07T00:04:50Z

Btw, I do like your example code a lot, it does look like a useful basic building block for analysis tools.

orbeckst added enhancement Component-Analysis proposal MDAKit suitable for MDAKit labels Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature proposal: Adding a ResidenceTime Analysis Routine #4868

Feature proposal: Adding a ResidenceTime Analysis Routine #4868

mattiafelice-palermo commented Dec 27, 2024

RMeli commented Dec 27, 2024 •

edited

Loading

RMeli commented Dec 27, 2024 •

edited

Loading

mattiafelice-palermo commented Dec 27, 2024

orbeckst commented Jan 7, 2025

orbeckst commented Jan 7, 2025

Feature proposal: Adding a ResidenceTime Analysis Routine #4868

Feature proposal: Adding a ResidenceTime Analysis Routine #4868

Comments

mattiafelice-palermo commented Dec 27, 2024

Summary

Motivation

Key Features

Parallelization Potential

Bonus Ideas: Fitting Residence Time Histograms, Probabilistic Criteria for Coordination and Grace time

Example Code

RMeli commented Dec 27, 2024 • edited Loading

RMeli commented Dec 27, 2024 • edited Loading

mattiafelice-palermo commented Dec 27, 2024

orbeckst commented Jan 7, 2025

orbeckst commented Jan 7, 2025

RMeli commented Dec 27, 2024 •

edited

Loading

RMeli commented Dec 27, 2024 •

edited

Loading