Store unique masks in a dictionary to avoid duplication (FormatNXmx.py) #789

yash4karan · 2025-02-20T11:02:38Z

Since static masks tend to be constant for a given experiment, it is more efficient to store unique entries in a dictionary.

This PR adds a MaskDict singleton, which inherits from the Python dict. This class provides a .insert() method that inserts a static mask object only if it is not already present and then returns a reference to the corresponding dictionary entry.

Motivation: dials.import seems to require a surprising amount of ram (sometimes) #2227

When trying to dials.import a large number of experiment files, the RAM usage on Linux was unnecessarily high due to separate static masks being created for each file. Since masks tend to be limited in number for each experiment, it makes more sense to store masks with unique values only, which is what this PR aims to do.

Before and after:

I profiled dials.import using memray on a Linux workstation with the following VMXi dataset:
/dls/mx/data/nt30330/nt30330-162/VMXi-AB2776/well_*/images/*.nxs.

Before

I found that originally, peak RAM usage was ~18 GB, with the mask objects taking up ~16 GB for 3836 files (i.e. ~4.3 MB per .nxs file, as expected). The RAM usage vs. time plot is shown below:

After

After the PR changes were made, total RAM usage was <900 MB, with the mask objects taking up <10 MB at any given time. The RAM usage vs. time plot is shown below:

…tion

graeme-winter

Thank you for the contribution: this looks exactly like the fix which is needed here - the changes I propose are largely cosmetic though dealing with mask tuples of length > 1 would be important before we merge this.

For others: I have already discussed these changes with @yash4karan 🙂

newsfragments/789.feature

src/dxtbx/format/FormatNXmx.py

ndevenish · 2025-02-20T11:56:14Z

Is it worth storing the masks as weak references; this way, if you ever do load anything with a million separate masks you'll keep them around forever even if the original format class is long gone.

If so, I haven't used it before, but it looks like that module has a WeakValueDictionary that may do exactly what we need?

ndevenish · 2025-02-20T11:58:47Z

ohey

A primary use for weak references is to implement caches or mappings holding large objects, where it’s desired that a large object not be kept alive solely because it appears in a cache or mapping.

ndevenish · 2025-02-21T09:14:25Z

I edited the news; as I tend to do for everyone. The news fragment is aimed at end-users of DIALS, so I try to describe the effect rather than the technical description of what changed; it isn't a commit message.

ndevenish · 2025-02-21T09:17:03Z

Add yourself to dxtbx AUTHORS also

Stores static masks in FormatNXmx.py in a dictionary to avoid duplica…

4ddbd74

…tion

yash4karan changed the title ~~Stores static masks in FormatNXmx.py in a dict to avoid duplication~~ Store static masks in FormatNXmx.py in a dict to avoid duplication Feb 20, 2025

Yash Karan and others added 2 commits February 20, 2025 11:09

Added newsfragment

42d7a7c

Rename newsfragments/XXX.feature to newsfragments/789.feature

68df1e2

graeme-winter reviewed Feb 20, 2025

View reviewed changes

Yash Karan added 2 commits February 20, 2025 16:26

dict to weakref dict, type information, naming/structural changes

84a5579

Fixed a major bug in _MaskCache

6d218f7

yash4karan changed the title ~~Store static masks in FormatNXmx.py in a dict to avoid duplication~~ Store unique masks in a dictionary to avoid duplication (FormatNXmx.py) Feb 20, 2025

Update and rename 789.feature to 789.bugfix

c6221d5

Added name to AUTHORS

7883e29

ndevenish merged commit 45f50b2 into cctbx:main Feb 21, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store unique masks in a dictionary to avoid duplication (FormatNXmx.py) #789

Store unique masks in a dictionary to avoid duplication (FormatNXmx.py) #789

yash4karan commented Feb 20, 2025 •

edited

Loading

graeme-winter left a comment

ndevenish commented Feb 20, 2025

ndevenish commented Feb 20, 2025

ndevenish commented Feb 21, 2025

ndevenish commented Feb 21, 2025

Store unique masks in a dictionary to avoid duplication (FormatNXmx.py) #789

Store unique masks in a dictionary to avoid duplication (FormatNXmx.py) #789

Conversation

yash4karan commented Feb 20, 2025 • edited Loading

Motivation: dials.import seems to require a surprising amount of ram (sometimes) #2227

Before and after:

Before

After

graeme-winter left a comment

Choose a reason for hiding this comment

ndevenish commented Feb 20, 2025

ndevenish commented Feb 20, 2025

ndevenish commented Feb 21, 2025

ndevenish commented Feb 21, 2025

yash4karan commented Feb 20, 2025 •

edited

Loading