-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store unique masks in a dictionary to avoid duplication (FormatNXmx.py) #789
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution: this looks exactly like the fix which is needed here - the changes I propose are largely cosmetic though dealing with mask tuples of length > 1 would be important before we merge this.
For others: I have already discussed these changes with @yash4karan 🙂
Is it worth storing the masks as weak references; this way, if you ever do load anything with a million separate masks you'll keep them around forever even if the original format class is long gone. If so, I haven't used it before, but it looks like that module has a |
ohey
|
I edited the news; as I tend to do for everyone. The news fragment is aimed at end-users of DIALS, so I try to describe the effect rather than the technical description of what changed; it isn't a commit message. |
Add yourself to dxtbx AUTHORS also |
Since static masks tend to be constant for a given experiment, it is more efficient to store unique entries in a dictionary.
This PR adds a
MaskDict
singleton, which inherits from the Pythondict
. This class provides a.insert()
method that inserts a static mask object only if it is not already present and then returns a reference to the corresponding dictionary entry.Motivation: dials.import seems to require a surprising amount of ram (sometimes) #2227
When trying to
dials.import
a large number of experiment files, the RAM usage on Linux was unnecessarily high due to separate static masks being created for each file. Since masks tend to be limited in number for each experiment, it makes more sense to store masks with unique values only, which is what this PR aims to do.Before and after:
I profiled
dials.import
usingmemray
on a Linux workstation with the following VMXi dataset:/dls/mx/data/nt30330/nt30330-162/VMXi-AB2776/well_*/images/*.nxs
.Before
I found that originally, peak RAM usage was ~18 GB, with the mask objects taking up ~16 GB for 3836 files (i.e. ~4.3 MB per
.nxs
file, as expected). The RAM usage vs. time plot is shown below:After
After the PR changes were made, total RAM usage was <900 MB, with the mask objects taking up <10 MB at any given time. The RAM usage vs. time plot is shown below: