Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Populate image metadata without allocating memory for the entire image content #19830

Merged
merged 18 commits into from
Mar 26, 2025

Conversation

kostrykin
Copy link
Contributor

@kostrykin kostrykin commented Mar 17, 2025

The num_unique_values metadata element for images was added in #18951. It was later observed that to populate this element, the image was loaded into memory, which required memory allocation in the size of the entire image. In #19760, it was thus proposed to remove the num_unique_values metadata element.

As described here, it is also possible to populate the num_unique_values metadata element without allocating memory for the entire image. There are two orthogonal strategies how this is achieved for TIFF and PNG images in this PR.

A hybrid strategy is employed to extract the necessary information from TIFF files without loading them entirely into memory. TIFF files with multiple segments (aka tiles, stripes) are processed by reading the segments one-by-one. This means that only the memory for a single segment needs to be allocated. TIFF files, however, that have only a single segment, are read by memory-mapping the image data into virtual memory. The image data is then read and processed chunk by chunk. In this case, only the memory of the size of a single chunk needs to be allocated.

In addition, this PR cleans up some terminology. There was confusion regarding pages/series in TIFF files.

For PNG images, the pypng library is used to read and process the image data row-by-row. This should allocate only as much memory as required for a single row of the image. The pypng library is added as a dependency (written entirely in Python, comes without further dependencies, only 58kB in size, see on PyPI).

For images other than TIFF and PNG, the num_unique_values metadata element is currently not computed.

cc @mvdbeek

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@kostrykin kostrykin marked this pull request as ready for review March 18, 2025 09:35
@github-actions github-actions bot added this to the 25.0 milestone Mar 18, 2025
@mvdbeek mvdbeek force-pushed the num_unique_values branch from b76cafa to 8a71919 Compare March 25, 2025 13:37
Copy link
Member

@mvdbeek mvdbeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@mvdbeek mvdbeek merged commit 76d6fdf into galaxyproject:dev Mar 26, 2025
53 of 56 checks passed
@kostrykin kostrykin deleted the num_unique_values branch March 26, 2025 08:11
@galaxyproject galaxyproject deleted a comment from github-actions bot Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants