Faster chunk checking for backend datasets #9808

dcherian · 2024-11-21T18:33:28Z

Closes opening a zarr dataset taking so much time #8902 (shaves 30s off the runtime, dask is responsible for the rest)
User visible changes (including notable bug fixes) are documented in whats-new.rst

max-sixty

I didn't re-think through the logic but the cache idea makes sense, thanks!

dcherian · 2024-11-21T22:21:37Z

I didn't re-think through the logic but the cache idea makes sense, thanks!

The core of it didn't change, just avoided materializing a huge iterable in memory in favor of finding the first disagreement.

Faster chunk checking for backend datasets

96b621a

dcherian mentioned this pull request Nov 21, 2024

opening a zarr dataset taking so much time #8902

Closed

limit size

2482fc4

dcherian marked this pull request as draft November 21, 2024 18:47

dcherian added the topic-chunked-arrays Managing different chunked backends, e.g. dask label Nov 21, 2024

dcherian added 2 commits November 21, 2024 11:56

fix test

5e07bbb

optimize

e2aad24

dcherian force-pushed the cache-chunking-backend-dataset branch from 8e80fb9 to e2aad24 Compare November 21, 2024 18:59

dcherian marked this pull request as ready for review November 21, 2024 19:07

dcherian requested a review from max-sixty November 21, 2024 19:34

max-sixty approved these changes Nov 21, 2024

View reviewed changes

dcherian added the plan to merge Final call for comments label Nov 21, 2024

Provide feedback