async chunk writing #125

kongdd · 2023-09-28T01:54:35Z

kongdd
Sep 28, 2023

Just wondering whether Zarr.jl supports parallel writing?
If yes, could you give one small example?

Oct 2, 2023

I am not really sure what the context is. It is completely fine if multiple processes write simultaneously into the same Zarr dataset, but it is up to the user/library to make sure that different processes/threads never write into the same chunk. So the following is fine:

using Zarr
a = zzeros(Float64,10,10,chunks=(10,1),path=tempname())
@sync for i=1:10
  @async a[:,i] = rand(10)
end
a[:,:]

However, when using multiple threads there may be problems coming from the different compression/decompression libraries, which are not thread-safe, so be careful when using compression.

Otherwise when writing across several chunks in a single call to setindex!, it will depend on the backend if chunks…

View full answer

meggart · 2023-10-02T13:07:02Z

meggart
Oct 2, 2023
Collaborator

I am not really sure what the context is. It is completely fine if multiple processes write simultaneously into the same Zarr dataset, but it is up to the user/library to make sure that different processes/threads never write into the same chunk. So the following is fine:

using Zarr
a = zzeros(Float64,10,10,chunks=(10,1),path=tempname())
@sync for i=1:10
  @async a[:,i] = rand(10)
end
a[:,:]

However, when using multiple threads there may be problems coming from the different compression/decompression libraries, which are not thread-safe, so be careful when using compression.

Otherwise when writing across several chunks in a single call to setindex!, it will depend on the backend if chunks are written sequentially or simulaneously. There is a storage backend trait which is set to SequentialRead by default, but several backends like HTTP or S3 do concurrent reads and writes by default, so if you do

a[:,:] = rand(10,10)

it will depend on the backend if you do concurrent writes, but compression will always happen sequentially because of the problems with the underlying C libraries mentioned above.

0 replies

bjarthur · 2024-02-08T23:44:29Z

bjarthur
Feb 8, 2024
Collaborator

However, when using multiple threads there may be problems coming from the different compression/decompression libraries, which are not thread-safe, so be careful when using compression.

perhaps it's time to add Blosc2 support to Zarr? i believe that is thread safe, but could be wrong.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

async chunk writing #125

{{title}}

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

async chunk writing #125

kongdd Sep 28, 2023

Replies: 2 comments

meggart Oct 2, 2023 Collaborator

bjarthur Feb 8, 2024 Collaborator

kongdd
Sep 28, 2023

meggart
Oct 2, 2023
Collaborator

bjarthur
Feb 8, 2024
Collaborator