You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ProteinStructureStore uses HDF5 for lazy IO of structures, but HDF5.jl is just a wrapper for some pre-built binary that doesn't support parallelization. This effectively bottlenecks the IO speed. A rough test showed that for a dataset with ~20 properties, ~300KB per structure, reading 100 structures takes ~1 second.
At the moment, this format and structure is viable for repositories of protein structures with chain and residue-wise information that is otherwise expensive to gather. It might not be optimal for direct use in workflows that require high throughput protein data look-ups.
Programs that require fast IO might have to serialize into some faster intermediate format.
The text was updated successfully, but these errors were encountered:
ProteinStructureStore uses HDF5 for lazy IO of structures, but HDF5.jl is just a wrapper for some pre-built binary that doesn't support parallelization. This effectively bottlenecks the IO speed. A rough test showed that for a dataset with ~20 properties, ~300KB per structure, reading 100 structures takes ~1 second.
At the moment, this format and structure is viable for repositories of protein structures with chain and residue-wise information that is otherwise expensive to gather. It might not be optimal for direct use in workflows that require high throughput protein data look-ups.
Programs that require fast IO might have to serialize into some faster intermediate format.
The text was updated successfully, but these errors were encountered: