Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProteinStructureStore speed #15

Open
AntonOresten opened this issue Nov 18, 2024 · 0 comments
Open

ProteinStructureStore speed #15

AntonOresten opened this issue Nov 18, 2024 · 0 comments

Comments

@AntonOresten
Copy link
Member

ProteinStructureStore uses HDF5 for lazy IO of structures, but HDF5.jl is just a wrapper for some pre-built binary that doesn't support parallelization. This effectively bottlenecks the IO speed. A rough test showed that for a dataset with ~20 properties, ~300KB per structure, reading 100 structures takes ~1 second.

At the moment, this format and structure is viable for repositories of protein structures with chain and residue-wise information that is otherwise expensive to gather. It might not be optimal for direct use in workflows that require high throughput protein data look-ups.

Programs that require fast IO might have to serialize into some faster intermediate format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant