Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: reduce memory footprint of cyrcular graph breakends command #7

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dlaehnemann
Copy link
Collaborator

  • reduce max usable read coverage to 32767 by switching Count to i16 from i32, this just about halves memory usage

…i16 from i32, this just about halves memory usage
@dlaehnemann
Copy link
Collaborator Author

Also, in a second step, could we go and simply use mosdepth for the depth calculation and have it output a D4 file. From what I understand writing these in mosdepth is really fast; and they can be read with random access and have a whole rust crate built for them](https://docs.rs/d4/latest/d4/).

@tedil
Copy link
Owner

tedil commented Nov 4, 2024

Also, in a second step, could we go and simply use mosdepth for the depth calculation and have it output a D4 file. From what I understand writing these in mosdepth is really fast; and they can be read with random access and have a whole rust crate built for them](https://docs.rs/d4/latest/d4/).

Yes, I don't see why not (apart from requiring additional external input files)

@dlaehnemann
Copy link
Collaborator Author

Yeah, but it would also be more in the spirit of specialized tools that do one thing, but do it well... But I have to admit that the D4 crate does not come with any examples in the docs, so one would probably have to scour the d4tools implementation to get the random access file reading set up.

@dlaehnemann
Copy link
Collaborator Author

Also, this PR here seems to have fit all of my current data into the respective HPCs memory. Just barely, but it worked. So I am assuming there's probably also another code path that uses a lot of memory at some point. I'll have to run heaptrack with a large sample on the HPC at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants