Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single threaded optimization #15

Merged
merged 8 commits into from
Jan 16, 2025
Merged

Single threaded optimization #15

merged 8 commits into from
Jan 16, 2025

Conversation

marissafujimoto
Copy link
Collaborator

@marissafujimoto marissafujimoto commented Jan 7, 2025

Description

Addresses some of #13. I ran pgmap under a profiler (vizviewer) and saw that the hamming function and the file parsing was taking the most time. We have been using seq.io for fastx parsing and it is not optimized for our use case (insofar as it parses then discards a lot of information about quality, coordinates, etc). I switched to just using python built-ins to parse fastx and that was ~2x improvement on my machine. Then using the levenshtein package for the sequence alignment functions which added a small boost as well.

To give a rough idea of performance we are looking at < 15 minutes to run on ~10gb fastq files on my macbook air m3. Then on the hutch cluster it may be 5x that so a bit over an hour for a typical file.

Dependent on #14

Type of change

Optimization. No functional change.

How Has This Been Tested?

Tests coverage unchanged.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

@marissafujimoto marissafujimoto marked this pull request as ready for review January 7, 2025 00:58
@marissafujimoto marissafujimoto mentioned this pull request Jan 16, 2025
12 tasks
Copy link
Collaborator

@cansavvy cansavvy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great and sounds like its a great start to making the code quicker!

@cansavvy
Copy link
Collaborator

I'm going to go ahead and merge unless you have any issues @marissafujimoto !

@cansavvy cansavvy merged commit bc8dfca into main Jan 16, 2025
5 checks passed
@cansavvy cansavvy deleted the optimization branch January 16, 2025 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants