Skip to content

Improve cigarstring performance #1295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

limwz01
Copy link

@limwz01 limwz01 commented Jun 25, 2024

Use direct string manipulation instead of inefficient Python joins

@limwz01
Copy link
Author

limwz01 commented Apr 22, 2025

@AndreasHeger @jmarshall @bioinformed any chance of merging this? Thanks!

@jmarshall
Copy link
Member

This PR is certainly on my shortlist to merge in due course, though I'd probably prefer if it used kstring rather than manual memory management. Have you done any timing testing?

@limwz01
Copy link
Author

limwz01 commented Apr 22, 2025

I did timing testing quite long ago, and there was substantial improvement, but I've just tested with the latest version on Python 3.12.1 using a ~100MB BAM file:

    tm = timeit.default_timer()
    with pysam.AlignmentFile(args.bam) as in_bam_f:
        for read in in_bam_f:
            read.cigarstring
    print("finished reading BAM in %f seconds"%(timeit.default_timer() - tm))
without optimisation: 5.821680s
with optimisation: 3.519799s

limwz01 added 2 commits April 22, 2025 19:38
Use direct string manipulation instead of inefficient Python joins
@limwz01
Copy link
Author

limwz01 commented Apr 22, 2025

nice, kstring is even faster even though I don't see it using a pre-allocated array:

without optimisation: 5.821680s
with optimisation: 2.439370s

There was a problem using the pattern in to_string:

            ret = force_str(buf.s[:buf.l])

            if buf.m:
                free(buf.s)

            return ret

I had to revert to using try:

            try:
                return buf.s[:buf.l].decode("utf8")
            finally:
                if buf.m:
                    free(buf.s)

If anyone can help to make the original pattern work, that would be great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants