Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vcztools view performance #93

Merged
merged 2 commits into from
Oct 22, 2024
Merged

vcztools view performance #93

merged 2 commits into from
Oct 22, 2024

Conversation

Will-Tyler
Copy link
Contributor

Overview

This pull request adds the view command to the set of performance testing commands.

This pull request also closes #92.

Testing

I ran the new performance command manually.

For the #92 fix, I verified the fix by running vcztools view performance/data/chr22.vcz -o /dev/null.

Results

cd performance
python -m compare 1
bcftools view data/chr22.vcf.gz
1.67GiB 0:00:10 [ 156MiB/s] [                             <=>                                                                                                         ]

real    0m10.931s
user    0m10.356s
sys     0m0.522s

vcztools view data/chr22.vcz
1.67GiB 0:00:40 [42.4MiB/s] [                                                                <=>                                                                      ]

real    0m40.290s
user    0m31.387s
sys     0m9.024s

Profiling

cProfile.run('write_vcf("performance/data/chr22.vcz", os.devnull)', sort=SortKey.TIME)
         182103 function calls in 39.822 seconds
   Ordered by: internal time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    21541   20.241    0.001   20.241    0.001 {method 'encode' of '_vcztools.VcfEncoder' objects}
      164    7.864    0.048    7.870    0.048 core.py:2343(_decode_chunk)
       26    6.919    0.266    6.919    0.266 {method 'astype' of 'numpy.ndarray' objects}
      220    2.659    0.012   10.533    0.048 core.py:2013(_process_chunk)
        1    0.810    0.810   39.822   39.822 vcf_writer.py:80(write_vcf)
      140    0.498    0.004   11.178    0.080 core.py:2108(_chunk_getitems)
      140    0.310    0.002   11.490    0.082 core.py:1316(_get_selection)
        3    0.162    0.054   38.963   12.988 vcf_writer.py:284(c_chunk_to_vcf)
      499    0.159    0.000    0.159    0.000 {method 'read' of '_io.BufferedReader' objects}
    25005    0.115    0.000    0.115    0.000 {built-in method builtins.print}
      500    0.011    0.000    0.011    0.000 {built-in method io.open}

References

@Will-Tyler
Copy link
Contributor Author

I'll take a look at the implementation to see if I can identify any optimizations we can make. I'll use GitHub issues to discuss any proposals.

@jeromekelleher
Copy link
Contributor

Very nice.

@jeromekelleher jeromekelleher merged commit d152af1 into sgkit-dev:main Oct 22, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

vcztools view: allow writing to /dev/null
2 participants