Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This pull request partially implements the
query --format
functionality from bcftools.This pull request closes #50.
Approach
The approach consists of two components: a parser and a generator. The parser processes the query format string and produces a format specifiers list. The generator is a function that takes the root VCF Zarr group and generates the result of the query one line at a time. The generator's initializer composes the generator according to the structure of the format specifiers list.
Parser
I implement the parser using PyParsing. We used PyParsing to implement a parser in #49 as well.
Generator
The generator uses Python generators to yield query results one variant position at a time. This approach allows Python to iterate over each Zarr array's chunks independently. The high-level generator zips generators for each of the format specifiers and joins the results to produce a line for each variant position.
Query format language
This implementation does not support the full query format language that bcftools supports.
Here is what this implementation should support:
This implementation does not support looping over samples at a variant site. Additionally, some format specifiers supported by bcftools are recognized by this implementation's parser but lead to an error in the generator (e.g.
%END0
).Testing
I add unit tests and validation tests along with my changes. I ran the test suite to check that my changes have good coverage.
Example usage
References