-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic support for '--targets/-t' #18
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
115cbea
Add basic support for '--targets/-t'
tomwhite 3d85446
Cover cases of missing start/end for targets string
tomwhite 94c84dd
Add a unit test for targets filter
tomwhite 9d8f88e
Fix off-by-one error for VCF regions
tomwhite 516e960
Add support for targets in numba too
tomwhite 2c6c696
Implement using pyranges
tomwhite File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
from typing import Optional | ||
import pytest | ||
from vcztools.regions import parse_region | ||
|
||
|
||
@pytest.mark.parametrize( | ||
"targets, expected", | ||
[ | ||
("chr1", ("chr1", None, None)), | ||
("chr1:12", ("chr1", 12, 12)), | ||
("chr1:12-", ("chr1", 12, None)), | ||
("chr1:12-103", ("chr1", 12, 103)), | ||
], | ||
) | ||
def test_parse_region( | ||
targets: str, expected: tuple[str, Optional[int], Optional[int]] | ||
): | ||
assert parse_region(targets) == expected |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
import re | ||
from typing import Any, List, Optional | ||
|
||
import numpy as np | ||
import pandas as pd | ||
import pyranges | ||
|
||
|
||
def parse_region(region: str) -> tuple[str, Optional[int], Optional[int]]: | ||
"""Return the contig, start position and end position from a region string.""" | ||
if re.search(r":\d+-\d*$", region): | ||
contig, start_end = region.rsplit(":", 1) | ||
start, end = start_end.split("-") | ||
return contig, int(start), int(end) if len(end) > 0 else None | ||
elif re.search(r":\d+$", region): | ||
contig, start = region.rsplit(":", 1) | ||
return contig, int(start), int(start) | ||
else: | ||
contig = region | ||
return contig, None, None | ||
|
||
|
||
def parse_targets(targets: str) -> list[tuple[str, Optional[int], Optional[int]]]: | ||
return [parse_region(region) for region in targets.split(",")] | ||
|
||
|
||
def regions_to_selection( | ||
all_contigs: List[str], | ||
variant_contig: Any, | ||
variant_position: Any, | ||
regions: list[tuple[str, Optional[int], Optional[int]]], | ||
): | ||
# subtract 1 from start coordinate to convert intervals | ||
# from VCF (1-based, fully-closed) to Python (0-based, half-open) | ||
|
||
df = pd.DataFrame({"Chromosome": variant_contig, "Start": variant_position - 1, "End": variant_position}) | ||
# save original index as column so we can retrieve it after finding overlap | ||
df["index"] = df.index | ||
variants = pyranges.PyRanges(df) | ||
|
||
chromosomes = [] | ||
starts = [] | ||
ends = [] | ||
for contig, start, end in regions: | ||
if start is None: | ||
start = 0 | ||
else: | ||
start -= 1 | ||
|
||
if end is None: | ||
end = np.iinfo(np.int64).max | ||
|
||
chromosomes.append(all_contigs.index(contig)) | ||
starts.append(start) | ||
ends.append(end) | ||
|
||
query = pyranges.PyRanges(chromosomes=chromosomes, starts=starts, ends=ends) | ||
|
||
overlap = variants.overlap(query) | ||
return overlap.df["index"].to_numpy() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this would return a
Region
object, like the one in bio2zarr, but I'm not sure which way the dependency is between the two projects (if indeed there is one). So I left it as a tuple for now.