Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support filtering a VariantCollection according to an intervals list file #59

Open
timodonnell opened this issue Apr 15, 2015 · 2 comments

Comments

@timodonnell
Copy link
Contributor

For some of our analyses it would be helpful to be able to filter a VariantCollection to only those variants that fall within the intended capture targets.

Example intervals list:

[odonnt02@minerva4 ~]$ head /sc/orga/projects/ngs/resources/captures/2.3/Human_All_Exon_V5.hg19.interval_list
@HD     VN:1.4  SO:unsorted
@SQ     SN:chrM LN:16571        UR:file:/gs01/projects/ngs/resources/gatk/2.3/ucsc.hg19.parmasked.fasta M5:d2ed829b8a1628d16cbeee88e88e39eb
@SQ     SN:chr1 LN:249250621    UR:file:/gs01/projects/ngs/resources/gatk/2.3/ucsc.hg19.parmasked.fasta M5:1b22b98cdeb4a9304cb5d48026a85128
@SQ     SN:chr2 LN:243199373    UR:file:/gs01/projects/ngs/resources/gatk/2.3/ucsc.hg19.parmasked.fasta M5:a0d9851da00400dec1098a9255ac712e
@SQ     SN:chr3 LN:198022430    UR:file:/gs01/projects/ngs/resources/gatk/2.3/ucsc.hg19.parmasked.fasta M5:641e4338fa8d52a5b781bd2a2c08d3c3
@SQ     SN:chr4 LN:191154276    UR:file:/gs01/projects/ngs/resources/gatk/2.3/ucsc.hg19.parmasked.fasta M5:23dccd106897542ad87d2765d28a19a1
@SQ     SN:chr5 LN:180915260    UR:file:/gs01/projects/ngs/resources/gatk/2.3/ucsc.hg19.parmasked.fasta M5:0740173db9ffd264d728f32784845cd7
@SQ     SN:chr6 LN:171115067    UR:file:/gs01/projects/ngs/resources/gatk/2.3/ucsc.hg19.parmasked.fasta M5:1d3a93a248d92a729ee764823acbbc6b
@SQ     SN:chr7 LN:159138663    UR:file:/gs01/projects/ngs/resources/gatk/2.3/ucsc.hg19.parmasked.fasta M5:618366e953d6aaad97dbe4777c29375e
@SQ     SN:chr8 LN:146364022    UR:file:/gs01/projects/ngs/resources/gatk/2.3/ucsc.hg19.parmasked.fasta M5:96f514a9929e410c6651697bded59aec

GATK also supports a few other formats (probably not needed here though): https://www.broadinstitute.org/gatk/guide/article?id=1319

@iskandr
Copy link
Contributor

iskandr commented Apr 15, 2015

I'm also thinking about the best way to filter variants by expression level (of either genes, transcripts, or allele-specific read count). Do you think these two use cases have enough in common to suggest a filtering API?

@timodonnell
Copy link
Contributor Author

Maybe just a VariantCollection.filter function that takes a variant -> bool callable and returns a new VariantCollection?

Then could add a new module with filter implementations, including my and your examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants