Skip to content

Commit

Permalink
feat: add snpeff program to bin
Browse files Browse the repository at this point in the history
  • Loading branch information
shihabdider committed Oct 17, 2024
1 parent 002fc75 commit 92c7be5
Show file tree
Hide file tree
Showing 116 changed files with 666,716 additions and 0 deletions.
12 changes: 12 additions & 0 deletions bin/snpEff/LICENSE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@

Lincense: https://opensource.org/licenses/MIT

Copyright 2021, Pablo Cingolani

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Binary file added bin/snpEff/SnpSift.jar
Binary file not shown.
Binary file not shown.
Binary file added bin/snpEff/examples/1kg.head_chr1.vcf.gz
Binary file not shown.
8 changes: 8 additions & 0 deletions bin/snpEff/examples/cancer.ann.vcf
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
00:00:39.673 Reading cancer samples pedigree from file 'examples/samples_cancer_one.txt'.
##SnpEffVersion="4.1 (build 2015-01-07), by Pablo Cingolani"
##SnpEffCmd="SnpEff -cancer -cancerSamples examples/samples_cancer_one.txt GRCh37.75 examples/cancer.vcf "
##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ">
##INFO=<ID=LOF,Number=.,Type=String,Description="Predicted loss of function effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected' ">
##INFO=<ID=NMD,Number=.,Type=String,Description="Predicted nonsense mediated decay effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected' ">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Patient_01_Germline Patient_01_Somatic
1 69091 . A C,G . PASS AC=1;ANN=G|start_lost|HIGH|OR4F5|ENSG00000186092|transcript|ENST00000335137|protein_coding|1/1|c.1A>G|p.Met1?|1/918|1/918|1/305||,G-C|start_lost|HIGH|OR4F5|ENSG00000186092|transcript|ENST00000335137|protein_coding|1/1|c.1A>G|p.Leu1?|1/918|1/918|1/305||WARNING_REF_DOES_NOT_MATCH_GENOME,C|initiator_codon_variant|LOW|OR4F5|ENSG00000186092|transcript|ENST00000335137|protein_coding|1/1|c.1A>C|p.Met1?|1/918|1/918|1/305||;LOF=(OR4F5|ENSG00000186092|1|1.00) GT 1/0 2/1
7 changes: 7 additions & 0 deletions bin/snpEff/examples/cancer.eff.vcf
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
##SnpEffVersion="4.1 (build 2015-01-07), by Pablo Cingolani"
##SnpEffCmd="SnpEff -classic -cancer -cancerSamples examples/samples_cancer_one.txt testHg3775Chr1 examples/cancer.vcf "
##INFO=<ID=EFF,Number=.,Type=String,Description="Predicted effects for this variant.Format: 'Effect ( Effect_Impact | Functional_Class | Codon_Change | Amino_Acid_Change| Amino_Acid_length | Gene_Name | Transcript_BioType | Gene_Coding | Transcript_ID | Exon_Rank | Genotype_Number [ | ERRORS | WARNINGS ] )' ">
##INFO=<ID=LOF,Number=.,Type=String,Description="Predicted loss of function effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected' ">
##INFO=<ID=NMD,Number=.,Type=String,Description="Predicted nonsense mediated decay effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected' ">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Patient_01_Germline Patient_01_Somatic
1 69091 . A C,G . PASS AC=1;EFF=START_LOST(HIGH|MISSENSE|Atg/Gtg|M1V|305|OR4F5|protein_coding|CODING|ENST00000335137|1|G),START_LOST(HIGH|MISSENSE|Ctg/Gtg|L1V|305|OR4F5|protein_coding|CODING|ENST00000335137|1|G-C|WARNING_REF_DOES_NOT_MATCH_GENOME),NON_SYNONYMOUS_START(LOW|MISSENSE|Atg/Ctg|M1L|305|OR4F5|protein_coding|CODING|ENST00000335137|1|C);LOF=(OR4F5|ENSG00000186092|1|1.00) GT 1/0 2/1
2 changes: 2 additions & 0 deletions bin/snpEff/examples/cancer.vcf
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Patient_01_Germline Patient_01_Somatic
1 69091 . A C,G . PASS AC=1 GT 1/0 2/1
10 changes: 10 additions & 0 deletions bin/snpEff/examples/cancer_pedigree.ann.vcf
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
##PEDIGREE=<Derived=Patient_01_Somatic,Original=Patient_01_Germline>
##SnpEffVersion="4.1 (build 2015-01-07), by Pablo Cingolani"
##SnpEffCmd="SnpEff -cancer testHg3775Chr1 examples/cancer_pedigree.vcf "
##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ">
##INFO=<ID=LOF,Number=.,Type=String,Description="Predicted loss of function effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected' ">
##INFO=<ID=NMD,Number=.,Type=String,Description="Predicted nonsense mediated decay effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected' ">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Patient_01_Germline Patient_01_Somatic
1 69091 . A C,G . PASS AF=0.1122;ANN=G|start_lost|HIGH|OR4F5|ENSG00000186092|transcript|ENST00000335137|protein_coding|1/1|c.1A>G|p.Met1?|1/918|1/918|1/305||,G-C|start_lost|HIGH|OR4F5|ENSG00000186092|transcript|ENST00000335137|protein_coding|1/1|c.1A>G|p.Leu1?|1/918|1/918|1/305||WARNING_REF_DOES_NOT_MATCH_GENOME,C|initiator_codon_variant|LOW|OR4F5|ENSG00000186092|transcript|ENST00000335137|protein_coding|1/1|c.1A>C|p.Met1?|1/918|1/918|1/305||;LOF=(OR4F5|ENSG00000186092|1|1.00) GT 1/0 2/1
1 69849 . G A,C . PASS AF=0.1122;ANN=A|stop_gained|HIGH|OR4F5|ENSG00000186092|transcript|ENST00000335137|protein_coding|1/1|c.759G>A|p.Trp253*|759/918|759/918|253/305||,C-A|stop_lost|HIGH|OR4F5|ENSG00000186092|transcript|ENST00000335137|protein_coding|1/1|c.759G>C|p.Ter253Cysext*?|759/918|759/918|253/305||WARNING_REF_DOES_NOT_MATCH_GENOME,C|missense_variant|MODERATE|OR4F5|ENSG00000186092|transcript|ENST00000335137|protein_coding|1/1|c.759G>C|p.Trp253Cys|759/918|759/918|253/305|| GT 1/0 2/1
1 69511 . A C,G . PASS AF=0.3580;ANN=C|missense_variant|MODERATE|OR4F5|ENSG00000186092|transcript|ENST00000335137|protein_coding|1/1|c.421A>C|p.Thr141Pro|421/918|421/918|141/305||,G|missense_variant|MODERATE|OR4F5|ENSG00000186092|transcript|ENST00000335137|protein_coding|1/1|c.421A>G|p.Thr141Ala|421/918|421/918|141/305||,G-C|missense_variant|MODERATE|OR4F5|ENSG00000186092|transcript|ENST00000335137|protein_coding|1/1|c.421A>G|p.Pro141Ala|421/918|421/918|141/305||WARNING_REF_DOES_NOT_MATCH_GENOME GT 1/1 2/2
5 changes: 5 additions & 0 deletions bin/snpEff/examples/cancer_pedigree.vcf
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
##PEDIGREE=<Derived=Patient_01_Somatic,Original=Patient_01_Germline>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Patient_01_Germline Patient_01_Somatic
1 69091 . A C,G . PASS AF=0.1122 GT 1/0 2/1
1 69849 . G A,C . PASS AF=0.1122 GT 1/0 2/1
1 69511 . A C,G . PASS AF=0.3580 GT 1/1 2/2
7 changes: 7 additions & 0 deletions bin/snpEff/examples/example_motif.vcf
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
1 1060235 . G A . PASS AC=16
1 1250957 . G A . PASS AC=9
1 1310924 . T C . PASS AC=948
1 1368599 . A C . PASS AC=920
1 2182470 . G A . PASS AC=743
1 2466633 . A G . PASS AC=770
1 2480337 . G A . PASS AC=16
86 changes: 86 additions & 0 deletions bin/snpEff/examples/examples.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
#!/bin/sh

#-------------------------------------------------------------------------------
#
# Command lines for SnpEff's manua (examples)
#
#
# Pablo Cingolani
#-------------------------------------------------------------------------------

genome="GRCh37.75"
genome="testHg3775Chr1" # Note: Sometimes we can use testHg3775Chr1 instead of GRCh37.75 ('testHg3775Chr1' only loads chr1 so it's faster)
genome22="testHg3775Chr22" # Note: Sometimes we can use testHg3775Chr22 instead of GRCh37.75 ('testHg3775Chr22' only loads chr22 so it's faster)

#---
# Multiple annotations per variant examples
#---

# java -Xmx4g -jar snpEff.jar $genome examples/variants_1.vcf > examples/variants_1.ann.vcf
#
# java -Xmx4g -jar snpEff.jar $genome examples/variants_2.vcf > examples/variants_2.ann.vcf

#---
# Cancer examples
#---

# java -Xmx4g -jar snpEff.jar -v -cancer -cancerSamples examples/samples_cancer_one.txt $genome examples/cancer.vcf > examples/cancer.ann.vcf
#
# java -Xmx4g -jar snpEff.jar -v -classic -cancer -cancerSamples examples/samples_cancer_one.txt $genome examples/cancer.vcf > examples/cancer.eff.vcf
#
# java -Xmx4g -jar snpEff.jar -v -cancer $genome examples/cancer_pedigree.vcf > examples/cancer_pedigree.ann.vcf

#---
# Regulatory variants
#---

# java -Xmx4g -jar snpEff.jar -v -reg HeLa-S3 -reg NHEK $genome examples/test.1KG.vcf > examples/test.1KG.ann_reg.vcf

#---
# Encode example
#---

# # Create a directory for ENCODE files
# mkdir -p db/encode
#
# # Download ENCODE experimental results (BigBed file)
# cd db/encode
# wget "http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/integration_data_jan2011/byDataType/openchrom/jan2011/fdrPeaks/wgEncodeDukeDnase8988T.fdr01peaks.hg19.bb"
# cd -
#
# # Annotate using ENCODE's data:
# java -Xmx4g -jar snpEff.jar -v -interval db/encode/wgEncodeDukeDnase8988T.fdr01peaks.hg19.bb $genome examples/test.1KG.vcf > examples/test.1KG.ann_encode.vcf

#---
# Annotation example
#---

# java -Xmx4g -jar snpEff.jar -v $genome22 examples/test.chr22.vcf > examples/test.chr22.ann.vcf

#---
# SnpSift Filter examples
#---

#java -jar SnpSift.jar filter "ANN[0].EFFECT = 'missense_variant'" examples/test.chr22.ann.vcf > examples/test.chr22.ann.filter_missense_first.vcf

#java -jar SnpSift.jar filter "ANN[*].EFFECT = 'missense_variant'" examples/test.chr22.ann.vcf > examples/test.chr22.ann.filter_missense_any.vcf

#java -jar SnpSift.jar filter "(ANN[*].EFFECT = 'missense_variant') && (ANN[*].GENE = 'TRMT2A')" examples/test.chr22.ann.vcf > examples/test.chr22.ann.filter_missense_any_TRMT2A.vcf

#java -jar SnpSift.jar filter "( GEN[HG00096].DS > 0.2 ) & ( GEN[HG00097].DS > 0.5 )" examples/1kg.head_chr1.vcf.gz > examples/1kg.head_chr1.filtered.vcf
#gzip examples/1kg.head_chr1.filtered.vcf

#---
# SnpSift extractFields examples
#---

#java -jar SnpSift.jar extractFields -s "," -e "." examples/test.chr22.ann.vcf CHROM POS REF ALT "ANN[*].EFFECT" "ANN[*].HGVS_P" > examples/test.chr22.ann.txt

#java -jar SnpSift.jar extractFields examples/test.chr22.ann.vcf CHROM POS REF ALT "ANN[*].EFFECT" > examples/test.chr22.ann.txt

#cat examples/test.chr22.ann.vcf \
# | ./scripts/vcfEffOnePerLine.pl \
# | java -jar SnpSift.jar extractFields - CHROM POS REF ALT "ANN[*].EFFECT" \
# > examples/test.chr22.ann.one_per_line.txt

# java -jar SnpSift.jar extractFields examples/1kg.head_chr1.vcf.gz CHROM POS REF ALT "GEN[HG00096].DS" "GEN[HG00097].DS" #> examples/1kg.head_chr1.txt
15,428 changes: 15,428 additions & 0 deletions bin/snpEff/examples/file.vcf

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions bin/snpEff/examples/intervals.bed
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
2L 10000 10999
2L 12000 12999
2L 14000 14999
2L 16000 16999
2L 18000 18999
1 change: 1 addition & 0 deletions bin/snpEff/examples/my_annotations.bed
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1 10000 20000 MY_ANNOTATION
4 changes: 4 additions & 0 deletions bin/snpEff/examples/samples_cancer.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Patient_01_Germline Patient_01_Somatic
Patient_02_Germline Patient_02_Somatic
Patient_03_Germline Patient_03_Somatic
Patient_04_Germline Patient_04_Somatic
1 change: 1 addition & 0 deletions bin/snpEff/examples/samples_cancer_one.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Patient_01_Germline Patient_01_Somatic
Loading

0 comments on commit 92c7be5

Please sign in to comment.