-
Notifications
You must be signed in to change notification settings - Fork 21
Citation and References
When using grenepipe, please cite:
Lucas Czech, Moises Exposito-Alonso.
grenepipe: A flexible, scalable, and reproducible pipeline to automate variant and frequency calling from sequence reads.
arXiv. 2021.
arXiv:2103.15167
Furthermore, please do not forget to cite all tools that you selected to be run for your analysis. See below for their references.
AdapterRemoval
Lindgreen S.
AdapterRemoval: Easy cleaning of next-generation sequencing reads.
BMC Res Notes. 2012.
doi:10.1186/1756-0500-5-337
Schubert M, Lindgreen S, Orlando L.
AdapterRemoval v2: Rapid adapter trimming, identification, and read merging.
BMC Res Notes. 2016.
doi:10.1186/s13104-016-1900-2
Cutadapt
Martin M.
Cutadapt removes adapter sequences from high-throughput sequencing reads.
EMBnet journal. 2011.
doi:10.14806/ej.17.1.200
fastp
Chen S, Zhou Y, Chen Y, Gu J.
fastp: an ultra-fast all-in-one FASTQ preprocessor.
Bioinformatics. 2018.
doi:10.1093/bioinformatics/bty560
SeqPrep
John, JS.
SeqPrep: Tool for stripping adaptors and/or merging paired reads with overlap into single reads.
https://github.com/jstjohn/SeqPrep
skewer
Jiang H, Lei R, Ding S-W, Zhu S.
Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads.
BMC Bioinformatics. 2014.
doi:10.1186/1471-2105-15-182
trimmomatic
Bolger AM, Lohse M, Usadel B.
Trimmomatic: A flexible trimmer for Illumina sequence data.
Bioinformatics. 2014.
doi:10.1093/bioinformatics/btu170
Bowtie 2
Langmead B, Salzberg SL.
Fast gapped-read alignment with Bowtie 2.
Nat Methods. 2012.
doi:10.1038/nmeth.1923
bwa mem and bwa aln
Li H, Durbin R.
Fast and accurate short read alignment with Burrows-Wheeler transform.
Bioinformatics. 2009.
doi:10.1093/bioinformatics/btp324
bwa mem2
Vasimuddin M, Misra S, Li H, Aluru S.
Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems.
2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 2019.
doi:10.1109/IPDPS.2019.00041
BamUtil clipOverlap
Jun G, Wing MK, Abecasis GR, Kang HM.
An efficient and scalable analysis framework for variant extraction and refinement from population scale DNA sequence data.
Genome Research, 25(6), gr.176552.114. 2015.
doi:10.1101/GR.176552.114
Picard MarkDuplicates
Broad Institute.
Picard toolkit.
Broad Institute; 2018.
GitHub repository, online: http://broadinstitute.github.io/picard/
DeDup
Peltzer A, Jäger G, Herbig A, Seitz A, Kniep C, Krause J, et al.
EAGER: efficient ancient genome reconstruction.
Genome Biol. 2016.
doi:10.1186/s13059-016-0918-z
GATK BaseRecalibrator
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al.
The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data.
Genome Res. 2010.
doi:10.1101/GR.107524.110
samtool merge and samtool mpileup
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al.
The Sequence Alignment/Map format and SAMtools.
Bioinformatics. 2009.
doi:10.1093/bioinformatics/btp352
mapDamage
Ginolhac A, Rasmussen M, Gilbert MTP, Willerslev E, Orlando L.
mapDamage: testing for damage patterns in ancient DNA sequences.
Bioinformatics. 2011.
doi:10.1093/bioinformatics/btr347
Jónsson H, Ginolhac A, Schubert M, Johnson PLF, Orlando L.
mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters.
Bioinformatics. 2013.
doi:10.1093/bioinformatics/btt193
DamageProfiler
Neukamm J, Peltzer A, Nieselt K.
DamageProfiler: Fast damage pattern calculation for ancient DNA.
bioRxiv. 2020.
doi:10.1101/2020.10.01.322206
bcftools call
Li H.
A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.
Bioinformatics. 2011.
doi:10.1093/bioinformatics/btr509
freebayes
Garrison E, Marth G.
Haplotype-based variant detection from short-read sequencing.
arXiv. 2012.
arxiv:1207.3907
Neph S, Kuehn MS, Reynolds AP, Haugen E, Thurman RE, Johnson AK, et al.
BEDOPS: high-performance genomic feature operations.
Bioinformatics. 2012;28.
doi:10.1093/bioinformatics/bts277
GATK HaplotypeCaller, GATK SelectVariants, GATK VariantFiltration, GATK VariantRecalibrator
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al.
The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data.
Genome Res. 2010.
doi:10.1101/GR.107524.110
FastQC
Andrews S.
FastQC: a quality control tool for high throughput sequence data.
Babraham Bioinformatics, Babraham Institute, Cambridge, United Kingdom; 2010.
Online: https://www.bioinformatics.babraham.ac.uk/projects/fastqc
samtool stats and samtool flagstat
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al.
The Sequence Alignment/Map format and SAMtools.
Bioinformatics. 2009.
doi:10.1093/bioinformatics/btp352
QualiMap
Okonechnikov K, Conesa A, García-Alcalde F.
Qualimap 2: Advanced multi-sample quality control for high-throughput sequencing data.
Bioinformatics. 2016.
doi:10.1093/bioinformatics/btv566
Picard CollectMultipleMetrics
Broad Institute.
Picard toolkit.
Broad Institute; 2018.
GitHub repository, online: http://broadinstitute.github.io/picard/
snpEff
Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, et al.
A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3.
Fly. 2012.
doi:10.4161/fly.19695
VEP
McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, Flicek P, Cunningham F.
The Ensembl Variant Effect Predictor.
Genome Biology. 2016.
doi:10.1186/s13059-016-0974-4
SeqKit
Shen W, Le S, Li Y, Hu F.
SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation.
PLOS ONE 11(10), e0163962. 2016.
doi:10.1371/journal.pone.0163962
bcftools stats
Li H.
A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.
Bioinformatics. 2011.
doi:10.1093/bioinformatics/btr509
MultiQC
Ewels P, Magnusson M, Lundin S, Käller M.
MultiQC: Summarize analysis results for multiple tools and samples in a single report.
Bioinformatics. 2016.
doi:10.1093/bioinformatics/btw354
Snakemake
Köster J, Rahmann S.
Snakemake--a scalable bioinformatics workflow engine.
Bioinformatics. 2012.
doi:10.1093/bioinformatics/bts480
Mölder F, Jablonski KP, Letcher B, Hall MB, Tomkins-Tinch CH, Sochat V, Forster J, Lee S, Twardziok SO, Kanitz A, Wilm A, Holtgrewe M, Rahmann S, Nahnsen S, Köster J.
Sustainable data analysis with Snakemake.
F1000Res 10, 33. 2021.
doi:10.12688/f1000research.29032.2
Bioconda
Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, et al.
Bioconda: A sustainable and comprehensive software distribution for the life sciences.
Nat Methods. 2018.
doi:10.1038/s41592-018-0046-7
Fastq file format
Cock PJA, Fields CJ, Goto N, Heuer ML, Rice PM.
The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants.
Nucleic Acids Res. 2009.
doi:10.1093/nar/gkp1137
Fasta file format
Pearson WR, Lipman DJ.
Improved tools for biological sequence comparison.
Proceedings of the National Academy of Sciences. 1988.
doi:10.1073/pnas.85.8.2444
SAM/BAM file format
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup.
The Sequence Alignment/Map format and SAMtools.
Bioinformatics. 2009.
doi:10.1093/bioinformatics/btp352
VCF file format
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al.
The variant call format and VCFtools.
Bioinformatics. 2011.
doi:10.1093/bioinformatics/btr330
GrENE-net
Genomics of rapid Evolution in Novel Environments network (GrENE-net).
Online: https://grenenet.org/