Skip to content

Citation and References

Lucas Czech edited this page Aug 12, 2022 · 23 revisions

Reference

When using grenepipe, please cite:

Lucas Czech, Moises Exposito-Alonso.
grenepipe: A flexible, scalable, and reproducible pipeline to automate variant and frequency calling from sequence reads.
arXiv. 2021.
arXiv:2103.15167

Furthermore, please do not forget to cite all tools that you selected to be run for your analysis. See below for their references.

Read trimming

AdapterRemoval

Lindgreen S.
AdapterRemoval: Easy cleaning of next-generation sequencing reads.
BMC Res Notes. 2012.
doi:10.1186/1756-0500-5-337

Schubert M, Lindgreen S, Orlando L.
AdapterRemoval v2: Rapid adapter trimming, identification, and read merging.
BMC Res Notes. 2016.
doi:10.1186/s13104-016-1900-2

Cutadapt

Martin M.
Cutadapt removes adapter sequences from high-throughput sequencing reads.
EMBnet journal. 2011.
doi:10.14806/ej.17.1.200

fastp

Chen S, Zhou Y, Chen Y, Gu J.
fastp: an ultra-fast all-in-one FASTQ preprocessor.
Bioinformatics. 2018.
doi:10.1093/bioinformatics/bty560

SeqPrep

John, JS.
SeqPrep: Tool for stripping adaptors and/or merging paired reads with overlap into single reads.
https://github.com/jstjohn/SeqPrep

skewer

Jiang H, Lei R, Ding S-W, Zhu S.
Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads.
BMC Bioinformatics. 2014.
doi:10.1186/1471-2105-15-182

trimmomatic

Bolger AM, Lohse M, Usadel B.
Trimmomatic: A flexible trimmer for Illumina sequence data.
Bioinformatics. 2014.
doi:10.1093/bioinformatics/btu170

Read mapping, duplication removal, and quality score recalibration

Bowtie 2

Langmead B, Salzberg SL.
Fast gapped-read alignment with Bowtie 2.
Nat Methods. 2012.
doi:10.1038/nmeth.1923

bwa mem and bwa aln

Li H, Durbin R.
Fast and accurate short read alignment with Burrows-Wheeler transform.
Bioinformatics. 2009.
doi:10.1093/bioinformatics/btp324

bwa mem2

Vasimuddin M, Misra S, Li H, Aluru S.
Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems.
2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 2019.
doi:10.1109/IPDPS.2019.00041

BamUtil clipOverlap

Jun G, Wing MK, Abecasis GR, Kang HM.
An efficient and scalable analysis framework for variant extraction and refinement from population scale DNA sequence data.
Genome Research, 25(6), gr.176552.114. 2015.
doi:10.1101/GR.176552.114

Picard MarkDuplicates

Broad Institute.
Picard toolkit.
Broad Institute; 2018.
GitHub repository, online: http://broadinstitute.github.io/picard/

DeDup

Peltzer A, Jäger G, Herbig A, Seitz A, Kniep C, Krause J, et al.
EAGER: efficient ancient genome reconstruction.
Genome Biol. 2016.
doi:10.1186/s13059-016-0918-z

GATK BaseRecalibrator

McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al.
The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data.
Genome Res. 2010.
doi:10.1101/GR.107524.110

samtool merge and samtool mpileup

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al.
The Sequence Alignment/Map format and SAMtools.
Bioinformatics. 2009.
doi:10.1093/bioinformatics/btp352

Damage profiling

mapDamage

Ginolhac A, Rasmussen M, Gilbert MTP, Willerslev E, Orlando L.
mapDamage: testing for damage patterns in ancient DNA sequences.
Bioinformatics. 2011.
doi:10.1093/bioinformatics/btr347

Jónsson H, Ginolhac A, Schubert M, Johnson PLF, Orlando L.
mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters.
Bioinformatics. 2013.
doi:10.1093/bioinformatics/btt193

DamageProfiler

Neukamm J, Peltzer A, Nieselt K.
DamageProfiler: Fast damage pattern calculation for ancient DNA.
bioRxiv. 2020.
doi:10.1101/2020.10.01.322206

Variant calling, genotyping, and filtering

bcftools call

Li H.
A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.
Bioinformatics. 2011.
doi:10.1093/bioinformatics/btr509

freebayes

Garrison E, Marth G.
Haplotype-based variant detection from short-read sequencing.
arXiv. 2012.
arxiv:1207.3907

Neph S, Kuehn MS, Reynolds AP, Haugen E, Thurman RE, Johnson AK, et al.
BEDOPS: high-performance genomic feature operations.
Bioinformatics. 2012;28.
doi:10.1093/bioinformatics/bts277

GATK HaplotypeCaller, GATK SelectVariants, GATK VariantFiltration, GATK VariantRecalibrator

McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al.
The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data.
Genome Res. 2010.
doi:10.1101/GR.107524.110

Frequency calling

HAF-pipe

Tilk S, Bergland A, Goodman A, Schmidt P, Petrov D, Greenblum S.
Accurate Allele Frequencies from Ultra-low Coverage Pool-Seq Samples in Evolve-and-Resequence Experiments.
G3: Genes|Genomes|Genetics. 2019.
doi:10.1534/g3.119.400755

Kessner D, Turner T, Novembre J.
Maximum Likelihood Estimation of Frequencies of Known Haplotypes from Pooled Sequence Data.
Molecular Biology and Evolution. 2013.
doi:10.1093/molbev/mst016

Quality control, statistics, SNP annotation, reporting

FastQC

Andrews S.
FastQC: a quality control tool for high throughput sequence data.
Babraham Bioinformatics, Babraham Institute, Cambridge, United Kingdom; 2010.
Online: https://www.bioinformatics.babraham.ac.uk/projects/fastqc

samtool stats and samtool flagstat

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al.
The Sequence Alignment/Map format and SAMtools.
Bioinformatics. 2009.
doi:10.1093/bioinformatics/btp352

QualiMap

Okonechnikov K, Conesa A, García-Alcalde F.
Qualimap 2: Advanced multi-sample quality control for high-throughput sequencing data.
Bioinformatics. 2016.
doi:10.1093/bioinformatics/btv566

Picard CollectMultipleMetrics

Broad Institute.
Picard toolkit.
Broad Institute; 2018.
GitHub repository, online: http://broadinstitute.github.io/picard/

snpEff

Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, et al.
A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3.
Fly. 2012.
doi:10.4161/fly.19695

VEP

McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, Flicek P, Cunningham F.
The Ensembl Variant Effect Predictor.
Genome Biology. 2016.
doi:10.1186/s13059-016-0974-4

SeqKit

Shen W, Le S, Li Y, Hu F.
SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation.
PLOS ONE 11(10), e0163962. 2016.
doi:10.1371/journal.pone.0163962

bcftools stats

Li H.
A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.
Bioinformatics. 2011.
doi:10.1093/bioinformatics/btr509

MultiQC

Ewels P, Magnusson M, Lundin S, Käller M.
MultiQC: Summarize analysis results for multiple tools and samples in a single report.
Bioinformatics. 2016.
doi:10.1093/bioinformatics/btw354

Additional references

Snakemake

Köster J, Rahmann S.
Snakemake--a scalable bioinformatics workflow engine.
Bioinformatics. 2012.
doi:10.1093/bioinformatics/bts480

Mölder F, Jablonski KP, Letcher B, Hall MB, Tomkins-Tinch CH, Sochat V, Forster J, Lee S, Twardziok SO, Kanitz A, Wilm A, Holtgrewe M, Rahmann S, Nahnsen S, Köster J.
Sustainable data analysis with Snakemake.
F1000Res 10, 33. 2021.
doi:10.12688/f1000research.29032.2

Bioconda

Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, et al.
Bioconda: A sustainable and comprehensive software distribution for the life sciences.
Nat Methods. 2018.
doi:10.1038/s41592-018-0046-7

Fastq file format

Cock PJA, Fields CJ, Goto N, Heuer ML, Rice PM.
The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants.
Nucleic Acids Res. 2009.
doi:10.1093/nar/gkp1137

Fasta file format

Pearson WR, Lipman DJ.
Improved tools for biological sequence comparison.
Proceedings of the National Academy of Sciences. 1988.
doi:10.1073/pnas.85.8.2444

SAM/BAM file format

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup.
The Sequence Alignment/Map format and SAMtools.
Bioinformatics. 2009.
doi:10.1093/bioinformatics/btp352

VCF file format

Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al.
The variant call format and VCFtools.
Bioinformatics. 2011.
doi:10.1093/bioinformatics/btr330

GrENE-net

Genomics of rapid Evolution in Novel Environments network (GrENE-net).
Online: https://grenenet.org/

Clone this wiki locally