by Alexander Keller (LMU Munich)
A simple script to process metabarcoding (e.g. 16S V4) data, with amplicons generated by
- 16S: Kozich et al. 2013 AEM
- ITS2: Sickel et al. 2015 BMC Ecology
If you use this script, please kindly cite this article: https://doi.org/10.1098/rstb.2021.0171
- VSEARCH https://github.com/torognes/vsearch
- SeqFilter https://github.com/BioInf-Wuerzburg/SeqFilter
- (USEARCH python scripts depreciated and work around is now integrated https://drive5.com/python/ )
- Also check the _DBs folder for Databases
- Un-gzipping files
- Individual sample preparation
- Merging forward and reverse reads
- Quality filtering
- Backup Option: Forward read only use in case of bad quality reverse reads
- Community level processing
- Dereplication
- Denoising
- ASV generation
- Chimera (de novo) removal
- Taxonomic classification
- allows for multiple reference databases (iterative) with decreasing priority
- all unclassified reads are hierarchically classified
- Creation of a community table
-
Put all your raw sequencing files (
.fastq
or.fastq.gz
) into a subfolder of where this script is (do not use full paths). -
Copy a config.txt from the resources folder, adapt it to your needs, and copy it into your data folder. Consier to check paths to binaries in the script file
-
You also need to add a
config.txt
file, where information about databases are stored. An example is in the example directory.
Then you are ready to run:
bash _processing_MB_0.2a.sh <FOLDER>
Results will be in a new subfolder of your current directory called <FOLDER>.<DATE>
In case the analysis needs to be reverted, which will remove files and bring the folder structure back to the original state.
bash _revert_analysis_1.sh <FOLDER>
In the <FOLDER>.<DATE>
folder, there will be an R script for data import and basic ecological analyses.