Some basic steps in microbial ecology, focusing on the processing of 2ndGen
Illumina fastq
data, into either amplicon
(e.g. 16S) or metagenomic
(e.g. shotgun) datasets, followed by ecology-based analysis of the communities and patterns we find in that data.
As above, the tutorial covers the following steps:
- Setting up your analysis -
bash
and friends - Checking your sequence data -
FastQC
&MultiQC
- Sequencing QC - filtering and trimming your sequences -
Trimmomatic
- Sequencing QC - purifying your sequences -
BowTie2
- Metagenomic Community profiling -
Kraken2
&Bracken
(orKaiju
if you like)
We also move through importing output from Kaiju
or Kraken2+Bracken
into R
(bare-bones): .
- importing data into
R
- generating a count matrix, taxonomic table, and phyloseq object from metagenomic data
This metagenomic workflow is also present in simple, no-nonsense, raw code
(note there might be differences to the complete workflow above).
raw code only of the metagenomic shotgun assembly
- as above, less explanation
Forthcoming. The initial steps (setup, get data, QC) are very similar in most cases (remember to cut off your primers!), but are followed by a denoising step (DADA2
) and optionally an attempt to predict the metabolic capabilities of the communities at hand (PICRUSt2
).
Still to be done. Although it's a simply enormous topic, it is also the real magic, and we get to make pictures. Until this section is properly fleshed out, consider instead this comprehensive methods (F1000) paper from DADA2's Callahan et al., this guide from AstroBioMike - Bioinformatics for beginners, and the steady pace of phyloseq which is an excellent on-ramp.
This guide to metagenomic analysis continues to be updated (April, 3023 April 5^th^ 3,024!). All (+/-)feedback is welcome: simply throw objects/comments directly at me, or drop us a line at the related repo.
all the best!