Computational pipelines for filtering host DNA and evaluating the microbiome abundance from biopsy WGS samples.
Zhang C, Cleveland K, Schnoll-Sussman F, McClure B, Bigg M, Thakkar P, Schultz M, Shah M, and Betel D. Identification of the gastric microbiome from endoscopic biopsy samples using whole genome sequencing. Genome Biology. 2015
Download: BWA 0.6.2
hg19 Human reference genome: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/chromosomes/
Additional human genomes: ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens//ARCHIVE/BUILD.37.3/Assembled_chromosomes/seq/
Download: RepeatMasker 4.0.5
Database: [Repbase for RepeatMasker] (http://www.girinst.org/)
Download: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.27/
Database: [Combined human sequence database] (http://www.girinst.org/)
Download: Bowtie2 2.2.5
NCBI bacteria compelete genomes: ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/
Download: PathoScope 2.0
Download: R 3.0.2
Download: samtools 0.1.19
Set the correct paths to above software and databases in configs.sh
Step 1. Align all reads to hg19 Human reference genome with BWA
./Filter_S1.sh
Step 2. Align the remaining unmapped reads to additional human genomes with BWA
./Filter_S2.sh
Step 3. Filter the remaining unmapped reads with RepearMasker
./Filter_S3.sh
Step 4. Align the remaining reads to human sequence database with BLAST
./Filter_S4.sh
Step 5. Align the remaining reads to human sequence database with MegaBLAST
./Filter_S5.sh
Step 1. Align fully filtered reads to NCBI bacteria compelete genomes with Bowtie2
Step 2. Calculate the genome coverages to further remove false identification
Step 3. Assign the multiple mapped the reads with PathoScope
Step 4. Re-evaluate the relative abundances of microbiome
./Profiling.sh