Skip to content

Latest commit

 

History

History
65 lines (45 loc) · 6.51 KB

mim2_tool_resources.md

File metadata and controls

65 lines (45 loc) · 6.51 KB

Environment Setup

Objective

The tools and resources presented here are necessary and useful to carry out the bioinformatic pipeline(s) used in the Van Bael Lab for ITS and 16S sequence data. The main repository can be found here. The pipeline below is mainly focused on DADA2 pipeline presented by Benjamin Callahan et al. in its ITS and 16S variants. The pipeline follows closely the DADA2 pipeline, the work of Mareli Sánchez Juliá for her MAMF project, and Farrer Lab at Tulane University. A hybrid approach with the USEARCH and VSEARCH tools can be employed and then assign taxonomy with the dada2 function assignTaxonomy.

Resources

The list below is to help guide your search and aid your bioinformatics journey. Each project and sequencing run is different and should be treated as such. Determine the parameters to filter, cut, trim and truncate accordingly.

Van Bael Files

  • VBL_Bioinformatics folder in the Google Drive were you can find scripts and notes on bioinformatic pipelines from past graduate students and post-docs.
  • Farrer Lab repository. They work on similar data sets.

DADA2

Bionformatic software tools and environments

R and Python3

A lot of the bioinformatic pipelines take place in a Unix/Linux environment. If you have Mac OS then half the troubles are gone. If you have Windows OS you will need to install a Virtual Machine to operate a Linux platform (e.g. Ubuntu). Although various Unix-like environments and command-line interfaces for Windows exist they have limitations and not all packages or modules are supported by them.

Python comes installed in most OS and different versions can coexist and are usually employed simultaneously by apps and software. Verify the minimum version needed for your application. FastQC, MultiQC, Bioconductor and other bioinformatics tools use Python3+ (e.g. > 3.7). If you are not familiar with the Python ecosystem, stop and take a step back. Think about where you want to install these tools. In your virtual machine, laptop, lab computer? What file paths (location in computer)? Try to think about these things before installing to avoid installing in random places and then not knowing where it is and being unable to call a command. It is best to execute the bioinformatic pipelines in a HPC cluster or local computer. Avoid having the files in the lab's Google Drive and trying to access it this way. It is possible but the shortcuts provided by Google Drive make for long file paths and are not accessible through Virtual Machines.

Cutadapt

  • Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads. Sequencing cores usually de multiplex your data but do no remove the primers and adapters. This is the tool for the job.

  • cutadapt installation: You will need to install this application to complete this pipeline.

FIGARO

  • FIGARO "FIGARO will quickly analyze error rates in a directory of FASTQ files to determine optimal trimming parameters for high-resolution targeted microbiome sequencing pipelines, such as those utilizing DADA2 and Deblur."

Bioconductor

The Bioconductor project purpose is to develop, support, and disseminate free open source software that facilitates rigorous and reproducible analysis of data from current and emerging biological assays. + Installation

USEARCH and VSEARCH search and clustering algorithms

  • VSEARCH: From their website [...]"supports de novo and reference based chimera detection, clustering, full-length and prefix dereplication, rereplication, reverse complementation, masking, all-vs-all pairwise global alignment, exact and global alignment searching, shuffling, subsampling and sorting. It also supports FASTQ file analysis, filtering, conversion and merging of paired-end reads."
  • USEARCH: From their website "USEARCH offers search and clustering algorithms that are often orders of magnitude faster than BLAST."

FastQC and MultiQC

Other resources