Skip to content

PHB-v0.1.0

Compare
Choose a tag to compare
@frankambrosio3 frankambrosio3 released this 03 Jan 13:31
· 887 commits to main since this release
d816661

PHB v0.1.0 Release Notes

This first stable release introduces the PHB repository, the TheiaEuk workflow, the Snippy_Variants and Snippy_Tree workflows, and the Nullarbor workflow.

The TheiaEuk Workflow Series

TheiaEuk_PE

Organism-specific modules

  • Candida auris
    • Cladetyping: clade assignment will be performed using a specialized GAMBIT database consisting of the five clade-specific reference sequences for C. auris
    • Detection of SNPs in genes associated with antifungal resistance (FKS1, ERG11, FUR1)
  • Candida albicans
    • Detection of SNPs in genes associated with antifungal resistance (ERG11, FKS1, FUR1, RTA2)
  • Aspergillus fumigatus
    • Detection of SNPs in genes associated with antifungal resistance (CYP51a, HAPE, COX10)
  • Cryptococcus neoformans
    • Detection of SNPs in genes associated with antifungal resistance (ERG11)

The Snippy Workflow Series

Snippy_Variants

The Snippy_Variants workflow aligns single-end or paired-end reads against a reference genome, then identifies single-nucleotide polymorphisms (SNPs), multi-nucleotide polymorphisms (MNPs), and insertions/deletions (INDELs) across the alignment. If a GenBank file is used as the reference, mutations associated with user-specified query strings (e.g. genes of interest) can additionally be reported to the Terra data table.

  • Finding mutations: (SNPs, MNPs, and INDELs) in your own sample’s reads relative to a reference, e.g. mutations in genes of phenotypic interest.
  • Quality control: When undertaking quality control of sequenced isolates, it is difficult to identify contamination between multiple closely related genomes using the conventional approaches in TheiaProk (e.g. isolates from an outbreak or transmission cluster). Such contamination may be identified as allele heterogeneity at a significant number of genome positions. Snippy_Variants may be used to identify these heterogeneous positions by aligning reads to the assembly of the same reads, or to a closely related reference genome and lowering the thresholds to call SNPs.
  • Assessing support for a mutation: Snippy_Variants produces a BAM file of the reads aligned to the reference genome. This BAM file can be visualized in IGV (see Theiagen Office Hours recordings) to assess the position of a mutation in supporting reads, or if the assembly of the reads was used as a reference, the position in the contig.
    • Mutations that are only found at the ends of supporting reads may be an error of sequencing.
    • Mutations found at the end of contigs may be assembly errors.

Snippy_Tree

  • Task 1: Snippy multi
    • Subtask 1a: Snippy variants: Determine all variants relative to a reference genome
    • Subtask 1b: Snippy core: Perform core genome phylogeny SNP analysis
    • Subtask 1c: Snippy clean: Removes all characters from alignment, except AGCT-, and replaces with N
  • Task 2: Gubbins (optional): Remove recombinant sites from the alignment
  • Task 3: Tree inference (optional): Tree inference for alignments that have not used gubbins (e.g. TB, outbreak samples etc), or for users who choose to undertake separate tree inference from the final tree produced by gubbins
  • Task 4: SNP-dists (optional): Provision of pairwise SNP distances

The Nullarbor Workflow

Nullarbor
Single task WDL workflow to capture the Nullarbor bioinformatics pipeline to generate complete public health microbiology reports from sequenced isolates
https://github.com/tseemann/nullarbor

Follow Theiagen on Twitter!