Releases: theiagen/public_health_bioinformatics
v1.0.0
Public Health Bioinformatics v1.0.0 Release Notes
This major release offers stable and validated versions of Theiagen's Terra-accessible WDL workflows in a single repository.
About Public Health Bioinformatics
The Public Health Bioinformatics repository hosts bioinformatics workflows for characterization, epidemiology, and sharing of pathogen genomes. More information about these workflows is available via the Theiagen Public Resources Documentation.
Due to numerous code redundancies across the PHVG, PHBG, and Terra Utilities repositories, updating and maintaining these repositories became error-prone and time-consuming. By (1) consolidating these repositories, (2) implementing stricter organization, and (3) enforcing the style guide, the PHB repository is now easier to read, maintain, and modify.
Major changes
All workflows now include the suffix _PHB to differentiate them from their previous incarnations in the PHBG, PHVG, and Terra Utilities repositories. A PHB Dockstore collection has been made to host these workflows. When importing these workflows from Dockstore, please remember to import the version with the _PHB suffix.
New workflows
Several new workflows have been created (please see the linked documentation for more information):
- Augur_PHB (and Augur_Prep_PHB): these workflows perform phylogenetic inference using Nextstrain's Augur pipeline. However, unlike the PHVG versions (TheiaCoV_Augur_Prep, TheiaCoV_Augur_DistanceTree, and TheiaCoV_Augur_Run) which were restricted to SARS-CoV-2, the PHB versions are now able to be run on non-SARS-CoV-2 viral pathogens, e.g., West Nile virus or mpox.
- TheiaProk_FASTA_PHB and TheiaProk_ONT_PHB: these workflows extend the TheiaProk workflow series to accept assemblies and Oxford Nanopore read data as input.
- Assembly_Fetch_PHB: this workflow downloads a reference assembly from NCBI from either (1) a provided assembly accession number, or (2) the closest identified reference genome to a query assembly.
- Snippy_Variants_PHB and Snippy_Tree_PHB: these workflows use Snippy to identify variants (Snippy_Variants) and use those variants to produce a phylogenetic tree (Snippy_Tree)
- Snippy_Streamline_PHB: this workflow is an all-in-one approach to generating a reference-based phylogeny using the Snippy tools. By default, it runs Snippy_Variants and Snippy_Tree, but will optionally run Assembly_Fetch if a reference genome is not provided.
- Lyve_Set_PHB: this workflow runs the Lyve-SET pipeline developed by Lee Katz.
- TheiaValidate_PHB: this workflow performs basic comparisons between user-designated columns in two separate tables. Intended to determine if any differences exist between version releases or two workflows, a summary PDF is produced in addition to an Excel spreadsheet that lists the values for any columns that do not have matching content for a sample.
Deprecated workflows
Several workflows will not be included in the PHB repository and be excluded from future development updates. However, these workflows will always be available in perpetuity in their origin repository.
- Mercury_PE_Prep, Mercury_SE_Prep, and Mercury_Batch (PHVG); the Mercury_Prep_N_Batch_PHB workflow offers similar functionality and capabilities
- TheiaCoV_WWVC (PHVG); the Freyja workflows are available for wastewater sequencing analysis
- TheiaCoV_Validate (PHVG); TheiaValidate_PHB workflow offers expanded capabilities
- TheiaCoV_Augur_Prep, TheiaCoV_Augur_DistanceTree, and TheiaCoV_Augur_Run (PHVG); the Augur_Prep_PHB and Augur_PHB workflows offer expanded capabilities
- Import_SE_reads, Import_PE_reads, BAM_to_FASTQ_SE, and BAM_to_FASTQ_PE (Terra Utilities)
- The Kleborate, SerotypeFinder, TBProfiler_Illumina_PE, and TBProfiler_ONT standalone workflows (PHBG)
Implementation of a style guide
To ensure consistency across the repository, a style guide was and continues to be implemented.
Documentation updates
The documentation for PHB v1.0.0 has been reorganized to help users identify what workflows may suit their needs. Documentation has been created for every workflow in the repository and includes lists of required and optional inputs, all potential outputs, details regarding the workflows, and tips for successful analysis and usage.
What's Changed
- v0.2.0 by @kevinlibuit , @michellescribner, @cimendes, @kapsakcj, @jrotieno, @rpetit3, @emmadoughty, @frankambrosio3
- add theiacov gha by @rpetit3 in #77
- Update default dockers by @sage-wright in #86
- Prevent assemblies & ONT data from Vibrio submodules by @sage-wright in #87
- make assembly_fasta optional by @sage-wright in #89
- allow not present columns in validation criteria to be ignored by @sage-wright in #90
- Quasitools Bug Discovery and Destruction by @sage-wright in #91
- fix path by @sage-wright in #93
Full Changelog: v0.2.0...v1.0.0
v0.2.0
This release consolidates PHVG v2.3.2, PHBG v1.3.0, and Terra Utilities v1.4.1 into the PHB repository.
Workflows not present in PHVG, PHBG, or Terra_Utilities have also been added:
- Snippy_Streamline_PHB: for performing phylogenetic reconstruction using various Snippy functions
- Lyve_SET_PHB: for performing phylogenetic reconstruction using Lyve-SET
- Augur_PHB: for performing phylogenetic reconstruction using the Augur pipeline (derived from the PHVG TheiaCoV_Augur_Run workflow to ensure compatibility for non-SC2 data)
- TheiaProk_ONT_PHB and TheiaProk_FASTA_PHB: for performing bacterial characterization from input ONT or FASTA data
Please note that, as a v0 release, minimal functional tests have been performed. More stringent validation will be conducted prior to a PHB v1.0.0 release.
What's Changed
- Lyve-SET Workflow & update docker for task_version by @kevinlibuit in #21
- Augur Rework by @sage-wright in #31
- PHBG v1.2.0 changes by @sage-wright in #42
- Adding the read_screen task to the TheiaCoV workflows by @sage-wright in #35
- Add BUSCO and QC_Check to TheiaEuk by @michellescribner in #50
- Fix snp_sites in snippy_tree workflow by @cimendes in #55
- add frame work for GHA, working theiaprok_illumina_pe workflow by @rpetit3 in #39
- basespace_fetch_PHB: account for underscores in sample name by @kapsakcj in #36
- Revamp qc_check task, add to TheiaCoV by @michellescribner in #56
- updated SRA_Fetch_PHB workflow to use fastq-dl v2.0.1 by @kapsakcj in #61
- PHBG v1.3.0 changes - vibrio subworkflow by @cimendes in #52
- Add consortium to Mercury_Prep_N_Batch by @sage-wright in #67
- add optional String input
fastq_dl_opts
for sra_fetch workflow by @kapsakcj in #68 - add Snippy_Streamline and Assembly_Fetch workflows by @kapsakcj in #46
- Split snippy variants into two tasks and fix TheiaEuk gene query by @michellescribner in #62
- Add emm-typing-tool and hicap tasks by @jrotieno in #63
- Split AMRFinderPlus string outputs by scope by @michellescribner in #73
- DRAFT: Pull request template by @cimendes in #66
- Adding Pathogen: environmental package to Terra_2_NCBI by @sage-wright in #80
- PHVG v2.3.2 concordance by @sage-wright in #83
- Renaming and old workflow removal by @sage-wright in #82
- The TheiaValidate Workflow by @sage-wright in #79
- Adding VirulenceFinder to TheiaProk by @sage-wright in #76
- kSNP3 updates and a new metadata output for phylogenetic workflows by @sage-wright in #81
- CDPH TBprofiler parsing changes by @cimendes in #53
- Removing C.albicans subworkflow by @kevinlibuit in #84
- Add nextclade_clade to Augur_Prep by @sage-wright in #85
New Contributors
Full Changelog: PHB-v0.1.0-theiaeuk-manuscript...v0.2.0
PHB-v0.1.0-theiaeuk-manuscript
This release tags a version of the Public Health Bioinformatics repository for use in testing and validation of the following manuscript: "TheiaEuk: A Species-Agnostic Bioinformatics Workflow for Fungal Genomic Characterization".
This release is not intended for workflows other than TheiaEuk and should not be used for routine analysis.
Changes in this release:
- Exposes runtime parameters for the the TheiaEuk, Nullarbor and MycoSNP workflows to allow apples-to-apples comparison
- Updates fungal GAMBIT database files to v0.2 and updates the file locations to the rp bucket
- Adds the product name for ERG11 to the resistance gene query because the "ERG11" gene name is not found in all reference genome files for Candida auris
- Additionally updates have been made to the reference files used in the C. auris cladetyper task and the subsequent snippy task.
PHB-v0.1.0
PHB v0.1.0 Release Notes
This first stable release introduces the PHB repository, the TheiaEuk workflow, the Snippy_Variants and Snippy_Tree workflows, and the Nullarbor workflow.
The TheiaEuk Workflow Series
TheiaEuk_PE
Organism-specific modules
- Candida auris
- Cladetyping: clade assignment will be performed using a specialized GAMBIT database consisting of the five clade-specific reference sequences for C. auris
- Detection of SNPs in genes associated with antifungal resistance (FKS1, ERG11, FUR1)
- Candida albicans
- Detection of SNPs in genes associated with antifungal resistance (ERG11, FKS1, FUR1, RTA2)
- Aspergillus fumigatus
- Detection of SNPs in genes associated with antifungal resistance (CYP51a, HAPE, COX10)
- Cryptococcus neoformans
- Detection of SNPs in genes associated with antifungal resistance (ERG11)
The Snippy Workflow Series
Snippy_Variants
The Snippy_Variants
workflow aligns single-end or paired-end reads against a reference genome, then identifies single-nucleotide polymorphisms (SNPs), multi-nucleotide polymorphisms (MNPs), and insertions/deletions (INDELs) across the alignment. If a GenBank file is used as the reference, mutations associated with user-specified query strings (e.g. genes of interest) can additionally be reported to the Terra data table.
- Finding mutations: (SNPs, MNPs, and INDELs) in your own sample’s reads relative to a reference, e.g. mutations in genes of phenotypic interest.
- Quality control: When undertaking quality control of sequenced isolates, it is difficult to identify contamination between multiple closely related genomes using the conventional approaches in TheiaProk (e.g. isolates from an outbreak or transmission cluster). Such contamination may be identified as allele heterogeneity at a significant number of genome positions.
Snippy_Variants
may be used to identify these heterogeneous positions by aligning reads to the assembly of the same reads, or to a closely related reference genome and lowering the thresholds to call SNPs. - Assessing support for a mutation:
Snippy_Variants
produces a BAM file of the reads aligned to the reference genome. This BAM file can be visualized in IGV (see Theiagen Office Hours recordings) to assess the position of a mutation in supporting reads, or if the assembly of the reads was used as a reference, the position in the contig.- Mutations that are only found at the ends of supporting reads may be an error of sequencing.
- Mutations found at the end of contigs may be assembly errors.
Snippy_Tree
- Task 1: Snippy multi
- Subtask 1a: Snippy variants: Determine all variants relative to a reference genome
- Subtask 1b: Snippy core: Perform core genome phylogeny SNP analysis
- Subtask 1c: Snippy clean: Removes all characters from alignment, except AGCT-, and replaces with N
- Task 2: Gubbins (optional): Remove recombinant sites from the alignment
- Task 3: Tree inference (optional): Tree inference for alignments that have not used gubbins (e.g. TB, outbreak samples etc), or for users who choose to undertake separate tree inference from the final tree produced by gubbins
- Task 4: SNP-dists (optional): Provision of pairwise SNP distances
The Nullarbor Workflow
Nullarbor
Single task WDL workflow to capture the Nullarbor bioinformatics pipeline to generate complete public health microbiology reports from sequenced isolates
https://github.com/tseemann/nullarbor