Skip to content

v2.1.0

Compare
Choose a tag to compare
@cimendes cimendes released this 26 Jun 14:14
· 144 commits to main since this release
d0377e1

Public Health Bioinformatics v2.1.0 Minor Release Notes

This minor release improves the utility and usability of several Oxford Nanopore Technologies’ dedicated workflows for viral and bacterial genomic characterization (TheiaCoV and TheiaProk). Additionally, support for new organisms has been added to several workflows.

Full release notes can be found here!

Find our documentation here!

🚀 Changes to existing workflows:

  • All TheiaProk Workflows

    • General Abricate is now available though the call_abricate and abricate_db optional inputs.
    • Abricate specifically for Vibrio cholerae is now available. It launches automatically if the gambit_predicted_taxon or expected_taxon is Vibrio cholerae.
    • A new optional parameter separate_betalactam_genes is now available that splits AMRFinderPlus beta-lactam hits into new columns.
    • The call_midas optional input is now set to false by default.
  • TheiaProk_Illumina_PE

    • New read quality-control outputs have been added: r1_mean_q_clean, r2_mean_q_clean, r1_mean_readlength_clean and r2_mean_readlength_clean.
  • TheiaProk_ONT

    • New read quality-control outputs have been added: nanoplot_r1_median_readlength_raw, nanoplot_r1_stdev_readlength_raw, nanoplot_r1_n50_raw, nanoplot_r1_median_q_raw, nanoplot_r1_est_coverage_raw, nanoplot_r1_median_readlength_clean, nanoplot_r1_stdev_readlength_clean, nanoplot_r1_n50_clean, nanoplot_r1_median_q_clean and nanoplot_r1_est_coverage_clean.
    • Kraken2 is now available through the call_kraken and kraken_db optional inputs.
    • A maximum genome size of 10Mbp is set to prevent excessive runtimes.
  • All TheiaCoV Workflows

    • RSV-A and RSV-B are now able to be analyzed with the TheiaCoV workflows. Nextclade characterization and Kraken taxonomic analysis will now be run on RSV samples.
    • The following default organisms now have the following Nextclade dataset tags:
      Organism New default Nextclade dataset tag
      SARS-CoV-2 "2024-06-13--23-42-47Z"
      mpox "2024-04-19--07-50-39Z"
      Flu H1N1 HA "2024-04-19--07-50-39Z"
      Flu H1N1 NA "2024-04-19--07-50-39Z"
      Flu H3N2 HA "2024-04-19--07-50-39Z"
      Flu H3N2 NA "2024-04-19--07-50-39Z"
      Flu Victoria HA "2024-04-19--07-50-39Z"
      Flu Victoria NA "2024-04-19--07-50-39Z"
  • TheiaProk_ONT

    • New read quality-control outputs have been added: nanoplot_r1_median_readlength_raw, nanoplot_r1_stdev_readlength_raw, nanoplot_r1_n50_raw, nanoplot_r1_median_q_raw, nanoplot_r1_est_coverage_raw, nanoplot_r1_median_readlength_clean, nanoplot_r1_stdev_readlength_clean, nanoplot_r1_n50_clean, nanoplot_r1_median_q_clean and nanoplot_r1_est_coverage_clean.
  • TheiaCoV Flu Track

    • All of the flu-specific tasks now live in their own sub-workflow, flu_track. This has no effect on the end-user.
    • In TheiaCoV_ONT, flu samples will now have both the HA and NA segment’s assembly mean coverage appear in the assembly_mean_coverage output variable. This reflects the behaviour already present on TheiaCoV_Illumina_PE.
    • The all-segments FASTA header lines now include samplename.
    • The new output irma_subtype_notes now indicates if IRMA was able to determine the flu subtype
    • All workflows now uses abricate_flu_subtype (instead of irma_subtype) for selecting the appropriate nextclade_dataset_tag.
    • Nextclade outputs columns for flu now explicitly state either HA or NA.
    • Padded assemblies, where - or . present in the final assembly file are either removed or replaced by N (respectively), are now being provided to MAFFT and VADR to prevent task failures.
  • Terra_2_NCBI

    • Skipping BioSample submission via the skip_biosample optional now skips the requirement to have BioSample metadata in your data table.
  • Augur_Prep_PHB and Augur_PHB

    • RSV-A and RSV-B can now be analyzed with the Augur workflows.
    • Metadata no longer required to run Augur. Only a distance tree will be created if metadata is not provided.
  • kSNP3 and other phylogenetic inference workflows

    • Outputs from phylogenetic workflows (SNP matrices) and the summarize_data task will now have a properly toggleable Phandango coloring suffix.
    • The phandango_coloring optional input is now off by default.

Docker container updates:

  • IRMA has been updated to version v1.1.5
  • AMRFinderPlus has been updated to version v3.12.8-2024-05-02.2
  • ts_mlst database has been updated as of 2024-06-01
  • Pangolin database has been updated to pdata v1.27

🐛 Bug fixes and small improvements:

  • TheiaProk_ONT and TheiaProk_FASTA: Hicap was being run in TheiaProk_ONT but the outputs were never appearing in the data table! This has been fixed.
  • All TheiaCoV workflows: Unsupported organisms will no longer cause workflow failures.
  • Terra_2_NCBI: Fixed a typo when using the Wastewater Biosample package that was causing an error.
  • Freyja_Dashboard: The freyja_dasbhoard output variable now correctly says freyja_dashboard.
  • Workflows that accept String inputs that are used to name things: Several input variables such as cluster_name now accept Strings with whitespace.
  • All workflows: Runtime parameters have been adjusted for several tasks.
  • TheiaCoV Flu Track: A bug has been fixed for IRMA running out of disk space. Additionally, another bug affecting Flu B samples was fixed related to empty HA segment FASTA files.

What's Changed

  • TheiaCoV wf support for RSV - run nextclade by default and small optimizations (kraken_target_organism, genome_length) by @kapsakcj in #436
  • [New workflow - internal] Gambitcore for assembly quality assessment with GAMBIT by @cimendes in #466
  • [TheiaProk_ONT and TheiaCoV_ONT] Expose additional QC metrics from nanoplot for both raw and clean reads by @cimendes in #452
  • Exposing r1 and r2 mean_q_clean and mean_readlength_clean by @jrotieno in #455
  • [TheiaProk_ONT] add patch fix to kmc estimated genome size to not go over 10Mbp by @cimendes in #459
  • Add abricate as optional module by @jrotieno in #431
  • [TheiaProk_ONT] Add Kraken2 as part of read_qc by @cimendes in #438
  • [Flu] Assembly mean coverage & read screen clean-up by @sage-wright in #469
  • [Freyja_Dashboard] fix typo in freyja_dashboard output File variable name by @AndrewLangvt in #482
  • [Terra_2_NCBI] remove metadata requirements with skip_biosample == true by @sage-wright in #475
  • Augur Updates for RSV-A and RSV-B by @jrotieno in #478
  • [kSNP3] fix behaviour when phandango colouring is set to false by @cimendes in #496
  • [Internal] Updating runtime parameters by @sage-wright in #494
  • Automatically convert spaces to dashes in workflows that accept strings by @AndrewLangvt in #498
  • [TheiaCoV] Enable user to run TheiaCoV with an unsupported organism by @sage-wright in #501
  • [AMRFinderPlus] parse BETA-LACTAM genes and subclasses into individual output columns by @sage-wright in #505
  • IRMA bug fixes & improvements; theiacov_illumina_pe wf updates for Flu by @kapsakcj in #468
  • Augur_PHB: Set sample_metadata_tsvs input to optional by @jrotieno in #503
  • [Internal - Gambitcore] Downgrade database to stable 1.3.0 version by @cimendes in #473
  • [TheiaCoV_Illumina_PE & _ONT] Create sub-workflow for flu-specific modules by @sage-wright in #502
  • [TheiaProk] Add abricate module for vibrio characterization by @cimendes in #429
  • [TheiaProk] expose hicap outputs in theiaprok_fasta and theiaprok_ont by @cimendes in #508
  • Fix typo in Terra_2_NCBI Wastewater metadata by @michellescribner in #519
  • [TheiaProk] Update amrfinderplus to v3.12.8; DB: v2024-05-02.2; reduce compute resources by @kapsakcj in #514
  • [TheiaProk] upgrade mlst docker image to 2024-06-01 staphb build; reduced runtime parameters; enable preemptible by @kapsakcj in #516
  • update default pangolin docker (pdata 1.27) & nextclade dataset tags for SC2, Flu, Mpox by @kapsakcj in #521
  • [all workflows] upgrade PHB version to 2.1.0 by @kapsakcj in #517

New Contributors

Full Changelog: v2.0.1...v2.1.0