Skip to content

Commit

Permalink
Merge pull request #106 from phac-nml/dev
Browse files Browse the repository at this point in the history
Update to version 0.4.0
  • Loading branch information
mattheww95 authored Sep 4, 2024
2 parents 31c494a + 7093f01 commit c32ce85
Show file tree
Hide file tree
Showing 92 changed files with 41,825 additions and 247 deletions.
31 changes: 31 additions & 0 deletions .github/workflows/spellcheck.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@

name: Spellcheck Action

on:
# Enabling manual test
# REF: https://stackoverflow.com/questions/58933155/manual-workflow-triggers-in-github-actions
workflow_dispatch:
push:

jobs:
build:
name: Spellcheck
runs-on: ubuntu-latest
steps:

# The checkout step
- uses: actions/checkout@v4

- uses: rojopolis/spellcheck-github-actions@v0
name: Spellcheck
with:
source_files: README.md CHANGELOG.md nextflow_schema.json assets/schema_input.json
task_name: Markdown
output_file: spellcheck-output.txt

- uses: actions/upload-artifact@v4
if: '!cancelled()'
name: Archive spellcheck output
with:
name: Spellcheck artifact
path: spellcheck-output.txt
23 changes: 23 additions & 0 deletions .spellcheck.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
matrix:
- name: Markdown
sources:
- '!venv/**/*.md|**/*.md'
default_encoding: utf-8
aspell:
lang: en
ignore-case: true
dictionary:
encoding: utf-8
wordlists:
- .wordlist.txt
pipeline:
- pyspelling.filters.markdown:
markdown_extensions:
- pymdownx.superfences
- pymdownx.striphtml
- pyspelling.filters.html:
comments: false
ignores:
- code
- pre
- small
171 changes: 171 additions & 0 deletions .wordlist.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
ECTyper
IRIDA
SISTR
StarAMR
JSON
UI
Bakta
Nextflow
fastp
pointfinder
mikrokondo
subtyping
phac
nml
metagenomic
mikrokondo
StarAMR
readme
nf
iridanext
contig
contigs
Quast
AMR
kat
genomic
bakta
SeqtkBaseCount
locidex
Kraken
params
param
minimap
irida
config
StackOverflow
maxRetries
samtools
QCSummary
QCMessage
Changelog
Biotechnol
doi
Shigella
Vibrio
Alneberg
Andreas
Ewels
Fillinger
Harshil
Maxime
Nahnsen
Paolo
Peltzer
Tommaso
Ulysse
Wilm
bioinformatics
https
oschwengers
Flye
pacbio
Nanopore
Illumina
FinalReads
FinalReports
PostProcessing
FinalAssembly
Subworkflow
outdir
useage
FOFN
fasta
fastq
gzip
speciation
GTDB
Apptainer
arounds
Apptainer
apptainer
charliecloud
gitpod
podman
shifter
aspc
gc
https
genotypic
bioinformatic
pre
AminoAcid
Archae
CPUs
CheckM
Eukaryotic
Gloval
Kmer
MLST
Miniumum
PathCheck
Seemann's
Torsten
Za
abricate
allOf
ambig
autodetection
cgMLST
cgmlst
checkm
coli
configs
copyNoFollow
cov
cpus
csv
dedup
deduplication
dehosting
enum
evalue
fas
fp
githubusercontent
hifi
hpcov
hpid
ident
identiy
idx
kitome
kmer
longread
lx
mh
mimetype
mlst
mobrecon
nNote
nTypically
opcov
opid
phred
plaintext
plasmid
polyG
polyX
polyg
polyx
rellink
repo
runtime
seqs
serotyping
slurm
snps
symlink
tracedir
unicycler
validationFailUnrecognisedParams
validationShowHiddenParams
zA
kondo
gz
fq
ast
dependentRequired
errorMessage
Samplesheet
TSeemann's
38 changes: 36 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,39 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.4.0] - 2024-09-04

### `Changed`

- Removed quay.io docker repo tags [PR 94](https://github.com/phac-nml/mikrokondo/pull/94)

### `Updated`

- Added QCMessage and QCSummary fields for metagenomic sequencing runs. See [PR 103](https://github.com/phac-nml/mikrokondo/pull/103)

- Updated TSeemann's MLST default container to use version 2.23.0 of `mlst`. See [PR 97](https://github.com/phac-nml/mikrokondo/pull/97)

- Moved allele schema parameters under one option in the nextflow_schema.json. See [PR 104](https://github.com/phac-nml/mikrokondo/pull/104)


### `Fixed`

- Fixed typo in metagenomic QC message. See [PR 103](https://github.com/phac-nml/mikrokondo/pull/103)

- Fixed spelling issues issues in config values. See [PR 95](https://github.com/phac-nml/mikrokondo/pull/95)

- Fixed the headers specified in the nextflow.config file for Kraken2. See [PR 96](https://github.com/phac-nml/mikrokondo/pull/96)

### `Added`

- Added additional organism QC parameters to defaults. See [PR 105](https://github.com/phac-nml/mikrokondo/pull/105)

- Updated locidex to version 0.2.3. See [PR 96](https://github.com/phac-nml/mikrokondo/pull/96)

- Added module for automatic selection of locidex databases through configuration of a locidex database collection. See [PR 96](https://github.com/phac-nml/mikrokondo/pull/96)

- Added module for summary of basic allele metrics, listing of missing alleles and reporting of specific alleles. See [PR 96](https://github.com/phac-nml/mikrokondo/pull/96)

## [0.3.0] - 2024-07-04

### `Changed`
Expand Down Expand Up @@ -57,7 +90,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- Updated StarAMR point finder DB selection to resolve error when in db selection when a database is not selected addressing issue. See [PR 74](https://github.com/phac-nml/mikrokondo/pull/74)

- Fixed calculation of SeqtkBaseCount value include counts for both pairs of paird-end reads. See [PR 65](https://github.com/phac-nml/mikrokondo/pull/65).
- Fixed calculation of SeqtkBaseCount value include counts for both pairs of paired-end reads. See [PR 65](https://github.com/phac-nml/mikrokondo/pull/65).

## `Changed`

Expand All @@ -72,7 +105,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Changed

- Changed default values for database parameters `--dehosting_idx`, `--mash_sketch`, `--kraken2_db`, and `--bakta_db` to null. See [PR 71](https://github.com/phac-nml/mikrokondo/pull/71)
- Enabled checking for existance of database files in JSON Schema to avoid issues with staging non-existent files in Azure. See [PR 71](https://github.com/phac-nml/mikrokondo/pull/71).
- Enabled checking for existence of database files in JSON Schema to avoid issues with staging non-existent files in Azure. See [PR 71](https://github.com/phac-nml/mikrokondo/pull/71).
- Set `--kraken2_db` to be a required parameter for the pipeline. See [PR 71](https://github.com/phac-nml/mikrokondo/pull/71)
- Hide bakta parameters from IRIDA Next UI. See [PR 71](https://github.com/phac-nml/mikrokondo/pull/71)

Expand Down Expand Up @@ -110,6 +143,7 @@ Initial release of phac-nml/mikrokondo. Mikrokondo currently supports: read trim

- Added integration testing using [nf-test](https://www.nf-test.com/).

[0.4.0]: https://github.com/phac-nml/mikrokondo/releases/tag/0.4.0
[0.3.0]: https://github.com/phac-nml/mikrokondo/releases/tag/0.3.0
[0.2.1]: https://github.com/phac-nml/mikrokondo/releases/tag/0.2.1
[0.2.0]: https://github.com/phac-nml/mikrokondo/releases/tag/0.2.0
Expand Down
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,14 +69,14 @@ Nextflow is required to run mikrokondo (requires Linux), and instructions for it

## Step 2: Choose a Container Engine

Nextflow and Mikrokondo only supports running the pipeline using containers such as: Docker, Singularity (now apptainer), podman, gitpod, shifter and charliecloud. Currently only usage with Singularity has been fully tested, (Docker and Apptainer have only been partially tested) but support for each of the container services exists.
Nextflow and Mikrokondo only supports running the pipeline using containers such as: Docker, Singularity (now apptainer), podman, gitpod, shifter and charliecloud. Currently only usage with Singularity has been fully tested, (Docker and Apptainer have only been partially tested) but support for each of the container services exists.

>[!Note]
>[!Note]
>Singularity was adopted by the Linux Foundation and is now called Apptainer. Singularity still exists, but it is likely newer installs will use Apptainer.
### Docker or Singularity?

Docker or Singularity (Apptainer) Docker requires root privileges which can can make it a hassle to install on computing clusters (there are work arounds). Apptainer/Singularity does not, so running the pipeline using Apptainer/Singularity is the recommended method for running the pipeline.
Docker or Singularity (Apptainer) Docker requires root privileges which can can make it a hassle to install on computing clusters (there are workarounds). Apptainer/Singularity does not, so running the pipeline using Apptainer/Singularity is the recommended method for running the pipeline.

## Step 3: Install dependencies

Expand All @@ -93,7 +93,7 @@ Besides the Nextflow run time (requires Java), and container engine the dependen

- [GTDB Mash Sketch](https://zenodo.org/record/8408361): required for speciation and determination if sample is metagenomic
- [Decontamination Index](https://zenodo.org/record/8408557): Required for decontamination of reads (it is simply a minimap2 index)
- [Kraken2 nt database](https://benlangmead.github.io/aws-indexes/k2): Required for binning of metagenommic data and is an alternative to using Mash for speciation
- [Kraken2 nt database](https://benlangmead.github.io/aws-indexes/k2): Required for binning of metagenomic data and is an alternative to using Mash for speciation
- [Bakta database](https://zenodo.org/record/7669534): Running Bakta is optional and there is a light database option, however the full one is recommended. You will have to unzip and un-tar the database for usage. You can skip running Bakta however making the requirement of downloading this database **optional**.
- [StarAMR database](https://github.com/phac-nml/staramr#database-build): Running StarAMR is optional and requires downloading the StarAMR databases. Also if you wish to avoid downloading the database, the container for StarAMR has a database included which mikrokondo will default to using if one is not specified making this requirement **optional**.

Expand Down Expand Up @@ -143,7 +143,7 @@ Under the usage section you can find example commands, instructions for configur

### Data Input/formats

Mikrokondo requires two things as input:
Mikrokondo requires two things as input:
1. **Sample files** - fastq and fasta must be in gzip format
2. **Sample sheet** - this FOFN (file of file names) contains sample names and allows user to combine read-sets. The following header fields are accepted:
- sample
Expand All @@ -152,8 +152,8 @@ Mikrokondo requires two things as input:
- long_reads
- assembly

For more information see the [useage docs](https://phac-nml.github.io/mikrokondo/usage/useage/).
For more information see the [usage docs](https://phac-nml.github.io/mikrokondo/usage/useage/).

### Output/Results

All output files will be written into the `outdir` (specified by the user). More explicit tool results can be found in both the [Workflow](workflows/CleanAssemble/) and [Subworkflow](subworkflows/) sections of the docs. Here is a brief description of the outdir structure (though in brief the further into the structure you head, the further in the workflow the tool has been run):
Expand Down Expand Up @@ -206,7 +206,7 @@ Add `--profile singularity` to switch from using docker by default to using sing

## Troubleshooting and FAQs:

Within release 0.1.0, Bakta is currently skipped however it can be enabled from the command line or within the nextflow.config (please check the docs for more information). It has been disabled by default due issues in using the latest bakta database releases due to an issue with `amr_finder` there are fixes available and older databases still work however they have not been tested. A user can still enable Bakta themselves or fix the database. More information is provided here: https://github.com/oschwengers/bakta/issues/268
Within release 0.1.0, Bakta is currently skipped however it can be enabled from the command line or within the nextflow.config (please check the docs for more information). It has been disabled by default due issues in using the latest bakta database releases due to an issue with `amr_finder` there are fixes available and older databases still work however they have not been tested. A user can still enable Bakta themselves or fix the database. More information is provided here: https://github.com/oschwengers/bakta/issues/268

For a list of common issues or errors and their solutions, please read our [FAQ section](https://phac-nml.github.io/mikrokondo/troubleshooting/FAQ/).

Expand Down
13 changes: 13 additions & 0 deletions conf/irida_next.config
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ iridanext {
"**/Assembly/Subtyping/SISTR/*.sistr.allele.subtyping.json",
"**/Assembly/Subtyping/SISTR/*.sistr.cgmlst.subtyping.csv",
"**/Assembly/Subtyping/Locidex/Report/*.mlst.subtyping.json.gz",
"**/Assembly/Subtyping/Locidex/Summary/*.locidex.summary.json",
"**/FinalReports/FlattenedReports/*.flat_sample.json.gz",
"**/Assembly/Annotation/Abricate/*abricate.annotation.txt",
"**/Assembly/Annotation/Mobsuite/Recon/*/*mobtyper_results*.txt",
Expand Down Expand Up @@ -64,6 +65,12 @@ iridanext {
"meta.downsampled" : "Downsampled",
"SpeciesTopHit" : "predicted_identification_name",
"IdentificationMethod" : "predicted_identification_method",
"LocidexDatabaseInformation.db_name" : "locidex_db_name",
"LocidexDatabaseInformation.db_date" : "locidex_db_date",
"LocidexDatabaseInformation.db_version" : "locidex_db_version",
"LocidexSummary.TotalLoci" : "total_loci",
"LocidexSummary.AllelesPresent" : "count_loci_found",
"LocidexSummary.MissingAllelesCount" : "count_loci_missing",
"ECTyperSubtyping.0.Database" : "ECTyper Database",
"ECTyperSubtyping.0.Evidence" : "ECTyper Evidence",
"ECTyperSubtyping.0.GeneCoverages(%)" : "ECTyper GeneCoverages (%)",
Expand Down Expand Up @@ -145,6 +152,12 @@ iridanext {
"IdentificationMethod",
"meta.downsampled",
"SpeciesTopHit",
"LocidexDatabaseInformation.db_name",
"LocidexDatabaseInformation.db_date",
"LocidexDatabaseInformation.db_version",
"LocidexSummary.TotalLoci",
"LocidexSummary.AllelesPresent",
"LocidexSummary.MissingAllelesCount",
"ECTyperSubtyping.0.Database",
"ECTyperSubtyping.0.Evidence",
"ECTyperSubtyping.0.GeneCoverages(%)",
Expand Down
Loading

0 comments on commit c32ce85

Please sign in to comment.