Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Workflow] AMR-search for neisseria gonorrhoeae samples #743

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .dockstore.yml
Original file line number Diff line number Diff line change
Expand Up @@ -296,5 +296,10 @@ workflows:
- name: Clair3_Variants_ONT_PHB
subclass: WDL
primaryDescriptorPath: /workflows/standalone_modules/wf_clair3_variants_ont.wdl
testParameterFiles:
- /tests/inputs/empty.json
- name: AMR_Search_PHB
subclass: WDL
primaryDescriptorPath: /workflows/utilities/wf_amr_search.wdl
testParameterFiles:
- /tests/inputs/empty.json
Binary file added docs/assets/figures/AMR_Search.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
24 changes: 24 additions & 0 deletions docs/workflows/genomic_characterization/theiaprok.md
Original file line number Diff line number Diff line change
Expand Up @@ -1462,6 +1462,30 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after

??? toggle "_Neisseria_ spp."
##### _Neisseria_ spp. {#neisseria}

??? task "`amr_search`: _Neisseria gonorrhoeae_ antimicrobial resistance profiling"

This task performs *in silico* antimicrobial resistance (AMR) profiling for *Neisseria gonorrhoeae* using **AMRsearch**, the primary tool used by [Pathogenwatch](https://pathogen.watch/) to genotype and infer antimicrobial resistance (AMR) phenotypes from assembled microbial genomes.

**AMRsearch** screens against Pathogenwatch's library of curated genotypes and inferred phenotypes, developed in collaboration with community experts. Resistance phenotypes are determined based on both **resistance genes** and **mutations**, and the system accounts for interactions between multiple SNPs, genes, and suppressors. Predictions follow **S/I/R classification** (*Sensitive, Intermediate, Resistant*).

The AMR search is conducted when *Neisseria gonorrhoeae* is identified as the taxon in *TheiaProk* workflows. The default database for *N. gonorrhoeae* is **485**.

**Outputs:**

- **JSON Output:** Contains the complete AMR profile, including detailed **resistance state**, detected **resistance genes/mutations**, and supporting **BLAST results**.

- **CSV & PNG Tables:** A downstream task, [`parse_amr_json.wdl`](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/utilities/data_handling/parse_amr_json.wdl), extracts and formats results into a **CSV file** and **PNG summary table** for easier visualization.

!!! techdetails "amr_search Technical Details"

| | Links |
| --- | --- |
| Task | [task_amr_search.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/drug_resistance/task_amr_search.wdl) |
| Software Source Code | [AMRsearch](https://github.com/pathogenwatch-oss/amr-search) |
| Software Documentation | [Pathogenwatch](https://cgps.gitbook.io/pathogenwatch) |
| Original Publication(s) | [PAARSNP: *rapid genotypic resistance prediction for *Neisseria gonorrhoeae*](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7545138/) |

??? task "`ngmaster`: _Neisseria gonorrhoeae_ sequence typing"

NG-MAST is currently the most widely used method for epidemiological surveillance of *Neisseria gonorrhoea.* This tool is targeted at clinical and research microbiology laboratories that have performed WGS of *N. gonorrhoeae* isolates and wish to understand the molecular context of their data in comparison to previously published epidemiological studies. As WGS becomes more routinely performed, *NGMASTER*
Expand Down
84 changes: 84 additions & 0 deletions docs/workflows/standalone/amr_search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# AMR Search

## Quick Facts

| **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** |
|---|---|---|---|---|
| [Standalone](../../workflows_overview/workflows_type.md/#standalone) | [Any Taxa](../../workflows_overview/workflows_kingdom.md/#any-taxa) | vX.X.X | Yes | Sample-level |

## AMR_Search_PHB

The AMR_Search workflow is a standalone version of Pathogenwatch's AMR profiling functionality utilizing `AMRsearch` tool from Pathogenwatch.

A limited number of species are currently supported and are listed below. NCBI codes are needed from this table to select the correct library.

| Species | NCBI Code |
|------------------------------|-----------|
| _Neisseria gonorrhoeae_ | 485 |
| _Staphylococcus aureus_ | 1280 |
| _Salmonella_ Typhi | 90370 |
| _Streptococcus pneumoniae_ | 1313 |
| _Klebisiella_ | 570 |
| _Escherichia_ | 561 |
| _Mycobacterium tuberculosis_ | 1773 |
| _Candida auris_ | 498019 |
| _Vibrio cholerae_ | 666 |
| _Campylobacter_ | 194 |

!!! caption "AMR_Search Workflow Diagram"
![AMR_Search Workflow Diagram](../../assets/figures/AMR_Search.png)

### Inputs

Input should be ordered as they appear on Terra

| **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
|---|---|---|---|---|---|
| amr_search_workflow | **amr_search_database** | String | NCBI taxon code of samples known taxonomy, see above supported species || Required |
| amr_search_workflow | **input_fasta** | File | A microbial assembly file || Required |
| amr_search_workflow | **samplename** | String | Identifier user wants prefixed to output files || Required |
| amr_search | **cpu** | Integer | Number of CPUs to allocate to the task |2| Optional |
| amr_search | **disk_size** | Integer | Amount of storage (in GB) to allocate to the task |50| Optional |
| amr_search | **docker** | String | The docker container to use for the task |us-docker.pkg.dev/general-theiagen/theiagen/amrsearch:0.2.0| Optional |
| amr_search | **memory** | Integer | Amount of memory/RAM (in GB) to allocate to the task |8| Optional |

### Workflow Tasks

Description of the workflow tasks

??? task "`amr_search`: Antimicrobial resistance profiling"

This task performs *in silico* antimicrobial resistance (AMR) profiling for supported species using **AMRsearch**, the primary tool used by [Pathogenwatch](https://pathogen.watch/) to genotype and infer antimicrobial resistance (AMR) phenotypes from assembled microbial genomes.

**AMRsearch** screens against Pathogenwatch's library of curated genotypes and inferred phenotypes, developed in collaboration with community experts. Resistance phenotypes are determined based on both **resistance genes** and **mutations**, and the system accounts for interactions between multiple SNPs, genes, and suppressors. Predictions follow **S/I/R classification** (*Sensitive, Intermediate, Resistant*).

**Outputs:**

- **JSON Output:** Contains the complete AMR profile, including detailed **resistance state**, detected **resistance genes/mutations**, and supporting **BLAST results**.

- **CSV & PNG Tables:** An incorprated Python script, `parse_amr_json.py`, extracts and formats results into a **CSV file** and **PNG summary table** for easier visualization.

!!! techdetails "amr_search Technical Details"

| | Links |
| --- | --- |
| Task | [task_amr_search.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/fc-amr-search-dev/tasks/gene_typing/drug_resistance/task_amr_search.wdl) |
| Software Source Code | [AMRsearch](https://github.com/pathogenwatch-oss/amr-search) |
| Software Documentation | [Pathogenwatch](https://cgps.gitbook.io/pathogenwatch) |
| Original Publication(s) | [PAARSNP: *rapid genotypic resistance prediction for *Neisseria gonorrhoeae*](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7545138/) |

### Outputs

| **Variable** | **Type** | **Description** |
|---|---|---|
| amr_results_csv | File | CSV formatted AMR profile |
| amr_results_png | File | PNG formatted AMR profile |
| amr_search_results | File | JSON formatted AMR profile including BLAST results |
| amr_search_docker | String | Docker image used to run AMR_Search |
| amr_search_version | String | Version of AMR_Search libraries used |

## References

> [Pathogenwatch AMRsearch](https://github.com/pathogenwatch-oss/amr-search)
<!-- -->
> [PAARSNP](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7545138/)
1 change: 1 addition & 0 deletions docs/workflows_overview/workflows_alphabetically.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ title: Alphabetical Workflows

| **Name** | **Description** | **Applicable Kingdom** | **Workflow Level** | **Command-line Compatibility**[^1] | **Last Known Changes** | **Dockstore** |
|---|---|---|---|---|---|---|
| [**AMR_Search**](../workflows/standalone/amr_search.md) | Perform AMR profiling on microbial assemblies mimicing Pathogenwatch | Any taxa | Sample-level | Yes | vX.X.X | [AMR_Search_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/AMR_Search_PHB:fc-amr-search-dev?tab=info) |
| [**Assembly_Fetch**](../workflows/data_import/assembly_fetch.md) | Download assemblies from NCBI, after optionally identifying the closest RefSeq reference genome to your own draft assembly | Any taxa | Sample-level | Yes | v1.3.0 | [Assembly_Fetch_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Assembly_Fetch_PHB:main?tab=info) |
| [**Augur**](../workflows/phylogenetic_construction/augur.md) | Phylogenetic analysis for viral pathogens | Viral | Sample-level, Set-level | Yes | vX.X.X | [Augur_Prep_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Augur_Prep_PHB:main?tab=info), [Augur_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Augur_PHB:main?tab=info) |
| [**BaseSpace_Fetch**](../workflows/data_import/basespace_fetch.md)| Import data from BaseSpace into Terra | Any taxa | Sample-level | Yes | v2.0.0 | [BaseSpace_Fetch_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/BaseSpace_Fetch_PHB:main?tab=info) |
Expand Down
1 change: 1 addition & 0 deletions docs/workflows_overview/workflows_kingdom.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ title: Workflows by Kingdom

| **Name** | **Description** | **Taxa** | **Workflow Level** | **Command-line Compatible**[^1] | **Last known changes** | **Dockstore** |
|---|---|---|---|---|---|---|
| [**AMR_Search**](../workflows/standalone/amr_search.md) | Perform AMR profiling on microbial assemblies mimicing Pathogenwatch | Any taxa | Sample-level | Yes | vX.X.X | [AMR_Search_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/AMR_Search_PHB:fc-amr-search-dev?tab=info) |
| [**Assembly_Fetch**](../workflows/data_import/assembly_fetch.md) | Download assemblies from NCBI, after optionally identifying the closest RefSeq reference genome to your own draft assembly | Any taxa | Sample-level | Yes | v1.3.0 | [Assembly_Fetch_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Assembly_Fetch_PHB:main?tab=info) |
| [**BaseSpace_Fetch**](../workflows/data_import/basespace_fetch.md)| Import data from BaseSpace into Terra | Any taxa | Sample-level | Yes | v2.0.0 | [BaseSpace_Fetch_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/BaseSpace_Fetch_PHB:main?tab=info) |
| [**Clair3_Variants**](../workflows/phylogenetic_construction/clair3_variants.md)| ONT Variant Caller | Any taxa | Sample-level | Yes | vX.X.X | [Clair3_Variants_ONT_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Clair3_Variants_ONT_PHB:mb-clair3-variant-dev?tab=info) |
Expand Down
1 change: 1 addition & 0 deletions docs/workflows_overview/workflows_type.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ title: Workflows by Type

| **Name** | **Description** | **Applicable Kingdom** | **Workflow Level** | **Command-line Compatibility**[^1] | **Last Known Changes** | **Dockstore** |
|---|---|---|---|---|---|---|
| [**AMR_Search**](../workflows/standalone/amr_search.md) | Perform AMR profiling on microbial assemblies mimicing Pathogenwatch | Any taxa | Sample-level | Yes | vX.X.X | [AMR_Search_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/AMR_Search_PHB:fc-amr-search-dev?tab=info) |
| [**Cauris_CladeTyper**](../workflows/standalone/cauris_cladetyper.md)| C. auris clade assignment | Mycotics | Sample-level | Yes | vX.X.X | [Cauris_CladeTyper_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Cauris_CladeTyper_PHB:main?tab=info) |
| [**Concatenate_Illumina_Lanes**](../workflows/standalone/concatenate_illumina_lanes.md)| Concatenate Illumina lanes for a single sample | Any taxa | Sample-level | Yes | v2.3.0 | [Concatenate_Illumina_Lanes_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Concatenate_Illumina_Lanes_PHB:main?tab=info) |
| [**GAMBIT_Query**](../workflows/standalone/gambit_query.md)| Taxon identification of genome assembly using GAMBIT | Bacteria, Mycotics | Sample-level | Yes | v2.0.0 | [Gambit_Query_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Gambit_Query_PHB:main?tab=info) |
Expand Down
3 changes: 3 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ nav:
- Transfer_Column_Content: workflows/data_export/transfer_column_content.md
- Zip_Column_Content: workflows/data_export/zip_column_content.md
- Standalone:
- AMR_Search: workflows/standalone/amr_search.md
- Cauris_CladeTyper: workflows/standalone/cauris_cladetyper.md
- Concatenate_Illumina_Lanes: workflows/standalone/concatenate_illumina_lanes.md
- GAMBIT_Query: workflows/standalone/gambit_query.md
Expand All @@ -67,6 +68,7 @@ nav:
- Workflows by Kingdom:
- Overview Table: workflows_overview/workflows_kingdom.md
- Any Taxa:
- AMR_Search: workflows/standalone/amr_search.md
- Assembly_Fetch: workflows/data_import/assembly_fetch.md
- BaseSpace_Fetch: workflows/data_import/basespace_fetch.md
- Clair3_Variants_ONT: workflows/phylogenetic_construction/clair3_variants.md
Expand Down Expand Up @@ -124,6 +126,7 @@ nav:
- VADR_Update: workflows/genomic_characterization/vadr_update.md
- Workflows Alphabetically:
- Overview Table: workflows_overview/workflows_alphabetically.md
- AMR_Search: workflows/standalone/amr_search.md
- Assembly_Fetch: workflows/data_import/assembly_fetch.md
- Augur: workflows/phylogenetic_construction/augur.md
- BaseSpace_Fetch: workflows/data_import/basespace_fetch.md
Expand Down
47 changes: 47 additions & 0 deletions tasks/gene_typing/drug_resistance/task_amr_search.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
version 1.0

task amr_search {
input {
File input_fasta
String samplename
String amr_search_database = "485"
String docker = "us-docker.pkg.dev/general-theiagen/theiagen/amrsearch:0.2.0"
Int cpu = 2
Int disk_size = 50
Int memory = 8
}
command <<<
# Extract base name without path or extension
input_base=$(basename ~{input_fasta})
input_base=${input_base%.*}
echo "DEBUG: input_base = $input_base"

# Run the tool
java -jar /paarsnp/paarsnp.jar \
-i ~{input_fasta} \
-s ~{amr_search_database}

# Move the output file from the input directory to the working directory
mv $(dirname ~{input_fasta})/${input_base}_paarsnp.jsn ./~{samplename}_paarsnp_results.jsn

# Script housed within the image; https://github.com/theiagen/theiagen_docker_builds/tree/awh-amrsearch-image/amrsearch/0.0.20
python3 /scripts/parse_amr_json.py \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth a comment here with a link to the location of this script, for posterity's sake

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Ill put the link to the current dev branch of the docker builds repo and will update it when it gets merged.

./~{samplename}_paarsnp_results.jsn \
~{samplename}
>>>
output {
File json_output = "~{samplename}_paarsnp_results.jsn"
File output_csv = "~{samplename}_amr_results.csv"
File output_png = "~{samplename}_amr_results.png"
File output_version = "output_amr_version.txt"
String amr_search_docker = docker
}

runtime {
memory: "~{memory} GB"
cpu: cpu
docker: docker
disks: "local-disk " + disk_size + " SSD"
maxRetries: 1
}
}
4 changes: 4 additions & 0 deletions workflows/theiaprok/wf_theiaprok_fasta.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -636,6 +636,10 @@ workflow theiaprok_fasta {
String? ngmaster_ngstar_gyrA_allele = merlin_magic.ngmaster_ngstar_gyrA_allele
String? ngmaster_ngstar_parC_allele = merlin_magic.ngmaster_ngstar_parC_allele
String? ngmaster_ngstar_23S_allele = merlin_magic.ngmaster_ngstar_23S_allele
# Neisseria gonorrhoeae AMR
# File? amr_search_results = merlin_magic.amr_search_results
# String? amr_search_version = merlin_magic.amr_search_version
# String? amr_search_docker = merlin_magic.amr_search_docker
# Neisseria meningitidis Typing
File? meningotype_tsv = merlin_magic.meningotype_tsv
String? meningotype_version = merlin_magic.meningotype_version
Expand Down
4 changes: 4 additions & 0 deletions workflows/theiaprok/wf_theiaprok_illumina_pe.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -938,6 +938,10 @@ workflow theiaprok_illumina_pe {
String? ngmaster_ngstar_gyrA_allele = merlin_magic.ngmaster_ngstar_gyrA_allele
String? ngmaster_ngstar_parC_allele = merlin_magic.ngmaster_ngstar_parC_allele
String? ngmaster_ngstar_23S_allele = merlin_magic.ngmaster_ngstar_23S_allele
# Neisseria gonorrhoeae AMR
# File? amr_search_results = merlin_magic.amr_search_results
# String? amr_search_version = merlin_magic.amr_search_version
# String? amr_search_docker = merlin_magic.amr_search_docker
# Neisseria meningitidis Typing
File? meningotype_tsv = merlin_magic.meningotype_tsv
String? meningotype_version = merlin_magic.meningotype_version
Expand Down
4 changes: 4 additions & 0 deletions workflows/theiaprok/wf_theiaprok_illumina_se.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -866,6 +866,10 @@ workflow theiaprok_illumina_se {
String? ngmaster_ngstar_gyrA_allele = merlin_magic.ngmaster_ngstar_gyrA_allele
String? ngmaster_ngstar_parC_allele = merlin_magic.ngmaster_ngstar_parC_allele
String? ngmaster_ngstar_23S_allele = merlin_magic.ngmaster_ngstar_23S_allele
# Neisseria gonorrhoeae AMR
# File? amr_search_results = merlin_magic.amr_search_results
# String? amr_search_version = merlin_magic.amr_search_version
# String? amr_search_docker = merlin_magic.amr_search_docker
# Neisseria meningitidis Typing
File? meningotype_tsv = merlin_magic.meningotype_tsv
String? meningotype_version = merlin_magic.meningotype_version
Expand Down
4 changes: 4 additions & 0 deletions workflows/theiaprok/wf_theiaprok_ont.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -815,6 +815,10 @@ workflow theiaprok_ont {
String? ngmaster_ngstar_gyrA_allele = merlin_magic.ngmaster_ngstar_gyrA_allele
String? ngmaster_ngstar_parC_allele = merlin_magic.ngmaster_ngstar_parC_allele
String? ngmaster_ngstar_23S_allele = merlin_magic.ngmaster_ngstar_23S_allele
# Neisseria gonorrhoeae AMR
# File? amr_search_results = merlin_magic.amr_search_results
# String? amr_search_version = merlin_magic.amr_search_version
# String? amr_search_docker = merlin_magic.amr_search_docker
# Neisseria meningitidis Typing
File? meningotype_tsv = merlin_magic.meningotype_tsv
String? meningotype_version = merlin_magic.meningotype_version
Expand Down
27 changes: 27 additions & 0 deletions workflows/utilities/wf_amr_search.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
version 1.0

import "../../tasks/gene_typing/drug_resistance/task_amr_search.wdl" as run_amr_search

workflow amr_search_workflow {
input {
File input_fasta
String amr_search_database
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider changing this as the user doesn't need to pass in a DB, only the taxon code that then references the correct DB to use. Taxon/taxon_of_interest/taxon_code might make more sense. Some of this will depend on how we do the mapping when we plug into TheiaProk, but just dropping this as a note for future work when we implement that integration upstream.

String samplename
}

# Call amr_search task to perform the analysis
call run_amr_search.amr_search {
input:
input_fasta = input_fasta,
samplename = samplename,
amr_search_database = amr_search_database
}

output {
File amr_search_results = amr_search.json_output
File amr_results_csv = amr_search.output_csv
File amr_results_png = amr_search.output_png
String amr_search_docker = amr_search.amr_search_docker
String amr_search_version = read_string(amr_search.output_version)
}
}
Loading