theiagen · Michal-Babins · Dec 31, 2024 · Dec 24, 2024 · Dec 24, 2024 · Dec 24, 2024
@@ -4,7 +4,7 @@
 
 | **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibliity** | **Workflow Level** |
 |---|---|---|---|---|
-| [Genomic Characterization](../../workflows_overview/workflows_type.md/#genomic-characterization) | [Mycotics](../../workflows_overview/workflows_kingdom.md/#mycotics) | PHB v2.3.0 | Yes | Sample-level |
+| [Genomic Characterization](../../workflows_overview/workflows_type.md/#genomic-characterization) | [Mycotics](../../workflows_overview/workflows_kingdom.md/#mycotics) | PHB vX.X.X | Yes | Sample-level |
 
 ## TheiaEuk Workflows
 
@@ -598,64 +598,100 @@ All input reads are processed through "core tasks" in the TheiaEuk workflows. Th
 
 | **Variable** | **Type** | **Description** |
 |---|---|---|
+| assembly_fasta | File | _De novo_ genome assembly in FASTA format |
+| assembly_length | Int | Length of assembly (total number of nucleotides) as determined by QUAST |
+| bbduk_docker| String | BBDuk docker image used |
+| busco_database | String | BUSCO database used |
+| busco_docker | String | BUSCO docker image used |
+| busco_report | File | A plain text summary of the results in BUSCO notation |
+| busco_results | String | BUSCO results (see above for explanation of BUSCO notation) |
+| busco_version | String | BUSCO software version used |
 | cg_pipeline_docker | String | Docker file used for running CG-Pipeline on cleaned reads |
 | cg_pipeline_report | File | TSV file of read metrics from raw reads, including average read length, number of reads, and estimated genome coverage |
-| est_coverage_clean | Float | Estimated coverage calculated from   clean reads and genome length |
-| est_coverage_raw | Float | Estimated coverage calculated from  raw reads and genome length |
+| cladetyper_annotated_reference | String | The annotated reference file for the identified clade, "None" if no clade was identified |
+| cladetyper_clade | String | The clade assigned to the input assembly |
+| cladetyper_docker_image | String | The Docker container used for the task |
+| cladetyper_gambit_version | String | The version of GAMBIT used for the analysis |
+| combined_mean_q_clean | Float | Mean quality score for the combined clean reads |
+| combined_mean_q_raw | Float | Mean quality score for the combined raw reads |
+| combined_mean_readlength_clean | Float | Mean read length for the combined clean reads |
+| combined_mean_readlength_raw | Float | Mean read length for the combined raw reads |
+| contigs_fastg | File | Assembly graph if megahit used for genome assembly |
+| contigs_gfa | File | Assembly graph if spades used for genome assembly |
+| contigs_lastgraph | File | Assembly graph if velvet used for genome assembly |
+| est_coverage_clean | Float | Estimated coverage calculated from clean reads and genome length |
+| est_coverage_raw | Float | Estimated coverage calculated from raw reads and genome length |
+| fastp_html_report | File | The HTML report made with fastp |
+| fastp_version | String | Version of fastp software used |
 | fastq_scan_clean1_json | File | JSON file output from `fastq-scan` containing summary stats about clean forward read quality and length |
 | fastq_scan_clean2_json | File | JSON file output from `fastq-scan` containing summary stats about clean reverse read quality and length |
+ fastq_scan_num_reads_clean_pairs | String | Number of read pairs after cleaning as calculated by fastq_scan |
+| fastq_scan_num_reads_clean1 | Int | Number of forward reads after cleaning as calculated by fastq_scan |
+| fastq_scan_num_reads_clean2 | Int | Number of reverse reads after cleaning as calculated by fastq_scan |
+| fastq_scan_num_reads_raw_pairs | String | Number of input read pairs calculated by fastq_scan |
+| fastq_scan_num_reads_raw1 | Int | Number of input forward reads calculated by fastq_scan |
+| fastq_scan_num_reads_raw2 | Int | Number of input reverse reads calculated by fastq_scan |
+| fastq_scan_num_reads_raw_pairs | String | Number of input read pairs calculated by fastq_scan |
 | fastq_scan_raw1_json | File | JSON file output from `fastq-scan` containing summary stats about raw forward read quality and length |
 | fastq_scan_raw2_json | File | JSON file output from `fastq-scan` containing summary stats about raw reverse read quality and length |
-| r1_mean_q_clean | Float | Mean quality score of clean forward reads |
-| r1_mean_q_raw | Float | Mean quality score of raw forward reads |
-| r2_mean_q_clean | Float | Mean quality score of clean reverse reads |
-| r2_mean_q_raw | Float | Mean quality score of raw reverse reads |
 | fastq_scan_version | String | Version of fastq-scan software used |
+| fastqc_clean1_html | File | Graphical visualization of clean forward read quality from fastqc to open in an internet browser |
+| fastqc_clean2_html | File | Graphical visualization of clean reverse read quality from fastqc to open in an internet browser |
+| fastqc_docker | String | Docker container used with fastqc |
+| fastqc_num_reads_clean1 | Int | Number of forward reads after cleaning by fastqc |
+| fastqc_num_reads_clean2 | Int | Number of reverse reads after cleaning by fastqc |
+| fastqc_num_reads_clean_pairs | String | Number of read pairs after cleaning by fastqc |
+| fastqc_num_reads_raw1 | Int | Number of input reverse reads by fastqc |
+| fastqc_num_reads_raw2 | Int | Number of input reverse reads by fastqc |
+| fastqc_num_reads_raw_pairs | String | Number of input read pairs by fastqc |
+| fastqc_raw1_html | File | Graphical visualization of raw forward read quality from fastqc to open in an internet browser |
+| fastqc_raw2_html | File | Graphical visualization of raw reverse read qualityfrom fastqc to open in an internet browser |
+| fastqc_version | String | Version of fastqc software used |
 | gambit_closest_genomes | File | CSV file listing genomes in the GAMBIT database that are most similar to the query assembly |
 | gambit_db_version | String | Version of GAMBIT used |
 | gambit_docker | String | GAMBIT docker file used |
 | gambit_predicted_taxon | String | Taxon predicted by GAMBIT |
 | gambit_predicted_taxon_rank | String | Taxon rank of GAMBIT taxon prediction |
 | gambit_report | File | GAMBIT report in a machine-readable format |
 | gambit_version | String | Version of GAMBIT software used |
-| assembly_length | Int | Length of assembly (total contig length) as determined by QUAST |
 | n50_value | Int | N50 of assembly calculated by QUAST |
 | number_contigs | Int | Total number of contigs in assembly |
+| qc_check | String | A string that indicates whether or not the sample passes a set of pre-determined and user-provided QC thresholds |
+| qc_standard | File | The user-provided file that contains the QC thresholds used for the QC check |
+| quast_gc_percent | Float | The GC percent of your sample |
 | quast_report | File | TSV report from QUAST |
 | quast_version | String | Software version of QUAST used |
+| r1_mean_q_raw | Float | Mean quality score of raw forward reads |
+| r1_mean_readlength_raw | Float | Mean read length of raw forward reads |
+| r2_mean_q_raw | Float | Mean quality score of raw reverse reads |
+| r2_mean_readlength_clean | Float | Mean read length of clean reverse reads |
 | rasusa_version | String | Version of rasusa used |
-| read1_subsampled | File | Subsampled read1 file |
-| read2_subsampled | File | Subsampled read2 file |
-| bbduk_docker | String | BBDuk docker image used  |
-| fastp_version | String | Version of fastp software used |
 | read1_clean | File | Clean forward reads file |
+| read1_subsampled | File | Subsampled read1 file |
 | read2_clean | File | Clean reverse reads file |
-| num_reads_clean_pairs | String | Number of read pairs after cleaning |
-| num_reads_clean1 | Int | Number of forward reads after cleaning |
-| num_reads_clean2 | Int | Number of reverse reads after cleaning |
-| num_reads_raw_pairs | String | Number of input read pairs |
-| num_reads_raw1 | Int | Number of input forward reads |
-| num_reads_raw2 | Int | Number of input reverse reads |
-| trimmomatic_version | String | Version of trimmomatic used |
-| clean_read_screen | String | PASS or FAIL result from clean read screening; FAIL accompanied by the reason for failure |
-| raw_read_screen | String | PASS or FAIL result from raw read screening; FAIL accompanied by thereason for failure |
-| assembly_fasta | File | <https://github.com/tseemann/shovill#contigsfa> |
-| contigs_fastg | File | Assembly graph if megahit used for genome assembly |
-| contigs_gfa | File | Assembly graph if spades used for genome assembly |
-| contigs_lastgraph | File | Assembly graph if velvet used for genome assembly |
+| read2_subsampled | File | Subsampled read2 file |
+| read_screen_clean | String | PASS or FAIL result from clean read screening; FAIL accompanied by the reason for failure | ONT, PE, SE |
+| read_screen_raw | String | PASS or FAIL result from raw read screening; FAIL accompanied by thereason for failure |
+| seq_platform | String | Sequencing platform input by the user |
 | shovill_pe_version | String | Shovill version used |
-| theiaeuk_snippy_variants_bam | File | BAM file produced by the snippy module |
+| theiaeuk_illumina_pe_analysis_date | String | Date of TheiaEuk PE workflow execution |
+| theiaeuk_illumina_pe_version | String | TheiaEuk PE workflow version used |
+| theiaeuk_snippy_variants_bai | String | BAI file produced by the snippy module |
+| theiaeuk_snippy_variants_bam | String | BAM file produced by the snippy module |
+| theiaeuk_snippy_variants_coverage_tsv | String | TSV file containing coverage information for each base in the reference genome |
 | theiaeuk_snippy_variants_gene_query_results | File | File containing all lines from variants file matching gene query terms |
 | theiaeuk_snippy_variants_hits | String | String of all variant file entries matching gene query term |
+| theiaeuk_snippy_variants_num_reads_aligned | String | Number of reads aligned by snippy |
+| theiaeuk_snippy_variants_num_variants | Int | Number of variants detected by snippy |
 | theiaeuk_snippy_variants_outdir_tarball | File | Tar compressed file containing full snippy output directory |
+| theiaeuk_snippy_variants_percent_ref_coverage | String | Percent of reference genome covered by snippy |
 | theiaeuk_snippy_variants_query | String | The gene query term(s) used to search variant |
 | theiaeuk_snippy_variants_query_check | String | Were the gene query terms present in the refence annotated genome file |
 | theiaeuk_snippy_variants_reference_genome | File | The reference genome used in the alignment and variant calling |
 | theiaeuk_snippy_variants_results | File | The variants file produced by snippy |
 | theiaeuk_snippy_variants_summary | File | A file summarizing the variants detected by snippy |
 | theiaeuk_snippy_variants_version | String | The version of the snippy_variants module being used |
-| seq_platform | String | Sequencing platform inout by the user |
-| theiaeuk_illumina_pe_analysis_date | String | Date of TheiaProk workflow execution |
-| theiaeuk_illumina_pe_version | String | TheiaProk workflow version used |
+| trimmomatic_docker | String | Docker image used for trimmomatic |
+| trimmomatic_version | String | Version of trimmomatic used |
 
 </div>
@@ -1,22 +1,86 @@
 # Cauris_CladeTyper
 
-!!! warning "NEEDS WORK!!!!"
-    This page is under construction and will be updated soon.
-
 ## Quick Facts
 
 | **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** |
 |---|---|---|---|---|
-| [Standalone](../../workflows_overview/workflows_type.md/#standalone) | [Mycotics](../../workflows_overview/workflows_kingdom.md#mycotics) | PHB v1.0.0 | Yes | Sample-level |
+| [Standalone](../../workflows_overview/workflows_type.md/#standalone) | [Mycotics](../../workflows_overview/workflows_kingdom.md#mycotics) | PHB vX.X.X | Yes | Sample-level |
 
 ## Cauris_CladeTyper_PHB
 
-The Cauris_CladeTyper_PHB Workflow is designed to assign clade to _Candida auris_ Whole Genome Sequencing assemblies based on their genomic sequence similarity to the five clade-specific reference files. Clade typing is essential for understanding the epidemiology and evolutionary dynamics of this emerging multidrug-resistant fungal pathogen.
+The Cauris_CladeTyper_PHB Workflow is designed to assign the clade to _Candida auris_ (also known as _Candidozyma auris_) WGS assemblies based on their genomic sequence similarity to the five clade-specific reference files. Clade typing is essential for understanding the epidemiology and evolutionary dynamics of this emerging multidrug-resistant fungal pathogen.
 
 ### Inputs
 
+<div class="searchable-table" markdown="1">
+
+| **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
+|---|---|---|---|---|---|
+| cauris_cladetyper | **assembly_fasta** | File | The input assembly file in FASTA format | | Required |
+| cauris_cladetyper | **samplename** | String | The name of the sample being analyzed | | Required |
+| cladetyper | **cpu** | Int | Number of CPUs to allocate to the task | 8 | Optional |
+| cladetyper | **disk_size** | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
+| cladetyper | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/biocontainers/hesslab-gambit:0.5.1--py37h8902056_0" | Optional |
+| cladetyper | **kmer_size** | Int | The kmer size to use for generating the GAMBIT signatures file; see GAMBIT documentation for more details | 11 | Optional |
+| cladetyper | **memory** | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional |
+| cladetyper | **ref_clade1** | File | The reference assembly for clade 1 | gs://theiagen-public-files/terra/candida_auris_refs/Cauris_Clade1_GCA_002759435.2_Cand_auris_B8441_V2_genomic.fasta | Optional |
+| cladetyper | **ref_clade1_annotated** | String | The path to the annotated reference for clade 1 | "gs://theiagen-public-files/terra/candida_auris_refs/Cauris_Clade1_GCA_002759435_Cauris_B8441_V2_genomic.gbff" | Optional |
+| cladetyper | **ref_clade2** | File | The reference assembly for clade 2 | gs://theiagen-public-files/terra/candida_auris_refs/Cauris_Clade2_GCA_003013715.2_ASM301371v2_genomic.fasta | Optional |
+| cladetyper | **ref_clade2_annotated** | String | The path to the annotated reference for clade 2 | "gs://theiagen-public-files/terra/candida_auris_refs/Cauris_Clade2_GCA_003013715.2_ASM301371v2_genomic.gbff"| Optional |
+| cladetyper | **ref_clade3** | File | The reference assembly for clade 3 | gs://theiagen-public-files/terra/candida_auris_refs/Cauris_Clade3_reference.fasta | Optional |
+| cladetyper | **ref_clade3_annotated** | String | The path to the annotated reference for clade 3 | "gs://theiagen-public-files/terra/candida_auris_refs/Cauris_Clade3_GCF_002775015.1_Cand_auris_B11221_V1_genomic.gbff" | Optional |
+| cladetyper | **ref_clade4** | File | The reference assembly for clade 4 | gs://theiagen-public-files/terra/candida_auris_refs/Cauris_Clade4_reference.fasta | Optional |
+| cladetyper | **ref_clade4_annotated** | String | The path to the annotated reference for clade 4 | "gs://theiagen-public-files/terra/candida_auris_refs/Cauris_Clade4_GCA_003014415.1_Cand_auris_B11243_genomic.gbff" | Optional |
+| cladetyper | **ref_clade5** | File | The reference assembly for clade 5 | gs://theiagen-public-files/terra/candida_auris_refs/Cauris_Clade5_GCA_016809505.1_ASM1680950v1_genomic.fasta | Optional |
+| cladetyper | **ref_clade5_annotated** | String | The path to the annotated reference for clade 5 | "gs://theiagen-public-files/terra/candida_auris_refs/Cauris_Clade5_GCA_016809505.1_ASM1680950v1_genomic.gbff" | Optional |
+| version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
+| version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
+
+</div>
+
 ### Workflow Tasks
 
-The Cauris_Cladetyper Workflow for _Candida auris_ employs GAMBIT for taxonomic identification, comparing whole genome sequencing data against reference databases to accurately classify _Candida auris_ isolates. A custom database featuring five clade-specific _Candida auris_ reference genomes facilitates clade typing. Sequences undergo genomic signature comparison against the custom database, enabling assignment to one of the five _Candida auris_ clades (Clade I to Clade V) based on sequence similarity and phylogenetic relationships. This integrated approach ensures precise clade assignments, crucial for understanding the genetic diversity and epidemiology of _Candida auris_.
+??? task "Cauris_Cladetyper"
+    The Cauris_Cladetyper Workflow for _Candida auris_ employs GAMBIT for taxonomic identification, comparing whole genome sequencing data against reference databases to accurately classify _Candida auris_ isolates.
+
+    A custom GAMBIT database is created using five clade-specific _Candida auris_ reference genomes. Sequences undergo genomic signature comparison against this database, which then enables assignment to one of the five _Candida auris_ clades (Clade I to Clade V) based on sequence similarity and phylogenetic relationships. This integrated approach ensures precise clade assignments, crucial for understanding the genetic diversity and epidemiology of _Candida auris_.
+
+    See more information on the reference information for the five clades below:
+
+    | Clade | Genome Accession | Assembly Name | Strain | BioSample Accession |
+    |---|---|---|---|---|
+    | Clade I | GCA_002759435.2 | Cand_auris_B8441_V2 | B8441 | SAMN05379624 |
+    | Clade II | GCA_003013715.2 | ASM301371v2 | B11220 | SAMN05379608 |
+    | Clade III | GCA_002775015.1 | Cand_auris_B11221_V1 | B11221 | SAMN05379609 |
+    | Clade IV | GCA_003014415.1 | Cand_auris_B11243 | B11243 | SAMN05379619 |
+    | Clade V | GCA_016809505.1 | ASM1680950v1 | IFRC2087 | SAMN11570381 |
 
+    !!! techdetails "Cauris_Cladetyper Technical Details"
+
+        |  | Links |
+        | --- | --- |
+        | Task | [task_cauris_cladetyper.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/candida/task_cauris_cladetyper.wdl) |
+        | Software Source Code | [GAMBIT on GitHub](https://github.com/jlumpe/gambit) |
+        | Software Documentation | [GAMBIT Overview](https://theiagen.notion.site/GAMBIT-7c1376b861d0486abfbc316480046bdc?pvs=4) |
+        | Original Publication(s) | [GAMBIT (Genomic Approximation Method for Bacterial Identification and Tracking): A methodology to rapidly leverage whole genome sequencing of bacterial isolates for clinical identification](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0277575) <br> [TheiaEuk: a species-agnostic bioinformatics workflow for fungal genomic characterization](https://doi.org/10.3389/fpubh.2023.1198213) |
+
 ### Outputs
+
+<div class="searchable-table" markdown="1">
+
+| **Variable** | **Type** | **Description** |
+|---|---|---|
+| cauris_cladetyper_wf_analysis_date | String | Date of analysis |
+| cauris_cladetyper_wf_version | String | Version of PHB used for the analysis |
+| cladetyper_annotated_reference | String | The annotated reference file for the identified clade, "None" if no clade was identified |
+| cladetyper_clade | String | The clade assigned to the input assembly |
+| cladetyper_docker_image | String | The Docker container used for the task |
+| cladetyper_gambit_version | String | The version of GAMBIT used for the analysis |
+
+</div>
+
+## References
+
+> Lumpe J, Gumbleton L, Gorzalski A, Libuit K, Varghese V, Lloyd T, et al. (2023) GAMBIT (Genomic Approximation Method for Bacterial Identification and Tracking): A methodology to rapidly leverage whole genome sequencing of bacterial isolates for clinical identification. PLoS ONE 18(2): e0277575. <https://doi.org/10.1371/journal.pone.0277575>
+<!-- -->
+> Ambrosio, Frank, Michelle Scribner, Sage Wright, James Otieno, Emma Doughty, Andrew Gorzalski, Danielle Siao, et al. 2023. "TheiaEuk: A Species-Agnostic Bioinformatics Workflow for Fungal Genomic Characterization." Frontiers in Public Health 11. <https://doi.org/10.3389/fpubh.2023.1198213>.