From b45dde689999d0b2e0f183858a399ebb035717b1 Mon Sep 17 00:00:00 2001 From: fraser-combe Date: Fri, 25 Oct 2024 10:39:43 -0500 Subject: [PATCH] update documentation snippy qc description --- .../phylogenetic_construction/snippy_streamline.md | 6 ++++++ .../phylogenetic_construction/snippy_streamline_fasta.md | 8 ++++++++ .../phylogenetic_construction/snippy_variants.md | 8 +++++++- 3 files changed, 21 insertions(+), 1 deletion(-) diff --git a/docs/workflows/phylogenetic_construction/snippy_streamline.md b/docs/workflows/phylogenetic_construction/snippy_streamline.md index a777641b5..65d955422 100644 --- a/docs/workflows/phylogenetic_construction/snippy_streamline.md +++ b/docs/workflows/phylogenetic_construction/snippy_streamline.md @@ -169,6 +169,12 @@ For all cases: `Snippy_Variants` aligns reads for each sample against the reference genome. As part of `Snippy_Streamline`, the only output from this workflow is the `snippy_variants_outdir_tarball` which is provided in the set-level data table. Please see the full documentation for [Snippy_Variants](./snippy_variants.md) for more information. +??? task "snippy_variants" (qc_metrics output) + + ##### snippy_variants {#snippy_variants} + + This task runs Snippy to perform SNP analysis on individual samples. It extracts QC metrics from the Snippy output for each sample and saves them in per-sample TSV files (`snippy_variants_qc_metrics`). These per-sample QC metrics are then combined into a single file (`snippy_combined_qc_metrics`) in the downstream `snippy_tree_wf` workflow. + ??? task "Snippy_Tree workflow" ##### Snippy_Tree {#snippy_tree} diff --git a/docs/workflows/phylogenetic_construction/snippy_streamline_fasta.md b/docs/workflows/phylogenetic_construction/snippy_streamline_fasta.md index 6b33499ff..c7667806c 100644 --- a/docs/workflows/phylogenetic_construction/snippy_streamline_fasta.md +++ b/docs/workflows/phylogenetic_construction/snippy_streamline_fasta.md @@ -107,6 +107,14 @@ The `Snippy_Streamline_FASTA` workflow is an all-in-one approach to generating a | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional | | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) | | Optional | +### Workflow Tasks + +??? task "snippy_variants" (qc_metrics output) + + ##### snippy_variants {#snippy_variants} + + This task runs Snippy to perform SNP analysis on individual samples. It extracts QC metrics from the Snippy output for each sample and saves them in per-sample TSV files (`snippy_variants_qc_metrics`). These per-sample QC metrics are then combined into a single file (`snippy_combined_qc_metrics`) in the downstream `snippy_tree_wf` workflow. + ### Outputs | **Variable** | **Type** | **Description** | diff --git a/docs/workflows/phylogenetic_construction/snippy_variants.md b/docs/workflows/phylogenetic_construction/snippy_variants.md index 8b041ed1a..1137f4c9f 100644 --- a/docs/workflows/phylogenetic_construction/snippy_variants.md +++ b/docs/workflows/phylogenetic_construction/snippy_variants.md @@ -58,6 +58,13 @@ The `Snippy_Variants` workflow aligns single-end or paired-end reads (in FASTQ f `Snippy_Variants` uses the snippy tool to align reads to the reference and call SNPs, MNPs and INDELs according to optional input parameters. The output includes a file of variants that is then queried using the `grep` bash command to identify any mutations in specified genes or annotations of interest. The query string MUST match the gene name or annotation as specified in the GenBank file and provided in the output variant file in the `snippy_results` column. +Additionally, `Snippy_Variants` extracts quality control (QC) metrics from the Snippy output for each sample. These per-sample QC metrics are saved in TSV files (`snippy_variants_qc_metrics`). The QC metrics include: + +- **Percentage of reads aligned to the reference genome** (`snippy_variants_percent_reads_aligned`). +- **Percentage of the reference genome covered at or above the specified depth threshold** (`snippy_variants_percent_ref_coverage`). + +These per-sample QC metrics can be combined into a single file (`snippy_combined_qc_metrics`) in downstream workflows, such as `snippy_tree_wf`, providing an overview of QC metrics across all samples. + ### Outputs !!! tip "Visualize your outputs in IGV" @@ -68,7 +75,6 @@ The `Snippy_Variants` workflow aligns single-end or paired-end reads (in FASTQ f | **Variable** | **Type** | **Description** | |---|---|---| -| snippy_combined_qc_metrics | File | Combined QC metrics file containing concatenated QC metrics from all samples. The file is a tab-separated values (TSV) file with the following columns:
- samplename
- reads_aligned_to_reference
- total_reads
- percent_reads_aligned
- variants_total
- percent_ref_coverage
- #rname
- startpos
- endpos
- numreads
- covbases
- coverage
- meandepth
- meanbaseq
- meanmapq

The last set of columns (`#rname` to `meanmapq`) may repeat for each chromosome or contig in the reference genome. | | snippy_variants_bai | File | Indexed bam file of the reads aligned to the reference | | snippy_variants_bam | File | Bam file of reads aligned to the reference | | snippy_variants_coverage_tsv | File | Coverage statistics TSV file output by the `samtools coverage` command, providing genome-wide metrics such as the proportion of bases covered (depth ≥ 1), mean depth, and other related statistics. |