Skip to content

Commit

Permalink
update documentation snippy qc description
Browse files Browse the repository at this point in the history
  • Loading branch information
fraser-combe committed Oct 25, 2024
1 parent b7ecc00 commit b45dde6
Show file tree
Hide file tree
Showing 3 changed files with 21 additions and 1 deletion.
6 changes: 6 additions & 0 deletions docs/workflows/phylogenetic_construction/snippy_streamline.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,12 @@ For all cases:

`Snippy_Variants` aligns reads for each sample against the reference genome. As part of `Snippy_Streamline`, the only output from this workflow is the `snippy_variants_outdir_tarball` which is provided in the set-level data table. Please see the full documentation for [Snippy_Variants](./snippy_variants.md) for more information.

??? task "snippy_variants" (qc_metrics output)

##### snippy_variants {#snippy_variants}

This task runs Snippy to perform SNP analysis on individual samples. It extracts QC metrics from the Snippy output for each sample and saves them in per-sample TSV files (`snippy_variants_qc_metrics`). These per-sample QC metrics are then combined into a single file (`snippy_combined_qc_metrics`) in the downstream `snippy_tree_wf` workflow.

??? task "Snippy_Tree workflow"

##### Snippy_Tree {#snippy_tree}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,14 @@ The `Snippy_Streamline_FASTA` workflow is an all-in-one approach to generating a
| version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
| version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) | | Optional |

### Workflow Tasks

??? task "snippy_variants" (qc_metrics output)

##### snippy_variants {#snippy_variants}

This task runs Snippy to perform SNP analysis on individual samples. It extracts QC metrics from the Snippy output for each sample and saves them in per-sample TSV files (`snippy_variants_qc_metrics`). These per-sample QC metrics are then combined into a single file (`snippy_combined_qc_metrics`) in the downstream `snippy_tree_wf` workflow.

### Outputs

| **Variable** | **Type** | **Description** |
Expand Down
8 changes: 7 additions & 1 deletion docs/workflows/phylogenetic_construction/snippy_variants.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,13 @@ The `Snippy_Variants` workflow aligns single-end or paired-end reads (in FASTQ f

`Snippy_Variants` uses the snippy tool to align reads to the reference and call SNPs, MNPs and INDELs according to optional input parameters. The output includes a file of variants that is then queried using the `grep` bash command to identify any mutations in specified genes or annotations of interest. The query string MUST match the gene name or annotation as specified in the GenBank file and provided in the output variant file in the `snippy_results` column.

Additionally, `Snippy_Variants` extracts quality control (QC) metrics from the Snippy output for each sample. These per-sample QC metrics are saved in TSV files (`snippy_variants_qc_metrics`). The QC metrics include:

- **Percentage of reads aligned to the reference genome** (`snippy_variants_percent_reads_aligned`).
- **Percentage of the reference genome covered at or above the specified depth threshold** (`snippy_variants_percent_ref_coverage`).

These per-sample QC metrics can be combined into a single file (`snippy_combined_qc_metrics`) in downstream workflows, such as `snippy_tree_wf`, providing an overview of QC metrics across all samples.

### Outputs

!!! tip "Visualize your outputs in IGV"
Expand All @@ -68,7 +75,6 @@ The `Snippy_Variants` workflow aligns single-end or paired-end reads (in FASTQ f

| **Variable** | **Type** | **Description** |
|---|---|---|
| snippy_combined_qc_metrics | File | Combined QC metrics file containing concatenated QC metrics from all samples. The file is a tab-separated values (TSV) file with the following columns:<br>- samplename<br>- reads_aligned_to_reference<br>- total_reads<br>- percent_reads_aligned<br>- variants_total<br>- percent_ref_coverage<br>- #rname<br>- startpos<br>- endpos<br>- numreads<br>- covbases<br>- coverage<br>- meandepth<br>- meanbaseq<br>- meanmapq<br><br>The last set of columns (`#rname` to `meanmapq`) may repeat for each chromosome or contig in the reference genome. |
| snippy_variants_bai | File | Indexed bam file of the reads aligned to the reference |
| snippy_variants_bam | File | Bam file of reads aligned to the reference |
| snippy_variants_coverage_tsv | File | Coverage statistics TSV file output by the `samtools coverage` command, providing genome-wide metrics such as the proportion of bases covered (depth ≥ 1), mean depth, and other related statistics. |
Expand Down

0 comments on commit b45dde6

Please sign in to comment.