-
Notifications
You must be signed in to change notification settings - Fork 18
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
8f88276
commit cc8770e
Showing
1 changed file
with
14 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,39 +8,39 @@ | |
|
||
## Dorado Basecalling Overview | ||
|
||
The Dorado Basecalling workflow is used to convert Oxford Nanopore `POD5` sequencing files into `FASTQ` format by utilizing a GPU-accelerated environment. This workflow is ideal for high-throughput applications where fast and accurate basecalling is essential. | ||
The Dorado Basecalling workflow is used to convert Oxford Nanopore `POD5` sequencing files into `FASTQ` format by utilizing a GPU-accelerated environment. This workflow is ideal for high-throughput applications where fast and accurate basecalling is essential. The workflow will upload fastq files to a user designated terra table for downstream analysis. | ||
|
||
### Model Type Selection | ||
|
||
Users can choose between automatic or manual model selection using a configurable use_auto_model flag: | ||
|
||
- Automatic Model Selection: Automatically picks the best model ('sup', 'hac', or 'fast') based on the input file and user-defined model accuracy paramater. | ||
Automatic Model Selection: Automatically picks the best model ('sup', 'hac', or 'fast') based on the input file and user-defined model accuracy paramater. | ||
|
||
- Manual Model Input: If the user disables automatic selection, a specific model path or model version must be provided. | ||
Manual Model Input: If the user disables automatic selection, a specific model path or model version must be provided. | ||
|
||
- **Model Type (sup):** (super accuracy) The most accurate model, recommended for critical applications requiring the highest basecall accuracy. It is the slowest of the three model types. | ||
- **Model Type (hac):** (High Accuracy) A balance between speed and accuracy, recommended for most users. Provides accurate results faster than `sup` but less accurate than `sup`. | ||
- **Model Type (fast):** (Fast Model) The fastest model, recommended when speed is prioritized over accuracy, such as for initial analyses or non-critical applications. | ||
|
||
**Example Manual Models:** | ||
### Example Manual Models: | ||
- `[email protected]` | ||
- `[email protected]` | ||
- `[email protected]` | ||
|
||
## **Workflow Structure** | ||
### Workflow Structure | ||
|
||
1. **Dorado Basecalling**: Converts `POD5` files to **SAM** files using the specified model. | ||
1. **Dorado Basecalling**: Converts `POD5` files to 'SAM' files using the specified model. | ||
2. **Samtools Convert**: Converts the generated SAM files to BAM for efficient processing. | ||
3. **Dorado Demultiplexing**: Demultiplexes BAM files to produce barcode-specific FASTQ files. | ||
4. **FASTQ File Transfer**: Transfers files to Terra for downstream analysis. | ||
5. **Terra Table Creation**: Generates a Terra table with the uploaded FASTQ files for downstream analyses. | ||
|
||
--- | ||
|
||
## **Inputs** | ||
## Inputs | ||
|
||
| **Task** | **Variable** | **Type** | **Description** | **Default Value** | **Required** | | ||
|---|---|---|---|---|---|---| | ||
|---|---|---|---|---|---| | ||
| Basecalling | **input_files** | Array[File] | Array of `POD5` files for basecalling | None | Yes | | ||
| Basecalling | **use_auto_model** | Boolean | Use automatic model selection (`sup`, `hac`, or `fast` based on model accuracy)| true | No | | ||
| Basecalling | **model_accuracy** | String | Desired model accuracy (`sup`, `hac`, `fast`) if using automatic selection | sup | No | | ||
|
@@ -58,18 +58,21 @@ Users can choose between automatic or manual model selection using a configurabl | |
|
||
--- | ||
|
||
### **Detailed Input Information** | ||
### Detailed Input Information | ||
- **fastq_file_name**: This will serve as a prefix for the output FASTQ files. For example, if you provide `project001`, the resulting files will be named `project001_barcodeXX.fastq.gz`. | ||
- **kit_name**: Ensure the correct kit name is provided, as it determines the barcoding and adapter trimming behavior. | ||
- **fastq_upload_path**: This is the folder path in Terra where the final FASTQ files will be transferred for further analysis. Ensure the path matches your Terra workspace configuration. | ||
|
||
--- | ||
|
||
## **Outputs** | ||
## Outputs | ||
|
||
| **Variable** | **Type** | **Description** | | ||
|---|---|---|---| | ||
|---|---|---| | ||
| **basecalled_fastqs** | Array[File] | Array of FASTQ files generated from basecalling | | ||
| **demuxed_fastqs** | Array[File] | FASTQ files produced from BAM demultiplexing | | ||
| **logs** | Array[File] | Log files from the demultiplexing process | | ||
| **terra_table_tsv** | File | TSV file for Terra table upload | | ||
|
||
<!-- --> | ||
><https://github.com/nanoporetech/dorado/> |