Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Workflow] Adding the TheiaMeta_Panel_Illumina_PE Workflow #656

Draft
wants to merge 56 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
7b265b2
make theiameta_panel
sage-wright Sep 24, 2024
b3bd529
rename taxon id vars in org param
sage-wright Sep 24, 2024
593e5b5
language
sage-wright Sep 24, 2024
01c1223
progress
sage-wright Sep 26, 2024
2c63022
notes
sage-wright Oct 9, 2024
83e8add
finish
sage-wright Oct 10, 2024
e8e757d
does this work?
sage-wright Oct 10, 2024
8416a8e
set required for now
sage-wright Oct 10, 2024
8ddf11b
correct terrible spelling
sage-wright Oct 10, 2024
236230f
add runtime
cimendes Oct 11, 2024
3d23bce
start documentation
sage-wright Oct 11, 2024
75c7224
add information on workflow tasks to documentation
sage-wright Oct 15, 2024
c177267
Merge branch 'main' into smw-theiameta-panel-dev
sage-wright Oct 15, 2024
e8312a5
remove krona
sage-wright Oct 15, 2024
b5260e4
add + to everything????
sage-wright Oct 16, 2024
aad8ec4
remove from one array
sage-wright Oct 16, 2024
991d540
also remove from that one too
sage-wright Oct 16, 2024
f56ca68
trying something cRaZy
sage-wright Oct 16, 2024
a04af99
it doesn't work
sage-wright Oct 17, 2024
498d07c
more crazy ideas?
cimendes Oct 17, 2024
688f89b
maybe basename is a good idea
cimendes Oct 17, 2024
fce8abb
change to json
cimendes Oct 17, 2024
cc97d70
sort of works but is ugly
cimendes Oct 17, 2024
9a7086a
IT WORKS
cimendes Oct 17, 2024
bc96474
clean up
cimendes Oct 17, 2024
93bb88b
add dummy genome length & logic block consensus qc
sage-wright Oct 21, 2024
c4cf61b
remove null values from identified_organisms otuput
sage-wright Oct 21, 2024
4e1c373
add versioning
sage-wright Oct 21, 2024
148cb9d
up to 1000
sage-wright Oct 21, 2024
8c7de78
make theiameta_panel fault-resistant, has impacts on theiameta_illumi…
cimendes Oct 21, 2024
365837e
add catch if assembly file is empty
cimendes Oct 21, 2024
5bcb25b
remove exit 1 because it's causing task to fail
sage-wright Oct 21, 2024
166e9fb
update contributions
sage-wright Oct 21, 2024
05d55f7
add warnings to gathered output
sage-wright Oct 21, 2024
ff73187
bump up al
sage-wright Oct 21, 2024
2e25834
work on inputs
sage-wright Oct 22, 2024
c04fa48
hide some optional inputs
sage-wright Oct 23, 2024
43e0efd
add inputs and outputs to docs
sage-wright Oct 23, 2024
4c8d373
Merge branch 'main' into smw-theiameta-panel-dev
sage-wright Oct 23, 2024
547a920
enable searchable
sage-wright Oct 23, 2024
82695a7
set default, expand docs
sage-wright Oct 24, 2024
033cbc0
update contributors
sage-wright Oct 28, 2024
c8b658a
input explosion
sage-wright Oct 28, 2024
92acb24
make good
sage-wright Oct 28, 2024
a3c7c52
document the explosion
sage-wright Oct 28, 2024
b0498ae
optionalize extracted reads
sage-wright Oct 28, 2024
48a26a2
add flu outputs to gather scatter
sage-wright Oct 28, 2024
fcc17d9
finish documentation
sage-wright Oct 28, 2024
8bc64fb
clean up docs
sage-wright Oct 28, 2024
931e815
update md5sums
sage-wright Oct 28, 2024
42e5503
move taxon_id conversion to its own file; remove comment cruft
sage-wright Nov 4, 2024
d5bebaf
name things lol
sage-wright Nov 4, 2024
cf40557
typing issues, again
sage-wright Nov 4, 2024
c6ff0c1
remove output
sage-wright Nov 4, 2024
522a016
Merge branch 'main' into smw-theiameta-panel-dev
sage-wright Nov 19, 2024
53d00c9
Merge branch 'main' into smw-theiameta-panel-dev
sage-wright Dec 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .dockstore.yml
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,11 @@ workflows:
primaryDescriptorPath: /workflows/theiameta/wf_theiameta_illumina_pe.wdl
testParameterFiles:
- /tests/inputs/empty.json
- name: TheiaMeta_Panel_Illumina_PE_PHB
subclass: WDL
primaryDescriptorPath: /workflows/theiameta/wf_theiameta_panel_illumina_pe.wdl
testParameterFiles:
- /tests/inputs/empty.json
- name: Snippy_Streamline_PHB
subclass: WDL
primaryDescriptorPath: /workflows/phylogenetics/wf_snippy_streamline.wdl
Expand Down
1 change: 0 additions & 1 deletion docs/workflows/genomic_characterization/theiameta.md
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,6 @@ The TheiaMeta_Illumina_PE workflow processes Illumina paired-end (PE) reads ge
`metaspades` is a _de novo_ assembler that first constructs a de Bruijn graph of all the reads using the SPAdes algorithm. Through various graph simplification procedures, paths in the assembly graph are reconstructed that correspond to long genomic fragments within the metagenome. For more details, please see the original publication.

!!! techdetails "MetaSPAdes Technical Details"

| | Links |
| --- | --- |
| Task | [task_metaspades.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/assembly/task_metaspades.wdl) |
Expand Down
687 changes: 687 additions & 0 deletions docs/workflows/genomic_characterization/theiameta_panel.md

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions docs/workflows_overview/workflows_alphabetically.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ title: Alphabetical Workflows
| [**TheiaCov Workflow Series**](../workflows/genomic_characterization/theiacov.md) | Viral genome assembly, QC and characterization from amplicon sequencing | HIV, Influenza, Monkeypox virus, RSV-A, RSV-B, SARS-CoV-2, Viral, WNV | Sample-level, Set-level | Some optional features incompatible, Yes | v2.2.0 | [TheiaCoV_Illumina_PE_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaCoV_Illumina_PE_PHB:main?tab=info), [TheiaCoV_Illumina_SE_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaCoV_Illumina_SE_PHB:main?tab=info), [TheiaCoV_ONT_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaCoV_ONT_PHB:main?tab=info), [TheiaCoV_ClearLabs_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaCoV_ClearLabs_PHB:main?tab=info), [TheiaCoV_FASTA_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaCoV_FASTA_PHB:main?tab=info), [TheiaCoV_FASTA_Batch_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaCoV_FASTA_Batch_PHB:main?tab=info) |
| [**TheiaEuk**](../workflows/genomic_characterization/theiaeuk.md) | Mycotic genome assembly, QC and characterization from WGS data | Mycotics | Sample-level | Some optional features incompatible, Yes | v2.0.1 | [TheiaEuk_Illumina_PE_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaEuk_Illumina_PE_PHB:main?tab=info) |
| [**TheiaMeta**](../workflows/genomic_characterization/theiameta.md) | Genome assembly and QC from metagenomic sequencing | Any taxa | Sample-level | Yes | v2.0.0 | [TheiaMeta_Illumina_PE_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaMeta_Illumina_PE_PHB:main?tab=info) |
| [**TheiaMeta Panel**](../workflows/genomic_characterization/theiameta_panel.md) | Genome assembly and QC from metagenomic sequencing using a panel | Viral | Sample-level | Yes |2.X.X | [TheiaMeta_Panel_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaMeta_Panel_PHB:main?tab=info) |
| [**TheiaProk Workflow Series**](../workflows/genomic_characterization/theiaprok.md) | Bacterial genome assembly, QC and characterization from WGS data | Bacteria | Sample-level | Some optional features incompatible, Yes | v2.2.0 | [TheiaProk_Illumina_PE_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaProk_Illumina_PE_PHB:main?tab=info), [TheiaProk_Illumina_SE_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaProk_Illumina_SE_PHB:main?tab=info), [TheiaProk_ONT_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaProk_ONT_PHB:main?tab=info), [TheiaProk_FASTA_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaProk_FASTA_PHB:main?tab=info) |
| [**TheiaValidate**](../workflows/standalone/theiavalidate.md)| This workflow performs basic comparisons between user-designated columns in two separate tables. | Any taxa | | No | v2.0.0 | [TheiaValidate_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaValidate_PHB:main?tab=info) |
| [**Transfer_Column_Content**](../workflows/data_export/transfer_column_content.md)| Transfer contents of a specified Terra data table column for many samples ("entities") to a GCP storage bucket location | Any taxa | Set-level | Yes | v1.3.0 | [Transfer_Column_Content_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Transfer_Column_Content_PHB:main?tab=info) |
Expand Down
1 change: 1 addition & 0 deletions docs/workflows_overview/workflows_kingdom.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ title: Workflows by Kingdom
| [**Terra_2_GISAID**](../workflows/public_data_sharing/terra_2_gisaid.md)| Upload of assembly data to GISAID | SARS-CoV-2, Viral | Set-level | Yes | v1.2.1 | [Terra_2_GISAID_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Terra_2_GISAID_PHB:main?tab=info) |
| [**Terra_2_NCBI**](../workflows/public_data_sharing/terra_2_ncbi.md)| Upload of sequence data to NCBI | Bacteria, Mycotics, Viral | Set-level | No | v2.1.0 | [Terra_2_NCBI_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Terra_2_NCBI_PHB:main?tab=info) |
| [**TheiaCov Workflow Series**](../workflows/genomic_characterization/theiacov.md) | Viral genome assembly, QC and characterization from amplicon sequencing | HIV, Influenza, Monkeypox virus, RSV-A, RSV-B, SARS-CoV-2, Viral, WNV | Sample-level, Set-level | Some optional features incompatible, Yes | v2.2.0 | [TheiaCoV_Illumina_PE_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaCoV_Illumina_PE_PHB:main?tab=info), [TheiaCoV_Illumina_SE_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaCoV_Illumina_SE_PHB:main?tab=info), [TheiaCoV_ONT_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaCoV_ONT_PHB:main?tab=info), [TheiaCoV_ClearLabs_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaCoV_ClearLabs_PHB:main?tab=info), [TheiaCoV_FASTA_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaCoV_FASTA_PHB:main?tab=info), [TheiaCoV_FASTA_Batch_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaCoV_FASTA_Batch_PHB:main?tab=info) |
| [**TheiaMeta Panel**](../workflows/genomic_characterization/theiameta_panel.md) | Genome assembly and QC from metagenomic sequencing using a panel | Viral | Sample-level | Yes | v2.X.X | [TheiaMeta_Panel_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaMeta_Panel_PHB:main?tab=info) |
| [**Usher_PHB**](../workflows/phylogenetic_placement/usher.md)| Use UShER to rapidly and accurately place your samples on any existing phylogenetic tree | Monkeypox virus, SARS-CoV-2, Viral | Sample-level, Set-level | Yes | v2.1.0 | [Usher_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Usher_PHB:main?tab=info) |
| [**VADR_Update**](../workflows/genomic_characterization/vadr_update.md)| Update VADR assignments | HAV, Influenza, Monkeypox virus, RSV-A, RSV-B, SARS-CoV-2, Viral, WNV | Sample-level | Yes | v1.2.1 | [VADR_Update_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/VADR_Update_PHB:main?tab=info) |

Expand Down
1 change: 1 addition & 0 deletions docs/workflows_overview/workflows_type.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ title: Workflows by Type
| [**TheiaCov Workflow Series**](../workflows/genomic_characterization/theiacov.md) | Viral genome assembly, QC and characterization from amplicon sequencing | HIV, Influenza, Monkeypox virus, RSV-A, RSV-B, SARS-CoV-2, Viral, WNV | Sample-level, Set-level | Some optional features incompatible, Yes | v2.2.0 | [TheiaCoV_Illumina_PE_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaCoV_Illumina_PE_PHB:main?tab=info), [TheiaCoV_Illumina_SE_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaCoV_Illumina_SE_PHB:main?tab=info), [TheiaCoV_ONT_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaCoV_ONT_PHB:main?tab=info), [TheiaCoV_ClearLabs_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaCoV_ClearLabs_PHB:main?tab=info), [TheiaCoV_FASTA_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaCoV_FASTA_PHB:main?tab=info), [TheiaCoV_FASTA_Batch_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaCoV_FASTA_Batch_PHB:main?tab=info) |
| [**TheiaEuk**](../workflows/genomic_characterization/theiaeuk.md) | Mycotic genome assembly, QC and characterization from WGS data | Mycotics | Sample-level | Some optional features incompatible, Yes | v2.0.1 | [TheiaEuk_Illumina_PE_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaEuk_Illumina_PE_PHB:main?tab=info) |
| [**TheiaMeta**](../workflows/genomic_characterization/theiameta.md) | Genome assembly and QC from metagenomic sequencing | Any taxa | Sample-level | Yes | v2.0.0 | [TheiaMeta_Illumina_PE_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaMeta_Illumina_PE_PHB:main?tab=info) |
| [**TheiaMeta Panel**](../workflows/genomic_characterization/theiameta_panel.md) | Genome assembly and QC from metagenomic sequencing using a panel | Viral | Sample-level | Yes | v2.X.X | [TheiaMeta_Panel_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaMeta_Panel_PHB:main?tab=info) |
| [**TheiaProk Workflow Series**](../workflows/genomic_characterization/theiaprok.md) | Bacterial genome assembly, QC and characterization from WGS data | Bacteria | Sample-level | Some optional features incompatible, Yes | v2.2.0 | [TheiaProk_Illumina_PE_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaProk_Illumina_PE_PHB:main?tab=info), [TheiaProk_Illumina_SE_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaProk_Illumina_SE_PHB:main?tab=info), [TheiaProk_ONT_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaProk_ONT_PHB:main?tab=info), [TheiaProk_FASTA_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaProk_FASTA_PHB:main?tab=info) |
| [**VADR_Update**](../workflows/genomic_characterization/vadr_update.md)| Update VADR assignments | HAV, Influenza, Monkeypox virus, RSV-A, RSV-B, SARS-CoV-2, Viral, WNV | Sample-level | Yes | v1.2.1 | [VADR_Update_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/VADR_Update_PHB:main?tab=info) |

Expand Down
3 changes: 3 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ nav:
- TheiaCoV Workflow Series: workflows/genomic_characterization/theiacov.md
- TheiaEuk Workflow Series: workflows/genomic_characterization/theiaeuk.md
- TheiaMeta: workflows/genomic_characterization/theiameta.md
- TheiaMeta_Panel: workflows/genomic_characterization/theiameta_panel.md
- TheiaProk Workflow Series: workflows/genomic_characterization/theiaprok.md
- VADR_Update: workflows/genomic_characterization/vadr_update.md
- Phylogenetic Construction:
Expand Down Expand Up @@ -115,6 +116,7 @@ nav:
- Terra_2_GISAID: workflows/public_data_sharing/terra_2_gisaid.md
- Terra_2_NCBI: workflows/public_data_sharing/terra_2_ncbi.md
- TheiaCoV Workflow Series: workflows/genomic_characterization/theiacov.md
- TheiaMeta_Panel: workflows/genomic_characterization/theiameta_panel.md
- Usher_PHB: workflows/phylogenetic_placement/usher.md
- VADR_Update: workflows/genomic_characterization/vadr_update.md
- Workflows Alphabetically:
Expand Down Expand Up @@ -152,6 +154,7 @@ nav:
- TheiaCoV Workflow Series: workflows/genomic_characterization/theiacov.md
- TheiaEuk Workflow Series: workflows/genomic_characterization/theiaeuk.md
- TheiaMeta: workflows/genomic_characterization/theiameta.md
- TheiaMeta_Panel: workflows/genomic_characterization/theiameta_panel.md
- TheiaProk Workflow Series: workflows/genomic_characterization/theiaprok.md
- TheiaValidate: workflows/standalone/theiavalidate.md
- Transfer_Column_Content: workflows/data_export/transfer_column_content.md
Expand Down
20 changes: 16 additions & 4 deletions tasks/assembly/task_metaspades.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -15,23 +15,35 @@ task metaspades_pe {
}
command <<<
metaspades.py --version | head -1 | cut -d ' ' -f 2 | tee VERSION
metaspades.py \
touch WARNING

if metaspades.py \
AndrewLangvt marked this conversation as resolved.
Show resolved Hide resolved
-1 ~{read1_cleaned} \
-2 ~{read2_cleaned} \
~{'-k ' + kmers} \
-m ~{memory} \
-t ~{cpu} \
-o metaspades \
--phred-offset ~{phred_offset} \
~{metaspades_opts}
~{metaspades_opts}; then

mv metaspades/contigs.fasta ~{samplename}_contigs.fasta

mv metaspades/contigs.fasta ~{samplename}_contigs.fasta
if [ ! -s ~{samplename}_contigs.fasta ]; then
tee "Metaspades produced an empty assembly for ~{samplename}" > WARNING
rm -f ~{samplename}_contigs.fasta
fi

else
tee "Metaspades failed to assemble for ~{samplename}" > WARNING
fi

>>>
output {
File assembly_fasta = "~{samplename}_contigs.fasta"
File? assembly_fasta = "~{samplename}_contigs.fasta"
String metaspades_version = read_string("VERSION")
String metaspades_docker = '~{docker}'
String metaspades_warning = read_string("WARNING")
}
runtime {
docker: "~{docker}"
Expand Down
12 changes: 7 additions & 5 deletions tasks/quality_control/basic_statistics/task_fastq_scan.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,14 @@ task fastq_scan_pe {
input {
File read1
File read2
String read1_name = basename(basename(basename(read1, ".gz"), ".fastq"), ".fq")
String read2_name = basename(basename(basename(read2, ".gz"), ".fastq"), ".fq")
Int disk_size = 50

Int disk_size = 100
String docker = "us-docker.pkg.dev/general-theiagen/biocontainers/fastq-scan:1.0.1--h4ac6f70_3"
Int memory = 2
Int cpu = 1
}
String read1_name = basename(basename(basename(read1, ".gz"), ".fastq"), ".fq")
String read2_name = basename(basename(basename(read2, ".gz"), ".fastq"), ".fq")
command <<<
# exit task in case anything fails in one-liners or variables are unset
set -euo pipefail
Expand Down Expand Up @@ -77,12 +78,13 @@ task fastq_scan_pe {
task fastq_scan_se {
input {
File read1
String read1_name = basename(basename(basename(read1, ".gz"), ".fastq"), ".fq")
Int disk_size = 50

Int disk_size = 100
Int memory = 2
Int cpu = 1
String docker = "us-docker.pkg.dev/general-theiagen/biocontainers/fastq-scan:1.0.1--h4ac6f70_3"
}
String read1_name = basename(basename(basename(read1, ".gz"), ".fastq"), ".fq")
command <<<
# exit task in case anything fails in one-liners or variables are unset
set -euo pipefail
Expand Down
16 changes: 11 additions & 5 deletions tasks/quality_control/read_filtering/task_pilon.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -16,20 +16,26 @@ task pilon {
pilon --version | cut -d' ' -f3 | tee VERSION

# run pilon
pilon \
if pilon \
--genome ~{assembly} \
--frags ~{bam} \
--output ~{samplename} \
--outdir pilon \
--changes --vcf
--changes --vcf; then
touch WARNING
AndrewLangvt marked this conversation as resolved.
Show resolved Hide resolved
else
tee "Pilon failed to run for ~{samplename}" > WARNING
exit 1
fi

>>>
output {
File assembly_fasta = "pilon/~{samplename}.fasta"
File changes = "pilon/~{samplename}.changes"
File vcf = "pilon/~{samplename}.vcf"
File? assembly_fasta = "pilon/~{samplename}.fasta"
File? changes = "pilon/~{samplename}.changes"
File? vcf = "pilon/~{samplename}.vcf"
String pilon_version = read_string("VERSION")
String pilon_docker = "~{docker}"
String pilon_warning = read_string("WARNING")
}
runtime {
docker: "~{docker}"
Expand Down
6 changes: 3 additions & 3 deletions tasks/species_typing/betacoronavirus/task_pangolin.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ task pangolin4 {
Float max_ambig = 0.5
String docker
String? analysis_mode
Boolean expanded_lineage=true
Boolean skip_scorpio=false
Boolean skip_designation_cache=false
Boolean expanded_lineage = true
Boolean skip_scorpio = false
Boolean skip_designation_cache = false
String? pangolin_arguments
Int disk_size = 100
Int memory = 8
Expand Down
2 changes: 1 addition & 1 deletion tasks/taxon_id/contamination/task_kraken2.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -168,8 +168,8 @@ task kraken2_standalone {
File kraken2_unclassified_read1 = "~{samplename}.unclassified_1.fastq.gz"
File? kraken2_unclassified_read2 = "~{samplename}.unclassified_2.fastq.gz"
File kraken2_classified_read1 = "~{samplename}.classified_1.fastq.gz"
Float kraken2_percent_human = read_float("PERCENT_HUMAN")
File? kraken2_classified_read2 = "~{samplename}.classified_2.fastq.gz"
Float kraken2_percent_human = read_float("PERCENT_HUMAN")
sage-wright marked this conversation as resolved.
Show resolved Hide resolved
String kraken2_database = kraken2_db
}
runtime {
Expand Down
61 changes: 61 additions & 0 deletions tasks/taxon_id/task_krakentools.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
version 1.0

task extract_kraken_reads {
input {
File kraken2_output
File kraken2_report
File read1
File read2
Int taxon_id

Int cpu = 1
Int disk_size = 100
String docker = "us-docker.pkg.dev/general-theiagen/theiagen/krakentools:d4a2fbe"
Int memory = 4
}
command <<<
gunzip -c ~{kraken2_output} > kraken2_output_unzipped.txt

python3 /KrakenTools/extract_kraken_reads.py \
-k kraken2_output_unzipped.txt \
-s1 ~{read1} \
-s2 ~{read2} \
--taxid ~{taxon_id} \
--report ~{kraken2_report} \
--include-parents \
--include-children \
--fastq-output \
--output ~{taxon_id}_1.fastq \
--output2 ~{taxon_id}_2.fastq

if [ -s ~{taxon_id}_1.fastq ]; then
echo "DEBUG: Taxon ~{taxon_id} reads extracted"
echo "true" > CONTINUE

gzip ~{taxon_id}_1.fastq
gzip ~{taxon_id}_2.fastq
else
echo "DEBUG: No reads were extracted for taxon ~{taxon_id}, removing empty files"
echo "false" > CONTINUE
fi

grep ~{taxon_id} ~{kraken2_report} | awk '{for (i=6; i <= NF; ++i) print $i}' | tr '\n' ' ' | xargs > ORGANISM_NAME

>>>
output {
File? extracted_read1 = "~{taxon_id}_1.fastq.gz"
File? extracted_read2 = "~{taxon_id}_2.fastq.gz"
String organism_name = read_string("ORGANISM_NAME")
String krakentools_docker = docker
Boolean success = read_boolean("CONTINUE")
}
runtime {
cpu: cpu
disk: disk_size + " GB"
disks: "local-disk " + disk_size + " SSD"
docker: docker
memory: "~{memory} GB"
preemptible: 1
maxRetries: 3
}
}
Loading
Loading