Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Workflow] Flye_denovo to replace DragonFlye #692

Draft
wants to merge 76 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
38b1caa
placeholder
sage-wright Oct 4, 2024
cdf0913
make flye task
sage-wright Oct 9, 2024
46629d8
rename fasta
sage-wright Oct 10, 2024
b45b41d
make workflow a workflow
sage-wright Oct 11, 2024
62b48e2
update output files for flye
fraser-combe Nov 19, 2024
787121c
v1 bandage plot flye assembly visual
fraser-combe Nov 19, 2024
e9f29b4
medaka initial commit
fraser-combe Nov 20, 2024
df7f79f
initial commit racon framework
fraser-combe Nov 20, 2024
5a3fcd6
framework for tasks dnaapler porechop and racon
fraser-combe Nov 21, 2024
d6fe1bc
update docker images
fraser-combe Nov 22, 2024
89f7d3b
update outdir medaka
fraser-combe Nov 22, 2024
35982db
update medaka and dnaapler
fraser-combe Nov 25, 2024
d738ab1
add polypolish and separate bwa mem -a tasks
sage-wright Nov 25, 2024
85a68dc
remove comment cruft
sage-wright Nov 25, 2024
c7db697
initial commit bash contig filtering
fraser-combe Nov 25, 2024
0371972
initial commit bash contig filtering
fraser-combe Nov 25, 2024
facd02f
update medaka docker image
fraser-combe Nov 27, 2024
d26fc95
refactor assembly tasks and workflows for clarity and consistency
fraser-combe Nov 27, 2024
d45481e
add dnaapler to wf
fraser-combe Nov 27, 2024
eef27a9
update racon
fraser-combe Nov 27, 2024
0fee522
add polisher options to flye_consensus wf
fraser-combe Nov 27, 2024
17289d1
update workflow and tasks and altered racon with minimap in docker to…
fraser-combe Nov 29, 2024
646b116
update dnapler and tody flye consensus wf
fraser-combe Dec 2, 2024
8e56dc0
update filter contigs task initial attempt aowrking
fraser-combe Dec 3, 2024
f37bf3b
updated flye consensus wf and filter contigs
fraser-combe Dec 3, 2024
2865a2d
update docker images porechop and dnaapler
fraser-combe Dec 9, 2024
f9cbde4
optional trim and polish tasks, update porechop and dnaapler mode
fraser-combe Dec 9, 2024
4b97b30
incporporate hybrid assemblies with polypolish
fraser-combe Dec 9, 2024
1993791
update meta wf description
fraser-combe Dec 9, 2024
fa10d0c
start updating docs, remove run polypolish logic and update po0lypoli…
fraser-combe Dec 9, 2024
63c7e2e
update racon with polishing round logic with updated minimap2 in dock…
fraser-combe Dec 9, 2024
c92358b
additional comments for filtering contigs logic
fraser-combe Dec 10, 2024
420ed91
add assembly stats output
fraser-combe Dec 10, 2024
74d62a9
update filter task metrics output
fraser-combe Dec 10, 2024
ac3667f
per dev meeting update wf name, remove metrics output, re arrange fol…
fraser-combe Dec 10, 2024
c6064cb
update wf to pass miniwdl checks
fraser-combe Dec 10, 2024
6878957
update medaka top use auto model selection or user provide overide
fraser-combe Dec 10, 2024
1d2de80
all local tests successful for each path now to add to theiaprok
fraser-combe Dec 11, 2024
afeceb1
update flye call
fraser-combe Dec 11, 2024
cc5f966
add all tasks inputs to subworkflow for terra, increase CPU allocatio…
fraser-combe Dec 12, 2024
475e543
rename some input specific variables for terra users to know which ta…
fraser-combe Dec 12, 2024
e86e193
debugging racon terra
fraser-combe Dec 12, 2024
97df2ec
debugging potential memory usage increase for racon
fraser-combe Dec 12, 2024
6309913
more debugging for racon failing terra only
fraser-combe Dec 12, 2024
fac2a23
trying new dockerfile with updated cmake command for cpu compatibility
fraser-combe Dec 12, 2024
bc8fafa
trying test dockerfile with updated cpu installs
fraser-combe Dec 13, 2024
a379f6b
trying new racon build with flags for cpu optimization on terra
fraser-combe Dec 13, 2024
3938c18
Increase maxRetries for dnaapler and contig_filter tasks; add bandage…
fraser-combe Dec 13, 2024
01ebb7b
docs update theiaprok
fraser-combe Dec 13, 2024
84df24f
Refactor workflows to standardize assembly output variable names; inc…
fraser-combe Dec 13, 2024
2ec4e8f
add versions output to theiaprok
fraser-combe Dec 16, 2024
9ffc554
update theiaprok wf
fraser-combe Dec 16, 2024
b088c74
versions output
fraser-combe Dec 16, 2024
abad81b
medaka model output
fraser-combe Dec 16, 2024
beaec7a
update medaka model docs information for users
fraser-combe Dec 16, 2024
fcd39b9
update medaka model selection order
fraser-combe Dec 16, 2024
47155c0
Merge branch 'main' into smw-flye-dev
fraser-combe Dec 17, 2024
78d3a5f
update md sums for theiaprok after merge main
fraser-combe Dec 17, 2024
123f7ee
remove versioning task from fle sub wf
fraser-combe Dec 20, 2024
730d146
Refactor WDL tasks for improved parameter handling and consistency; u…
fraser-combe Jan 16, 2025
65040e7
Update docs and filter_contgs updated to biopython, enhanced output
fraser-combe Jan 16, 2025
97d6584
Enhance documentation and workflows: update filtering task. Adjust d…
fraser-combe Jan 16, 2025
8ce6b13
update fasta name
fraser-combe Jan 16, 2025
0750790
update bandage plot output for theiaprok_ont wdl
fraser-combe Jan 16, 2025
797882a
update bandage plot output for theiaprok_ont wdl
fraser-combe Jan 16, 2025
8439292
update docs and minor updates
fraser-combe Jan 17, 2025
1403c6a
Merge branch 'main' into smw-flye-dev
fraser-combe Jan 17, 2025
ea4ecd0
correct merge
fraser-combe Jan 17, 2025
330fb60
update md5sums
fraser-combe Jan 17, 2025
4cb2f9b
update input defaults and medaka model resolving and defauilt model s…
fraser-combe Jan 19, 2025
de4a766
update docker image for filter contigs biopython
fraser-combe Jan 21, 2025
b7b97cb
remove default values wf level
fraser-combe Jan 23, 2025
bf061ce
remove default values wf level
fraser-combe Jan 23, 2025
8a0daa3
add back in bandage plot options
fraser-combe Jan 23, 2025
0909205
update
fraser-combe Jan 29, 2025
5be645c
update docker
fraser-combe Jan 29, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
206 changes: 181 additions & 25 deletions docs/workflows/genomic_characterization/theiaprok.md

Large diffs are not rendered by default.

48 changes: 47 additions & 1 deletion tasks/alignment/task_bwa.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ task bwa {
String docker = "us-docker.pkg.dev/general-theiagen/staphb/ivar:1.3.1-titan"
}
command <<<
set -euo pipefail
# date and version control
date | tee DATE
echo "BWA $(bwa 2>&1 | grep Version )" | tee BWA_VERSION
Expand Down Expand Up @@ -153,4 +154,49 @@ task bwa {
preemptible: 0
maxRetries: 3
}
}
}

task bwa_all {
input {
File draft_assembly_fasta
File read1
File read2
String samplename

Int cpu = 6
Int disk_size = 100
String docker = "us-docker.pkg.dev/general-theiagen/staphb/bwa:0.7.18"
Int memory = 16
}
command <<<
set -euo pipefail

bwa &> BWA_HELP
grep "Version" BWA_HELP | cut -d" " -f2 > BWA_VERSION

if [[ ! -f "~{draft_assembly_fasta}.bwt" ]]; then
echo "Indexing reference genome: ~{draft_assembly_fasta}"
bwa index ~{draft_assembly_fasta}
else
echo "Reference genome is already indexed: ~{draft_assembly_fasta}"
fi

bwa mem -t ~{cpu} -a ~{draft_assembly_fasta} ~{read1} > ~{samplename}_R1.sam
bwa mem -t ~{cpu} -a ~{draft_assembly_fasta} ~{read2} > ~{samplename}_R2.sam

>>>
output {
File read1_sam = "~{samplename}_R1.sam"
File read2_sam = "~{samplename}_R2.sam"
String bwa_version = read_string("BWA_VERSION")
}
runtime {
docker: "~{docker}"
memory: "~{memory} GB"
cpu: "~{cpu}"
disks: "local-disk " + disk_size + " SSD"
disk: disk_size + " GB"
maxRetries: 3
preemptible: 0
}
}
30 changes: 30 additions & 0 deletions tasks/assembly/task_bandage_plot.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
version 1.0

task bandage_plot {
input {
File assembly_graph_gfa
String samplename
Int cpu = 1
Int memory = 4
Int disk_size = 10
String docker = "us-docker.pkg.dev/general-theiagen/staphb/bandage:0.8.1"
}
command <<<
set -euo pipefail
Bandage --version | tee VERSION
Bandage image ~{assembly_graph_gfa} ~{samplename}_bandage_plot.png
>>>
output {
File plot = "~{samplename}_bandage_plot.png"
String bandage_version = read_string("VERSION")
}
runtime {
docker: "~{docker}"
cpu: cpu
memory: "~{memory} GB"
disks: "local-disk " + disk_size + " HDD"
disk: disk_size + " GB"
maxRetries: 1
preemptible: 0
}
}
63 changes: 63 additions & 0 deletions tasks/assembly/task_dnaapler.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
version 1.0

task dnaapler {
input {
File input_fasta
String samplename
String dnaapler_mode = "all" # The mode of reorientation to execute (default: 'all')
Int cpu = 4
Int disk_size = 100
Int memory = 16
String docker = "us-docker.pkg.dev/general-theiagen/staphb/dnaapler:1.0.1"
}
command <<<
set -euo pipefail

# dnaapler version
dnaapler --version | tee VERSION

# Create a subdirectory for dnaapler outputs
output_dir="dnaapler_output"
mkdir -p "$output_dir"
echo "Output directory created: $output_dir"

# Run dnaapler subcommand
echo "Running dnaapler..."
dnaapler ~{dnaapler_mode} \
-i ~{input_fasta} \
-o "$output_dir" \
-p ~{samplename} \
-t ~{cpu} \
-f || {
echo "ERROR: dnaapler command failed. Check logs for details." >&2
exit 1
}

echo "dnaapler command completed successfully."

# Check if output FASTA file exists
if [[ ! -f "$output_dir"/~{samplename}_reoriented.fasta ]]; then
echo "ERROR: Expected output file not found: $output_dir/~{samplename}_reoriented.fasta" >&2
exit 1
fi

# Move the final reoriented FASTA file to the task's working directory
echo "Moving output FASTA file to working directory..."
mv "$output_dir"/~{samplename}_reoriented.fasta .

echo "dnaapler task completed successfully for sample: ~{samplename}"
>>>
output {
File reoriented_fasta = "~{samplename}_reoriented.fasta"
String dnaapler_version = read_string("VERSION")
}
runtime {
docker: "~{docker}"
cpu: cpu
memory: "~{memory} GB"
disks: "local-disk " + disk_size + " SSD"
disk: disk_size + " GB"
maxRetries: 3
preemptible: 0
}
}
96 changes: 0 additions & 96 deletions tasks/assembly/task_dragonflye.wdl

This file was deleted.

68 changes: 68 additions & 0 deletions tasks/assembly/task_flye.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
version 1.0

task flye {
input {
File read1
String samplename
String read_type = "--nano-hq" # Default read type
Int? genome_length # requires `asm_coverage`
Int? asm_coverage # reduced coverage for initial disjointig assembly

Int flye_polishing_iterations = 1
Int? minimum_overlap

Float? read_error_rate
Boolean uneven_coverage_mode = false
Boolean keep_haplotypes = false
Boolean no_alt_contigs = false
Boolean scaffold = false

String? additional_parameters # Any extra Flye-specific parameters

Int cpu = 4
Int disk_size = 100
String docker = "us-docker.pkg.dev/general-theiagen/staphb/flye:2.9.4"
Int memory = 32
}
command <<<
set -euo pipefail
flye --version | tee VERSION

# genome size parameter requires asm_coverage
flye \
"~{read_type}" "~{read1}" \
--iterations ~{flye_polishing_iterations} \
~{"--min-overlap" + minimum_overlap} \
~{if defined(asm_coverage) then "--genome-size " + genome_length else ""} \
~{"--asm-coverage " + asm_coverage} \
~{"--read-error " + read_error_rate} \
~{true="--meta" false="" uneven_coverage_mode} \
~{true="--keep-haplotypes" false="" keep_haplotypes} \
~{true="--no-alt-contigs" false="" no_alt_contigs} \
~{true="--scaffold" false="" scaffold} \
~{"--extra-params " + additional_parameters } \
--threads ~{cpu} \
--out-dir .

mv assembly.fasta ~{samplename}.assembly.fasta
mv assembly_info.txt ~{samplename}.assembly_info.txt
mv assembly_graph.gfa ~{samplename}.assembly_graph.gfa

>>>
output {
File assembly_fasta = "~{samplename}.assembly.fasta"
File assembly_graph_gfa = "~{samplename}.assembly_graph.gfa"
File assembly_info = "~{samplename}.assembly_info.txt"
String flye_version = read_string("VERSION")
String flye_docker = "~{docker}"
}
runtime {
docker: "~{docker}"
cpu: cpu
memory: "~{memory} GB"
disks: "local-disk " + disk_size + " SSD"
disk: disk_size + " GB"
maxRetries: 3
preemptible: 0
}
}
55 changes: 55 additions & 0 deletions tasks/polishing/task_medaka.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
version 1.0

task medaka {
input {
File unpolished_fasta
String samplename
File read1
Boolean auto_model = true # Enable automatic Medaka model selection
String medaka_model = "r1041_e82_400bps_sup_v5.0.0" # Default model if auto_model is disabled or no model is resolved
Int cpu = 4
Int memory = 16
Int disk_size = 100
String docker = "us-docker.pkg.dev/general-theiagen/staphb/medaka:2.0.1"
}
command <<<
set -euo pipefail

medaka --version | tee MEDAKA_VERSION

# Attempt automatic model resolution if enabled
if [[ "~{auto_model}" == "true" ]]; then
echo "Attempting automatic model selection..."
medaka tools resolve_model --auto_model consensus ~{read1} > auto_model.txt || true
resolved_model=$(cat auto_model.txt || echo "")
medaka_model="${resolved_model:-~{medaka_model}}"
fi

echo "Using Medaka model for polishing: $medaka_model"
echo "$medaka_model" > MEDAKA_MODEL

# Perform Medaka polishing
medaka_consensus \
-i "~{read1}" \
-d "~{unpolished_fasta}" \
-o . \
-m "$medaka_model" \
-t "~{cpu}"

mv consensus.fasta ~{samplename}.polished.fasta
>>>
output {
File medaka_fasta = "~{samplename}.polished.fasta"
String medaka_version = read_string("MEDAKA_VERSION")
String resolved_medaka_model = read_string("MEDAKA_MODEL")
}
runtime {
docker: "~{docker}"
cpu: cpu
memory: "~{memory} GB"
disks: "local-disk " + disk_size + " SSD"
disk: disk_size + " GB"
maxRetries: 3
preemptible: 0
}
}
Loading