-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Workflow] Adding the TheiaMeta_Panel_Illumina_PE Workflow #656
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This some really great work @sage-wright & @cimendes. Just a few points for us to discuss, but I think they're pretty minor overall. I think this workflow has potential to become really widely adopted by labs as multi-pathogen panels become more readily used for "bug screening."
read1 = select_first([krakentools.extracted_read1]), | ||
read2 = select_first([krakentools.extracted_read2]) | ||
} | ||
if (fastq_scan_binned.read1_seq > minimum_read_number) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the kraken read extraction retain only mate pairs, or do we have chance for singletons in L & R read datasets? I.e. should this be checking if both read1 and read2 are above the threshold?
This PR closes #553
🗑️ This dev branch should NOT be deleted after merging to main.
🧠 Summary
The Illumina VSP panel is one example of a panel-based sequencing assay. This particular panel contains over 200 target viruses,. This data cannot be analyzed using the traditional TheiaMeta analysis pathway due to the numerous target organisms which makes the assembly first-bin later approach undesirable. TheiaMeta_Panel performs taxonomic binning first with Kraken, then attempts to assemble those bins. If the organism to which the bin belongs is supported in TheiaCoV, we perform additional characterization steps, though the results of those characterizations are not always of high quality. The user must be cautious before reporting on the characterization results.
⚡ Impacted Workflows/Tasks
This PR may lead to different results in pre-existing outputs: No
This PR uses an element that could cause duplicate runs to have different results: No
🛠️ Changes
⚙️ Algorithm
TheiaMeta_Panel takes the following approach:
morgana_magic
which performs characterization depending on the organism.Assembly is fault resistant, meaning that if one shard fails, the workflow continues
This has impacts for TheiaMeta as workflows will no longer fail if assembly fails in either the metaspades or pilon tasks
➡️ Inputs
Many, please see documentation
⬅️ Outputs
Many, please see documentation
🧪 Testing
Tested regular TheiaMeta on 5 HAV samples here
Tested 21 Illumina VSP samples with TheiaMeta Panel here
Suggested Scenarios for Reviewer to Test
🔬 Final Developer Checklist
🎯 Reviewer Checklist