Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TheiaCov] wfs add percentage_mapped_reads #641

Merged
merged 41 commits into from
Nov 4, 2024

Conversation

fraser-combe
Copy link
Contributor

@fraser-combe fraser-combe commented Oct 3, 2024

This PR closes #507

🗑️ This dev branch should be deleted after merging to main.

🧠 Summary

This PR adds outputs for percentage_mapped_reads to various workflows, specifically targeting reads for flu and non-flu organisms, ensuring consistency in outputs.

⚡ Impacted Workflows/Tasks

theiacov_ont.wdl
theiacov_illumina_pe.wdl
theiacov_illumina_se.wdl
theiacov_clearlabs.wdl

This PR may lead to different results in pre-existing outputs: No
This PR uses an element that could cause duplicate runs to have different results: No

🛠️ Changes

Added unified output for percentage_mapped_reads across theiacov_ont, theiacov_illumina_pe, theiacov_illumina_se, and theiacov_clearlabs workflows.
Consolidated flu and non-flu percentage mapped reads using select_first to ensure a single output variable for mapped reads.
Refined logic for flu and non-flu workflows to ensure correct handling of percentage_mapped_reads.

⚙️ Algorithm

  • No major algorithmic changes were introduced, but the logic for flu and non-flu organisms in calculating percentage_mapped_reads was updated to call different tasks.

  • For iVar-based workflows (theiacov_illumina_pe, theiacov_illumina_se), the percentage is parsed from the samtools flagstat file.

  • For non-iVar workflows, the assembled_reads_percent task is used to pass in BAM files and calculate mapped reads.

➡️ Inputs

No new inputs were added.

⬅️ Outputs

The following outputs were updated or added:

percentage_mapped_reads:
For non-flu organisms, calculated using either ivar_consensus.percentage_mapped_reads (for iVar-based workflows) or from stats_n_coverage.percentage_mapped_reads for ONT workflows.

🧪 Testing

Tested both flu and non-flu cases across workflows, ensuring the correct mapping of percentage_mapped_reads.

TheiaCov_ONT (non flu)
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Combe_Sandbox/job_history/a158d7fc-aac7-4c9e-8782-e8d96afe059a
image

TheiaCov_ONT (flu)
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Combe_Sandbox/job_history/fd988d1f-5a75-4b7f-b1e5-ce293a345a13
image

ThieaCov_illumina_pe (flu)
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Combe_Sandbox/job_history/3ca0d049-31b4-4179-b2ce-20753946f40c/ec42f3a3-7b30-4756-aa72-39640adb92a9
image

TheiaCov_illumina_pe (non-flu)
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Combe_Sandbox/job_history/cd90a924-8fe9-48c3-814e-d4fce8453632
image

TheiaCov_illumina_se
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Combe_Sandbox/job_history/dcfb1141-a483-4eb3-b280-7377a1df4a4e

Suggested Scenarios for Reviewer to Test

Run workflows with flu and non-flu samples to ensure the correct assignment of percentage_mapped_reads.

Validate the unified logic correctly handles percentage_mapped_reads for both flu and non-flu workflows, particularly for iVar-based and non-iVar-based workflows.

🔬 Final Developer Checklist

  • The workflow/task has been tested and results, including file contents, are as anticipated
  • The CI/CD has been adjusted and tests are passing (Theiagen developers)
  • Code changes follow the style guide
  • Documentation and/or workflow diagrams have been updated if applicable (Theiagen developers only)

🎯 Reviewer Checklist

  • All changed results have been confirmed
  • You have tested the PR appropriately (see the testing guide for more information)
  • All code adheres to the style guide
  • MD5 sums have been updated
  • The PR author has addressed all comments
  • The documentation has been updated

@fraser-combe fraser-combe requested a review from a team as a code owner October 3, 2024 17:16
@fraser-combe fraser-combe force-pushed the fc-theiacov-mapped-reads-dev branch from 0dc9640 to 33ab026 Compare October 4, 2024 12:47
@sage-wright sage-wright marked this pull request as draft October 16, 2024 17:53
@fraser-combe fraser-combe marked this pull request as ready for review October 21, 2024 21:30
@sage-wright
Copy link
Member

sage-wright commented Oct 24, 2024

Also I was wrong and you do need to provide a default value for the percentage_mapped_reads in case the read_screen fails (see here).

Could you coerce all of these non-optional outputs into Strings and provide "" as the default value at the end of the select_first for the workflows? That way, when the read_screen fails, there isn't a 0 populated to the column since that would be an inaccurate statement.

  • in ClearLabs, remove the ? quantifier off the float
  • in SE, change the Float? to a String?
  • in PE, change the Float to a String and add a , "" to the end of the select_first block
  • in ONT, change the Float to a String and add a , "" to the end of the select_first block
  • in ivarconsensus, change change it to be a String and add , "" to the end of the select_first (like we do for meanbaseq and meanmapq, assembly_mean_coverage, etc.

Thanks!

@sage-wright
Copy link
Member

just as a side note, do you have any examples where it's not 100%?

@fraser-combe
Copy link
Contributor Author

just as a side note, do you have any examples where it's not 100%?

No Id have to try and find data, unless you have any potential data that may provide this kind of result

@fraser-combe
Copy link
Contributor Author

Those updates should be passing tests now, let me know if you need me to find more testing data after your tests have completed

@kapsakcj
Copy link
Contributor

kapsakcj commented Nov 4, 2024

FYI I'm sharing this with 1 lab & user, so please keep the dev branch available after merging. I'll pass along any feedback I hear from them

Copy link
Member

@sage-wright sage-wright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌟

@sage-wright sage-wright merged commit 3dda02c into main Nov 4, 2024
17 checks passed
@sage-wright sage-wright deleted the fc-theiacov-mapped-reads-dev branch November 7, 2024 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add percentage_mapped_reads output to TheiaCoV wfs
3 participants