-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEW: filter_kraken2_classifications action #226
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #226 +/- ##
==========================================
+ Coverage 95.60% 95.62% +0.01%
==========================================
Files 34 34
Lines 1956 2010 +54
Branches 226 235 +9
==========================================
+ Hits 1870 1922 +52
- Misses 48 49 +1
- Partials 38 39 +1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 3 out of 13 changed files in this pull request and generated 1 comment.
Files not reviewed (10)
- q2_moshpit/kraken2/tests/data/abundance-filter/outputs-only-unclassified/sample1.output.txt: Language not supported
- q2_moshpit/kraken2/tests/data/abundance-filter/outputs-only-unclassified/sample2.output.txt: Language not supported
- q2_moshpit/kraken2/tests/data/abundance-filter/outputs/sample1.output.txt: Language not supported
- q2_moshpit/kraken2/tests/data/abundance-filter/outputs/sample2.output.txt: Language not supported
- q2_moshpit/kraken2/tests/data/abundance-filter/reports-only-unclassified/sample1.report.txt: Language not supported
- q2_moshpit/kraken2/tests/data/abundance-filter/reports-only-unclassified/sample2.report.txt: Language not supported
- q2_moshpit/kraken2/tests/data/abundance-filter/reports-w-unclassified/sample1.report.txt: Language not supported
- q2_moshpit/kraken2/tests/data/abundance-filter/reports-w-unclassified/sample2.report.txt: Language not supported
- q2_moshpit/kraken2/tests/data/abundance-filter/reports/sample1.report.txt: Language not supported
- q2_moshpit/kraken2/tests/data/abundance-filter/reports/sample2.report.txt: Language not supported
Comments suppressed due to low confidence (1)
q2_moshpit/kraken2/classification.py:278
- [nitpick] Consider including the sample_id in the error message for better debugging.
raise ValueError("All Taxonomic bins were filtered by the"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @cherman2, this looks great - thanks! 🏅
Please see the very minor cosmetic comment below.
parameter_descriptions={}, | ||
output_descriptions={}, | ||
name='Filter Kraken2 Classifications by Abundance', | ||
description='...', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just put here the text from the "name" field or did you want to keep the ...
?
input_descriptions={}, | ||
parameter_descriptions={}, | ||
output_descriptions={}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide these missing descriptions 🙏
Hmmm, yeah, that's a good point actually... I guess the other action focuses on metadata-based filtering but perhaps we could merge them indeed... |
Hi Bok Lab!
I recently was taking to a collaborator who's area of expertise is metagenomic contaminations/spurious hits. They suggested that I should be preforming low abundance filtering on my data to filter out spurious hits. They suggested that the kraken reports is the best place to filter, so that bracken isn't estimating taxons that were spuriously assigned.
@colinvwood and I develped a methods to allow for filtering the kraken reports by abundance! This method also filters the output files so that the reports and outputs are not out of sync.
Just as a side note: I used this code on a dataset of mine and found that filtering at 0.0001 (which means the taxon has to have roughly 15-60 hits or it gets discarded) retains my diversity signal but minizes the low abundance feature overlap that was happening in my samples!
Let me know if y'all have any questions!