-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filter-sample-max-read-depth options #36
Comments
Hi @ribeirots, thanks for the suggestion - let's see if I am getting this correctly:
I am not sure that I understand exactly what you mean by: "exclude reads with depth in the top 2% highest coverage" - top 2% of what exactly? As far as I can make sense of this, you would like to have a threshold per sample, across all it's positions. This would require a complete reading of the data first to figure out the max read depth per sample, then set a 98% threshold based on that, and then reading the data again to apply it and compute the actual thing we want based on that filter. That double reading seems like a rather large downside, but I can see this generally being a useful way to specify filters.
That is definitely more efficient, as it would not require two passes through the data. Well, yes, once initially to calculate the percentile for each sample. But it would not require two passes every time then if you need to run grenedalf again, for instance to experiment with the effect of other settings. So that definitely sounds interesting, and might also be applicable to the other per-sample filter settings. I can see a bit of a potential for accidental issues with this as the filtering would not be consistent between samples though. Maybe @jeff_spence has some thoughts on this? Please let me know if this fits with what you had in mind. Unfortunately, I am a bit short on time for working on grenedalf at the moment, but we can keep this issue open for when I get back to working on it. So, not sure that this will be available any time soon. In the meantime, to achieve the same, you can pre-filter your data manually. For instance, you can use the Hope that helps, cheers |
Thank you, Lucas. Yes, I think you understood my suggestion correctly. It is a feature that is implemented in PoPoolation, too - that is where I first saw this kind of filter. I am doing the manual pre-filter to achieve this filter for now. |
Hi @ribeirots, all right, just checked how PoPoolation2 is doing this - they indeed read the whole input data just to get those thresholds to get the value for a percentage of Also, PoPoolation2 has a percentage option for the Let's see. Right now I am a bit short on time, but I'll hopefully be able to implement this eventually. It seems useful! Thanks again |
Would it be possible to have the max read depth be a percentile? For example, set it to 0.98 to exclude reads with depth in the top 2% highest coverage. Perhaps, if the value passed to the parameter is between 0 and 1 it could take it as percentile by default. Alternatively, could I pass different integers for each sample, similar to how I can pass different pool sizes? This way I could calculate the percentile myself before running the program.
The text was updated successfully, but these errors were encountered: