Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi, big fan of your work!
In this pull request, I rewrote the --average_qual method to accurately calculate the average quality of a read.
I was running .fastq files of DNA sequenced on our Nanopore through fastp (Nanopore says to use average read q-scores), and way more reads were passing the quality filter than I was used to. I looked into it, and fastp was averaging the q-scores, which are log values, and not taking the q-score out of log scale to p values before averaging. This results in way more reads passing the filter than there should be.
As an example:
To implement this, in the filter.cpp file, I changed the totalQual variable to a float. I then had the totalQual variable increment by the probability of error instead of the q-score. Then, in the 'else if' statement, I divided the final totalQual value of the read by the rlen, and calculated the resulting q-score to compare to the users input.
I complied the code and tested it on a simulated dataset, and the results were identical to the other nanopore quailty filtering packages I have on my machine.
Thanks again for fastp!!