Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix average q-score calculation #495

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

DrinnanSante
Copy link

Hi, big fan of your work!

In this pull request, I rewrote the --average_qual method to accurately calculate the average quality of a read.

I was running .fastq files of DNA sequenced on our Nanopore through fastp (Nanopore says to use average read q-scores), and way more reads were passing the quality filter than I was used to. I looked into it, and fastp was averaging the q-scores, which are log values, and not taking the q-score out of log scale to p values before averaging. This results in way more reads passing the filter than there should be.

As an example:

     A base with a q-score of 10 and a second base with a q-score of 20, if  
     averaged, would have an average q-score of 15.

     However, if you average the probability of errors: 

     A q-score of 10 is a probability of error of 0.1
     A q-score of 20 is a probability of error of 0.01
     Averaging the probability of error:   0.1 + 0.01 = 0.11  | 0.11 / 2 = 0.055

     The q-score for a probability of error of 0.055 is ~12.5. 
     This number accurately reflects the average amount of error present in the read.

To implement this, in the filter.cpp file, I changed the totalQual variable to a float. I then had the totalQual variable increment by the probability of error instead of the q-score. Then, in the 'else if' statement, I divided the final totalQual value of the read by the rlen, and calculated the resulting q-score to compare to the users input.

I complied the code and tested it on a simulated dataset, and the results were identical to the other nanopore quailty filtering packages I have on my machine.

Thanks again for fastp!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant