fix average q-score calculation #495

DrinnanSante · 2023-05-31T18:04:41Z

Hi, big fan of your work!

In this pull request, I rewrote the --average_qual method to accurately calculate the average quality of a read.

I was running .fastq files of DNA sequenced on our Nanopore through fastp (Nanopore says to use average read q-scores), and way more reads were passing the quality filter than I was used to. I looked into it, and fastp was averaging the q-scores, which are log values, and not taking the q-score out of log scale to p values before averaging. This results in way more reads passing the filter than there should be.

As an example:

     A base with a q-score of 10 and a second base with a q-score of 20, if  
     averaged, would have an average q-score of 15.

     However, if you average the probability of errors: 

     A q-score of 10 is a probability of error of 0.1
     A q-score of 20 is a probability of error of 0.01
     Averaging the probability of error:   0.1 + 0.01 = 0.11  | 0.11 / 2 = 0.055

     The q-score for a probability of error of 0.055 is ~12.5. 
     This number accurately reflects the average amount of error present in the read.

To implement this, in the filter.cpp file, I changed the totalQual variable to a float. I then had the totalQual variable increment by the probability of error instead of the q-score. Then, in the 'else if' statement, I divided the final totalQual value of the read by the rlen, and calculated the resulting q-score to compare to the users input.

I complied the code and tested it on a simulated dataset, and the results were identical to the other nanopore quailty filtering packages I have on my machine.

Thanks again for fastp!!

fix average q-score calculation

25cd40c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix average q-score calculation #495

fix average q-score calculation #495

DrinnanSante commented May 31, 2023

fix average q-score calculation #495

Are you sure you want to change the base?

fix average q-score calculation #495

Conversation

DrinnanSante commented May 31, 2023