Output results recommendation for DADA2 #40

mentorwan · 2021-06-25T14:52:05Z

Thanks for this tool. I have downloaded and tested this tool. Here are a few questions or comments:

It seems the tool cannot support variable-length reads. I need to trim all reads to a fix length to make it work.
If primers already removed, the code cannot support forward parameter 0 and reverse parameter 0. So I just put a small number such as -f 5 -r r to make it work. Is it right?
I don’t understand outputs.
For example:

python figaro/figaro.py -i /Volumes/Issue-33/RAW1/Trim/ -o ./output -a 300 -f 5 -r 5
Forward read length: 250
Reverse read length: 251
{"trimPosition": [134, 196], "maxExpectedError": [1, 2], "readRetentionPercent": 92.92, "score": 91.91915753123422}

Does it suggest forward trim length 134 and reverse trim length 196 in DADA2 or QIIME command because it has the best read retention percent?

Thanks,
Yunhu Wan

michael-weinstein · 2021-06-25T21:08:19Z

Putting in a small number for the primers (even 1) should work for this issue. I designed this with the idea that people would not be pre-trimming their reads. Likewise, the issue with reads needing to be all one length is from the same cause.

The output of the program should have a list of potential trimming locations and expected error values to use for forward and reverse reads. The first ones listed are the ones considered optimal by the program based upon the score. The score starts off with the percentage of reads retained (since retaining reads is generally a good thing), and then applying an exponentially-increasing penalty for expected error allowances (since these are generally a bad thing). In this case, you are exactly right on the suggestion, and that looks like a very good score with little error.

marschmi · 2021-11-16T17:24:41Z

Thanks for the helpful discussion here and in #25 & #27 !

I'm also trying to run figaro within a DADA2 analysis. I have 192 fastq files (including the R1 and R2 for 96 samples). However, the json file and command line output has 193 entries. This I find to be very confusing because it makes me think that each row of output is for each of the 192 files (which according to the decrease in readRententionPercent seems like a misunderstanding on my end?).

Though, now with the discussion here, it does appear that the row of outputs is an overall score for the data with decreasing the read retention. For example:

{"trimPosition": [163, 147], "maxExpectedError": [1, 1], "readRetentionPercent": 93.93, "score": 93.93114158846818}
{"trimPosition": [160, 150], "maxExpectedError": [1, 1], "readRetentionPercent": 93.92, "score": 93.92239210796859}
{"trimPosition": [164, 146], "maxExpectedError": [1, 1], "readRetentionPercent": 93.92, "score": 93.91911105278125}
{"trimPosition": [162, 148], "maxExpectedError": [1, 1], "readRetentionPercent": 93.91, "score": 93.91145525734409}
{"trimPosition": [165, 145], "maxExpectedError": [1, 1], "readRetentionPercent": 93.91, "score": 93.91036157228164}
{"trimPosition": [166, 144], "maxExpectedError": [1, 1], "readRetentionPercent": 93.9, "score": 93.89833103659471}

This combined with the slide in #33 implies that the top output should be used for the dada2::filterAndTrim(), especially as it will maintain most of the reads. Am I understanding this correctly?

michael-weinstein · 2021-11-23T06:13:12Z

The output is sorted on scores. As you'll notice, the highest score is the first set listed followed by other sets in descending order. If you were to graph them out, you'd often see a "peaky" pattern over your optimal trimming sites. I include all the possible combinations because I am a big believer in rigorous QC measures, and tracking alterations in the optimal trimming parameters can provide a method to potentially detect changes in sequencing quality over time. Most users will not need to use anything but the first value unless they are attempting to look at read quality trends between runs or chart out how the optimization process actually worked.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output results recommendation for DADA2 #40

Output results recommendation for DADA2 #40

mentorwan commented Jun 25, 2021

michael-weinstein commented Jun 25, 2021

marschmi commented Nov 16, 2021

michael-weinstein commented Nov 23, 2021

Output results recommendation for DADA2 #40

Output results recommendation for DADA2 #40

Comments

mentorwan commented Jun 25, 2021

michael-weinstein commented Jun 25, 2021

marschmi commented Nov 16, 2021

michael-weinstein commented Nov 23, 2021