-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output results recommendation for DADA2 #40
Comments
Putting in a small number for the primers (even 1) should work for this issue. I designed this with the idea that people would not be pre-trimming their reads. Likewise, the issue with reads needing to be all one length is from the same cause. The output of the program should have a list of potential trimming locations and expected error values to use for forward and reverse reads. The first ones listed are the ones considered optimal by the program based upon the score. The score starts off with the percentage of reads retained (since retaining reads is generally a good thing), and then applying an exponentially-increasing penalty for expected error allowances (since these are generally a bad thing). In this case, you are exactly right on the suggestion, and that looks like a very good score with little error. |
Thanks for the helpful discussion here and in #25 & #27 ! I'm also trying to run figaro within a DADA2 analysis. I have 192 fastq files (including the R1 and R2 for 96 samples). However, the json file and command line output has 193 entries. This I find to be very confusing because it makes me think that each row of output is for each of the 192 files (which according to the decrease in readRententionPercent seems like a misunderstanding on my end?). Though, now with the discussion here, it does appear that the row of outputs is an overall score for the data with decreasing the read retention. For example:
This combined with the slide in #33 implies that the top output should be used for the |
The output is sorted on scores. As you'll notice, the highest score is the first set listed followed by other sets in descending order. If you were to graph them out, you'd often see a "peaky" pattern over your optimal trimming sites. I include all the possible combinations because I am a big believer in rigorous QC measures, and tracking alterations in the optimal trimming parameters can provide a method to potentially detect changes in sequencing quality over time. Most users will not need to use anything but the first value unless they are attempting to look at read quality trends between runs or chart out how the optimization process actually worked. |
Thanks for this tool. I have downloaded and tested this tool. Here are a few questions or comments:
It seems the tool cannot support variable-length reads. I need to trim all reads to a fix length to make it work.
If primers already removed, the code cannot support forward parameter 0 and reverse parameter 0. So I just put a small number such as -f 5 -r r to make it work. Is it right?
I don’t understand outputs.
For example:
python figaro/figaro.py -i /Volumes/Issue-33/RAW1/Trim/ -o ./output -a 300 -f 5 -r 5
Forward read length: 250
Reverse read length: 251
{"trimPosition": [134, 196], "maxExpectedError": [1, 2], "readRetentionPercent": 92.92, "score": 91.91915753123422}
Does it suggest forward trim length 134 and reverse trim length 196 in DADA2 or QIIME command because it has the best read retention percent?
Thanks,
Yunhu Wan
The text was updated successfully, but these errors were encountered: