Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors unsing figaro with v3-v4 sequencing file MiSeq #37

Open
vehamel opened this issue May 13, 2021 · 15 comments
Open

Errors unsing figaro with v3-v4 sequencing file MiSeq #37

vehamel opened this issue May 13, 2021 · 15 comments

Comments

@vehamel
Copy link

vehamel commented May 13, 2021

Hi!

It is the first time I am using figaro. I am not an usual user of Python, so that's why I am asking help. I don't know what to do with this output and how to interprete it. I was using figaro to help me choose how to Trim my sequences because I find the quality poor.

Thanks a lot for your help!

Here it is what it run :

Forward read files appear to be of different lengths or of varied lengths. {(300, 0.7550505050505051), (299, 0.945050505050505), (300, 0.6905050505050505), (299, 0.8581818181818182), (300, 0.8383838383838383), (299, 0.9797979797979798), (299, 1.3761616161616161), (299, 0.854949494949495), (299, 0.9090909090909091), (299, 1.9526262626262625), (299, 1.0460606060606061), (299, 0.9923232323232323), (300, 0.3405050505050505), (299, 1.9267676767676767), (299, 0.7550505050505051), (299, 1.0233333333333334), (300, 1.5252525252525253), (299, 0.831919191919192), (300, 0.8145454545454546), (299, 0.7963636363636364), (299, 0.8504040404040404), (300, 0.6540404040404041), (300, 0.7146464646464646), (299, 0.9465656565656566), (300, 0.797979797979798), (299, 1.0925252525252525), (300, 0.6440404040404041), (300, 1.4343434343434343), (300, 0.7337373737373738), (299, 0.8819191919191919), (299, 0.753939393939394), (299, 0.9716161616161616), (299, 1.2044444444444444), (299, 1.0953535353535353), (299, 0.8723232323232324), (299, 0.8145454545454546), (299, 0.8686868686868686), (300, 0.6944444444444444), (299, 0.9166666666666666), (299, 1.8226262626262626), (300, 0.7167676767676767), (299, 0.9837373737373737), (299, 1.1268686868686868), (299, 1.0920202020202021), (300, 0.7135353535353536), (299, 0.7975757575757575), (299, 1.7006060606060607)}
Reverse read files appear to be of different lengths or of varied lengths. {(300, 0.6524242424242425), (300, 0.7228282828282828), (300, 0.34454545454545454), (300, 0.2771717171717172), (300, 0.5175757575757576), (300, 0.805959595959596), (300, 0.39555555555555555), (300, 0.5716161616161616), (300, 0.5268686868686868), (300, 0.19989898989898988), (300, 0.2832323232323232), (300, 0.5268686868686869), (300, 0.34383838383838383), (300, 0.21575757575757576), (300, 0.5425252525252525), (300, 0.8117171717171717), (300, 0.7348484848484849), (300, 0.4302020202020202), (300, 0.4011111111111111), (300, 0.49737373737373736), (299, 0.8988888888888888), (300, 0.4908080808080808), (300, 0.7632323232323233), (300, 0.9586868686868687), (300, 0.5066666666666667), (300, 0.6475757575757576), (300, 0.16353535353535353), (300, 0.45202020202020204), (300, 0.6666666666666667), (300, 0.612020202020202), (300, 0.3106060606060606), (300, 0.9995959595959596), (300, 0.5732323232323232), (300, 0.7272727272727273), (300, 0.6565656565656566), (300, 0.553030303030303), (300, 0.4670707070707071), (300, 0.38383838383838387), (300, 0.8771717171717172), (300, 0.547070707070707), (300, 0.8484848484848485), (299, 0.9389898989898989), (300, 0.5276767676767676), (300, 0.7070707070707071), (300, 0.7485858585858586)}
Forward reads appear to not be of consistent length. {(300, 0.7550505050505051), (299, 0.945050505050505), (300, 0.6905050505050505), (299, 0.8581818181818182), (300, 0.8383838383838383), (299, 0.9797979797979798), (299, 1.3761616161616161), (299, 0.854949494949495), (299, 0.9090909090909091), (299, 1.9526262626262625), (299, 1.0460606060606061), (299, 0.9923232323232323), (300, 0.3405050505050505), (299, 1.9267676767676767), (299, 0.7550505050505051), (299, 1.0233333333333334), (300, 1.5252525252525253), (299, 0.831919191919192), (300, 0.8145454545454546), (299, 0.7963636363636364), (299, 0.8504040404040404), (300, 0.6540404040404041), (300, 0.7146464646464646), (299, 0.9465656565656566), (300, 0.797979797979798), (299, 1.0925252525252525), (300, 0.6440404040404041), (300, 1.4343434343434343), (300, 0.7337373737373738), (299, 0.8819191919191919), (299, 0.753939393939394), (299, 0.9716161616161616), (299, 1.2044444444444444), (299, 1.0953535353535353), (299, 0.8723232323232324), (299, 0.8145454545454546), (299, 0.8686868686868686), (300, 0.6944444444444444), (299, 0.9166666666666666), (299, 1.8226262626262626), (300, 0.7167676767676767), (299, 0.9837373737373737), (299, 1.1268686868686868), (299, 1.0920202020202021), (300, 0.7135353535353536), (299, 0.7975757575757575), (299, 1.7006060606060607)}
Reverse reads appear to not be of consistent length. {(300, 0.6524242424242425), (300, 0.7228282828282828), (300, 0.34454545454545454), (300, 0.2771717171717172), (300, 0.5175757575757576), (300, 0.805959595959596), (300, 0.39555555555555555), (300, 0.5716161616161616), (300, 0.5268686868686868), (300, 0.19989898989898988), (300, 0.2832323232323232), (300, 0.5268686868686869), (300, 0.34383838383838383), (300, 0.21575757575757576), (300, 0.5425252525252525), (300, 0.8117171717171717), (300, 0.7348484848484849), (300, 0.4302020202020202), (300, 0.4011111111111111), (300, 0.49737373737373736), (299, 0.8988888888888888), (300, 0.4908080808080808), (300, 0.7632323232323233), (300, 0.9586868686868687), (300, 0.5066666666666667), (300, 0.6475757575757576), (300, 0.16353535353535353), (300, 0.45202020202020204), (300, 0.6666666666666667), (300, 0.612020202020202), (300, 0.3106060606060606), (300, 0.9995959595959596), (300, 0.5732323232323232), (300, 0.7272727272727273), (300, 0.6565656565656566), (300, 0.553030303030303), (300, 0.4670707070707071), (300, 0.38383838383838387), (300, 0.8771717171717172), (300, 0.547070707070707), (300, 0.8484848484848485), (299, 0.9389898989898989), (300, 0.5276767676767676), (300, 0.7070707070707071), (300, 0.7485858585858586)}
Traceback (most recent call last):
File "C:\Users\veham18\figaro\figaro\figaro.py", line 218, in
main()
File "C:\Users\veham18\figaro\figaro\figaro.py", line 210, in main
resultTable, forwardCurve, reverseCurve = trimParameterPrediction.performAnalysisLite(parameters.inputDirectory.value, parameters.minimumCombinedReadLength.value, subsample = parameters.subsample.value, percentile = parameters.percentile.value, forwardPrimerLength=parameters.forwardPrimerLength.value, reversePrimerLength=parameters.reversePrimerLength.value, namingStandardAlias=fileNamingStandard)
File "C:\Users\veham18\figaro\figaro\trimParameterPrediction.py", line 448, in performAnalysisLite
forwardReadLength, reverseReadLength = checkReadLengths(fastqList)
File "C:\Users\veham18\figaro\figaro\trimParameterPrediction.py", line 407, in checkReadLengths
raise fastqHandler.FastqValidationError("Unable to validate fastq files enough to perform this operation. Please check log for specific error(s).")
fastqHandler.FastqValidationError: Unable to validate fastq files enough to perform this operation. Please check log for specific error(s).

@michael-weinstein
Copy link
Collaborator

michael-weinstein commented May 14, 2021 via email

@vehamel
Copy link
Author

vehamel commented May 17, 2021

Hi!

No, I tried to trimmed them, but I give figaro the original files. So, no they were not trimmed. But, yes, it seems they are of various length (299 or 300), which I think is kind of expected no, one nucleotide difference is not a big difference ... What can I do about that?

@janetw
Copy link

janetw commented May 18, 2021

Hello, I too am trying to use figaro for the first time and have been able to get it to now run but am getting a similar output. These reads were already trimmed of primers and barcodes. Since we used phasing in our primers, I am not surprised that I have varied lengths of forward and reverse reads. Does figaro require reads to be of the same length?

@vehamel
Copy link
Author

vehamel commented May 18, 2021

Hello!

Me too! I will need to remove first part of the sequences because they must be primers. I forget to do it and now I was thinking to change that to my script!

@vehamel
Copy link
Author

vehamel commented May 26, 2021

Hi!

I cannot still use the tool! Can you help me?

@janetw
Copy link

janetw commented May 28, 2021

Hello, I was able to get FIGARO to work by first running fastqc and multiqc to determine the length that I wanted to trim to and make all reads the same length. I then used trimmomatic to get all the reads the same length. Trimmomatic has the option to crop at a certain length and drop reads that are shorter or you can choose to crop at the shortest sequencing read length; that's what I did. I then used FIGARO on the trimmed reads and once reads were all a consistent length, it ran fine. Hope this is helpful.

@vehamel
Copy link
Author

vehamel commented May 28, 2021

Hello!

I understand! But it is not the goal of using Figaro to uptimize where we should trim our sequences? Maybe I don't understand correctly?!

@janetw
Copy link

janetw commented May 28, 2021

Hello, FIGARO helps to choose parameters for the filterAndTrim function in DADA2. For FIGARO to work, however, the reads going into it must be one consistent length. So for example, I had reads that ranged from 269-281 bases. I cropped all reads to 269 and then used those trimmed reads in FIGARO. The output of FIGARO then provided what it determined to be optimal settings for the truncLen and maxEE settings in DADA2. I still am hoping that eventually FIGARO will be able to handle varying lengths.

@vehamel
Copy link
Author

vehamel commented May 28, 2021

Thanks a lot for the explanation! I will try that ;)

@michael-weinstein
Copy link
Collaborator

michael-weinstein commented Jun 3, 2021

Thanks for the community support. Sorry for being away for a bit, new baby over the last few weeks has been keeping me occupied. I agree very much with the approach above: if your reads only differ by a slight bit of length (a few bases here and there), just pretrim them to the shortest length, since you don't want to be selecting trimming parameters that are in the area where trimming may have happened to some reads. If your reads differ in length by a lot due to quality trimming, I recommend not doing that quality trim, as the purpose of FIGARO is to optimize the DADA2 native quality trimming methods.
In the case shown above where it looks like it's seeing reads vary between 299 and 301 length, just trim it all down to 297 or even 295 to be safe. It's unlikely you'd be wanting to retain those last few bases anyway.

@janetw
Copy link

janetw commented Jun 3, 2021

Well, now I am wondering, don't you have to trim to a consistent length in order to use FIGARO? Or is that now not the case? Thanks!

@michael-weinstein
Copy link
Collaborator

michael-weinstein commented Jun 3, 2021 via email

@janetw
Copy link

janetw commented Jun 3, 2021

Thanks!

@BrendaAmairanibp
Copy link

Hello, FIGARO helps to choose parameters for the filterAndTrim function in DADA2. For FIGARO to work, however, the reads going into it must be one consistent length. So for example, I had reads that ranged from 269-281 bases. I cropped all reads to 269 and then used those trimmed reads in FIGARO. The output of FIGARO then provided what it determined to be optimal settings for the truncLen and maxEE settings in DADA2. I still am hoping that eventually FIGARO will be able to handle varying lengths.

Hi Janetw, reading your comments really helped me going trhough my illumina v3-v4 data but I have some troubles and doubts for trimming my sequences in to a same lenght; since there are no adapters in my fastq files I supposed I only have to use de command "CROP" in trimmomatic, Is this correct? Hopping you can help me.

@handibles
Copy link

handibles commented Feb 10, 2022

Brenda, yup passing CROP:220 to trimmomatic will cut the 3' to 220bp. For trimming the 5' end, see HEADCROP in the Trimmomatic ref manual

cutadapt will also trim 3' bases from reads to a fixed length, e.g. -l 220 or --length 220 for 220bp length.

@michael-weinstein congrats on the sprog! 😺

edit: cutadapt option --minimum-length / -m will remove reads shorter than the value specified. Have resorted to passing both -l 220 and -m 220 to strictly enforce all reads being one length, as reads lengths vary slightly at the best of times. Use another run at FastQC/MultiQC to check your lengths & outputs.

Remember, and as above, FIGARO doesn't need F and R reads to be the same length, so QC / trim them separately if it helps you retain more of your sequence (e.g. F is uniform 300 but R is 294-300 - do R only).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants