Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running figaro fails on 16S V4 data #36

Open
dean0358 opened this issue May 10, 2021 · 10 comments
Open

Running figaro fails on 16S V4 data #36

dean0358 opened this issue May 10, 2021 · 10 comments

Comments

@dean0358
Copy link

dean0358 commented May 10, 2021

I am running Figaro on 16S V4 libraries sequenced on an Illumina MiSeq Instrument (MiSeq Reagent Kit v3; 2x300 bp). The length of the sequenced amplicon is ~250bps. The reason why I chose to use a 2x300 kit is because it was cheaper and produced more reads on a single run compared to the traditional MiSeq Reagent Kit v2; 2x250 bps.

When I run Figaro with the following parameters:

python3 figaro.py -i fastq -o figaro_test -a 250 -f 19 -r 20

The program throws the following error:

Traceback (most recent call last):
  File "/Users/starcue/Desktop/DADA2_Test/scripts/figaro/figaro/figaro.py", line 218, in <module>
    main()
  File "/Users/starcue/Desktop/DADA2_Test/scripts/figaro/figaro/figaro.py", line 210, in main
    resultTable, forwardCurve, reverseCurve = trimParameterPrediction.performAnalysisLite(parameters.inputDirectory.value, parameters.minimumCombinedReadLength.value, subsample =  parameters.subsample.value, percentile = parameters.percentile.value, forwardPrimerLength=parameters.forwardPrimerLength.value, reversePrimerLength=parameters.reversePrimerLength.value, namingStandardAlias=fileNamingStandard)
  File "/Users/starcue/Desktop/DADA2_Test/scripts/figaro/figaro/trimParameterPrediction.py", line 457, in performAnalysisLite
    resultTable = runTrimParameterTestLite(forwardExpectedErrorMatrix, reverseExpectedErrorMatrix, trimPositions, minimumTrimmingPositions, forwardCurve, reverseCurve, forwardPrimerLength, reversePrimerLength)
  File "/Users/starcue/Desktop/DADA2_Test/scripts/figaro/figaro/trimParameterPrediction.py", line 347, in runTrimParameterTestLite
    reverseExpectedErrors = reverseExpectedErrorMatrix[reverseTrimPosition - reverseMinimumTrimPosition]
IndexError: index 292 is out of bounds for axis 0 with size 12

However, when I change the length of the amplicon (e.g., 300 bps), the program completes successfully. It seems that Figaro is having trouble with this particular dataset because the reads are longer than the sequenced amplicon. This could probably be resolved by trimming the trailing 30 or so base pairs from each read, as these are not biologically relevant, and then rerunning Figaro on the trimmed reads. Has this issue been discussed previously?

Best,
Chris

@michael-weinstein
Copy link
Collaborator

Interesting... I would be willing to bet that causes it to crash. I definitely built around optimizing for much longer amplicons (like V1V2 or V3V4)

@dean0358
Copy link
Author

Is it because of the length of the amplicon or because the reads are longer than the length of the sequenced amplicon?

In any case, would this be an easy fix? Do you accept pull requests?

@michael-weinstein
Copy link
Collaborator

michael-weinstein commented May 13, 2021 via email

@damselflywingz
Copy link

damselflywingz commented Aug 30, 2021

Hi, I would like to try Figaro on my V4 Illumina paired libraries but I am getting an error.

I am running the code:
figaro -i /fastaQ_files/ -o /Figaro_test/ -f 20 -r 19 -a 325 -F illumina

I am getting the following error - any suggestions appreciated. Thanks-A
`Forward read length: 285
Reverse read length: 288
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/python/3.9.6/lib/python3.9/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/python/3.9.6/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "/global/home/hpc4605/.local/lib/python3.9/site-packages/figaro/expectedErrorCurve.py", line 97, in calculateAverageExpectedError
percentileExpectedError = makeExpectedErrorPercentileArrayForFastq(fastq.filePath, self.subsample, self.percentile, self.primerLength)
File "/global/home/hpc4605/.local/lib/python3.9/site-packages/figaro/expectedErrorCurve.py", line 109, in makeExpectedErrorPercentileArrayForFastq
expectedErrorMatrix = fastqAnalysis.buildExpectedErrorMatrix(path, subsample=subsample, leftTrim=primerLength)
File "/global/home/hpc4605/.local/lib/python3.9/site-packages/figaro/fastqAnalysis.py", line 37, in buildExpectedErrorMatrix
return numpy.array(expectedErrorMatrix, dataType, order='F')
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (95059,) + inhomogeneous part.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/global/home/hpc4605/.local/bin/figaro", line 8, in
sys.exit(main())
File "/global/home/hpc4605/.local/lib/python3.9/site-packages/figaro/figaro.py", line 210, in main
resultTable, forwardCurve, reverseCurve = trimParameterPrediction.performAnalysisLite(parameters.inputDirectory.value, parameters.minimumCombinedReadLength.value, subsample = parameters.subsample.value, percentile = parameters.percentile.value, forwardPrimerLength=parameters.forwardPrimerLength.value, reversePrimerLength=parameters.reversePrimerLength.value, namingStandardAlias=fileNamingStandard)
File "/global/home/hpc4605/.local/lib/python3.9/site-packages/figaro/trimParameterPrediction.py", line 453, in performAnalysisLite
forwardCurve, reverseCurve = expectedErrorCurve.calculateExpectedErrorCurvesForFastqList(fastqList, subsample=subsample, percentile=percentile, makePNG=makeExpectedErrorPlots, forwardPrimerLength=forwardPrimerLength, reversePrimerLength=reversePrimerLength)
File "/global/home/hpc4605/.local/lib/python3.9/site-packages/figaro/expectedErrorCurve.py", line 175, in calculateExpectedErrorCurvesForFastqList
forwardExpectedErrorArray = makeExpectedErrorPercentileArrayForFastqList(forwardFastqs, subsample, percentile, forwardPrimerLength)
File "/global/home/hpc4605/.local/lib/python3.9/site-packages/figaro/expectedErrorCurve.py", line 122, in makeExpectedErrorPercentileArrayForFastqList
expectedErrorReturns = easyMultiprocessing.parallelProcessRunner(parallelAgent.calculateAverageExpectedError, fastqList)
File "/global/home/hpc4605/.local/lib/python3.9/site-packages/figaro/easyMultiprocessing.py", line 68, in parallelProcessRunner
return mapper(processor, itemsToProcess, chunkSize)
File "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/python/3.9.6/lib/python3.9/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/python/3.9.6/lib/python3.9/multiprocessing/pool.py", line 771, in get
raise self._value
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (95059,) + inhomogeneous part.`

@michael-weinstein
Copy link
Collaborator

This is a problem I haven't seen before. Can we move this to a new issue? I will need to check out your data quickly to see how this might be happening.

@damselflywingz
Copy link

damselflywingz commented Sep 1, 2021

@michael-weinstein yes, please. Although wondering if Figaro must be a perfect match for F and R libraries as I am running on demultiplexed data that is not a perfect match? If that is not possible, then may be why the error is pulling?

@michael-weinstein
Copy link
Collaborator

Can you reach out to Zymo tech support? They'll give you my contact info and I will set up a chat.

@damselflywingz
Copy link

damselflywingz commented Sep 1, 2021 via email

@michael-weinstein
Copy link
Collaborator

I wanted to check back if you were able to reach tech support. I haven't heard from them yet, but once I do, I will set us up a quick zoom call.

@damselflywingz
Copy link

damselflywingz commented Sep 8, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants