-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about circFilter.py #78
Comments
Dear @JunmingH, would you have the complete command line that triggered this error? Regarding the time constraints - two days seems pretty long. But the time required depends on data and command line used. Cheers, |
Hi Tobias, Attached is the script. I ran it with around 300 samples. But there is no chance to run it completly. /DCC-0.4.7/DCC/main.py @DCC_InputFiles/samplesheet_BM10 -T 2 -D -N -R /ref/GRCh38_Repeats_simpleRepeats_RepeatMasker.gtf -an |
Hi @JunmingH, I don't think I ever ran DCC on such a large number of sample - therefore the merging might indeed take a lot of time. However, you might tune some of the parameters to reduce the time:
Let me know if the new parameters can address your issue. Cheers, |
Hi Tobias, One of my scripts is successful. It used 400 Gb memory for filter function. But it still stuck at a place. I am not sure which part it is: The error message is : |
BTW very appreciate for your help! |
Dear @JunmingH, You may want to create a new index for your genome fasta file. Try Cheers, |
Do you mean -A /ref/Homo_sapiens.GRCh38.dna.primary_assembly.fa this file? |
Yes, exactly. |
Hi Tobias, Is this warning normal?, I got this when doing linear counting 2020-03-24 09:52:35,105 WARNING: circRNA start position ('chr18', '45196357') does not have mapped read counts, treated as 0 |
Hi, @tjakobi , I'm stuck with the same part. My error information is as follows:
I have no idea that whether if my gtf annotation, which was downloaded from NCBI, is not right, my gtf:
But CircCoordinates and CircCount were still generated. Two separate chunks not affect each other ? Thanks for your time. |
Try ucsc annotation file. This one did not have chromosome number. |
@JunmingH Thanks. It's a genome of Hepatitis B virus and there is no information in UCSC. |
Hi, sorry for the delayed response. There might be an issue with the format of the 9th column, and DCC running into problems gathering data from it. DCC normally uses GTF files and you are providing a GFF3 file, which might result in problems. You might want to try using a GTF formatted file just to rule this possibility out. See https://www.biostars.org/p/99462/ for details on the differences. |
Thanks, it is indeed a problem of GTF. |
Hi,
I am trying to run DCC but always stuck at circFilter.py
The error message is below.
Traceback (most recent call last):
File "/circu_RNA/DCC-0.4.8/DCC/main.py", line 842, in
main()
File "/circu_RNA/DCC-0.4.8/DCC/main.py", line 375, in main
filt.filter_nonrep(rep_file, indx0, count0)
File "/circu_RNA/DCC-0.4.8/DCC/circFilter.py", line 110, in filter_nonrep
nonrep = np.column_stack((indx0, count0))
File "/share/pkg.7/python2/2.7.16/install/lib/python2.7/site-packages/numpy/lib/shape_base.py", line 640, in column_stack
return _nx.concatenate(arrays, 1)
MemoryError
Could you please give me some ideas about this?
BTW is that possible it spends two days on combining those results together? Since each time it takes a very long time to combine, but still, have a memory error. I only use 4 threads and request 252 GB.
Best
The text was updated successfully, but these errors were encountered: