Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.OutOfMemoryError error prompted by running te-locate #93

Open
andreabours opened this issue May 8, 2022 · 4 comments
Open
Labels

Comments

@andreabours
Copy link

Hello,

I'm trying to run your pipeline, however, I'm struggling to run te-locate. I receive the following exception:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at genome_sv_population_genetics.PopTE_Insertion.<init>(PopTE_Insertion.java:93)
	at genome_sv_population_genetics.PopTE_Insertion.main(PopTE_Insertion.java:589)

In the log I can see that the te-locate tool is invoked with a java -Xmx4g -jar command, so I tried to run the tool with just a maximum of 4g, but this doesn't resolve the problem. At the moment I am not even sure anymore whether the max memory invoked is the problem.

I hope you can help me get this problem solved.
Best,
Andrea

@cbergman
Copy link
Member

cbergman commented May 9, 2022

Hi @andreabours

Thanks for your feedback. Could you send some more information to help us figure out what is going on?

  • Were you able to install McClintock with all components and run on the test data successfully?
  • Can you post the command line statement you used to run McClintock that generated this error?
  • Can you say describe the machine you are running this job on (max memory available, whether it is a cluster node or a local workstation)
  • If you run McClintock on your data with all other methods besides te-locate (i.e. "-m trimgalore,coverage,ngs_te_mapper,ngs_te_mapper2,relocate,relocate2,temp,temp2,retroseq,popoolationte,popoolationte2,teflon,tebreak"), does your job complete successfully?
  • If you run McClintock on your data with just te-locate (i.e. "-m te-locate"), does you you see the same error you posted above?

Thanks,
Casey

@andreabours
Copy link
Author

andreabours commented May 16, 2022

Hi Casey,

Thanks for helping (apologies for the delay).

  • My IT department had to install the pipeline, while there was an initial problem (the pipeline wanting to install TE library, while I don't have the rights to do that in the folder it's placed). They ran the test data succesfully. I saw their output and they didn't have a problem with running te-locate.

  • here is my command: python3 /data/biosoftware/mcclintock/mcclintock.py -r ~/reference/renamed_reorder_new_reference.fasta -c consensus_adj.fa -g reference_TE_annotation_adj.gff -t reference_TE_taxonomy_adj.tsv -1 "sample"_R1.fastq.gz -2 "sample"_R2.fastq.gz -p 6 -m te-locate -o combination_4/"sample"_telocate
    I submitted it through a sbatch script, which calls for 4G memory on 1 node. (I initially ran it with 64G memory)

  • my jobs complete succesfully with ngs_te_mapper2 and PoPoolationTE2 (other programs I haven't run yet)

  • So yes the error occurs both when running multiple tools and when only running the te-locate tool on it own. Exact same error everytime.

Thanks,
Andrea

@cbergman
Copy link
Member

Hi @andreabours

This looks like an interaction between one or more of the following: your system wide java, how the McClintock system was installed, your data and possibly how TE-locate is called by McClintock. From what you say, it does not appear to be a problem with McClintock or TE-locate per se, since other components run clean on your data and TE-locate runs clean on the test data.

Your statement about "the pipeline wanting to install TE library, while I don't have the rights to do that in the folder it's placed" is curious since the McClintock install process does not install a TE library. This says to me that the install process done by your sysadmins may have been done or communicated incorrectly. One possible thing for you to try next is to install McClintock using bioconda in your home directory so we know exactly how it was installed.

Alternatively I can try to troubleshoot this offline if you are willing to share a copy of this data? Please contact me at cbergman [ at ] uga [ dot ] edu.

In the mean time, I would suggest moving forward using results from the other components while we sort out the TE-locate issue.

Thanks,
Casey

@andreabours
Copy link
Author

Hi Casey,

To clarify the installation problem was the following:

RepeatMasker version open-4.0.7
Search Engine: NCBI/RMBLAST [ 2.11.0+ ]
Rebuilding RepeatMaskerLib.embl library
  Reading Dfam_consensus database...
- Read in 216 sequences from /data/biosoftware/mcclintock/install/envs/conda/a7544eba/share/RepeatMasker/Libraries/DfamConsensus.embl
Saving RepeatMaskerLib.embl library...() Unable to open file /data/biosoftware/mcclintock/install/envs/conda/a7544eba/share/RepeatMasker/Libraries/RepeatMaskerLib.embl for writing: Permission denied

And this error occured on the first time running RelocaTE2 after setting up the pipeline.
But that got solved when my IT department ran the test dataset (they hadn't done that intially).

As a side note I actually tried to install the pipeline myself, it was a bit of a pain, I didn't finish the installation as too many things were interfering with each other (from my own home to the set-up of the computer cluster in general).

Let me see if I have some sample data.
Also, I kinda made the decision already to not include te-locate. I just wanted to inform you about this issue/bug (for future users)

Cheers,
Andrea

@cbergman cbergman added the bug label May 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants