Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grenepipe v0.13.x problems #60

Open
ospfsg opened this issue Jan 14, 2025 · 5 comments
Open

Grenepipe v0.13.x problems #60

ospfsg opened this issue Jan 14, 2025 · 5 comments

Comments

@ospfsg
Copy link

ospfsg commented Jan 14, 2025

Dear Lucas

Since I update to micromamba and v0.13.x I am no longer able to run grenepipe!!!

I installed micromamba using the command

"${SHELL}" <(curl -L micro.mamba.pm/install.sh)

end up with
micromamba version 2.0.5

gerenepipe-0.13.4 was downloaded and extracted to a dir with the same name

then
cd /path/to/grenepipe

micromamba env create -f workflow/envs/grenepipe.yaml -n grenepipe

micromamba activate grenepipe

then I run one command on a small test dataset of 10 samples
snakemake --use-conda --conda-frontend mamba --cores 102 --directory /mnt/data3/Project_KeePace/Operational/4_data_analysis/5_grenepipe/run8/ --conda-prefix /home/dau1/software/conda-envs/

and after the run ended I run another command, just for the mapping part

snakemake --use-conda --conda-frontend mamba --cores 102 --directory /mnt/data3/Project_KeePace/Operational/4_data_analysis/5_grenepipe/run9_mapping/ --conda-prefix /home/dau1/software/conda-envs/

In both case the run did not ended ok

I annex the logs from both runs. I also include the picard merged-genotyped.log

qc files are there, trimming files also, mapping (dedup, mergd and sorted) are there

bam files seems to be ok, but vcf files only have the headers!!! even when I run only the mapping step on the dedup bam files

Is there something I should have to modified in the config files to avoid this problem?

2025-01-13T114035.042333.log
2025-01-14T105534.420266.log
merge-genotyped.log
2025-01-14T105534.997975.snakemake.log
2025-01-13T114035.617454.snakemake.log

thank you very much for your help

osp

@lczech
Copy link
Member

lczech commented Jan 14, 2025

Hi @ospfsg,

hm, this looks like it could be an issue with the data, and needs further search for where the error comes from. If the VCFs are empty, then something in the calling went wrong. Also, Picard QC (CollectMultipleMetrics) seems to fail.

Can you please zip the complete log directory, and send it here? Without further details, I cannot see what is wrong.

Cheers
Lucas

@ospfsg
Copy link
Author

ospfsg commented Jan 15, 2025

Dear Lucas

Thank you for your quick reply. I annex the log directory

logs_run8.tar.gz

cheers
osp

@lczech lczech closed this as completed in 167eef2 Jan 24, 2025
@lczech
Copy link
Member

lczech commented Jan 24, 2025

Hi Octávio,

thanks for sharing the log files. You can follow the trail of logs as described here, in order to figure out where the issues are. In this case, we can see the following:

The main snakemake log file contains several entries Error in rule picard_collectmultiplemetrics. Tracking this down by checking the log files in logs/qc/picard-collectmultiplemetrics, we find that Picard CollectMultipleMetrics fails with

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

This simply means that it does not have enough memory, which can be adjusted in the config file, see here. As described in the comment here, you need to change that line in the config.yaml to

CollectMultipleMetrics-java-opts: "-Xmx10g"

or some larger value, if that is not enough still.

Furthermore, the VCF files are empty due to an issue in Freebayes, which we can see by inspecting the log files in logs/calling/freebayes. The files repeat the error message

/opt/conda/conda-bld/freebayes_1711687983774/_build_env/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/stl_vector.h:1123: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = Allele*; _Alloc = std::allocator<Allele*>; reference = Allele*&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.

which after a quick google search reveals that this is an error in freebayes that is already known and fixed, see here. I have now updated the main branch with a more recent version of freebayes, which should fix this.

Can you please download the main branch via the green "Code" button here, and test this with your data?

Cheers and so long
Lucas

@lczech lczech reopened this Jan 24, 2025
@ospfsg
Copy link
Author

ospfsg commented Feb 3, 2025

Thank you for your reply

I manage to run the v0.13.4 to the end only if I change to false the CollectMultipleMetrics. I tried several picard memory solution for CollectMultipleMetrics from -Xmx10g to -Xmx300g and none worked.

If I changed to true the pipeline give an error almost at the end, but the .bam and .vcf seems to be fine!
I tried several picard memory solution for CollectMultipleMetrics from -Xmx10g to -Xmx300g and none worked.

So now the freebayes problems is solved and the pipeline runs to the end if I remove the CollectMultipleMetrics.

I tried several picard memory solution for CollectMultipleMetrics from -Xmx10g to -Xmx300g and none worked

this is the usual error mesaage:

Select jobs to execute...
Traceback (most recent call last):
File "/mnt/data3/Project_KeePace/Operational/4_data_analysis/5_grenepipe/run14/.snakemake/scripts/tmpitn5_yqr.picard-collectmultiplemetrics.py", line 79, in
shell(
File "/home/dau1/micromamba/envs/grenepipe/lib/python3.12/site-packages/snakemake/shell.py", line 357, in new
raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'set -euo pipefail; (picard CollectMultipleMetrics I=mapping/dedup/QCE-TUS-006.bam O=qc/picard/QCE-TUS-006 R=/mnt/data1/Project_KeePace/Operational/6_reference_genomes/Qvariabilis_h1/Qvariabilis_h1_chr1_12.fna -Xmx300g VALIDATION_STRINGENCY=LENIENT METRIC_ACCUMULATION_LEVEL=null METRIC_ACCUMULATION_LEVEL=SAMPLE PROGRAM=MeanQualityByCycle PROGRAM=CollectBaseDistributionByCycle PROGRAM=CollectQualityYieldMetrics PROGRAM=CollectGcBiasMetrics PROGRAM=CollectAlignmentSummaryMetrics PROGRAM=CollectInsertSizeMetrics PROGRAM=QualityScoreDistribution -Xmx164M -Djava.io.tmpdir=/tmp) > logs/qc/picard-collectmultiplemetrics/QCE-TUS-006.log 2>&1' returned non-zero exit status 1.
RuleException:
CalledProcessError in file /home/dau1/software/grenepipe-0.13.4/workflow/rules/qc-bam.smk, line 221:
Command 'source /home/dau1/micromamba/envs/grenepipe/bin/activate '/home/dau1/software/conda-envs/8569ffe46ca5f8c925da383ed32c9f26_'; set -euo pipefail; python /mnt/data3/Project_KeePace/Operational/4_data_analysis/5_grenepipe/run14/.snakemake/scripts/tmpitn5_yqr.picard-collectmultiplemetrics.py' returned non-zero exit status 1.
[Thu Jan 30 11:40:16 2025]
Error in rule picard_collectmultiplemetrics:
jobid: 47
input: mapping/dedup/QCE-TUS-006.bam, /mnt/data1/Project_KeePace/Operational/6_reference_genomes/Qvariabilis_h1/Qvariabilis_h1_chr1_12.fna
output: qc/picard/QCE-TUS-006.alignment_summary_metrics, qc/picard/QCE-TUS-006.base_distribution_by_cycle_metrics, qc/picard/QCE-TUS-006.base_distribution_by_cycle.pdf, qc/picard/QCE-TUS-006.gc_bias.detail_metrics, qc/picard/QCE-TUS-006.gc_bias.summary_metrics, qc/picard/QCE-TUS-006.gc_bias.pdf, qc/picard/QCE-TUS-006.insert_size_metrics, qc/picard/QCE-TUS-006.insert_size_histogram.pdf, qc/picard/QCE-TUS-006.quality_by_cycle_metrics, qc/picard/QCE-TUS-006.quality_by_cycle.pdf, qc/picard/QCE-TUS-006.quality_distribution_metrics, qc/picard/QCE-TUS-006.quality_distribution.pdf, qc/picard/QCE-TUS-006.quality_yield_metrics
log: logs/qc/picard-collectmultiplemetrics/QCE-TUS-006.log (check log file(s) for error details)
conda-env: /home/dau1/software/conda-envs/8569ffe46ca5f8c925da383ed32c9f26_

@lczech
Copy link
Member

lczech commented Feb 3, 2025

Hey @ospfsg,

thanks for the update, and happy to hear that it's generally working again, except for this one step.

Could you maybe check if slurm reports an out-of-memory issue for all of them? It is weird that even with 300G, it fails for that reason. You can find the slurm logs in the hidden .snakemake directory - should be called slurm_logs or something similar. Alternatively, if you want, you can also zip the log and the .snakemake/logs and .snakemake/slurm_logs directories, and post them here, so that I can have a look.

Cheers and so long
Lucas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants