Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't run after update #12

Open
pgcudahy opened this issue May 5, 2022 · 2 comments
Open

Can't run after update #12

pgcudahy opened this issue May 5, 2022 · 2 comments

Comments

@pgcudahy
Copy link

pgcudahy commented May 5, 2022

Thanks for letting me know about the update Jody, unfortunately all of my runs fail with this error that I cannot decode (the version says 0.2.0, but I confirmed I'm running 0.2.1 but the version string wasn't updated with the latest release):

$ cat N200006_S289.errlog.txt

ntm-profiler error report

  • OS: linux
  • ntm-profiler version: 0.2.0
  • pathogen-profiler version: 2.0.0
  • Program call:
{'no_clean': False, 'read1': '/home/pgc29/scratch60/Taiwan_MKansasii/dataraw/N200006_S289_R1.fastq.gz', 'read2': '/home/pgc29/scratch60/Taiwan_MKansasii/dataraw/N200006_S289_R2.fastq.gz', 'bam': None, 'fasta': None, 'vcf': None, 'platform': 'illumina', 'resistance_db': None, 'external_resistance_db': None, 'species_db': 'ntmdb', 'external_species_db': None, 'prefix': 'N200006_S289', 'dir': '.', 'csv': False, 'txt': False, 'add_columns': None, 'add_mutation_metadata': False, 'call_whole_genome': False, 'mapper': 'bwa', 'caller': 'freebayes', 'calling_params': None, 'min_depth': 10, 'af': 0.1, 'reporting_af': 0.1, 'coverage_fraction_threshold': 0, 'missing_cov_threshold': None, 'species_only': False, 'no_trim': False, 'no_flagstat': False, 'no_clip': True, 'no_delly': False, 'no_species': False, 'no_mash': False, 'output_kmer_counts': False, 'add_variant_annotations': False, 'threads': 1, 'verbose': 0, 'no_cleanup': False, 'delly_vcf': None, 'func': <function main_profile at 0x2b25417c41f0>, 'software_name': 'ntm-profiler', 'tmp_prefix': 'abdb8f1d-40f4-4030-92c7-72565c0a47f0', 'files_prefix': './abdb8f1d-40f4-4030-92c7-72565c0a47f0'}

Traceback:

  File "/gpfs/ysm/project/cohen_theodore/pgc29/conda_envs/ntmprofiler2/bin/ntm-profiler", line 322, in <module>
    args.func(args)
  File "/gpfs/ysm/project/cohen_theodore/pgc29/conda_envs/ntmprofiler2/bin/ntm-profiler", line 89, in main_profile
    species_prediction = pp.speciate(args)
  File "/gpfs/ysm/project/cohen_theodore/pgc29/conda_envs/ntmprofiler2/lib/python3.8/site-packages/pathogenprofiler/cli.py", line 64, in speciate
    kmer_dump = fastq_class.get_kmer_counts(args.files_prefix,threads=args.threads)
  File "/gpfs/ysm/project/cohen_theodore/pgc29/conda_envs/ntmprofiler2/lib/python3.8/site-packages/pathogenprofiler/fastq.py", line 127, in get_kmer_counts
    run_cmd(f"kmc {bins} -t{threads} -sf{threads} -sp{threads} -sr{threads} -k{klen} @{tmp_file_list} {tmp_prefix} {tmp_prefix}")
  File "/gpfs/ysm/project/cohen_theodore/pgc29/conda_envs/ntmprofiler2/lib/python3.8/site-packages/pathogenprofiler/utils.py", line 391, in run_cmd
    raise ValueError("Command Failed:\n%s\nstderr:\n%s" % (cmd,stderr.decode()))

Value:

Command Failed:
set -u pipefail; kmc  -t1 -sf1 -sp1 -sr1 -k31 @219bda85-28f9-43c3-9f0c-461bc10d96e1.list 219bda85-28f9-43c3-9f0c-461bc10d96e1 219bda85-28f9-43c3-9f0c-461bc10d96e1
stderr:
*****************
Stage 1: 100%
Stage 2: 100%
/bin/sh: line 1: 150965 Bus error               kmc -t1 -sf1 -sp1 -sr1 -k31 @219bda85-28f9-43c3-9f0c-461bc10d96e1.list 219bda85-28f9-43c3-9f0c-461bc10d96e1 219bda85-28f9-43c3-9f0c-461bc10d96e1

I tried to run the kmc command by itself and got the same error

$ kmc  -t1 -sf1 -sp1 -sr1 -k31 @219bda85-28f9-43c3-9f0c-461bc10d96e1.list 219bda85-28f9-43c3-9f0c-461bc10d96e1 219bda85-28f9-43c3-9f0c-461bc10d96e1
*****************
Stage 1: 100%
Stage 2: 100%
Bus error

This is on my university's cluster and not my own machine, so a bit harder to debug, but I'll keep looking into it.

@pgcudahy
Copy link
Author

pgcudahy commented May 5, 2022

Ah, it's an out of memory issue. My cluster instance had a cap of 8GB of memory and the default for kmc is 12GB. Per kmc's documentation -m<size> - max amount of RAM in GB (from 1 to 1024); default: 12 so adding the argument -m8 made it run fine. kmc needs a minimum of 2GB of memory, but when comparing different runs, it takes about 45 seconds per genome with both 2GB and 8GB, so maybe set the command to -m2. Alternatively you could just warn users to allocate at least 12GB.

@jodyphelan
Copy link
Owner

Ah thanks for looking into this, I'll add that parameter in and update the release

jodyphelan added a commit to jodyphelan/pathogen-profiler that referenced this issue May 6, 2022
jodyphelan added a commit that referenced this issue May 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants