Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating local database is always interrupted at aligning step #37

Open
clavedec opened this issue Jul 8, 2024 · 1 comment
Open

Creating local database is always interrupted at aligning step #37

clavedec opened this issue Jul 8, 2024 · 1 comment

Comments

@clavedec
Copy link

clavedec commented Jul 8, 2024

Hello,

I have been trying to use make-SIFT-db-all.pl to create a database for chiLan. It was all going well, and the files were being created in the directories singleRecords, fasta and subst (the others are empty). However, I constantly get an email saying the slurm job has failed. It says 'Exit code 255', usually after 11h-12h of run at the step of " Aligning queries with candidate sequences ". Last time it advanced until:

** Aligning queries with candidate sequences **
... processing database part 1 (size ~1.00 GB): 47.50/100.00%

Since all the files had been created, I decided to run:

~/sift4g/bin/sift4g -d /full_path/scripts_to_build_SIFT_db/GCF_009829145.1/protein.faa -q /full_path/scripts_to_build_SIFT_db/all_prot.fasta --subst /full_path/scripts_to_build_SIFT_db/subst --out /full_path/scripts_to_build_SIFT_db/SIFT_predictions --sub-results

But the alignment does not advance beyond 47.50% due to 'Segmentation fault (core dumped)'. Although it seems to be a memory problem, it is using less memory than I allocated for the job. Any suggestion of what can happening?

Based on a previous issue, I'm here sharing the all_prot.fasta and also the config file I used for make-SIFT-db-all.pl on the following link.

Thank you very much for your help!

Best wishes,
Clarissa

@ChandlerJun
Copy link

ChandlerJun commented Aug 15, 2024

Hello,

I encountered the same problem when running the program in the Slurm system.
I removed all the abnormal protein codes beforehand. (e.g., X)

I Try:

  1. Increase memory to 1TB (same error)
  2. Remove proteins with sequence lengths over 35,000 from all_prot.fasta. (same error)
  3. Remove proteins with sequence lengths over 15,000 from all_prot.fasta. (no error)
  4. Test sequence lengths greater than 35,000 individually. (same error)

My protein sequence length distribution was:
Length range:Numbers of protein
0-8,999:67,873
15,000-15,999: 1
26,000-26,999: 2
35,000-35,999: 1

My guess might be that the chunk is running out of memory allocation.
I hope this can help developers give me suggestions to solve the problem of proteins lengths over 15,000 or fix the bug.

Thank you.

Best wishes,
Chandler

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants