Creating local database is always interrupted at aligning step #37

clavedec · 2024-07-08T13:00:14Z

Hello,

I have been trying to use make-SIFT-db-all.pl to create a database for chiLan. It was all going well, and the files were being created in the directories singleRecords, fasta and subst (the others are empty). However, I constantly get an email saying the slurm job has failed. It says 'Exit code 255', usually after 11h-12h of run at the step of " Aligning queries with candidate sequences ". Last time it advanced until:

** Aligning queries with candidate sequences **
... processing database part 1 (size ~1.00 GB): 47.50/100.00%

Since all the files had been created, I decided to run:

~/sift4g/bin/sift4g -d /full_path/scripts_to_build_SIFT_db/GCF_009829145.1/protein.faa -q /full_path/scripts_to_build_SIFT_db/all_prot.fasta --subst /full_path/scripts_to_build_SIFT_db/subst --out /full_path/scripts_to_build_SIFT_db/SIFT_predictions --sub-results

But the alignment does not advance beyond 47.50% due to 'Segmentation fault (core dumped)'. Although it seems to be a memory problem, it is using less memory than I allocated for the job. Any suggestion of what can happening?

Based on a previous issue, I'm here sharing the all_prot.fasta and also the config file I used for make-SIFT-db-all.pl on the following link.

Thank you very much for your help!

Best wishes,
Clarissa

ChandlerJun · 2024-08-15T05:01:54Z

Hello,

I encountered the same problem when running the program in the Slurm system.
I removed all the abnormal protein codes beforehand. (e.g., X)

I Try:

Increase memory to 1TB (same error)
Remove proteins with sequence lengths over 35,000 from all_prot.fasta. (same error)
Remove proteins with sequence lengths over 15,000 from all_prot.fasta. (no error)
Test sequence lengths greater than 35,000 individually. (same error)

My protein sequence length distribution was:
Length range:Numbers of protein
0-8,999:67,873
15,000-15,999: 1
26,000-26,999: 2
35,000-35,999: 1

My guess might be that the chunk is running out of memory allocation.
I hope this can help developers give me suggestions to solve the problem of proteins lengths over 15,000 or fix the bug.

Thank you.

Best wishes,
Chandler

clavedec mentioned this issue Jul 8, 2024

Creating local database is always interrupted at aligning step pauline-ng/SIFT4G_Create_Genomic_DB#96

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating local database is always interrupted at aligning step #37

Creating local database is always interrupted at aligning step #37

clavedec commented Jul 8, 2024 •

edited

Loading

ChandlerJun commented Aug 15, 2024 •

edited

Loading

Creating local database is always interrupted at aligning step #37

Creating local database is always interrupted at aligning step #37

Comments

clavedec commented Jul 8, 2024 • edited Loading

ChandlerJun commented Aug 15, 2024 • edited Loading

clavedec commented Jul 8, 2024 •

edited

Loading

ChandlerJun commented Aug 15, 2024 •

edited

Loading