Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Error: Array size overflow" when using Diamond on large custom database #823

Open
CaroleBelliardo opened this issue Jul 18, 2024 · 2 comments

Comments

@CaroleBelliardo
Copy link

CaroleBelliardo commented Jul 18, 2024

Hi BBuchfink,

I regularly use this tool, and it usually works fine. I'm encountering an issue while using Diamond on a custom database that is 609 GB in size and includes integrated taxonomic information. Previously, my job ran smoothly on an earlier 607 GB database version that included only some of the taxonomic information.

I did not receive any error messages during the database formatting process. However, after running a blastp analysis against the same proteome, I get the following error message after 3 minutes: “Error: Array size overflow”.

Could you please help me resolve this issue?

My diamond log file contain:

diamond blastp --more-sensitive --max-target-seqs 500 --evalue 0.001 --threads 70 --db /kwak/hub/25_cbelliardo/DB_Metag_Soildb/SoilDB_nr_v4.dmnd --query XINDMERG.20240529.recipeA.20240529.prot.fasta --out XINDMERG.20240529.recipeA.20240529.prot_diamond_taxids_exclude_metag.tsv --taxon-exclude 46003 -b 10 -c 1 --tmpdir . --log --verbose --outfmt 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore staxids
#CPU threads: 70
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
CPU features detected: ssse3 popcnt sse4.1 avx2
L3 cache size: 47185920
MAX_SHAPE_LEN=19 SEQ_MASK STRICT_BAND
Temporary directory: .
#Target sequences to report alignments for: 500
DP fields: 510
Opening the database... Error: Array size overflow.

@bbuchfink
Copy link
Owner

An internal array mapping the sequences to taxids is overflowing here. I will provide a fix in the next release.

@shenwei356
Copy link

same here, looking forward to the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants