Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hit not found for certain queries with custom 1/-1 matrix #819

Open
tanhevg opened this issue Jul 1, 2024 · 2 comments
Open

Hit not found for certain queries with custom 1/-1 matrix #819

tanhevg opened this issue Jul 1, 2024 · 2 comments

Comments

@tanhevg
Copy link

tanhevg commented Jul 1, 2024

Hello.

Thank you for the great package!

I am trying to run diamond blastp for detecting exact protein sequence matches. I am using a custom matrix that has 1's on the diagonal and -1's off diagonal. Gap open penalty = 12; gap extend = 2. I am running against a database built from UniRef100, excluding all clusters that have Uniparc representatives. I am using A0A8S9VRI7 as a test query sequence, with an N-terminal 6xHis-tag and TEV cleavage site (MHHHHHHENLYFQMDNNGVAKTL...). I get the following, somewhat inconsistent, results:

  • With the custom matrix, and masking and comp-based-stats disabled (--masking none --comp-based-stats 0) I don't get any matches.
  • Same diamond settings with different queries returns correct matches.
  • Masking and comp-based-stats disabled with BLOSUM62 and default penalties, matches the correct sequence.
  • Custom matrix and penalties, without disabling masking and comp based stats, also matches the correct sequence.

Not sure if this has something to do with the query being quite long (1714aa), or containing repetitive regions (sequences of Asparagines or Lysines). Do I need to specify any extra settings to make Diamond statistics work properly with my custom matrix? Anything else I might be doing wrong?

A different example, W7JKY7, also does not match the correct sequence with the custom matrix, even without disabling masking and comp-based stats. The correct hit is found with BLOSUM62.

For completeness, here is the diamond command line:

diamond blastp -q A0A8S9VRI7.fasta \
    -d ~/diamond_db/uniref100_no_uniparc.dmnd -o A0A8S9VRI7.tsv \
    --custom-matrix diamond_matrix.txt --gapopen 12 --gapextend 2 \
    --header --tmpdir /dev/shm --fast -b 40 --max-target-seqs 100 \
    --evalue 1e-6 --id 80 --max-hsps 10 --masking none --comp-based-stats 0 \
    --threads 48

Lambda and kappa printed by Diamond:

Scoring parameters: (Matrix=custom Lambda=2.77945 K=0.829449 Penalties=12/2)

My custom matrix:

   A  C  D  E  F  G  H  I  K  L  M  N  P  Q  R  S  T  V  W  Y  X  Z
A  1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
C -1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
D -1 -1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
E -1 -1 -1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
F -1 -1 -1 -1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
G -1 -1 -1 -1 -1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
H -1 -1 -1 -1 -1 -1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
I -1 -1 -1 -1 -1 -1 -1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
K -1 -1 -1 -1 -1 -1 -1 -1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
L -1 -1 -1 -1 -1 -1 -1 -1 -1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
M -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
N -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
P -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1
Q -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1 -1 -1 -1 -1 -1 -1 -1 -1
R -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1 -1 -1 -1 -1 -1 -1 -1
S -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1 -1 -1 -1 -1 -1 -1
T -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1 -1 -1 -1 -1 -1
V -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1 -1 -1 -1 -1
W -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1 -1 -1 -1
Y -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1 -1 -1
X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1 -1
Z -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1

Thanks in advance.

@tanhevg tanhevg changed the title Hit not found with custom matrix and disabled masking and comp-based-stats Hit not found for certain queries with custom matrix and disabled masking and comp-based-stats Jul 1, 2024
@tanhevg tanhevg changed the title Hit not found for certain queries with custom matrix and disabled masking and comp-based-stats Hit not found for certain queries with custom 1/-1 matrix Jul 2, 2024
@tanhevg
Copy link
Author

tanhevg commented Jul 2, 2024

somehow it appears that using --no-ranking solves this problem

@bbuchfink
Copy link
Owner

Ok, let me know if you still need any help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants