You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to run diamond blastp for detecting exact protein sequence matches. I am using a custom matrix that has 1's on the diagonal and -1's off diagonal. Gap open penalty = 12; gap extend = 2. I am running against a database built from UniRef100, excluding all clusters that have Uniparc representatives. I am using A0A8S9VRI7 as a test query sequence, with an N-terminal 6xHis-tag and TEV cleavage site (MHHHHHHENLYFQMDNNGVAKTL...). I get the following, somewhat inconsistent, results:
With the custom matrix, and masking and comp-based-stats disabled (--masking none --comp-based-stats 0) I don't get any matches.
Same diamond settings with different queries returns correct matches.
Masking and comp-based-stats disabled with BLOSUM62 and default penalties, matches the correct sequence.
Custom matrix and penalties, without disabling masking and comp based stats, also matches the correct sequence.
Not sure if this has something to do with the query being quite long (1714aa), or containing repetitive regions (sequences of Asparagines or Lysines). Do I need to specify any extra settings to make Diamond statistics work properly with my custom matrix? Anything else I might be doing wrong?
A different example, W7JKY7, also does not match the correct sequence with the custom matrix, even without disabling masking and comp-based stats. The correct hit is found with BLOSUM62.
For completeness, here is the diamond command line:
The text was updated successfully, but these errors were encountered:
tanhevg
changed the title
Hit not found with custom matrix and disabled masking and comp-based-stats
Hit not found for certain queries with custom matrix and disabled masking and comp-based-stats
Jul 1, 2024
tanhevg
changed the title
Hit not found for certain queries with custom matrix and disabled masking and comp-based-stats
Hit not found for certain queries with custom 1/-1 matrix
Jul 2, 2024
Hello.
Thank you for the great package!
I am trying to run
diamond blastp
for detecting exact protein sequence matches. I am using a custom matrix that has 1's on the diagonal and -1's off diagonal. Gap open penalty = 12; gap extend = 2. I am running against a database built from UniRef100, excluding all clusters that have Uniparc representatives. I am using A0A8S9VRI7 as a test query sequence, with an N-terminal 6xHis-tag and TEV cleavage site (MHHHHHHENLYFQMDNNGVAKTL...
). I get the following, somewhat inconsistent, results:--masking none --comp-based-stats 0
) I don't get any matches.Not sure if this has something to do with the query being quite long (1714aa), or containing repetitive regions (sequences of Asparagines or Lysines). Do I need to specify any extra settings to make Diamond statistics work properly with my custom matrix? Anything else I might be doing wrong?
A different example, W7JKY7, also does not match the correct sequence with the custom matrix, even without disabling masking and comp-based stats. The correct hit is found with BLOSUM62.
For completeness, here is the diamond command line:
Lambda and kappa printed by Diamond:
My custom matrix:
Thanks in advance.
The text was updated successfully, but these errors were encountered: