-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failing for long sequences #4
Comments
Hi, Sorry for the time it's taken for me to get back to you (been on holiday and moved house). This is unlikely to be an issue with perl. The perl code is just checking that the outputs for the various C programs are ok before it proceeds. If you look in the src/ directory you'll find assorted .c and .cpp files the problem likely arises in one or more of them. If you can find some places where the sequence length is a set number then you can likely fix this by monkey patching your disopred and recompiling the files For instance I see in both disordcomb_pred.c and diso_neighb.c contain the line
And, fingers crossed, it should work |
C I know! Will try to change the macros to accommodate the titin sequences. Many thanks! |
I will report back once I've tried it. Just need to wait a week or so until my current calculations end. Don't want to recompile mid-analysis. |
Hi again, I realised that the buffers were probably long enough, since they are 50000 by default and the sequences in question are about 32500 aa each. Because I reran my entire analysis using a larger reference database (uniref90), I also reran these long sequences too under the same conditions. This time I get another error: It is not obvious to me what I can do to fix this, and I can live without these three proteins. But in case you are interested in digging deeper, the proteins in question have uniprot IDs A2ASS6, E9Q8K5, and E9Q8N1. I'd be happy to answer any questions about what I did to get this error, but I think it is pretty straightforward since I have not used anything unorthodox or modified anything. |
run_disopred.pl fails for the three titin variants A2ASS6, E9Q8K5, and E9Q8N1 (uniprot accession codes). These are very long sequences, >30000 aa. No non-standard amino acids can be found in the sequences. See the output below.
My perl skills are just too weak to figure out what goes wrong, but the sequence length seem like a likely culprit. All other >55000 proteins in my dataset worked fine.
The text was updated successfully, but these errors were encountered: