-
Notifications
You must be signed in to change notification settings - Fork 15
Genbank file issues with spliced proteins #59
Comments
Nick: I haven't seen this before in bacterial genomes; the flat formats used in ITEP are not really designed to handle splicing. That being said can you tell me the error you're getting? Matt |
Hi Matt, Traceback (most recent call last): Using genbank files from RAST, a subset of CDS looks like this: CDS join(5090..8716,8720..8794) Thanks much! Best, Begin forwarded message:
|
Line 406 is a blank line now (there have been some changes to that script since the initial release). Could you update your copy of ITEP with this command $ git pull origin master and tell me if there is still a problem / what error you get when trying to run it? Thanks and best Matt |
I looked into this a little (with an arabadopsis chromosome). It succeeded in making a table with all the genes (it treats the location as if there was no splice site) with biopython 1.61 and the latest ITEP code. However, I do need to fix a problem with multiply-spliced proteins in Genbank files (the ITEP IDs won't be added for mutliply-spliced proteins because I didn't build the lookup table correctly, assuming that there would only be one protein in the same region of DNA). I'll fix that problem, but it is unlikely to affect you with a bacterial genome. Matt |
Thanks for looking into it. Nick From: mattb112885 [email protected] I looked into this a little (with an arabadopsis chromosome). It succeeded Matt ‹ |
Hi Matt,
A girl in my lab is trying to set up ITEP with some bacterial genomes. She ran into an issue: convertGenbank2table.py fails because some of here genbank files (downloaded from RAST) have join() in the CDS location info. Example:
CDS join(544..589,688..>1032)
/product="T-cell receptor beta-chain"
She's just going to delete those CDS features, but this is definitely not optimal.
Thanks.
Nick
The text was updated successfully, but these errors were encountered: