Skip to content
This repository has been archived by the owner on Feb 16, 2019. It is now read-only.

Genbank file issues with spliced proteins #59

Open
nick-youngblut opened this issue Feb 27, 2014 · 5 comments
Open

Genbank file issues with spliced proteins #59

nick-youngblut opened this issue Feb 27, 2014 · 5 comments
Labels

Comments

@nick-youngblut
Copy link

Hi Matt,
A girl in my lab is trying to set up ITEP with some bacterial genomes. She ran into an issue: convertGenbank2table.py fails because some of here genbank files (downloaded from RAST) have join() in the CDS location info. Example:

CDS join(544..589,688..>1032)
/product="T-cell receptor beta-chain"

She's just going to delete those CDS features, but this is definitely not optimal.

Thanks.
Nick

@mattb112885
Copy link
Owner

Nick:

I haven't seen this before in bacterial genomes; the flat formats used in ITEP are not really designed to handle splicing.

That being said can you tell me the error you're getting?

Matt

@nick-youngblut
Copy link
Author

Hi Matt,
Here's the error I'm getting:

Traceback (most recent call last):
File "./convertGenbank2table.py", line 406, in
raise KeyError
KeyError

Using genbank files from RAST, a subset of CDS looks like this:

CDS join(5090..8716,8720..8794)

Thanks much!

Best,
Mallory

Begin forwarded message:

From: Nicholas David Youngblut [email protected]
Subject: FW: [clusterDbAnalysis] join() in genbank (#59)
Date: February 27, 2014 2:55:43 PM EST
To: "[email protected]" [email protected]

Hi Mallory,
Matt, the creator of ITEP, got back to me on the ‘join()’ bug for convertGenbank2table.py script. Can you send him the error that you got?

Thanks.
Nick

From: mattb112885 [email protected]
Reply-To: mattb112885/clusterDbAnalysis [email protected]
Date: Thursday, February 27, 2014 at 2:32 PM
To: mattb112885/clusterDbAnalysis [email protected]
Cc: Nicholas Youngblut [email protected]
Subject: Re: [clusterDbAnalysis] join() in genbank (#59)

Nick:

I haven't seen this before in bacterial genomes; the flat formats used in ITEP are not really designed to handle splicing.

That being said can you tell me the error you're getting?

Matt


Reply to this email directly or view it on GitHub.

@mattb112885
Copy link
Owner

Line 406 is a blank line now (there have been some changes to that script since the initial release). Could you update your copy of ITEP with this command

$ git pull origin master

and tell me if there is still a problem / what error you get when trying to run it?

Thanks and best

Matt

@mattb112885
Copy link
Owner

I looked into this a little (with an arabadopsis chromosome). It succeeded in making a table with all the genes (it treats the location as if there was no splice site) with biopython 1.61 and the latest ITEP code. However, I do need to fix a problem with multiply-spliced proteins in Genbank files (the ITEP IDs won't be added for mutliply-spliced proteins because I didn't build the lookup table correctly, assuming that there would only be one protein in the same region of DNA). I'll fix that problem, but it is unlikely to affect you with a bacterial genome.

Matt

@nick-youngblut
Copy link
Author

Thanks for looking into it.

Nick

From: mattb112885 [email protected]
Reply-To: mattb112885/clusterDbAnalysis
<reply+i-28439260-596214c9a8d085e465614b963a5c5a06a869676a-2468572@reply.git
hub.com>
Date: Thursday, February 27, 2014 at 5:23 PM
To: mattb112885/clusterDbAnalysis [email protected]
Cc: Nicholas Youngblut [email protected]
Subject: Re: [clusterDbAnalysis] join() in genbank (#59)

I looked into this a little (with an arabadopsis chromosome). It succeeded
in making a table with all the genes (it treats the location as if there was
no splice site) with biopython 1.61 and the latest ITEP code. However, I do
need to fix a problem with multiply-spliced proteins in Genbank files (the
ITEP IDs won't be added for mutliply-spliced proteins because I didn't build
the lookup table correctly, assuming that there would only be one protein in
the same region of DNA). I'll fix that problem, but it is unlikely to affect
you with a bacterial genome.

Matt


Reply to this email directly or view it on GitHub
<#59 (comment)
99317> .

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants