Genbank file issues with spliced proteins #59

nick-youngblut · 2014-02-27T17:50:19Z

Hi Matt,
A girl in my lab is trying to set up ITEP with some bacterial genomes. She ran into an issue: convertGenbank2table.py fails because some of here genbank files (downloaded from RAST) have join() in the CDS location info. Example:

CDS join(544..589,688..>1032)
/product="T-cell receptor beta-chain"

She's just going to delete those CDS features, but this is definitely not optimal.

Thanks.
Nick

mattb112885 · 2014-02-27T19:32:27Z

Nick:

I haven't seen this before in bacterial genomes; the flat formats used in ITEP are not really designed to handle splicing.

That being said can you tell me the error you're getting?

Matt

nick-youngblut · 2014-02-27T20:31:54Z

Hi Matt,
Here's the error I'm getting:

Traceback (most recent call last):
File "./convertGenbank2table.py", line 406, in
raise KeyError
KeyError

Using genbank files from RAST, a subset of CDS looks like this:

CDS join(5090..8716,8720..8794)

Thanks much!

Best,
Mallory

Begin forwarded message:

From: Nicholas David Youngblut [email protected]
Subject: FW: [clusterDbAnalysis] join() in genbank (#59)
Date: February 27, 2014 2:55:43 PM EST
To: "[email protected]" [email protected]

Hi Mallory,
Matt, the creator of ITEP, got back to me on the ‘join()’ bug for convertGenbank2table.py script. Can you send him the error that you got?

Thanks.
Nick

From: mattb112885 [email protected]
Reply-To: mattb112885/clusterDbAnalysis [email protected]
Date: Thursday, February 27, 2014 at 2:32 PM
To: mattb112885/clusterDbAnalysis [email protected]
Cc: Nicholas Youngblut [email protected]
Subject: Re: [clusterDbAnalysis] join() in genbank (#59)

Nick:

I haven't seen this before in bacterial genomes; the flat formats used in ITEP are not really designed to handle splicing.

That being said can you tell me the error you're getting?

Matt

—
Reply to this email directly or view it on GitHub.

mattb112885 · 2014-02-27T20:55:33Z

Line 406 is a blank line now (there have been some changes to that script since the initial release). Could you update your copy of ITEP with this command

$ git pull origin master

and tell me if there is still a problem / what error you get when trying to run it?

Thanks and best

Matt

mattb112885 · 2014-02-27T22:23:57Z

I looked into this a little (with an arabadopsis chromosome). It succeeded in making a table with all the genes (it treats the location as if there was no splice site) with biopython 1.61 and the latest ITEP code. However, I do need to fix a problem with multiply-spliced proteins in Genbank files (the ITEP IDs won't be added for mutliply-spliced proteins because I didn't build the lookup table correctly, assuming that there would only be one protein in the same region of DNA). I'll fix that problem, but it is unlikely to affect you with a bacterial genome.

Matt

nick-youngblut · 2014-02-28T00:40:26Z

Thanks for looking into it.

Nick

From: mattb112885 [email protected]
Reply-To: mattb112885/clusterDbAnalysis
<reply+i-28439260-596214c9a8d085e465614b963a5c5a06a869676a-2468572@reply.git
hub.com>
Date: Thursday, February 27, 2014 at 5:23 PM
To: mattb112885/clusterDbAnalysis [email protected]
Cc: Nicholas Youngblut [email protected]
Subject: Re: [clusterDbAnalysis] join() in genbank (#59)

I looked into this a little (with an arabadopsis chromosome). It succeeded
in making a table with all the genes (it treats the location as if there was
no splice site) with biopython 1.61 and the latest ITEP code. However, I do
need to fix a problem with multiply-spliced proteins in Genbank files (the
ITEP IDs won't be added for mutliply-spliced proteins because I didn't build
the lookup table correctly, assuming that there would only be one protein in
the same region of DNA). I'll fix that problem, but it is unlikely to affect
you with a bacterial genome.

Matt

‹
Reply to this email directly or view it on GitHub
<#59 (comment)
99317> .

mattb112885 added the bug label Feb 27, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Genbank file issues with spliced proteins #59

Genbank file issues with spliced proteins #59

nick-youngblut commented Feb 27, 2014

mattb112885 commented Feb 27, 2014

nick-youngblut commented Feb 27, 2014

mattb112885 commented Feb 27, 2014

mattb112885 commented Feb 27, 2014

nick-youngblut commented Feb 28, 2014

Genbank file issues with spliced proteins #59

Genbank file issues with spliced proteins #59

Comments

nick-youngblut commented Feb 27, 2014

mattb112885 commented Feb 27, 2014

nick-youngblut commented Feb 27, 2014

mattb112885 commented Feb 27, 2014

mattb112885 commented Feb 27, 2014

nick-youngblut commented Feb 28, 2014