Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequin table format #349

Open
lskatz opened this issue Oct 31, 2020 · 4 comments
Open

Sequin table format #349

lskatz opened this issue Oct 31, 2020 · 4 comments

Comments

@lskatz
Copy link
Contributor

lskatz commented Oct 31, 2020

Hi, I was wondering if there was any way to parse the NCBI Sequin tbl format? It is defined here: https://www.ncbi.nlm.nih.gov/projects/Sequin/table.html

I don't think I see any parser for it but I wanted be sure before writing my own. Thank you!

And the example starts like this.

>Feature Sc_16
1	7000	REFERENCE
			PubMed		8849441
<1	1050	gene
			gene		ATH1
<1	1009	CDS
			product		acid trehalase
			product		Ath1p
			codon_start	2
<1	1050	mRNA
			product		acid trehalase
[offset=2000]
1253	420	gene
			gene	YPR027C
1253	420	CDS
			product		Ypr027cp
			note		hypothetical protein
1253	420	mRNA
			product		Ypr027cp
2626	2535	gene
			gene	trnF
2626	2590	tRNA
2570	2535
			product		tRNA-Phe
@lskatz
Copy link
Contributor Author

lskatz commented Oct 31, 2020

This format is used for Sequin for submitting sequences to genbank, but it has also turned up in the VADR package from NCBI most recently.

@cjfields
Copy link
Member

cjfields commented Feb 3, 2021

I think there is a Bio::FeatureIO::table but I'm not sure whether that was developed for this particular NCBI format.

@cjfields
Copy link
Member

cjfields commented Feb 3, 2021

Sorry, was mistaken. We do have a Bio::SeqIO::table but that doesn't mention anything about NCBI's table format. Saying that, it's possibly you could look at the structure for that one to build from.

@hyphaltip
Copy link
Member

I would also look at the tools Jon Palmer has developed in @nextgenusfs https://github.com/nextgenusfs/funannotate which is python based but has some parsing of these tables to truncate and cleanup when we need to remove contigs or filter out contam overlapping regions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants