Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In transcript g86.t1 two UTR/CDS features are overlapping. Not allowed by definition. at ~/software/Augustus/scripts/gtf2gff.pl line 182, <STDIN> line 759036. #38

Open
wangjie07070910 opened this issue Sep 5, 2023 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@wangjie07070910
Copy link

Hello, I am trying this program. My commands are as follows:
galba.pl --genome=${genome_file} --prot_seq=${protein_file} --threads 40

My error is as follows:
ERROR in file ~/software/GALBA/scripts/galba.pl at line 5340
Failed to execute: cat augustus.hints.gff | perl -ne 'if(m/\tAUGUSTUS\t/) {print $_;}' | perl ~/software/Augustus/scripts/gtf2gff.pl --printExon --out=augustus.hints.tmp.gtf 2> errors/gtf2gff.augustus.hints.gtf.stderr

And the gtf2gff.augustus.hints.gtf.stderr shows: In transcript g86.t1 two UTR/CDS features are overlapping. Not allowed by definition. at ~/software/Augustus/scripts/gtf2gff.pl line 182, line 759036.

@KatharinaHoff
Copy link
Member

Could you provide your augustus.hints.gff ? (send me a link or the file via email to katharina.hoff at uni-greifswald.de ). I will look into it, then.

@KatharinaHoff KatharinaHoff self-assigned this Sep 15, 2023
@KatharinaHoff KatharinaHoff added the bug Something isn't working label Sep 15, 2023
@KatharinaHoff
Copy link
Member

I hope this problem is solved by commit d8aaf4b

@kullrich
Copy link

Hi,
I have the same problem with the latest galba.sif file.

less errors/gtf2gff.augustus.hints.gtf.stderr
In transcript g706.t1 two UTR/CDS features are overlapping. Not allowed by definition. at /opt/Augustus/scripts/gtf2gff.pl line 182, <STDIN> line 128068.

Is there a work around?

Thank you in anticipation

Best regards

Kristian

@kleinjoel
Copy link

Hi all,

I also run into the same issue for 2 of my genomes that I tried to annotate using GALBA:
In transcript g411.t1 two UTR/CDS features are overlapping. Not allowed by definition. at /opt/Augustus/scripts/gtf2gff.pl line 182, line 574857.

For 4 other genomes it runs just fine with exactly the same settings and protein input. I'm also wondering if there is a work around eg. removing the offending transcript from the augustus.hints.gff file.

Best Regards,

Joel

@KatharinaHoff KatharinaHoff reopened this May 24, 2024
@KatharinaHoff
Copy link
Member

Did you pull the singularity image within the last 3 months?

@kleinjoel
Copy link

Hi Katharina,
Thanks for your quick reply I double checked and got this information on the build:
$ singularity inspect --labels galba.sif
org.label-schema.build-arch: amd64
org.label-schema.build-date: Wednesday_15_May_2024_13:19:23_CEST
org.label-schema.schema-version: 1.0
org.label-schema.usage.singularity.deffile.bootstrap: docker
org.label-schema.usage.singularity.deffile.from: katharinahoff/galba-notebook:latest
org.label-schema.usage.singularity.version: 3.8.3

@KatharinaHoff
Copy link
Member

KatharinaHoff commented May 24, 2024 via email

@kleinjoel
Copy link

kleinjoel commented May 24, 2024

Dear Katharina,

Thanks for looking into it, if it helps I located the offending gene in the augustus.hints.gff file and copied the information of the 2 adjacent genes as well.

# start gene g410
CWNJ01000582	AUGUSTUS	gene	76	988	0.44	+	.	g410
CWNJ01000582	AUGUSTUS	transcript	76	988	0.44	+	.	g410.t1
CWNJ01000582	AUGUSTUS	start_codon	76	78	.	+	0	transcript_id "g410.t1"; gene_id "g410";
CWNJ01000582	AUGUSTUS	initial	76	389	0.48	+	0	transcript_id "g410.t1"; gene_id "g410";
CWNJ01000582	AUGUSTUS	internal	674	829	0.8	+	1	transcript_id "g410.t1"; gene_id "g410";
CWNJ01000582	AUGUSTUS	terminal	937	988	0.81	+	1	transcript_id "g410.t1"; gene_id "g410";
CWNJ01000582	AUGUSTUS	intron	390	673	0.85	+	.	transcript_id "g410.t1"; gene_id "g410";
CWNJ01000582	AUGUSTUS	intron	830	936	0.8	+	.	transcript_id "g410.t1"; gene_id "g410";
CWNJ01000582	AUGUSTUS	CDS	76	389	0.48	+	0	transcript_id "g410.t1"; gene_id "g410";
CWNJ01000582	AUGUSTUS	CDS	674	829	0.8	+	1	transcript_id "g410.t1"; gene_id "g410";
CWNJ01000582	AUGUSTUS	CDS	937	985	0.81	+	1	transcript_id "g410.t1"; gene_id "g410";
CWNJ01000582	AUGUSTUS	stop_codon	986	988	.	+	0	transcript_id "g410.t1"; gene_id "g410";
# coding sequence = [atgggttgttcatttgcagatggaatatacatgatggaagttgaccgcattctaagacctggtggttattgggtgcttt
# cgggtcctcctattggttggaaggttcattacaaagcctggcagcgatctaaggaggaccttcaggaagaacagaataagattgaagagactgctaag
# ctcctttgctgggagaaggtctctgagaagaatgaaattgccatttggcaaaagagggtagactctgtttcatgtcgtcgtagacaaatagattccag
# tgtaaaattctgcaaatcaagggatgttgatgatgtctggtataagaaaatggaggcctgcattactcctggtcctaaaggttctggtcataatctga
# aaccttttccagagaggctatatgcaatccctcctagaattgctagtggctctgctcctggagtttctgtggagacataccaggatgacaacaagaac
# tattcaatctcccaagttatgggtcatgaatgttgtgccaactattgctga]
# protein sequence = [MGCSFADGIYMMEVDRILRPGGYWVLSGPPIGWKVHYKAWQRSKEDLQEEQNKIEETAKLLCWEKVSEKNEIAIWQKR
# VDSVSCRRRQIDSSVKFCKSRDVDDVWYKKMEACITPGPKGSGHNLKPFPERLYAIPPRIASGSAPGVSVETYQDDNKNYSISQVMGHECCANYC]
# Evidence for and against this transcript:
# % of transcript supported by hints (any source): 0
# CDS exons: 0/3
# CDS introns: 0/2
# 5'UTR exons and introns: 0/0
# 3'UTR exons and introns: 0/0
# hint groups fully obeyed: 0
# incompatible hint groups: 1
#     RM:   1 
# end gene g410
# start gene g411
CWNJ01000583	AUGUSTUS	gene	1	504	0.56	+	.	g411
CWNJ01000583	AUGUSTUS	transcript	1	504	0.56	+	.	g411.t1
CWNJ01000583	AUGUSTUS	terminal	1	504	0.56	+	0	transcript_id "g411.t1"; gene_id "g411";
CWNJ01000583	AUGUSTUS	CDS	1	501	0.56	+	0	transcript_id "g411.t1"; gene_id "g411";
CWNJ01000583	AUGUSTUS	stop_codon	502	504	.	+	0	transcript_id "g411.t1"; gene_id "g411";
# coding sequence = [acaagtgaagctgtgaatgcatactattcagctgctttgatgggtatgtcatatggtgacagagaccttgttgcaattg
# gatcaacactgttagcattggaaatgaaagcagcacaaacatggtggcatgtgaaagatggggacagtaacatgtatggaaaagacttcacaaaggaa
# aacagaatagtgggaatcctgtgggctaacaagagagatagtgcactatggtgggcctcagctgagtgcagagagtgtaggcttagcattcagctatt
# gcctttgttgcctatttctgaagaactattttctaatgtggagtatgtgaagaagcttgtggaatggacagagcctgctactgaagaaggatggaagg
# gatttttgtatgcattggaagggatttatgataaagaggatgctttggagaagatcagaaagttgacagaatttgatgatggaaactcattcacaaat
# ctcttgtggtggattcatagcagagggggttga]
# protein sequence = [TSEAVNAYYSAALMGMSYGDRDLVAIGSTLLALEMKAAQTWWHVKDGDSNMYGKDFTKENRIVGILWANKRDSALWWA
# SAECRECRLSIQLLPLLPISEELFSNVEYVKKLVEWTEPATEEGWKGFLYALEGIYDKEDALEKIRKLTEFDDGNSFTNLLWWIHSRGG]
# Evidence for and against this transcript:
# % of transcript supported by hints (any source): 0
# CDS exons: 0/1
# CDS introns: 0/0
# 5'UTR exons and introns: 0/0
# 3'UTR exons and introns: 0/0
# hint groups fully obeyed: 0
# incompatible hint groups: 1
#     RM:   1 
# end gene g411
# start gene g412
CWNJ01000584	AUGUSTUS	gene	665	1711	2.92	-	.	g412
CWNJ01000584	AUGUSTUS	transcript	665	1711	1	-	.	g412.t1
CWNJ01000584	AUGUSTUS	stop_codon	665	667	.	-	0	transcript_id "g412.t1"; gene_id "g412";
CWNJ01000584	AUGUSTUS	terminal	665	1711	1	-	0	transcript_id "g412.t1"; gene_id "g412";
CWNJ01000584	AUGUSTUS	CDS	668	1711	1	-	0	transcript_id "g412.t1"; gene_id "g412";
# coding sequence = [tgcagctatggcggccacataatgccacgcccacatgataagtgtctctgctatgtcggcggcgacacccgaatccttg
# tcgttgatcggcattcctctctcaaagacctttgttcacgtctgtcttgtaccctcctccatggaaggcccttcaacctcaagtaccagctacccaat
# gaagatctcgacaatctgatatcagtttccaccgatgaagaccttgacaacatgattgaggagcatgatcgcatcactgcagctcatcctttaaaacc
# tgcacgtttgaggctttttctattcttcgataagccagagactgcagtttcaatgggttctcttttggatgattcaaagtctgaaacttggttcgtgg
# atgctcttaacaactctgggattctcccaagggttgtttcagattctgccacagtgggttgtttggtgaaccttgatggagttcttgctagtgattct
# agcaacaatttggaggctcaggctgctgagtctctggctgataacactaaacaagataagaatttgcctgatgtgcattcaatgccaaactcacctat
# ggtggagaacagttcctcatacggatcatcttcttcaaatccttcgatggccaatctgcctccaatgcggggtcgcgtcgacgagaatggtagtaggc
# tgcagcaagagcagaggcctgggatggaagagcagtttgctcaaatgacctttggtgcgaatgtgatgaaacaagatgatgggtatggtactttgtct
# gctcctatgccatcaattcctactacagttgtgacaatggcatcaccagcaattgttgctggtgataacatgaatcgggttatctcggatgacgagag
# attagatcagggagcacctgctggatatagaatgccgcctttgccattgctgcctgtgcaaccaaggactattagtggtggttttggcggaggtggag
# gctttggagctggtggcggttttagtgctggcagtggcgccggatttggtggtggagctggatatggagctggcggtggccagtga]
# protein sequence = [CSYGGHIMPRPHDKCLCYVGGDTRILVVDRHSSLKDLCSRLSCTLLHGRPFNLKYQLPNEDLDNLISVSTDEDLDNMI
# EEHDRITAAHPLKPARLRLFLFFDKPETAVSMGSLLDDSKSETWFVDALNNSGILPRVVSDSATVGCLVNLDGVLASDSSNNLEAQAAESLADNTKQD
# KNLPDVHSMPNSPMVENSSSYGSSSSNPSMANLPPMRGRVDENGSRLQQEQRPGMEEQFAQMTFGANVMKQDDGYGTLSAPMPSIPTTVVTMASPAIV
# AGDNMNRVISDDERLDQGAPAGYRMPPLPLLPVQPRTISGGFGGGGGFGAGGGFSAGSGAGFGGGAGYGAGGGQ]
# Evidence for and against this transcript:
# % of transcript supported by hints (any source): 100
# CDS exons: 1/1
#      C:   1 
# CDS introns: 0/0
# 5'UTR exons and introns: 0/0
# 3'UTR exons and introns: 0/0
# hint groups fully obeyed: 1
#      C:   1 (250025_250025)
# incompatible hint groups: 5
#      C:   1 (637779_637779)
#      P:   3 
#     RM:   1 
CWNJ01000584	AUGUSTUS	transcript	665	1519	0.99	-	.	g412.t2
CWNJ01000584	AUGUSTUS	stop_codon	665	667	.	-	0	transcript_id "g412.t2"; gene_id "g412";
CWNJ01000584	AUGUSTUS	single	665	1519	0.99	-	0	transcript_id "g412.t2"; gene_id "g412";
CWNJ01000584	AUGUSTUS	CDS	668	1519	0.99	-	0	transcript_id "g412.t2"; gene_id "g412";
CWNJ01000584	AUGUSTUS	start_codon	1517	1519	.	-	0	transcript_id "g412.t2"; gene_id "g412";
# coding sequence = [ctgatatcagtttccaccgatgaagaccttgacaacatgattgaggagcatgatcgcatcactgcagctcatcctttaa
# aacctgcacgtttgaggctttttctattcttcgataagccagagactgcagtttcaatgggttctcttttggatgattcaaagtctgaaacttggttc
# gtggatgctcttaacaactctgggattctcccaagggttgtttcagattctgccacagtgggttgtttggtgaaccttgatggagttcttgctagtga
# ttctagcaacaatttggaggctcaggctgctgagtctctggctgataacactaaacaagataagaatttgcctgatgtgcattcaatgccaaactcac
# ctatggtggagaacagttcctcatacggatcatcttcttcaaatccttcgatggccaatctgcctccaatgcggggtcgcgtcgacgagaatggtagt
# aggctgcagcaagagcagaggcctgggatggaagagcagtttgctcaaatgacctttggtgcgaatgtgatgaaacaagatgatgggtatggtacttt
# gtctgctcctatgccatcaattcctactacagttgtgacaatggcatcaccagcaattgttgctggtgataacatgaatcgggttatctcggatgacg
# agagattagatcagggagcacctgctggatatagaatgccgcctttgccattgctgcctgtgcaaccaaggactattagtggtggttttggcggaggt
# ggaggctttggagctggtggcggttttagtgctggcagtggcgccggatttggtggtggagctggatatggagctggcggtggccagtga]
# protein sequence = [LISVSTDEDLDNMIEEHDRITAAHPLKPARLRLFLFFDKPETAVSMGSLLDDSKSETWFVDALNNSGILPRVVSDSAT
# VGCLVNLDGVLASDSSNNLEAQAAESLADNTKQDKNLPDVHSMPNSPMVENSSSYGSSSSNPSMANLPPMRGRVDENGSRLQQEQRPGMEEQFAQMTF
# GANVMKQDDGYGTLSAPMPSIPTTVVTMASPAIVAGDNMNRVISDDERLDQGAPAGYRMPPLPLLPVQPRTISGGFGGGGGFGAGGGFSAGSGAGFGG
# GAGYGAGGGQ]
# Evidence for and against this transcript:
# % of transcript supported by hints (any source): 100
# CDS exons: 1/1
#      C:   1 
# CDS introns: 0/0
# 5'UTR exons and introns: 0/0
# 3'UTR exons and introns: 0/0
# hint groups fully obeyed: 0
# incompatible hint groups: 4
#      C:   2 (250025_250025,637779_637779)
#      P:   2 
CWNJ01000584	AUGUSTUS	transcript	665	1690	0.93	-	.	g412.t3
CWNJ01000584	AUGUSTUS	stop_codon	665	667	.	-	0	transcript_id "g412.t3"; gene_id "g412";
CWNJ01000584	AUGUSTUS	single	665	1690	0.93	-	0	transcript_id "g412.t3"; gene_id "g412";
CWNJ01000584	AUGUSTUS	CDS	668	1690	0.93	-	0	transcript_id "g412.t3"; gene_id "g412";
CWNJ01000584	AUGUSTUS	start_codon	1688	1690	.	-	0	transcript_id "g412.t3"; gene_id "g412";
# coding sequence = [atgccacgcccacatgataagtgtctctgctatgtcggcggcgacacccgaatccttgtcgttgatcggcattcctctc
# tcaaagacctttgttcacgtctgtcttgtaccctcctccatggaaggcccttcaacctcaagtaccagctacccaatgaagatctcgacaatctgata
# tcagtttccaccgatgaagaccttgacaacatgattgaggagcatgatcgcatcactgcagctcatcctttaaaacctgcacgtttgaggctttttct
# attcttcgataagccagagactgcagtttcaatgggttctcttttggatgattcaaagtctgaaacttggttcgtggatgctcttaacaactctggga
# ttctcccaagggttgtttcagattctgccacagtgggttgtttggtgaaccttgatggagttcttgctagtgattctagcaacaatttggaggctcag
# gctgctgagtctctggctgataacactaaacaagataagaatttgcctgatgtgcattcaatgccaaactcacctatggtggagaacagttcctcata
# cggatcatcttcttcaaatccttcgatggccaatctgcctccaatgcggggtcgcgtcgacgagaatggtagtaggctgcagcaagagcagaggcctg
# ggatggaagagcagtttgctcaaatgacctttggtgcgaatgtgatgaaacaagatgatgggtatggtactttgtctgctcctatgccatcaattcct
# actacagttgtgacaatggcatcaccagcaattgttgctggtgataacatgaatcgggttatctcggatgacgagagattagatcagggagcacctgc
# tggatatagaatgccgcctttgccattgctgcctgtgcaaccaaggactattagtggtggttttggcggaggtggaggctttggagctggtggcggtt
# ttagtgctggcagtggcgccggatttggtggtggagctggatatggagctggcggtggccagtga]
# protein sequence = [MPRPHDKCLCYVGGDTRILVVDRHSSLKDLCSRLSCTLLHGRPFNLKYQLPNEDLDNLISVSTDEDLDNMIEEHDRIT
# AAHPLKPARLRLFLFFDKPETAVSMGSLLDDSKSETWFVDALNNSGILPRVVSDSATVGCLVNLDGVLASDSSNNLEAQAAESLADNTKQDKNLPDVH
# SMPNSPMVENSSSYGSSSSNPSMANLPPMRGRVDENGSRLQQEQRPGMEEQFAQMTFGANVMKQDDGYGTLSAPMPSIPTTVVTMASPAIVAGDNMNR
# VISDDERLDQGAPAGYRMPPLPLLPVQPRTISGGFGGGGGFGAGGGFSAGSGAGFGGGAGYGAGGGQ]
# Evidence for and against this transcript:
# % of transcript supported by hints (any source): 100
# CDS exons: 1/1
#      C:   1 
# CDS introns: 0/0
# 5'UTR exons and introns: 0/0
# 3'UTR exons and introns: 0/0
# hint groups fully obeyed: 0
# incompatible hint groups: 5
#      C:   2 (250025_250025,637779_637779)
#      P:   3 
# end gene g412

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants