Transcript ID is not unique #51

xo2003 · 2024-06-14T07:45:36Z

Hi,

I am planning to merge GALBA result together with BRARKER3 by TSEBRA.
However, while running the standalone version of GALBA v1.0.11, I encountered an issue with five duplicated transcript IDs in the same pair of scaffolds (scaffold26:18.6Mbp and scaffold30:17.5Mbp).

When blasting these two scaffolds, 254 hits were found. The longest hit fragment is about 6 Kbp with 99.4% identity; however, this region does not cover the positions of the duplicated transcript IDs. Other hit fragments are less than 1 Kbp.

# Fields: query acc.ver, subject acc.ver, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 254 hits found
scaffold26	scaffold30	99.410	6099	36	0	706441	712539	1283513	1289611	0.0	11064

Since the duplication will cause an error during the execution of TSEBRA, I am seeking advice on how to resolve this issue.

Thank you!

The text was updated successfully, but these errors were encountered:

xo2003 · 2024-06-27T10:06:31Z

Besides the issue of duplicated transcript ID, the gene model predict by GALBA is weird...
When trying to fix gxf by AGAT, I got warning message as following

Warning: g13506.t1 stop codon not adjacent to the CDS
Warning: g15748.t1 stop codon not adjacent to the CDS
Warning: g1760.t2 stop codon not adjacent to the CDS
Warning: g200.t1 has several stop_codon
Warning: g201.t1 has several stop_codon
Warning: g203.t1 has several stop_codon
Warning: g206.t1 has several stop_codon
Warning: g207.t1 has several stop_codon
Warning: g2165.t1 stop codon not adjacent to the CDS
Warning: g2546.t1 stop codon not adjacent to the CDS
Warning: g2616.t1 stop codon not adjacent to the CDS
Warning: g2616.t2 stop codon not adjacent to the CDS
Warning: g2616.t3 stop codon not adjacent to the CDS
Warning: g425.t3 stop codon not adjacent to the CDS
Warning: g5487.t2 stop codon not adjacent to the CDS
Warning: g5873.t1 stop codon not adjacent to the CDS
Warning: g7418.t1 stop codon not adjacent to the CDS
14706 CDS extended to include the stop_codon

By checking the situation of the 'stop codon not adjacent to the CDS', it seems to have the same symptom in those cases.

Here is one example from the list.

The stop codon predicted by the GALBA gene model is not in the same reading frame as the CDS. I am not sure how to describe it, but it seems like there is a conflict between the predicted gene model and the predicted CDS. As a result, the stop codon is not adjacent to the CDS.

Since it is complicated to fix the problem and it might be a bug during prediction, I decided not to merge the annotations of BRAKER3 and GALBA. The BRAKER3 prediction seems more reliable. Is there any suggestion about this? Thank you!

KatharinaHoff · 2024-06-27T10:11:14Z

It is caused by Pyugustus. I currently have no time to fix it (neither in Pygustus, nor in Galba), but I will look into it, eventually. Most likely in fall.

…

On Thu, Jun 27, 2024 at 12:06 PM xo2003 ***@***.***> wrote: Besides the issue of duplicated transcript ID, the gene model predict by GALBA is weird... When trying to fix gxf by AGAT, I got warning message as following Warning: g13506.t1 stop codon not adjacent to the CDS Warning: g15748.t1 stop codon not adjacent to the CDS Warning: g1760.t2 stop codon not adjacent to the CDS Warning: g200.t1 has several stop_codon Warning: g201.t1 has several stop_codon Warning: g203.t1 has several stop_codon Warning: g206.t1 has several stop_codon Warning: g207.t1 has several stop_codon Warning: g2165.t1 stop codon not adjacent to the CDS Warning: g2546.t1 stop codon not adjacent to the CDS Warning: g2616.t1 stop codon not adjacent to the CDS Warning: g2616.t2 stop codon not adjacent to the CDS Warning: g2616.t3 stop codon not adjacent to the CDS Warning: g425.t3 stop codon not adjacent to the CDS Warning: g5487.t2 stop codon not adjacent to the CDS Warning: g5873.t1 stop codon not adjacent to the CDS Warning: g7418.t1 stop codon not adjacent to the CDS 14706 CDS extended to include the stop_codon By checking the situation of the 'stop codon not adjacent to the CDS', it seems to have the same symptom in those cases. Here is one example from the list. image.png (view on web) <https://github.com/Gaius-Augustus/GALBA/assets/136870182/6d5655d5-9d1f-454e-9d1c-cd544f77cf47> The stop codon predicted by the GALBA gene model is not in the same reading frame as the CDS. I am not sure how to describe it, but it seems like there is a conflict between the predicted gene model and the predicted CDS. As a result, the stop codon is not adjacent to the CDS. Since it is complicated to fix the problem and it might be a bug during prediction, I decided not to merge the annotations of BRAKER3 and GALBA. The BRAKER3 prediction seems more reliable. Is there any suggestion about this? Thank you! — Reply to this email directly, view it on GitHub <#51 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJMC6JDXAOCVWC5LUVXWYUTZJPP37AVCNFSM6AAAAABJJ2NWH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJUGI4TGOBXG4> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transcript ID is not unique #51

Transcript ID is not unique #51

xo2003 commented Jun 14, 2024

xo2003 commented Jun 27, 2024

KatharinaHoff commented Jun 27, 2024 via email

Transcript ID is not unique #51

Transcript ID is not unique #51

Comments

xo2003 commented Jun 14, 2024

xo2003 commented Jun 27, 2024

KatharinaHoff commented Jun 27, 2024 via email