Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcript ID is not unique #51

Open
xo2003 opened this issue Jun 14, 2024 · 2 comments
Open

Transcript ID is not unique #51

xo2003 opened this issue Jun 14, 2024 · 2 comments

Comments

@xo2003
Copy link

xo2003 commented Jun 14, 2024

Hi,

I am planning to merge GALBA result together with BRARKER3 by TSEBRA.
However, while running the standalone version of GALBA v1.0.11, I encountered an issue with five duplicated transcript IDs in the same pair of scaffolds (scaffold26:18.6Mbp and scaffold30:17.5Mbp).

image

When blasting these two scaffolds, 254 hits were found. The longest hit fragment is about 6 Kbp with 99.4% identity; however, this region does not cover the positions of the duplicated transcript IDs. Other hit fragments are less than 1 Kbp.

# Fields: query acc.ver, subject acc.ver, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 254 hits found
scaffold26	scaffold30	99.410	6099	36	0	706441	712539	1283513	1289611	0.0	11064

Since the duplication will cause an error during the execution of TSEBRA, I am seeking advice on how to resolve this issue.

Thank you!

@xo2003
Copy link
Author

xo2003 commented Jun 27, 2024

Besides the issue of duplicated transcript ID, the gene model predict by GALBA is weird...
When trying to fix gxf by AGAT, I got warning message as following

Warning: g13506.t1 stop codon not adjacent to the CDS
Warning: g15748.t1 stop codon not adjacent to the CDS
Warning: g1760.t2 stop codon not adjacent to the CDS
Warning: g200.t1 has several stop_codon
Warning: g201.t1 has several stop_codon
Warning: g203.t1 has several stop_codon
Warning: g206.t1 has several stop_codon
Warning: g207.t1 has several stop_codon
Warning: g2165.t1 stop codon not adjacent to the CDS
Warning: g2546.t1 stop codon not adjacent to the CDS
Warning: g2616.t1 stop codon not adjacent to the CDS
Warning: g2616.t2 stop codon not adjacent to the CDS
Warning: g2616.t3 stop codon not adjacent to the CDS
Warning: g425.t3 stop codon not adjacent to the CDS
Warning: g5487.t2 stop codon not adjacent to the CDS
Warning: g5873.t1 stop codon not adjacent to the CDS
Warning: g7418.t1 stop codon not adjacent to the CDS
14706 CDS extended to include the stop_codon

By checking the situation of the 'stop codon not adjacent to the CDS', it seems to have the same symptom in those cases.

Here is one example from the list.
image
The stop codon predicted by the GALBA gene model is not in the same reading frame as the CDS. I am not sure how to describe it, but it seems like there is a conflict between the predicted gene model and the predicted CDS. As a result, the stop codon is not adjacent to the CDS.

Since it is complicated to fix the problem and it might be a bug during prediction, I decided not to merge the annotations of BRAKER3 and GALBA. The BRAKER3 prediction seems more reliable. Is there any suggestion about this? Thank you!

@KatharinaHoff
Copy link
Member

KatharinaHoff commented Jun 27, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants