Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{genome}_cds.gff wrong coordinates #26

Open
JuanTrinidad opened this issue Oct 17, 2023 · 3 comments
Open

{genome}_cds.gff wrong coordinates #26

JuanTrinidad opened this issue Oct 17, 2023 · 3 comments

Comments

@JuanTrinidad
Copy link

Greetings everyone, and thank you for providing this fantastic pipeline.

I've encountered an issue while working with a eukaryote genome in multifasta format. Phame converts the genome to a single-line fasta format by concatenating the sequences, which is fine. However, when I analyzed the results, all genes were mapped to the first chromosome. I traced the error to the {genome}_cds.gff file and the CDScoords.txt file.

If my understanding is correct, the issue arises because Phame is not adjusting the coordinates in the original .gff file, which corresponds to a non-concatenated fasta genome. As a result, I only identified genes with SNPs on the first chromosome.

To address this, I'm concatenating the genome and generating a new .gff file that corresponds to the concatenated genome. I believe this will resolve the issue.

@mshakya
Copy link
Member

mshakya commented Nov 6, 2023

Thank you for finding this.I think we should resolve this by making sure that coordinates are correctly transferred when phame processes them. For the workaround, did you rerun the concatenated fasta through the annotation pipeline again? Also, would you be able to post the CDScoords.txt file and the original gff file here, so that we can document the error and fix it. Also, apologies for a tardy response.

@JuanTrinidad
Copy link
Author

Hi! Yes, I rerun the concated fasta with the new gff (home made) and Phame do the work.
I clean the folder, so, I don't have the CDScoords.txt or any output from first run, sorry about that.

If you want I can give you the inputs files, genome, gff and some fastq and you can run it.

my best,

@mshakya
Copy link
Member

mshakya commented Nov 28, 2023

Hi, sorry for tardy response. Yes, it would be great if you can post the input file (or accession ids) so that we can recreate the issue here. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants