Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I use or add a different translation table for the genetic code (working with parasites)? #200

Open
azmigueldario opened this issue Jun 27, 2024 · 6 comments
Assignees
Labels
Status: In Progress Has been assigned and is being worked on. Type: Bug

Comments

@azmigueldario
Copy link

azmigueldario commented Jun 27, 2024

I want to use the ideal table for the parasite I am working on but it is not supported. I could not find where the functions inherit the table to modify the code.

Would it be possible and simple to add custom translation tables?

Great tool by the way.

@rfm-targa rfm-targa self-assigned this Jun 27, 2024
@rfm-targa rfm-targa added the Status: In Progress Has been assigned and is being worked on. label Jun 27, 2024
@rfm-targa
Copy link
Contributor

Greetings @azmigueldario,

Thank you for your interest. Most modules in the latest version, v3.3.6, allow users to select or autodetect the genetic code. The CreateSchema and AlleleCall modules allow you to specify the genetic code through the --t, --translation-table parameter.
chewBBACA should support all genetic codes listed here. If it is not listed there, I suggest going with The Standard Code (1) or the one that might lead to the closest results.

Kind regards,

Rafael

@azmigueldario
Copy link
Author

Thank you for the quick reply.

I will use the standard code then (#1). I saw that the tables are restricted to a few most commonly used and related to bacterial pathogens, see error output below.

I believe the table is used in a function imported from Bio.seq to translate into protein space. It is not super vital for me but it may be worth it to remove the restriction just in case or add a warning if people end up using a weird translation reference table.

Thanks again,
Miguel

Authors: Rafael Mamede, Pedro Cerqueira, Mickael Silva, João Carriço, Mário Ramirez
Github: https://github.com/B-UMMI/chewBBACA
Documentation: https://chewbbaca.readthedocs.io/en/latest/index.html
Contacts: [email protected]

==================================
  chewBBACA - PrepExternalSchema
==================================
Started at: 2024-06-27T17:06:30


Invalid genetic code value.
Value must correspond to one of the accepted genetic codes

Accepted genetic codes:

        1: Standard
        4: The mold, protozoan, and coelenterate mitochondrial code and the mycoplasma/spiroplasma code
        11: The Bacterial, Archaeal and Plant Plastid code
        25: Candidate division SR1 and gracilibacteria code

@rfm-targa
Copy link
Contributor

Hello @azmigueldario,

I must be playing Jedi mind tricks on myself since I forgot about that step to validate the genetic code. It should accept more than those four genetic codes. I will add more genetic codes to the dictionary with the accepted values so that it still validates the value passed. This does not guarantee that it will work for any organism; it still depends on Pyrodigal/Prodigal, which was designed for Bacteria and Archaea. What is the genetic code that you would like to use?

Best regards,

Rafael

@azmigueldario
Copy link
Author

azmigueldario commented Jul 2, 2024 via email

@rfm-targa
Copy link
Contributor

Hello @azmigueldario,

We released chewBABCA v3.3.8, which adds support for the remaining genetic codes supported by Prodigal (complete list here), including genetic code 6. I tested the new options with Giardia genomes available on the NCBI. I downloaded the reference genome for Giardia intestinalis (GCF_000002435.2) and created Prodigal training files based on that genome and genetic codes 1 and 6. I used the following commands:

Genetic code 1:

prodigal -i GCF_000002435.2_UU_WB_2.1_genomic.fna -t giardia_gc1.trn -p single -g 1

Genetic code 6:

prodigal -i GCF_000002435.2_UU_WB_2.1_genomic.fna -t giardia_gc6.trn -p single -g 6

I then used the reference genome and the training files to create a schema for each genetic code with the CreateSchema module. After that, I downloaded all the Giardia genomes (n=38, 36 Giardia intestinalis, 1 Giardia muris, 1 Giardia lamblia) from the NCBI and performed allele calling with the AlleleCall module to identify new alleles to add to the schemas.
Here are the total number of loci and alleles after allele calling:

Schema #Loci #Alleles
Genetic code 1 4,881 76,096
Genetic code 6 4,722 51,512

To get an idea about the number of loci that Prodigal might be predicting well, I used the UniprotFinder module to compare the schema loci against the Giardia reference proteomes available on UniProt (n=3, UP000001548, UP000315496, UP000000350). It found annotations for loci in both schemas. Still, the loci in the schema created with genetic code 1 were more similar to what's in the reference proteome for Giardia intestinalis (found proteome annotations for 4,594 loci in the schema created with genetic code 1 and for 3,540 loci in the schema created with genetic code 6). The Giardia muris genome seems to differ considerably from the reference genome for Giardia intestinalis, so the schemas could not classify most CDSs predicted for Giardia muris. Several Giardia intestinalis available on the NCBI seem to be of low quality (e.g. highly fragmented or scaffolded), which can lead to high numbers of missing/non-identified loci for those genomes and a small core genome if you determine the core loci from results including those genomes.
This was to test whether it ran without errors and whether it might work, which it might, at least to some extent. Let us know how it goes. I hope it works!

Best regards,

Rafael

@azmigueldario
Copy link
Author

azmigueldario commented Jul 3, 2024

Thank you very much @rfm-targa for adding the table and taking the time to look into the functionality for Giardia.

I will likely stay with the standard table or run both to compare. Thank you very much for all the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: In Progress Has been assigned and is being worked on. Type: Bug
Projects
None yet
Development

No branches or pull requests

2 participants