Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MeSH proteins/genes and species-specificity #39

Open
bgyori opened this issue Jan 1, 2021 · 0 comments
Open

MeSH proteins/genes and species-specificity #39

bgyori opened this issue Jan 1, 2021 · 0 comments

Comments

@bgyori
Copy link
Contributor

bgyori commented Jan 1, 2021

This is to start a discussion about gene/protein entries in MeSH.

  1. MeSH has supplementary concepts representing human, mouse, and rat-specific proteins. Examples (MAPK1):
    https://meshb.nlm.nih.gov/record/ui?ui=C535150
    https://meshb.nlm.nih.gov/record/ui?ui=C535148
    https://meshb.nlm.nih.gov/record/ui?ui=C535149

  2. Each supplementary concept can have a "mapped to" property that links it to one or more primary concepts. These mappings are usually to the closest match in the list of primary concepts and there are two typical types of non-exactness: a) sometimes specific proteins are mapped to primary concepts representing families of proteins e.g., NASPP1 protein, human (https://meshb.nlm.nih.gov/record/ui?ui=C489391) is mapped to Autoantigens and Nuclear Proteins.
    b) the species-specific supplementary concepts are linked to a non-species-specific primary concept. For instnace, the above 3 terms for species-specific MAPK1 are all mapped to https://meshb.nlm.nih.gov/record/ui?ui=D019950.

  3. Some complicated observations made by @steppi a few months ago: The supplementary concepts are explicitly called proteins, e.g., MAPK1 protein, human. The primary concepts aren't explicit about this but there are often clues to them being proteins rather than genes, e.g., A serine/threonine-specific protein kinase which is encoded by the CHEK1 gene in humans. (https://meshb.nlm.nih.gov/record/ui?ui=D000071877). Then there are some complicated cases related one-to-many gene/protein relationships for instance, due to splice variants. For instance for https://meshb.nlm.nih.gov/record/ui?ui=D064546 we have

PKC beta encodes two proteins (PKCB1 and PKCBII) generated by alternative splicing of C-terminal exons.

meaning that this primary concept represents two proteins from the same gene. In another example, we have estrogen receptor alpha 36, human (https://meshb.nlm.nih.gov/record/ui?ui=C000601334) and "estrogen receptor alpha, human" (https://meshb.nlm.nih.gov/record/ui?ui=C506487) as two separate entries that would correspond to separate entries in the uniprot.isoform namespace, though whether the second one can be mapped at all is questionable.

Overall, I'm fairly convinced that both the primary and supplementary concepts should be interpreted as proteins, and that the primary concepts are non-species-specific whereas the supplementary concepts are (explicitly) species specific. Consequently, mappings such as

mesh | D016906 | Interleukin-9 | skos:exactMatch | hgnc | 6029 | IL9

(https://github.com/biomappings/biomappings/blob/master/src/biomappings/resources/mappings.tsv#L142) ought to be changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant