Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protein annotates as Gene in UI, though symbols are correct #24

Open
sharatisrani opened this issue Feb 21, 2023 · 1 comment
Open

Protein annotates as Gene in UI, though symbols are correct #24

sharatisrani opened this issue Feb 21, 2023 · 1 comment

Comments

@sharatisrani
Copy link

When a path intends to show a protein (as denoted by the icon/symbol), it annotates as "Gene." Would be easier on the user if it said "Protein."

Image

@sierra-moxon
Copy link
Member

from TAQA:

  • the icon on the gene is a protein and gene symbol.
  • from a user POV we expect this is a "protein" that is being affected, the protein of the gene Hebp1. But in our graph building, disambiguation of genes and proteins causes a lot of noise (both too many edges and too few - some sources only annotate to "gene" ids when they should mean, biologically, "protein" -- they "pre-conflate" these two concepts). Because of this, the query has to "know" to look for a protein or gene, based on the source and no one really has a good way of doing those queries effectively. So we decided that NN should do "conflation" -- it's configurable -- if you query by "gene" you get results for "genes" and "proteins" -- it eases the searchability.
  • polypeptide, protein, protein isoform were what the users were looking for, translated this into a protein category in the TRAPI, went looking by protein ID or protein symbol (which is not necessarily the same as the gene id).

from the UI perspective - we need to explain this. chemists might be more interested in the difference.
we like the way it works now.
e.g. if the person asks for a protein, we de-conflate at the user level maybe? return the protein instead of the gene, even if the underlying data sources use genes? If it is a conflated entity, it will always choose the gene first. Arbitrarily de-conflating would be dangerous.

should be tagged as a low priority to "fix" - if it is easy, change the icon.
we might be testing with a more critical eye.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants