-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review normalization of terms #157
Comments
gives
I double checked that all of the "null" for doid, efo, and mondo all come from the ontology hierarchies (would be good to require source annotations for all edges as well). The "gilda" edges are xrefs |
I see, so we don't have any issues with EFO and MONDO but almost certainly, mesh:D007938 will show up from other sources (the name there is capitalized as Leukemia), so we could run a query for that as well. |
I just updated the chart above. we'll want to follow-up by checking sider, chembl, and disgenet are all standardized the same way |
Great, so this reveals that we have an issue in normalizing between DOID and MeSH. Since MeSH is higher in the default priority order, standardization should typically map to it (assuming we have all the right xrefs). It looks like here: The standardization code seems to be working: > from indra_cogex.representation import Node
> Node.standardized(db_ns='DOID', db_id='DOID:1240', labels=['BioEntity'])
(:BioEntity { id:'MESH:D007938', name:'Leukemia' }) so I suspect the issue is that the processor wasn't re-run, or another possibility (which would be fun) is if the processor calls
without the DOID: prefix for the ID, which doesn't standardize correctly. |
I found that
yields "doid:1240", "efo:0000565", "mondo:0005059", and in addition, we have two nodes called Leukemia, "hp:0001909", and "mesh:D007938".
These nodes might appear just due to ontology imports (which would fine) but I have suspicions that these are actually involved in distinct relations without being normalized, leading to fragmentation.
The text was updated successfully, but these errors were encountered: