Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combining results for equivalent matches #14

Open
ianwdunlop opened this issue Sep 23, 2015 · 4 comments
Open

Combining results for equivalent matches #14

ianwdunlop opened this issue Sep 23, 2015 · 4 comments
Labels

Comments

@ianwdunlop
Copy link
Member

For any search it is likely that equivalent hits will be returned. By equivalence I mean that the IMS says that one particular URI is for the same thing represented by other URIs. We probably don't want the search to return a hit for each equivalent thing but instead want an amalgamated response which collects the labels and URIs together. So we now have the problem of which URI to return as being the primary one for a search result. The results are ranked according to elastic search and the IRS will just go through them one by one. Therefore the first one it hits will be taken as the primary URI and any other URIs it finds as equivalent to this one as it goes through the search will be added as secondary URIs. The labels for any secondary URIs would also not be used as the overall label for the item. This would mean that the label for this primary URI would be used in the explorer unless we have some rules about what label to use. It will likely mean that items may not be labelled as people expect.

@ianwdunlop
Copy link
Member Author

Perhaps the IMS needs a new API call which can take a list of URIs and returns a list with matching URIs removed. Let's say there are 2 URIs X and Y. If the IMS map call for X contains Y and the map call for Y contains X then the returning list would only have to contain X. However, if X maps to Y but Y doesn't map to X then the list needs to contain X and Y. Obviously the logic gets a bit more complex with more than 2 URIs. Reluctant to add functionality to the IRS which sits more happily in the Identity side of things.

@nicklynch
Copy link

Agree with the above descriptions. For our core concepts, we can perhaps define the primary URI source?
For Genes, we could use HGNC (Hugo descriptions) as the primary
For substance, we can use the OCRS?

Should we ask others to fill in our definitive list if we agree this is a good approach?

@stain
Copy link
Contributor

stain commented Oct 12, 2015

so let me see if I understand this correctly - you want an API call to filter a list of URIs to remove duplicates - but a URI X is only a duplicate of Y if X==Y both ways - in which case the first one in the list wins?

@ianwdunlop
Copy link
Member Author

That's pretty much it. I can't think of any other way to do it - this seemed to be the consensus from talking to users at the Lilly meeting last week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants