Combining results for equivalent matches #14

ianwdunlop · 2015-09-23T12:33:06Z

For any search it is likely that equivalent hits will be returned. By equivalence I mean that the IMS says that one particular URI is for the same thing represented by other URIs. We probably don't want the search to return a hit for each equivalent thing but instead want an amalgamated response which collects the labels and URIs together. So we now have the problem of which URI to return as being the primary one for a search result. The results are ranked according to elastic search and the IRS will just go through them one by one. Therefore the first one it hits will be taken as the primary URI and any other URIs it finds as equivalent to this one as it goes through the search will be added as secondary URIs. The labels for any secondary URIs would also not be used as the overall label for the item. This would mean that the label for this primary URI would be used in the explorer unless we have some rules about what label to use. It will likely mean that items may not be labelled as people expect.

ianwdunlop · 2015-10-09T14:15:01Z

Perhaps the IMS needs a new API call which can take a list of URIs and returns a list with matching URIs removed. Let's say there are 2 URIs X and Y. If the IMS map call for X contains Y and the map call for Y contains X then the returning list would only have to contain X. However, if X maps to Y but Y doesn't map to X then the list needs to contain X and Y. Obviously the logic gets a bit more complex with more than 2 URIs. Reluctant to add functionality to the IRS which sits more happily in the Identity side of things.

nicklynch · 2015-10-11T20:27:35Z

Agree with the above descriptions. For our core concepts, we can perhaps define the primary URI source?
For Genes, we could use HGNC (Hugo descriptions) as the primary
For substance, we can use the OCRS?

Should we ask others to fill in our definitive list if we agree this is a good approach?

stain · 2015-10-12T09:55:08Z

so let me see if I understand this correctly - you want an API call to filter a list of URIs to remove duplicates - but a URI X is only a duplicate of Y if X==Y both ways - in which case the first one in the list wins?

ianwdunlop · 2015-10-12T09:56:53Z

That's pretty much it. I can't think of any other way to do it - this seemed to be the consensus from talking to users at the Lilly meeting last week.

nicklynch added the question label Oct 11, 2015

stain mentioned this issue Oct 12, 2015

Add /removeDuplicates API call openphacts/IdentityMappingService#8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combining results for equivalent matches #14

Combining results for equivalent matches #14

ianwdunlop commented Sep 23, 2015

ianwdunlop commented Oct 9, 2015

nicklynch commented Oct 11, 2015

stain commented Oct 12, 2015

ianwdunlop commented Oct 12, 2015

Combining results for equivalent matches #14

Combining results for equivalent matches #14

Comments

ianwdunlop commented Sep 23, 2015

ianwdunlop commented Oct 9, 2015

nicklynch commented Oct 11, 2015

stain commented Oct 12, 2015

ianwdunlop commented Oct 12, 2015