Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GroundingInsightExporter shows strange results #1082

Open
kwalcock opened this issue Sep 28, 2021 · 3 comments
Open

GroundingInsightExporter shows strange results #1082

kwalcock opened this issue Sep 28, 2021 · 3 comments

Comments

@kwalcock
Copy link
Member

I'm digging into compositional groundings using the GroundingInsightExporter, and I noticed that the returned "score" for a given slot grounding does not equal the "avg match" score produced by averaging all the positive examples. In some cases, the second best grounding by "score" has a higher "avg match" score than the top grounding, and in fact is sometimes the preferred grounding.

As an example, in a sentence like "X caused population growth", the top theme grounding is "wm/concept/population_demographics/" with a score of 0.88844055 but an avg match score of 0.60294354. The second best theme grounding is "wm/concept/population_demographics/population_density/population_growth" with a score of 0.86057734 (lower than the top grounding) but an avg match of 0.7405923 (higher than the top grounding).

Any idea why these scores are different, and where they are computed? I think I tracked down where "avg match" is getting computed, but the regular "score" is nested within nests of different grounding classes. Any help is greatly appreciated!

@kwalcock
Copy link
Member Author

kwalcock commented Sep 29, 2021

I'm looking at

def groundingInfo(grounding: IndividualGrounding, canonical: String): String = {
and
def examplesDetail(text: String, examples: Seq[String]): Seq[String] = {

I think you are describing output from near the second link, the ones for "max match" and "avg match".

From what I understand, a mention (some cause or effect) has been grounded as to theme and has returned several results, sorted from best to worst, in advance of all this code. The top two of those groundings go through examplesDetail and because of their order, it is expected that the values for the first will be higher than that of the second. However, they come out with average scores of 0.60294354 and 0.7405923 which are reversed.

Isn't the original grounding done in advance based on the bag of words of all examples of each node, along with definitions and descriptions and the node name, etc.? It doesn't seem like the resulting vector would be strongly related to any of the vectors of the specific examples. Adding up vectors, normalizing the sum, and then doing to dot product will produce a different answer than doing the dot product with the different examples and then averaging the result, won't it?

It seems like you are finding that the single vector is not sufficient and that there should be a vector (or matching text) for each example so that a couple of really good example matches could decide the winner rather than some combined vector that summarizes too many disparate examples.

@kwalcock
Copy link
Member Author

FYI @zupon

@MihaiSurdeanu
Copy link
Contributor

Thanks for looking into this @kwalcock!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants