You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all: Thanks for creating this enormously helpful bundle! While fine-tuning it for our application, I've stumbled upon the following problem: The decompound filter correctly returns the subwords of compound words but returns every word that's not a compound word twice (i.e. it treats the compound word as a single subword of itself).
This is the simplified version of my index settings to reproduce the problem:
Thanks! I just came back to post this as well. What's important to note is that the unique filter should be used with only_on_same_position: true, because otherwise the term frequency will be heavily distorted.
First of all: Thanks for creating this enormously helpful bundle! While fine-tuning it for our application, I've stumbled upon the following problem: The decompound filter correctly returns the subwords of compound words but returns every word that's not a compound word twice (i.e. it treats the compound word as a single subword of itself).
This is the simplified version of my index settings to reproduce the problem:
Querying
/_analyze
with the textGrundbuchamt Anwältin
returns:As you can see, the token
Anwältin
is returned twice with the same offset and position.(Setting
subwords_only
to true eliminates the duplicates by the way.)Do you have an idea how we might fix this behaviour?
The text was updated successfully, but these errors were encountered: