Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query side analysis chain support---Stopwords for Query.And #33540

Open
sliu-e opened this issue Mar 10, 2025 · 0 comments
Open

Query side analysis chain support---Stopwords for Query.And #33540

sliu-e opened this issue Mar 10, 2025 · 0 comments

Comments

@sliu-e
Copy link

sliu-e commented Mar 10, 2025

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

I recently found out that Stopwords get applied to document tokenizing but not query-side tokenizing. This also makes me wonder if other analysis chain steps might be omitted from query side. While ideally my team would be on WAND instead of AND, we don't have any guarantees of this and ideally would have the analysis chain apply all steps to queryside and not just document side.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Would like everything in our analysis chain to be applied to query tokenizing and not just document tokenizing:

            <config name="com.yahoo.language.lucene.lucene-analysis" >
                <configDir>analysis-config</configDir>
                <analysis>
                    <item key="en">
                        <tokenizer>
                            <name>whitespace</name>
                        </tokenizer>
                        <tokenFilters>
                            <item>
                                <name>asciiFolding</name>
                            </item>
                            <item>
                                <name>synonymGraph</name>
                                <conf>
                                    <item key="synonyms">synonyms.txt</item>
                                    <item key="ignoreCase">true</item>
                                    <item key="expand">true</item>
                                </conf>
                            </item>
                            <item>
                                <name>stop</name>
                                <conf>
                                    <item key="words">stopwords.txt</item>
                                    <item key="ignoreCase">true</item>
                                </conf>
                            </item>
                            <item>
                                <name>wordDelimiterGraph</name>
                                <conf>
                                    <item key="generateNumberParts">1</item>
                                    <item key="generateNumberParts">1</item>
                                    <item key="catenateWords">1</item>
                                    <item key="catenateNumbers">1</item>
                                    <item key="catenateAll">1</item>
                                    <item key="splitOnCaseChange">1</item>
                                    <item key="splitOnNumerics">1</item>
                                    <item key="stemEnglishPossessive">1</item>
                                    <item key="preserveOriginal">1</item>
                                    <item key="protected">wordDelimiterGraphFilterFactoryProtected.txt</item>
                                </conf>
                            </item>
                            <item>
                                <name>lowercase</name>
                            </item>
                            <item>
                                <name>kStem</name>
                            </item>
                            <item>
                                <name>removeDuplicates</name>
                            </item>
                            <item>
                                <name>synonymGraph</name>
                                <conf>
                                    <item key="synonyms">british_synonyms.txt</item>
                                    <item key="ignoreCase">true</item>
                                    <item key="expand">true</item>
                                </conf>
                            </item>
                            <item>
                                <name>flattenGraph</name>
                            </item>
                        </tokenFilters>
                    </item>
                </analysis>
            </config>

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

We have unblocked ourselves by writing custom parsing logic to remove stop word tokens from the query tree within our java searcher logic

Additional context
Add any other context or screenshots about the feature request here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant