Query side analysis chain support---Stopwords for Query.And #33540

sliu-e · 2025-03-10T18:18:57Z

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

I recently found out that Stopwords get applied to document tokenizing but not query-side tokenizing. This also makes me wonder if other analysis chain steps might be omitted from query side. While ideally my team would be on WAND instead of AND, we don't have any guarantees of this and ideally would have the analysis chain apply all steps to queryside and not just document side.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Would like everything in our analysis chain to be applied to query tokenizing and not just document tokenizing:

            <config name="com.yahoo.language.lucene.lucene-analysis" >
                <configDir>analysis-config</configDir>
                <analysis>
                    <item key="en">
                        <tokenizer>
                            <name>whitespace</name>
                        </tokenizer>
                        <tokenFilters>
                            <item>
                                <name>asciiFolding</name>
                            </item>
                            <item>
                                <name>synonymGraph</name>
                                <conf>
                                    <item key="synonyms">synonyms.txt</item>
                                    <item key="ignoreCase">true</item>
                                    <item key="expand">true</item>
                                </conf>
                            </item>
                            <item>
                                <name>stop</name>
                                <conf>
                                    <item key="words">stopwords.txt</item>
                                    <item key="ignoreCase">true</item>
                                </conf>
                            </item>
                            <item>
                                <name>wordDelimiterGraph</name>
                                <conf>
                                    <item key="generateNumberParts">1</item>
                                    <item key="generateNumberParts">1</item>
                                    <item key="catenateWords">1</item>
                                    <item key="catenateNumbers">1</item>
                                    <item key="catenateAll">1</item>
                                    <item key="splitOnCaseChange">1</item>
                                    <item key="splitOnNumerics">1</item>
                                    <item key="stemEnglishPossessive">1</item>
                                    <item key="preserveOriginal">1</item>
                                    <item key="protected">wordDelimiterGraphFilterFactoryProtected.txt</item>
                                </conf>
                            </item>
                            <item>
                                <name>lowercase</name>
                            </item>
                            <item>
                                <name>kStem</name>
                            </item>
                            <item>
                                <name>removeDuplicates</name>
                            </item>
                            <item>
                                <name>synonymGraph</name>
                                <conf>
                                    <item key="synonyms">british_synonyms.txt</item>
                                    <item key="ignoreCase">true</item>
                                    <item key="expand">true</item>
                                </conf>
                            </item>
                            <item>
                                <name>flattenGraph</name>
                            </item>
                        </tokenFilters>
                    </item>
                </analysis>
            </config>

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

We have unblocked ourselves by writing custom parsing logic to remove stop word tokens from the query tree within our java searcher logic

Additional context
Add any other context or screenshots about the feature request here.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query side analysis chain support---Stopwords for Query.And #33540

Query side analysis chain support---Stopwords for Query.And #33540

sliu-e commented Mar 10, 2025

Query side analysis chain support---Stopwords for Query.And #33540

Query side analysis chain support---Stopwords for Query.And #33540

Comments

sliu-e commented Mar 10, 2025