Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Full-Text Search Configuration and Analyzer in Smile Elasticsearch #3507

Closed
mahesh-makwana-web-vision opened this issue Jan 31, 2025 · 2 comments
Labels

Comments

@mahesh-makwana-web-vision
Copy link

mahesh-makwana-web-vision commented Jan 31, 2025

Hi @romainruaud

I am working on configuring Elasticsearch within the Smile Elasticsearch suite, and I have encountered an issue with the content field mapping. Currently, the content field is defined as type: "text", but it uses the keyword analyzer, which does not tokenize the data as expected for full-text search.

Current Mapping (for the content field):

"content" : { "type" : "text", "fields" : { "standard" : { "type" : "text", "analyzer" : "standard" }, "untouched" : { "type" : "keyword", "ignore_above" : 256, "normalizer" : "untouched" } }, "copy_to" : [ "search" ], "norms" : false, "analyzer" : "keyword" }
Issue

  1. The content field is using the keyword analyzer, which is not ideal for full-text search.
  2. I need to search across the full content data

Questions

  1. How can I properly configure the content field to ensure it is fully searchable (i.e., tokenized correctly for full-text search) content value?
  2. Is there a specific recommendation for enabling or disabling norms for full-text search, and how does it affect the scoring of search results?
  3. Do I need to update or modify the search field that is being copied to, and what is the best analyzer to use for it?
  4. How can I perform an update to this mapping without causing issues with existing data in the index?

File : elasticsuite_indices.xml

<index identifier="customdata_search" defaultSearchType="customdata">
    <type name="customdata" idFieldName="id">
        <mapping>
            <field name="id" type="integer" />
            <field name="sorting" type="integer" />
            <field name="title" type="keyword" />
            <field name="content" type="text">
                <isSearchable>1</isSearchable>
                <defaultSearchAnalyzer>standard</defaultSearchAnalyzer>
            </field>
            <field name="url" type="keyword" />
            <field name="is_displayed_in_autocomplete" type="boolean" />
        </mapping>
    </type>
</index>

NOTE: I have a large amount of data in the content field.

I would appreciate any guidance on how to apply these changes and ensure minimal disruption to our existing setup.

Thanks in advance for your assistance!

@romainruaud
Copy link
Collaborator

Hi @mahesh-makwana-web-vision

your field is configured properly.

It has a subfield named content.standard which is using the "standard" analyzer which is correct.

The content.standard field will be then used when processing fulltext searches.

Best regarsd

@mahesh-makwana-web-vision
Copy link
Author

It's working.

Thanks @romainruaud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants