Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full-text search errors for prefix match #3014

Open
okimonhgb opened this issue May 6, 2024 · 12 comments
Open

Full-text search errors for prefix match #3014

okimonhgb opened this issue May 6, 2024 · 12 comments
Labels
enhancement New feature or request

Comments

@okimonhgb
Copy link

Describe the bug
Fulltext search gives error if I want to search for words that match a given prefix:
FT.SEARCH idxe:content "@text:football*"
"ERR Query syntax error"

To Reproduce
Steps to reproduce the behavior:

  1. Insert records : A json type key with a text field called "text".
  2. Create a full-text index: FT.CREATE indxe:content ON JSON PREFIX 1 contents: SCHEMA
    $.text AS text TEXT
  3. Query records using : FT.SEARCH idxe:content "@text:football*"
  4. It gives error: "ERR Query syntax error"

Expected behavior
for exact search I get no error:
FT.SEARCH idxe:content "@text:football"
My existing app. running on Redis doesn't get any error for prefix matching as expected:

https://redis.io/docs/latest/develop/interact/search-and-query/query/full-text/

...
Word prefix
You can also search for words that match a given prefix.

FT.SEARCH index "prefix*"
...
<<<

Screenshots
Error with dragonfly:
image

No error with Redis:
image

Environment (please complete the following information):

  • OS: Rocky Linux 9.3 (Blue Onyx)
  • Kernel: Linux cache-maker 5.14.0-362.24.1.el9_3.0.1.x86_64 finalize blpop algorithm #1 SMP PREEMPT_DYNAMIC Thu Apr 4 22:31:43 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
  • Containerized?: No
  • Dragonfly Version: v1.17.1-9d0253ca8937175d2551d9b39c26f99921f46a92

Reproducible Code Snippet
FT.CREATE indxe:content ON JSON PREFIX 1 contents: SCHEMA
$.text AS text TEXT

FT.SEARCH idxe:content "@text:football*"

Minimal code snippet to reproduce this bug

FT.SEARCH idxe:content "@text:football*"

Additional context
Add any other context about the problem here.

@okimonhgb okimonhgb added the bug Something isn't working label May 6, 2024
@chakaz
Copy link
Collaborator

chakaz commented May 6, 2024

@dranikpg can you kindly take a look? 🙏

@romange
Copy link
Collaborator

romange commented May 6, 2024

@okimonhgb is it full text search? it's not supported. indeed I do not see that it is clearly specified in the documentation.

@okimonhgb
Copy link
Author

@romange yes it is full text search. It works for exact word searches but seems doesn't support word prefixes.
Is it planned to be supported in near future? Is there any workaround? We want to use replace our redis instance with dragonfly but this problem breaks existing codes.

@romange
Copy link
Collaborator

romange commented May 6, 2024

Unfortunately it won't be implemented in near future.

@okimonhgb
Copy link
Author

Ok. I understand there is also no workaround?
So we should plan according to this.
Can I keep this issue open?
Best regards,

@romange
Copy link
Collaborator

romange commented May 6, 2024

Sure :)

@romange romange added enhancement New feature or request and removed bug Something isn't working labels Aug 12, 2024
@Lakshyadevelops
Copy link
Contributor

Lakshyadevelops commented Sep 25, 2024

@romange
Do we only wish to support prefix ab* queries or do we additionally aim to support infix *ab* and suffix queries *ab efficently. For the former we would need a patricia tree and for the latter a suffix trie. Patricia tree implementation will be very simple as compared to suffix trie which would involve Ukkonens algorithm and multiple edge cases.
https://redis.io/docs/latest/develop/interact/search-and-query/query/full-text/
https://redis.io/docs/latest/commands/ft.create/#:~:text=converted%20to%20lowercase.-,WITHSUFFIXTRIE,-for%20TEXT%20and

@romange
Copy link
Collaborator

romange commented Sep 26, 2024

We just discussed this internally within the team - we need to learn the domain and devise a plan.

I personally have never developed patricia tree nor the suffix trie but based on what you write it seems that the Patricia tree will will hit the limit very soon.

Are there any 3rd party libraries that provide suffix trie implementation out of the box?

@romange
Copy link
Collaborator

romange commented Sep 26, 2024

Also there are suffix arrays. Do they help with this?

@Lakshyadevelops
Copy link
Contributor

All of my conclusions will be based on comparison with Redis.

As per my understanding, We don't need to choose between Patricia Tree or Suffix Trie but rather both implementations need to be present. (If the --WITHSUFFIXTRIE flag is present then make suffix trie only otherwise patricia tree only). Suffix trie consumes a lot more memory so comparatively though both have same complexity. We can support suffix and infix queries on patricia tree also but they would be brute searches only.

I need to do some more research and comparison between suffix tree and suffix array, and find a suitable implementation. I previously extended a Patricia Tree implementation and I am familiar with full text indexing domain so I would like to be included if possible in your discussions.

@romange
Copy link
Collaborator

romange commented Oct 1, 2024

Currently @dranikpg is leading the effort but due to time constraints he is working on it on and off.
AFAIK, @dranikpg is planning to use rax.h for that but we have not started any implementation yet.
@dranikpg - lets collaborate here before starting the implementation.

@dranikpg
Copy link
Contributor

dranikpg commented Oct 5, 2024

  1. Let's not juggle around with terms. Patricia trees are radix tries. "Suffix tree" is just a way to build a trie 🙂
  2. I suggest to use rax from redis as we already rely on it and it's a sufficiently good implementation of what we need
  3. First step would be to implement just prefix matching. It requires some refactoring as we can replace our hashmap index lookup with a trie query to not store all the words twice.
  4. Suffix matches are just queries to the suffix trie. Suffix arrays would not be applicable due to linear update times. Infix matches are an intersection between prefix and suffix matches

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants