[BUG]How to improve search accuracy? #415
Replies: 8 comments
-
@KevinZhang19870314 good point! One idea that come to my mind is that you could try to summarize the document before uploading it. Btw, sorry but I don't understand the problem with setting a chunk_overlap and a chunk_size. You could set the two to be specific only for that document and use default values for the others, isn't it? Am I missing something? |
Beta Was this translation helpful? Give feedback.
-
@nicola-corbellini Thank you for your response! I mean if I change the chunk size and overlap to make this doc happy, and make this trending in one chunk, thus it will give the correct answer, but for other docs, maybe it will not works. So change chunk size and overlap is not the good choice in this situation. |
Beta Was this translation helpful? Give feedback.
-
Sure, I'll try the summarize plugins. |
Beta Was this translation helpful? Give feedback.
-
@KevinZhang19870314 ok, let me know if this solves your needs. Another solution that comes to my mind would be to play around with the documents' metadata and filter results when recalling. Btw, a hook to filter semantic search with custom metadata has yet to be implemented. |
Beta Was this translation helpful? Give feedback.
-
@nicola-corbellini Hi, I tried summarize plugins, unfortunately it not work. Chunk size is 400 and overlap is 100, and it will split in 21 documents and have 4 summaries. Here is the related log, actually
Here is my text document:
|
Beta Was this translation helpful? Give feedback.
-
@KevinZhang19870314 I'm sorry to hear that. This depends on your use case, but another solution could be splitting the document in advance.
you could split the list in two or three lists, all starting with What do you think? Alternatively, as anticipated in the previous comment, we could think about a hook to filter the retrieval with custom metadata. E.g. you could tag all the chunks with |
Beta Was this translation helpful? Give feedback.
-
@KevinZhang19870314 welcome! Another option is to write a tool that retrieves exactly the info you need in the way you need it. There are at least 4/5 ways to do this, let us know |
Beta Was this translation helpful? Give feedback.
-
@pieroit Thank you for your response, yes, I can do it in hooks or tools, but its not a general method, I have several docs, some of them need splitting like this, some need splitting like that. So, maybe we can category the docs per knowleage base? My thoughts is:
Then, all the docs in A will be well handled, for other docs need to be splitted differently, we can create another knowleage base B, and etc. Use this way, we can category the files/docs, rather than upload all files together, use the unified plugins for all docs. |
Beta Was this translation helpful? Give feedback.
-
Describe the bug
For example, I have a text document, and here is the content:
I upload this doc to memory, and the rabbithole split it and made 2 chunks,
chunk 1:
chunk 2:
Here is the question, when I search "Trending", it gives the 1 ~ 4, but I need 1 ~ 8;
when I search "The last trending", it gives the "4. xxx", but I need "8. xxx".
I know it caused by the chunk size and chunk overlap, if I change it to fit this doc, maybe other docs have issue, so how to resolve this, and how to improve the search accuracy?
Beta Was this translation helpful? Give feedback.
All reactions