[Do not merge] When chunk_size=0, skip vector db #99

gaya3-zipstack · 2024-09-03T16:35:14Z

What

Current; implementation always uses the indexed nodes when fetching context for prompts. However, when chunk_size=0, since we have to send the entire context, we can directly send the extracted text instead of fetching the chunk from the vector db.

Why

This will improve response time for prompts when chunk_size=0 as vector db need not be accessed

How

When chunk_size=0, the context can be fetched from the extracted text present in the container file system

Relevant Docs

Related Issues or PRs

https://zipstack.atlassian.net/browse/UN-1418

Unstract PR

[Do not merge] When chunk_size=0, skip vector db unstract#649

Dependencies Versions

Notes on Testing

Screenshots

Profile with Chunk_size=0
Manual indexing on a document. Here after indexing is completed, no nodes are added to the vector DB as shown

Prompt run on top of manual indexing. Here after prompt run, still no records in the vector db. But still, prompt answers are right as the context gets picked up from the extracted text and works fine.

Running a prompt before manual indexing (dynamic indexing would kick in).

Manually remove the extracted file after indexing. Run prompt. This gives an error saying the extracted file is missing

Now, do a manual re-indexing. Extracted file will be re-created. Then run prompt.

Profile with chunk_size =1024

Manual indexing on a document. Here after indexing is completed, nodes are added to the vector DB as shown

Prompt run on top of manual indexing. Prompt run works fine picking context from vector DB.

Running a prompt before manual indexing (dynamic indexing would kick in) as there are no records in vector db.

Dynamic indexing kicked in and prompt run worked fine

Manually remove the records from vector db

On running prompt, we see an error

Manually re-index. Run prompt again and prompt should work fine. Nodes added to vector DB.

Checklist

I have read and understood the Contribution Guidelines.

Signed-off-by: Gayathri <[email protected]>

gaya3-zipstack added 2 commits September 3, 2024 21:54

add a new extract function

8ab0485

Changes for version and added a TODO for deprecation

9b4f293

gaya3-zipstack requested review from Deepak-Kesavan, harini-venkataraman, chandrasekharan-zipstack and hari-kuriakose September 3, 2024 16:35

gaya3-zipstack mentioned this pull request Sep 3, 2024

[Do not merge] When chunk_size=0, skip vector db Zipstack/unstract#649

Draft

gaya3-zipstack changed the title ~~Fix/chunk size 0~~ When chunk_size=0, skip vector db Sep 3, 2024

Merge branch 'main' into fix/chunk-size-0

5813967

Signed-off-by: Gayathri <[email protected]>

gaya3-zipstack marked this pull request as draft September 4, 2024 09:33

gaya3-zipstack changed the title ~~When chunk_size=0, skip vector db~~ [Do not merge] When chunk_size=0, skip vector db Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Do not merge] When chunk_size=0, skip vector db #99

[Do not merge] When chunk_size=0, skip vector db #99

gaya3-zipstack commented Sep 3, 2024 •

edited

Loading

[Do not merge] When chunk_size=0, skip vector db #99

Are you sure you want to change the base?

[Do not merge] When chunk_size=0, skip vector db #99

Conversation

gaya3-zipstack commented Sep 3, 2024 • edited Loading

What

Why

How

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

Screenshots

Checklist

gaya3-zipstack commented Sep 3, 2024 •

edited

Loading