-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix document counts #3671
base: main
Are you sure you want to change the base?
Fix document counts #3671
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
@@ -263,21 +283,6 @@ def index_doc_batch_prepare( | |||
def filter_documents(document_batch: list[Document]) -> list[Document]: | |||
documents: list[Document] = [] | |||
for document in document_batch: | |||
# Remove any NUL characters from title/semantic_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this logic from this func and pushing it to strip_null_characters
in run_indexing.py
.
This isn't really a "filter" step, and the other function has this exact logic more-or-less, so it feels like it's a better fit there
2f14420
to
23ad230
Compare
d4378bb
to
c5e0da0
Compare
c5e0da0
to
14d5c31
Compare
Description
Addresses two edge cases:
How Has This Been Tested?
Ran indexing locally, verified counts matched as expected.
Backporting (check the box to trigger backport action)
Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.