fix: Every time a vectorized document is generated, the entire vectorized data of the document is deleted #2721

shaohuzhang1 · 2025-03-28T06:14:43Z

fix: Every time a vectorized document is generated, the entire vectorized data of the document is deleted

…ized data of the document is deleted

f2c-ci-robot · 2025-03-28T06:14:47Z

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

f2c-ci-robot · 2025-03-28T06:14:51Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

shaohuzhang1 · 2025-03-28T06:15:05Z

apps/common/event/listener_manage.py

@@ -272,8 +272,6 @@ def is_the_task_interrupted():
            ListenerManagement.update_status(QuerySet(Document).filter(id=document_id), TaskType.EMBEDDING,
                                             State.STARTED)

-            # 删除文档向量数据
-            VectorStore.get_embedding_vector().delete_by_document_id(document_id)

            # 根据段落进行向量化处理
            page_desc(QuerySet(Paragraph)


The provided code snippet does not contain any obvious irregularities or potential issues. However, there are some general optimizations that can be made:

Potential Improvements

Use of Generators for Data Handling: If QuerySet supports generator operations, you might use generators to handle data more efficiently, especially if dealing with large datasets.

Exception Handling: Consider adding try-except blocks around database operations and document management tasks to ensure robustness against exceptions during execution.

Code Clarity: Ensure that variable names and function structures are clear and concise. This makes the code easier to understand and maintain.

Performance Optimization: If VectorStore's delete_by_document_id method is slow due to indexing or other optimizations, consider caching results or using batch deletion when applicable.

Here's a slightly optimized version of the relevant part of the code (assuming document_id is valid):

def is_the_task_interrupted(): try: # Update listener status ListenerManagement.update_status( QuerySet(Document).filter(id=document_id), TaskType.EMBEDDING, State.STARTED ) # Delete document vector data VectorStore.get_embedding_vector().delete_by_document_id(document_id) # According to paragraphs for vectorization processing page_desc(QuerySet(Paragraph)) except Exception as e: print(f"An error occurred: {e}")

By implementing these suggestions, the code will be more robust, efficient, and readable. Adjustments may vary based on specific requirements and constraints within your project.

fix: Every time a vectorized document is generated, the entire vector…

6246b55

…ized data of the document is deleted

f2c-ci-robot bot added the do-not-merge/release-note-label-needed label Mar 28, 2025

shaohuzhang1 commented Mar 28, 2025

View reviewed changes

shaohuzhang1 merged commit dcc80a4 into main Mar 28, 2025
4 checks passed

shaohuzhang1 deleted the pr@main@fix_document_embedding branch March 28, 2025 06:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Every time a vectorized document is generated, the entire vectorized data of the document is deleted #2721

fix: Every time a vectorized document is generated, the entire vectorized data of the document is deleted #2721

shaohuzhang1 commented Mar 28, 2025

f2c-ci-robot bot commented Mar 28, 2025

f2c-ci-robot bot commented Mar 28, 2025

shaohuzhang1 Mar 28, 2025

fix: Every time a vectorized document is generated, the entire vectorized data of the document is deleted #2721

fix: Every time a vectorized document is generated, the entire vectorized data of the document is deleted #2721

Conversation

shaohuzhang1 commented Mar 28, 2025

f2c-ci-robot bot commented Mar 28, 2025

f2c-ci-robot bot commented Mar 28, 2025

shaohuzhang1 Mar 28, 2025

Choose a reason for hiding this comment

Potential Improvements