Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(document-search): LLM-based query rephraser #115

Merged
merged 8 commits into from
Oct 23, 2024

Conversation

akotyla
Copy link
Collaborator

@akotyla akotyla commented Oct 17, 2024

No description provided.

@akotyla akotyla linked an issue Oct 17, 2024 that may be closed by this pull request
@akotyla akotyla added feature New feature or request document search Changes to the document search package labels Oct 17, 2024
Copy link
Contributor

github-actions bot commented Oct 17, 2024

badge

Code Coverage Summary

Filename                                                                                                     Stmts    Miss  Cover    Missing
---------------------------------------------------------------------------------------------------------  -------  ------  -------  ----------------------------------------
packages/ragbits-core/src/ragbits/core/__init__.py                                                               0       0  100.00%
packages/ragbits-core/src/ragbits/core/config.py                                                                 6       0  100.00%
packages/ragbits-core/src/ragbits/core/embeddings/__init__.py                                                   11       0  100.00%
packages/ragbits-core/src/ragbits/core/embeddings/base.py                                                        4       0  100.00%
packages/ragbits-core/src/ragbits/core/embeddings/exceptions.py                                                 14       6  57.14%   7-8, 17, 26-27, 36
packages/ragbits-core/src/ragbits/core/embeddings/litellm.py                                                    28      19  32.14%   7-8, 43-51, 69-85
packages/ragbits-core/src/ragbits/core/embeddings/local.py                                                      39      25  35.90%   9-10, 35-45, 58-70, 74-76, 80-81
packages/ragbits-core/src/ragbits/core/embeddings/noop.py                                                        4       0  100.00%
packages/ragbits-core/src/ragbits/core/llms/__init__.py                                                         15       8  46.67%   28-41
packages/ragbits-core/src/ragbits/core/llms/base.py                                                             33       4  87.88%   33, 52, 88, 97
packages/ragbits-core/src/ragbits/core/llms/factory.py                                                          18       2  88.89%   47, 60
packages/ragbits-core/src/ragbits/core/llms/litellm.py                                                          25       4  84.00%   8-9, 54, 85
packages/ragbits-core/src/ragbits/core/llms/local.py                                                            24      10  58.33%   8-9, 42-47, 57, 70-71
packages/ragbits-core/src/ragbits/core/llms/types.py                                                             8       2  75.00%   24, 28
packages/ragbits-core/src/ragbits/core/llms/clients/__init__.py                                                  4       0  100.00%
packages/ragbits-core/src/ragbits/core/llms/clients/base.py                                                     23       0  100.00%
packages/ragbits-core/src/ragbits/core/llms/clients/exceptions.py                                               14       6  57.14%   7-8, 17, 26-27, 36
packages/ragbits-core/src/ragbits/core/llms/clients/litellm.py                                                  50      10  80.00%   10-11, 70, 108, 122-127
packages/ragbits-core/src/ragbits/core/llms/clients/local.py                                                    37      12  67.57%   11-12, 62-70, 92-103
packages/ragbits-core/src/ragbits/core/prompt/__init__.py                                                        2       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/base.py                                                           18       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/parsers.py                                                        34       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/prompt.py                                                        110       2  98.18%   107, 111
packages/ragbits-core/src/ragbits/core/prompt/discovery/__init__.py                                              2       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/discovery/prompt_discovery.py                                     33       2  93.94%   55-56
packages/ragbits-core/src/ragbits/core/utils/_pyproject.py                                                      25       0  100.00%
packages/ragbits-core/src/ragbits/core/utils/config_handling.py                                                  9       0  100.00%
packages/ragbits-core/src/ragbits/core/utils/decorators.py                                                      28       0  100.00%
packages/ragbits-core/src/ragbits/core/vector_store/__init__.py                                                 13       1  92.31%   29
packages/ragbits-core/src/ragbits/core/vector_store/base.py                                                     12       0  100.00%
packages/ragbits-core/src/ragbits/core/vector_store/chromadb_store.py                                           57       7  87.72%   9-10, 60-67, 84, 126
packages/ragbits-core/src/ragbits/core/vector_store/in_memory.py                                                18       0  100.00%
packages/ragbits-core/tests/unit/__init__.py                                                                     0       0  100.00%
packages/ragbits-core/tests/unit/llms/__init__.py                                                                0       0  100.00%
packages/ragbits-core/tests/unit/llms/test_litellm.py                                                           63       0  100.00%
packages/ragbits-core/tests/unit/llms/factory/__init__.py                                                        3       0  100.00%
packages/ragbits-core/tests/unit/llms/factory/test_get_default_llm.py                                            8       0  100.00%
packages/ragbits-core/tests/unit/llms/factory/test_get_llm_from_factory.py                                       8       0  100.00%
packages/ragbits-core/tests/unit/llms/factory/test_has_default_llm.py                                            8       0  100.00%
packages/ragbits-core/tests/unit/prompts/__init__.py                                                             0       0  100.00%
packages/ragbits-core/tests/unit/prompts/test_parsers.py                                                        65       0  100.00%
packages/ragbits-core/tests/unit/prompts/test_prompt.py                                                        143       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/__init__.py                                                   0       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/prompt_classes_for_tests.py                                  30       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/test_prompt_discovery.py                                     18       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/ragbits_tests_pkg_with_prompts/__init__.py                    2       1  50.00%   3
packages/ragbits-core/tests/unit/prompts/discovery/ragbits_tests_pkg_with_prompts/prompts/__init__.py            3       2  33.33%   2-4
packages/ragbits-core/tests/unit/prompts/discovery/ragbits_tests_pkg_with_prompts/prompts/temp_prompt1.py       14       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/ragbits_tests_pkg_with_prompts/prompts/temp_prompt2.py       14       0  100.00%
packages/ragbits-core/tests/unit/utils/test_decorators.py                                                       26       2  92.31%   17, 39
packages/ragbits-core/tests/unit/utils/pyproject/test_find.py                                                   13       0  100.00%
packages/ragbits-core/tests/unit/utils/pyproject/test_get_config.py                                              9       0  100.00%
packages/ragbits-core/tests/unit/utils/pyproject/test_get_instace.py                                            27       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_chromadb_store.py                                           69       4  94.20%   31, 34, 39, 44
packages/ragbits-core/tests/unit/vector_stores/test_simple_vector_store.py                                      16       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/__init__.py                                         2       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/_main.py                                           67       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/documents/__init__.py                               0       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/documents/document.py                              60       3  95.00%   53, 96, 143
packages/ragbits-document-search/src/ragbits/document_search/documents/element.py                               37       2  94.59%   34, 107
packages/ragbits-document-search/src/ragbits/document_search/documents/exceptions.py                            11       5  54.55%   7-8, 17, 26-27
packages/ragbits-document-search/src/ragbits/document_search/documents/sources.py                               93      16  82.80%   13-14, 63, 76, 160-165, 203-206, 210-211
packages/ragbits-document-search/src/ragbits/document_search/ingestion/__init__.py                               0       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/document_processor.py                    29       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/__init__.py                    13       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/base.py                        14       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/dummy.py                       11       1  90.91%   27
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/unstructured/__init__.py        0       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/unstructured/default.py        45       4  91.11%   97, 102-103, 134
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/unstructured/images.py         39      15  61.54%   61-68, 75-83, 94, 107
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/unstructured/pdf.py            20       6  70.00%   24, 36-44
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/unstructured/utils.py          33      10  69.70%   50, 61-62, 77-80, 104-119
packages/ragbits-document-search/src/ragbits/document_search/retrieval/__init__.py                               0       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/__init__.py                   15       4  73.33%   39-44
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/base.py                        7       1  85.71%   32
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/llm.py                        20       9  55.00%   27-28, 45-48, 65-67
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/noop.py                        4       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/prompts.py                    16       4  75.00%   49-54
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/__init__.py                    13       1  92.31%   27
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/base.py                         6       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/noop.py                         7       0  100.00%
packages/ragbits-document-search/tests/__init__.py                                                               0       0  100.00%
packages/ragbits-document-search/tests/helpers.py                                                                3       0  100.00%
packages/ragbits-document-search/tests/integration/__init__.py                                                   0       0  100.00%
packages/ragbits-document-search/tests/integration/test_sources.py                                              23      10  56.52%   22-32, 40-45
packages/ragbits-document-search/tests/integration/test_unstructured.py                                         47      10  78.72%   46-52, 65-71
packages/ragbits-document-search/tests/unit/__init__.py                                                          0       0  100.00%
packages/ragbits-document-search/tests/unit/test_document_processor.py                                          17       0  100.00%
packages/ragbits-document-search/tests/unit/test_document_search.py                                             75       0  100.00%
packages/ragbits-document-search/tests/unit/test_documents.py                                                   13       0  100.00%
packages/ragbits-document-search/tests/unit/test_elements.py                                                    15       0  100.00%
packages/ragbits-document-search/tests/unit/test_local_file_source.py                                           13       0  100.00%
packages/ragbits-document-search/tests/unit/test_providers.py                                                   31       0  100.00%
packages/ragbits-document-search/tests/unit/test_sources.py                                                     25       0  100.00%
TOTAL                                                                                                         2081     230  88.95%

Diff against main

Filename                                                                                         Stmts    Miss  Cover
---------------------------------------------------------------------------------------------  -------  ------  --------
packages/ragbits-core/src/ragbits/core/llms/__init__.py                                            +11      +8  -53.33%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/__init__.py       +2      +1  -3.59%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/base.py           +2      +1  -14.29%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/llm.py           +20      +9  +55.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/noop.py           -1       0  +100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/prompts.py       +16      +4  +75.00%
TOTAL                                                                                              +50     +23  -0.86%

Results for commit: fd4a48d

Minimum allowed coverage is 60%

♻️ This comment has been updated with latest results

Copy link
Contributor

github-actions bot commented Oct 17, 2024

Trivy scanning results.

.venv/lib/python3.10/site-packages/PyJWT-2.9.0.dist-info/METADATA (secrets)

Total: 1 (MEDIUM: 1, HIGH: 0, CRITICAL: 0)

MEDIUM: JWT (jwt-token)
════════════════════════════════════════
JWT token
────────────────────────────────────────
.venv/lib/python3.10/site-packages/PyJWT-2.9.0.dist-info/METADATA:80
────────────────────────────────────────
78 >>> encoded = jwt.encode({"some": "payload"}, "secret", algorithm="HS256")
79 >>> print(encoded)
80 [ *********************************************************************************************************
81 >>> jwt.decode(encoded, "secret", algorithms=["HS256"])
────────────────────────────────────────

.venv/lib/python3.10/site-packages/litellm/llms/huggingface_llms_metadata/hf_text_generation_models.txt (secrets)

Total: 1 (MEDIUM: 0, HIGH: 0, CRITICAL: 1)

CRITICAL: HuggingFace (hugging-face-access-token)
════════════════════════════════════════
Hugging Face Access Token
────────────────────────────────────────
.venv/lib/python3.10/site-packages/litellm/llms/huggingface_llms_metadata/hf_text_generation_models.txt:36162
────────────────────────────────────────
36160 mncai/Llama2-7B-Active_3rd-floor-LoRA-dim64_epoch4
36161 ajcdp/CM
36162 [ Nagharjun17/*************************************
36163 BigSalmon/InformalToFormalLincoln114Paraphrase
────────────────────────────────────────

.venv/lib/python3.10/site-packages/litellm/proxy/_types.py (secrets)

Total: 1 (MEDIUM: 1, HIGH: 0, CRITICAL: 0)

MEDIUM: Slack (slack-web-hook)
════════════════════════════════════════
Slack Webhook
────────────────────────────────────────
.venv/lib/python3.10/site-packages/litellm/proxy/_types.py:1288
────────────────────────────────────────
1286 alert_to_webhook_url: Optional[Dict] = Field(
1287 None,
1288 [ bhook_url: {'budget_alerts': '*****************************************************************************'}`",
1289 )
────────────────────────────────────────

"""
Rephrase a query.

Args:
query: The query to rephrase.
options: OptionaL options to fine-tune the rephraser behavior.
Copy link
Collaborator

@ludwiktrammer ludwiktrammer Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
options: OptionaL options to fine-tune the rephraser behavior.
options: Optional configuration of the the rephraser behavior.

(it's very nit-picky but fixes the "L" in "OptionaL", the awkward sounding "Optional options", and gets rid of the word "fine-tuning" which got me confused at first since usually it has a different meaning when used with LLMs)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, fixed

"""
Mock implementation which outputs the same query as in input.

Args:
query: The query to rephrase.
options: OptionaL options to fine-tune the rephraser behavior.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
options: OptionaL options to fine-tune the rephraser behavior.
options: Optional configuration of the the rephraser behavior.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed :)

If not provided, the default `QueryRephraserPrompt` is used.
"""

self._prompt = prompt or QueryRephraserPrompt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this mean that _prompt sometimes is an instance of QueryRephraserPrompt and sometimes it's the class itself? I also don't see self._prompt used anywhere. In the rephrase method QueryRephraserPrompt is used directly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I cleaned it up now and prompt is no longer an attribute

@akotyla akotyla merged commit 17f9ac2 into main Oct 23, 2024
3 checks passed
@micpst micpst deleted the 95-featdocument-search-llm-based-query-rephraser branch November 6, 2024 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
document search Changes to the document search package feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat(document-search): LLM-based query rephraser
3 participants