You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I know you've got LLM based filtering of chunks implemented but I'd love to be able to use (and customize prompts for) reranking with a "small" (really just cheap) LLM like Gemini Flash, Omini Mini, DeepSeek etc.
My understanding (from an LLM) of how DISABLE_LLM_CHUNK_FILTER works: DISABLE_LLM_CHUNK_FILTER and its relation to reranking:
DISABLE_LLM_CHUNK_FILTER primarily controls a separate feature, which I'll call "LLM-based Relevance Filtering", not traditional reranking. This flag determines whether an LLM (like GPT-3.5/4) is used to filter out entire chunks/sections before they are even considered for the final answer generation. This filtering happens after retrieval but before a traditional reranking step.
Reranking, as implemented in Onyx, traditionally uses cross-encoder models (or potentially a cloud API like Cohere Rerank) to reorder the retrieved chunks based on their relevance scores. This is a separate step from the LLM-based chunk filtering.
And here's Gemini's thoughts on implementing the feature given your entire repo as input...
Okay, here's an outline of the plan to implement LLM-based reranking in Onyx using a smaller, more cost-effective LLM like "GPT-4o-mini", followed by a draft implementation based on your provided code structure.
Outline of Changes and Feature Plan
1. Goals
Enable reranking with smaller, more cost-effective LLMs than the primary QA LLM.
Utilize the provided prompt template for the reranking task.
Handle potential parsing failures from the LLM response with retry logic.
Integrate the new reranking capability into the existing architecture with minimal disruption.
Provide configuration options to enable/disable this feature and customize the LLM used.
2. Design Considerations
Modularity: Design the implementation to be modular, allowing easy swapping of LLMs and prompt templates.
Error Handling: Implement robust error handling for cases where the LLM fails to provide a correctly formatted response. This includes retries and potentially falling back to a default behavior (e.g., no reranking).
Performance: Consider the performance impact of using an LLM for reranking. This might involve batching requests or optimizing the prompt to minimize token usage.
Configuration: Allow users to configure the LLM model, API key, and other relevant parameters via environment variables or configuration files.
Integration: Ensure seamless integration with the existing search pipeline, specifically the rerank_settings in the SearchQuery model.
3. Implementation Plan
Model Interface:
Create a new class, LLMReranker, that implements the interface of a reranking model.
This class will handle interactions with the LLM, including prompt construction and response parsing.
The constructor will accept parameters for model configuration (similar to EmbeddingModel).
Prompt Builder:
Create a helper function or class to construct the prompt for the LLM based on the provided template and the number of documents.
Response Parser:
Implement a function to parse the LLM's response and extract the ranked order of document IDs.
Include error handling and retry logic for parsing failures.
Integration with Search Pipeline:
Modify the search_pipeline to use the LLMReranker when configured.
Add a new field to the RerankingDetails model to specify the LLM reranker model.
Configuration:
Add new configuration options to model_configs.py for the LLM reranker model, API key, etc.
Testing:
Add unit tests to verify the LLMReranker functionality, including prompt construction, response parsing, and error handling.
Add integration tests to verify the entire flow, including the interaction with the SearchPipeline.
4. Draft Implementation
Here's a draft implementation based on the provided code structure:
File: onyx/llm/utils.py
importredefparse_rank_results(model_output: str) ->list[int]:
"""Parse the output string from the ranking prompt to extract the ranked list of doc indices. Handles a variety of formats, including: - Simple ordered lists (e.g., "1 2 3") - Lists with brackets (e.g., "[1] [2] [3]" or "1 > 2 > 3") - Lists with or without punctuation """ifnotmodel_output:
return []
# Try to extract numbers within bracketsextracted_numbers=re.findall(r"\[(\d+)\]", model_output)
ifnotextracted_numbers:
# If no brackets, try to extract numbers separated by spaces or other delimitersextracted_numbers=re.findall(r"\b\d+\b", model_output)
try:
return [int(num) fornuminextracted_numbers]
exceptValueError:
return []
File: onyx/llm/reranking.py
importjsonfromtypingimportAnyfromlangchain.schema.messagesimportHumanMessagefromlangchain.schema.messagesimportSystemMessagefromonyx.chat.prompt_builder.utilsimporttranslate_history_to_basemessagesfromonyx.llm.interfacesimportLLMfromonyx.llm.utilsimportbuild_content_with_imgsfromonyx.llm.utilsimportmessage_to_prompt_and_imgsfromonyx.prompts.direct_qa_promptsimportHISTORY_BLOCKfromonyx.utils.loggerimportsetup_loggerfromonyx.utils.text_processingimportclean_up_code_blockslogger=setup_logger()
MAX_RERANKING_TOKENS=4096classLLMReranker:
def__init__(
self,
llm: LLM,
):
self.llm=llmdefrerank(self, query: str, contexts: list[str]) ->list[int]:
""" Reranks the given contexts based on the query using the specified LLM. Returns a list of indices corresponding to the documents in descending order of relevance. """ifnotcontexts:
return []
system_prompt="You are an expert at evaluating the relevance of a document to a search query.\n""Now, you will be provided a query from the user. The user is trying to answer this query ""by using your organization's internal knowledge base. You are given a list of ""text chunks, each extracted from a different document and you must order them by how relevant they are to""the users query"system_msg=SystemMessage(content=system_prompt)
user_message= (
f"Here are {len(context)} chunks from different documents with their document numbers: \n\n""```\n"
)
fori, contextinenumerate(contexts):
user_message+=f"[{i+1}] {context}\n"user_message+="```\n"user_message+=f"Query: {query}\n"user_message+= (
"Provide the ordered list of document numbers, from most relevant to least relevant.""\nOrder: "
)
user_msg=HumanMessage(content=user_message)
messages= [system_msg, user_msg]
result=self.llm.invoke(messages).contenttry:
# Extract the list of document numbers using a regular expressionrank_list=re.findall(r"\[(\d+)\]", result)
return [int(x) -1forxinrank_list]
exceptExceptionase:
logger.warning(
f"Failed to parse LLM response for reranking. Returning default order. Error: {e}"
)
returnlist(range(len(contexts)))
5. Integration with Search Pipeline
You will need to update the SearchPipeline class to include the LLMReranker and call it when reranking is enabled. This will likely involve:
Adding an optional llm_reranker: LLMReranker | None field to the SearchPipeline class.
In the rerank_sections method, check if llm_reranker is not None. If it is, use it to rerank the sections.
6. Configuration
You'll need to add configuration options to specify the LLM reranker model. This could be done in model_configs.py or a new configuration file specifically for reranking. You'll need to define:
LLM_RERANKER_MODEL: The name of the LLM model to use for reranking (e.g., "gpt-4o-mini").
LLM_RERANKER_API_KEY: The API key for the LLM provider (if needed).
LLM_RERANKER_PROVIDER_TYPE: The LLM provider type (e.g., "openai", "anthropic", etc.).
This draft implementation should provide a solid starting point for implementing LLM-based reranking in Onyx. Remember to thoroughly test the changes and handle any edge cases or potential errors.
The text was updated successfully, but these errors were encountered:
I know you've got LLM based filtering of chunks implemented but I'd love to be able to use (and customize prompts for) reranking with a "small" (really just cheap) LLM like Gemini Flash, Omini Mini, DeepSeek etc.
My understanding (from an LLM) of how DISABLE_LLM_CHUNK_FILTER works:
DISABLE_LLM_CHUNK_FILTER
and its relation to reranking:DISABLE_LLM_CHUNK_FILTER
primarily controls a separate feature, which I'll call "LLM-based Relevance Filtering", not traditional reranking. This flag determines whether an LLM (like GPT-3.5/4) is used to filter out entire chunks/sections before they are even considered for the final answer generation. This filtering happens after retrieval but before a traditional reranking step.Here's one implementation for reference:
https://github.com/castorini/rank_llm/
And here's Gemini's thoughts on implementing the feature given your entire repo as input...
Okay, here's an outline of the plan to implement LLM-based reranking in Onyx using a smaller, more cost-effective LLM like "GPT-4o-mini", followed by a draft implementation based on your provided code structure.
Outline of Changes and Feature Plan
1. Goals
2. Design Considerations
rerank_settings
in theSearchQuery
model.3. Implementation Plan
LLMReranker
, that implements the interface of a reranking model.EmbeddingModel
).search_pipeline
to use theLLMReranker
when configured.RerankingDetails
model to specify the LLM reranker model.model_configs.py
for the LLM reranker model, API key, etc.LLMReranker
functionality, including prompt construction, response parsing, and error handling.SearchPipeline
.4. Draft Implementation
Here's a draft implementation based on the provided code structure:
File: onyx/llm/utils.py
File: onyx/llm/reranking.py
5. Integration with Search Pipeline
You will need to update the
SearchPipeline
class to include theLLMReranker
and call it when reranking is enabled. This will likely involve:llm_reranker: LLMReranker | None
field to theSearchPipeline
class.rerank_sections
method, check ifllm_reranker
is not None. If it is, use it to rerank the sections.6. Configuration
You'll need to add configuration options to specify the LLM reranker model. This could be done in
model_configs.py
or a new configuration file specifically for reranking. You'll need to define:LLM_RERANKER_MODEL
: The name of the LLM model to use for reranking (e.g., "gpt-4o-mini").LLM_RERANKER_API_KEY
: The API key for the LLM provider (if needed).LLM_RERANKER_PROVIDER_TYPE
: The LLM provider type (e.g., "openai", "anthropic", etc.).This draft implementation should provide a solid starting point for implementing LLM-based reranking in Onyx. Remember to thoroughly test the changes and handle any edge cases or potential errors.
The text was updated successfully, but these errors were encountered: