Feature Request: ReRank with an LLM #3578

williambarberjr · 2025-01-02T22:23:01Z

I know you've got LLM based filtering of chunks implemented but I'd love to be able to use (and customize prompts for) reranking with a "small" (really just cheap) LLM like Gemini Flash, Omini Mini, DeepSeek etc.

My understanding (from an LLM) of how DISABLE_LLM_CHUNK_FILTER works:
DISABLE_LLM_CHUNK_FILTER and its relation to reranking:

DISABLE_LLM_CHUNK_FILTER primarily controls a separate feature, which I'll call "LLM-based Relevance Filtering", not traditional reranking. This flag determines whether an LLM (like GPT-3.5/4) is used to filter out entire chunks/sections before they are even considered for the final answer generation. This filtering happens after retrieval but before a traditional reranking step.
Reranking, as implemented in Onyx, traditionally uses cross-encoder models (or potentially a cloud API like Cohere Rerank) to reorder the retrieved chunks based on their relevance scores. This is a separate step from the LLM-based chunk filtering.

Here's one implementation for reference:
https://github.com/castorini/rank_llm/

And here's Gemini's thoughts on implementing the feature given your entire repo as input...

Okay, here's an outline of the plan to implement LLM-based reranking in Onyx using a smaller, more cost-effective LLM like "GPT-4o-mini", followed by a draft implementation based on your provided code structure.

Outline of Changes and Feature Plan

1. Goals

Enable reranking with smaller, more cost-effective LLMs than the primary QA LLM.
Utilize the provided prompt template for the reranking task.
Handle potential parsing failures from the LLM response with retry logic.
Integrate the new reranking capability into the existing architecture with minimal disruption.
Provide configuration options to enable/disable this feature and customize the LLM used.

2. Design Considerations

Modularity: Design the implementation to be modular, allowing easy swapping of LLMs and prompt templates.
Error Handling: Implement robust error handling for cases where the LLM fails to provide a correctly formatted response. This includes retries and potentially falling back to a default behavior (e.g., no reranking).
Performance: Consider the performance impact of using an LLM for reranking. This might involve batching requests or optimizing the prompt to minimize token usage.
Configuration: Allow users to configure the LLM model, API key, and other relevant parameters via environment variables or configuration files.
Integration: Ensure seamless integration with the existing search pipeline, specifically the rerank_settings in the SearchQuery model.

3. Implementation Plan

Model Interface:
- Create a new class, LLMReranker, that implements the interface of a reranking model.
- This class will handle interactions with the LLM, including prompt construction and response parsing.
- The constructor will accept parameters for model configuration (similar to EmbeddingModel).
Prompt Builder:
- Create a helper function or class to construct the prompt for the LLM based on the provided template and the number of documents.
Response Parser:
- Implement a function to parse the LLM's response and extract the ranked order of document IDs.
- Include error handling and retry logic for parsing failures.
Integration with Search Pipeline:
- Modify the search_pipeline to use the LLMReranker when configured.
- Add a new field to the RerankingDetails model to specify the LLM reranker model.
Configuration:
- Add new configuration options to model_configs.py for the LLM reranker model, API key, etc.
Testing:
- Add unit tests to verify the LLMReranker functionality, including prompt construction, response parsing, and error handling.
- Add integration tests to verify the entire flow, including the interaction with the SearchPipeline.

4. Draft Implementation

Here's a draft implementation based on the provided code structure:

File: onyx/llm/utils.py

import re

def parse_rank_results(model_output: str) -> list[int]:
    """Parse the output string from the ranking prompt to extract the ranked list of doc indices.

    Handles a variety of formats, including:
    - Simple ordered lists (e.g., "1 2 3")
    - Lists with brackets (e.g., "[1] [2] [3]" or "1 > 2 > 3")
    - Lists with or without punctuation
    """

    if not model_output:
        return []

    # Try to extract numbers within brackets
    extracted_numbers = re.findall(r"\[(\d+)\]", model_output)
    if not extracted_numbers:
        # If no brackets, try to extract numbers separated by spaces or other delimiters
        extracted_numbers = re.findall(r"\b\d+\b", model_output)

    try:
        return [int(num) for num in extracted_numbers]
    except ValueError:
        return []

File: onyx/llm/reranking.py

import json
from typing import Any

from langchain.schema.messages import HumanMessage
from langchain.schema.messages import SystemMessage

from onyx.chat.prompt_builder.utils import translate_history_to_basemessages
from onyx.llm.interfaces import LLM
from onyx.llm.utils import build_content_with_imgs
from onyx.llm.utils import message_to_prompt_and_imgs
from onyx.prompts.direct_qa_prompts import HISTORY_BLOCK
from onyx.utils.logger import setup_logger
from onyx.utils.text_processing import clean_up_code_blocks

logger = setup_logger()

MAX_RERANKING_TOKENS = 4096

class LLMReranker:
    def __init__(
        self,
        llm: LLM,
    ):
        self.llm = llm

    def rerank(self, query: str, contexts: list[str]) -> list[int]:
        """
        Reranks the given contexts based on the query using the specified LLM.
        Returns a list of indices corresponding to the documents in
        descending order of relevance.
        """
        if not contexts:
            return []

        system_prompt = "You are an expert at evaluating the relevance of a document to a search query.\n"
        "Now, you will be provided a query from the user. The user is trying to answer this query "
        "by using your organization's internal knowledge base. You are given a list of "
        "text chunks, each extracted from a different document and you must order them by how relevant they are to"
        "the users query"

        system_msg = SystemMessage(content=system_prompt)

        user_message = (
            f"Here are {len(context)} chunks from different documents with their document numbers: \n\n"
            "```\n"
        )
        for i, context in enumerate(contexts):
            user_message += f"[{i+1}] {context}\n"
        user_message += "```\n"
        user_message += f"Query: {query}\n"
        user_message += (
            "Provide the ordered list of document numbers, from most relevant to least relevant."
            "\nOrder: "
        )

        user_msg = HumanMessage(content=user_message)

        messages = [system_msg, user_msg]

        result = self.llm.invoke(messages).content
        try:
            # Extract the list of document numbers using a regular expression
            rank_list = re.findall(r"\[(\d+)\]", result)
            return [int(x) - 1 for x in rank_list]
        except Exception as e:
            logger.warning(
                f"Failed to parse LLM response for reranking. Returning default order. Error: {e}"
            )
            return list(range(len(contexts)))

5. Integration with Search Pipeline

You will need to update the SearchPipeline class to include the LLMReranker and call it when reranking is enabled. This will likely involve:

Adding an optional llm_reranker: LLMReranker | None field to the SearchPipeline class.
In the rerank_sections method, check if llm_reranker is not None. If it is, use it to rerank the sections.

6. Configuration

You'll need to add configuration options to specify the LLM reranker model. This could be done in model_configs.py or a new configuration file specifically for reranking. You'll need to define:

LLM_RERANKER_MODEL: The name of the LLM model to use for reranking (e.g., "gpt-4o-mini").
LLM_RERANKER_API_KEY: The API key for the LLM provider (if needed).
LLM_RERANKER_PROVIDER_TYPE: The LLM provider type (e.g., "openai", "anthropic", etc.).

This draft implementation should provide a solid starting point for implementing LLM-based reranking in Onyx. Remember to thoroughly test the changes and handle any edge cases or potential errors.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: ReRank with an LLM #3578

Feature Request: ReRank with an LLM #3578

williambarberjr commented Jan 2, 2025

Feature Request: ReRank with an LLM #3578

Feature Request: ReRank with an LLM #3578

Comments

williambarberjr commented Jan 2, 2025