@component
class LLMBlenderRanker()
Implements a LLM output ranking method with a pairwise reward model using the LLM Blender framework.
Usage Example:
llm_ranker = LLMBlenderRanker(model="llm-blender/PairRM")
answers = [
GeneratedAnswer(data="Paris is the capital of France.", query="What makes Paris unique?", documents=[]),
GeneratedAnswer(
data="The Eiffel Tower is an iconic landmark in Paris.", query="What makes Paris unique?", documents=[]
),
GeneratedAnswer(data="Berlin is a beautiful city.", query="What makes Paris unique?", documents=[]),
]
output = llm_ranker.run(answers=answers)
ranked_answers = output["answers"]
print(ranked_answers)
# [
# GeneratedAnswer(
# data="The Eiffel Tower is an iconic landmark in Paris.",
# query="What makes Paris unique?",
# documents=[],
# meta={},
# ),
# GeneratedAnswer(
# data="Paris is the capital of France.", query="What makes Paris unique?", documents=[], meta={}
# ),
# GeneratedAnswer(data="Berlin is a beautiful city.", query="What makes Paris unique?", documents=[], meta={}),
# ]
def __init__(model: str = "llm-blender/PairRM",
device: str = "cpu",
model_kwargs: Optional[Dict[str, Any]] = None)
Initialize a LLMBlenderRanker.
Arguments:
model
: Local path or name of the model in Hugging Face's model hub, such as'llm-blender/PairRM'
.device
: The device on which the model is loaded. IfNone
, the default device is automatically selected.model_kwargs
: Keyword arguments to be passed to the LLM Blender model.
def warm_up()
Warm up the pair ranking model used for scoring the answers.
@component.output_types(documents=List[GeneratedAnswer])
def run(answers: Variadic[List[GeneratedAnswer]])
Rank the output answers using the LLM Blender model.
Arguments:
answers
: A list of answers to be ranked.
Returns:
A list of ranked answers.
class LLMBlenderEvaluator()
Implements an evaluator for assessing the performance of predictions against labels using BLEURT, BARTScore, and BERTScore.
def __init__(preds, labels)
Evaluates the performance of predictions against labels using BLEURT, BARTScore, and BERTScore.
Arguments:
preds
: A list of predicted outputs.labels
: A list of reference or target outputs.
def prepare_inputs()
Ensures that predictions and labels are formatted correctly before computing scores.
def compute_mean_scores(scores) -> float
Computes the mean of a list of scores.
Arguments:
scores
: A list of scores.
Returns:
The mean score.
def compute_bleurt() -> float
Computes the BLEURT score for the provided predictions and labels.
Returns:
The BLEURT score.
def compute_bartscore() -> float
Computes the BARTScore for the provided predictions and labels.
Returns:
The BARTScore.
def compute_bertscore() -> float
Computes the BERTScore for the provided predictions and labels.
Returns:
The BERTScore.
def compute_metrics() -> Dict[str, float]
Computes BLEURT, BARTScore, and BERTScore for the provided predictions and labels.
Returns:
A dictionary containing the computed metrics.