Why is the return value of Score empty when using Langsmith for RAG evaluation? #28448

RXZAN · 2024-12-02T09:15:03Z

RXZAN
Dec 2, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

import json
import os
from langchain_chroma import Chroma
from langchain_ollama import ChatOllama
from langchain_ollama import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_text_splitters import RecursiveJsonSplitter
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langsmith.evaluation import LangChainStringEvaluator
from langsmith.schemas import Run, Example
from langsmith import evaluate, Client
from langsmith import traceable
from langchain import hub

os.environ["USER_AGENT"] = "MyCustomUserAgent/1.0"
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "RAG_test"
os.environ["LANGCHAIN_API_KEY"] = "my_API"  # LangChain API Key
LANGCHAIN_ENDPOINT = "https://api.smith.langchain.com" 

client = Client()
dataset_name = "RAG_text"
dataset = client.clone_public_dataset(
    "https://smith.langchain.com/public/a63525f9-bdf2-4512-83e3-077dc9417f96/d",
    dataset_name=dataset_name
)

embeddings_model = OllamaEmbeddings(model="bge-large")
llm = ChatOllama(model="llama3.1:8b")

file_path = r"my_fire"
with open(file_path, 'r', encoding='utf-8') as file:
    json_data = json.load(file)
splitter = RecursiveJsonSplitter(max_chunk_size=1000)
docs = splitter.create_documents(texts=[json_data])
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
all_splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=all_splits, embedding=embeddings_model)
retriever = vectorstore.as_retriever(search_type="similarity")

system_prompt = """
You need to answer the questions from the retrieved documents from a vector database.
Context:{context}
Question：{question}
"""
custom_rag_prompt = PromptTemplate.from_template(system_prompt)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

grade_prompt_answer_accuracy = prompt = hub.pull("langchain-ai/rag-answer-vs-reference")

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | custom_rag_prompt
    | llm
    | StrOutputParser()
    | RunnablePassthrough()
)

def predict_rag_answer(example: dict):
    """Use this for answer evaluation"""
    response = rag_chain.invoke(example["input_question"])
    # print("response-->", response)
    return {"answer": response}


def predict_rag_answer_with_context(example: dict):
    """Use this for evaluation of retrieved documents and hallucinations"""

    
    response = rag_chain.invoke({
        "question": example["input_question"],  
        "context": example.get("context", "") 
    })


    print(response)

  
    return {
        "answer": response.get("answer", "No answer found"), 
        "contexts": response.get("contexts", [])  
    }


def answer_evaluator(run, example) -> dict:
    """
    A simple evaluator for RAG answer accuracy
    """

    # Get question, ground truth answer, RAG chain answer
    input_question = example.inputs["input_question"]
    print("input_question-->", input_question)
    reference = example.outputs["output_answer"]
    print("refer_output-->", reference)
    prediction = run.outputs["answer"]
    print("predict-->", prediction)

    # Structured prompt
    answer_grader = grade_prompt_answer_accuracy | llm

    # Run evaluator
    score = answer_grader.invoke({"question": input_question,
                                  "correct_answer": reference,
                                  "student_answer": prediction})
    print("score-->", score)
    score_value = score["Score"]
    score_float = float(score_value)

    if score_value is None:
        print("Error: score is None.")
        return {"key": "answer_v_reference_score", "score": 0}  

    return {"key": "answer_v_reference_score", "score": score_float}

experiment_results = evaluate(
    predict_rag_answer,
    data=dataset_name,
    evaluators=[answer_evaluator],
    experiment_prefix="rag-answer-v-reference",
)
# print("experiment_results-->", experiment_results)


Error code:
TypeError("'NoneType' object is not subscriptable")

Traceback (most recent call last):
  File "C:\Users\dell\AppData\Roaming\Python\Python312\site-packages\langsmith\run_helpers.py", line 609, in wrapper
    function_result = run_container["context"].run(func, *args, **kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\python_projects\Whole\RAG_evaluate.py", line 110, in answer_evaluator
    score_value = score["Score"]
                  ~~~~~^^^^^^^^^
TypeError: 'NoneType' object is not subscriptable

Description

When I use RAG evaluation, sometimes the evaluation score is empty, and sometimes the evaluation score is normal. When I look at the output in Langsmith, I find that there seems to be a problem with the output format. How can I solve this problem so that I can get the correct score format reply.

Format for correct output：
{
"Explanation": "The student answer does not contain any conflicting statements. Although it mentions a plausible explanation for why the script address might be related to the second step, it is factually accurate relative to the ground truth answer.",
"Score": "1"
}

Output format when errors occur：
To grade this student answer, I need to follow the given criteria.

Step 1: The first criterion is to grade the student answers based ONLY on their factual accuracy relative to the ground truth answer.
The student answer does not mention any specific parameters, but instead asks for clarification or more information. This means that the student's answer does not contain any conflicting statements with the ground truth.

Step 2: According to the second criterion, I need to ensure that the student answer does not contain any conflicting statements.
Since the student's answer does not claim anything about the parameters, it is safe from conflicting statements.

Step 3: The third criterion states that it is OK if the student answer contains more information than the ground truth answer, as long as it is factually accurate relative to the ground truth answer.
In this case, the student's answer contains less information than the ground truth, but what it does contain (asking for clarification) is indeed factually accurate.

Based on these steps and the criteria provided, I can now assign a score.

The student's answer does not provide any specific parameters and instead asks for more information. This means that their answer does meet all of the criteria mentioned in the prompt.

However, it seems like there was an expectation of providing actual parameters, which the student failed to do. But since they didn't claim anything conflicting or extra that wasn't accurate, I can give them a score based on this understanding.

Score: 0

System Info

feijoes · 2024-12-02T13:22:59Z

feijoes
Dec 2, 2024

@RXZAN what 'print("score-->", score)' print?

2 replies

RXZAN Dec 3, 2024
Author

It's None

RXZAN Dec 3, 2024
Author

sorry,I don't understand how he solved his problem,I am a new guy,I just started learning python and Langchain recently.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is the return value of Score empty when using Langsmith for RAG evaluation? #28448

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Why is the return value of Score empty when using Langsmith for RAG evaluation? #28448

RXZAN Dec 2, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 1 comment · 2 replies

feijoes Dec 2, 2024

RXZAN Dec 3, 2024 Author

RXZAN Dec 3, 2024 Author

RXZAN
Dec 2, 2024

Replies: 1 comment 2 replies

feijoes
Dec 2, 2024

RXZAN Dec 3, 2024
Author

RXZAN Dec 3, 2024
Author