RAGAs/Langsmith Experiments #1869

JoshShailes · 2025-01-22T12:43:01Z

Your Question
I am attempting to use ragas metrics with langsmith but keep getting an error. The ragas metric works ok with the data I am feeding to it when tested in isolation but once trying to integrate with Langsmith I get the following error.

Error

Error running evaluator <DynamicRunEvaluator ragas_faithfulness> on run ********: IndexError('list index out of range')
Traceback (most recent call last):
  File "\Documents\Qdrant-RAG\QdrantRAG\Lib\site-packages\langsmith\evaluation\_runner.py", line 1573, in _run_evaluators
    evaluator_response = evaluator.evaluate_run(
                         ^^^^^^^^^^^^^^^^^^^^^^^
  File "\Documents\Qdrant-RAG\QdrantRAG\Lib\site-packages\langsmith\evaluation\evaluator.py", line 331, in evaluate_run
    result = self.func(
             ^^^^^^^^^^
  File "\Documents\Qdrant-RAG\QdrantRAG\Lib\site-packages\langsmith\run_helpers.py", line 617, in wrapper
    raise e
  File "\Documents\Qdrant-RAG\QdrantRAG\Lib\site-packages\langsmith\run_helpers.py", line 614, in wrapper
    function_result = run_container["context"].run(func, *args, **kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "\AppData\Local\Temp\ipykernel_4248\2056749116.py", line 88, in ragas_faithfulness
    result = evaluate(
             ^^^^^^^^^
  File "\Documents\Qdrant-RAG\QdrantRAG\Lib\site-packages\ragas\_analytics.py", line 227, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "\Documents\Qdrant-RAG\QdrantRAG\Lib\site-packages\ragas\evaluation.py", line 323, in evaluate
    result = EvaluationResult(
             ^^^^^^^^^^^^^^^^^
  File "<string>", line 10, in __init__
  File "\Documents\Qdrant-RAG\QdrantRAG\Lib\site-packages\ragas\dataset_schema.py", line 420, in __post_init__
    self.traces = parse_run_traces(self.ragas_traces, run_id)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "\Documents\Qdrant-RAG\QdrantRAG\Lib\site-packages\ragas\callbacks.py", line 149, in parse_run_traces
    root_trace = root_traces[0]
                 ~~~~~~~~~~~^^^
IndexError: list index out of range

Code Examples

@traceable(run_type="llm")
def get_text_embedding(text):
    ''' 
    Embed Question to Query the Qdrant DB 
    Input: Text
    Output: Embedding Vector
    '''
    embedding = openai.embeddings.create(input=text, model="text-embedding-ada-002")
    # Do not seem able to track token usage for embedding
    return embedding#.to_dict()['data'][0]['embedding']

@traceable
def query_qdrant(query, collection_name, vector_name="page_vector", top_k=5):
    '''
    Calls get_text_embedding, Queries the Qdrant DB.
    Input: Text (Question), Qdrant Collection Name (optional: vector_name, top_k)
    Output: k=5 Matching Context Chunks
    '''
    # Create embedding vector for the Query
    embedded_query = get_text_embedding(query)
    embedded_query_v = embedded_query.to_dict()['data'][0]['embedding']

    query_results = qdrant_client.search(
        collection_name=collection_name,
        query_vector=(
            vector_name, embedded_query_v
        ),
        limit=top_k,
        query_filter=None
    )

    return query_results

@traceable(run_type="llm")
def generate_response_from_context(query, query_results):
    retrieved_text = '\n'.join([x.payload['page_text'] for x in query_results])

    response = openai.chat.completions.create(
        model = "gpt-35-turbo-16k",
        messages = [
            {"role":"system", "content": "You must answer the question asked by the User using only the included context from the documents."},
            {"role":"user", "content": f"""
            DOCUMENT:
            {retrieved_text}

            QUESTION:
            {query}

            INSTRUCTIONS:
            Answer the users QUESTION using the DOCUMENT text above.
            Keep your answer ground in the facts of the DOCUMENT.
            If the DOCUMENT doesn’t contain the facts to answer the QUESTION return NONE.
            """
            }
        ]
    )
    # by returning the full response we can also track the tokens usage.
    return response
#######################################
# For Running with experiment/evaluator
@traceable
def predict(inputs: dict) -> dict:
    #print(inputs)
    query = inputs['question']

    query_results = query_qdrant(query, collection_name)
    query_results_text = [x.payload['page_text'] for x in query_results]
    #print(query_results_text[0][:20])
    
    response = generate_response_from_context(query, query_results)
    #print(response.choices[0].message.content.strip()[:50])

    return {"output":response.choices[0].message.content.strip(), "source_documents":query_results_text}


#@traceable
def ragas_faithfulness(run: Run, example: Example) -> dict:
    data = [
        {"user_input": run.inputs['inputs']["question"],
        "response": run.outputs["output"],
        "retrieved_contexts": [str(d) for d in run.outputs["source_documents"]]}
    ]
    # Debug prints
    print("Dataset Context entry type:", type([str(d) for d in run.outputs["source_documents"]]))
    # global ragas_dataset ## added so I could test the same data with ragas evaluate outside of the function
    ragas_dataset = Dataset.from_list(data)
    #print(ragas_dataset[0])
    result = evaluate(
        ragas_dataset,
        metrics=[faithfulness],
        raise_exceptions=True
    )
      
    return {"key":"faithfulness", "score": result["faithfulness"]}

dataset_name = "Demo-dataset" # Langsmith Dataset

exp_results = ls_evaluate(
    predict,
    data=dataset_name,
    evaluators=[ragas_faithfulness],
    experiment_prefix="Test_Ragas_Faithfulness"
)

Additional context
A bit lost on what else to try so hoping someone may have experienced this before and can point me in the right direction. The fact that the RAGAs elements work in isolation has stumped me as I'm using the data in the same format as it should be in the functions.

The text was updated successfully, but these errors were encountered:

JoshShailes · 2025-01-22T15:27:32Z

Additional Information
I have been looking at the Traces and can see that the 'ragas evaluation' in my approach above has a parent_run_id (i.e is not None) and an empty outputs.

jjmachan · 2025-01-23T18:06:54Z

@JoshShailes we used to have this issue in the an older version but it was fixed. I'm assuming you are using v0.2.12 right?

JoshShailes added the question Further information is requested label Jan 22, 2025

dosubot bot added bug Something isn't working module-metrics this is part of metrics module labels Jan 22, 2025

jjmachan added the waiting 🤖 waiting for response. In none will close this automatically label Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAGAs/Langsmith Experiments #1869

RAGAs/Langsmith Experiments #1869

JoshShailes commented Jan 22, 2025 •

edited by sahusiddharth

Loading

JoshShailes commented Jan 22, 2025

jjmachan commented Jan 23, 2025

RAGAs/Langsmith Experiments #1869

RAGAs/Langsmith Experiments #1869

Comments

JoshShailes commented Jan 22, 2025 • edited by sahusiddharth Loading

JoshShailes commented Jan 22, 2025

jjmachan commented Jan 23, 2025

JoshShailes commented Jan 22, 2025 •

edited by sahusiddharth

Loading