Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAGAs/Langsmith Experiments #1869

Open
JoshShailes opened this issue Jan 22, 2025 · 2 comments
Open

RAGAs/Langsmith Experiments #1869

JoshShailes opened this issue Jan 22, 2025 · 2 comments
Labels
bug Something isn't working module-metrics this is part of metrics module question Further information is requested waiting 🤖 waiting for response. In none will close this automatically

Comments

@JoshShailes
Copy link

JoshShailes commented Jan 22, 2025

Your Question
I am attempting to use ragas metrics with langsmith but keep getting an error. The ragas metric works ok with the data I am feeding to it when tested in isolation but once trying to integrate with Langsmith I get the following error.

Error

Error running evaluator <DynamicRunEvaluator ragas_faithfulness> on run ********: IndexError('list index out of range')
Traceback (most recent call last):
  File "\Documents\Qdrant-RAG\QdrantRAG\Lib\site-packages\langsmith\evaluation\_runner.py", line 1573, in _run_evaluators
    evaluator_response = evaluator.evaluate_run(
                         ^^^^^^^^^^^^^^^^^^^^^^^
  File "\Documents\Qdrant-RAG\QdrantRAG\Lib\site-packages\langsmith\evaluation\evaluator.py", line 331, in evaluate_run
    result = self.func(
             ^^^^^^^^^^
  File "\Documents\Qdrant-RAG\QdrantRAG\Lib\site-packages\langsmith\run_helpers.py", line 617, in wrapper
    raise e
  File "\Documents\Qdrant-RAG\QdrantRAG\Lib\site-packages\langsmith\run_helpers.py", line 614, in wrapper
    function_result = run_container["context"].run(func, *args, **kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "\AppData\Local\Temp\ipykernel_4248\2056749116.py", line 88, in ragas_faithfulness
    result = evaluate(
             ^^^^^^^^^
  File "\Documents\Qdrant-RAG\QdrantRAG\Lib\site-packages\ragas\_analytics.py", line 227, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "\Documents\Qdrant-RAG\QdrantRAG\Lib\site-packages\ragas\evaluation.py", line 323, in evaluate
    result = EvaluationResult(
             ^^^^^^^^^^^^^^^^^
  File "<string>", line 10, in __init__
  File "\Documents\Qdrant-RAG\QdrantRAG\Lib\site-packages\ragas\dataset_schema.py", line 420, in __post_init__
    self.traces = parse_run_traces(self.ragas_traces, run_id)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "\Documents\Qdrant-RAG\QdrantRAG\Lib\site-packages\ragas\callbacks.py", line 149, in parse_run_traces
    root_trace = root_traces[0]
                 ~~~~~~~~~~~^^^
IndexError: list index out of range

Code Examples

@traceable(run_type="llm")
def get_text_embedding(text):
    ''' 
    Embed Question to Query the Qdrant DB 
    Input: Text
    Output: Embedding Vector
    '''
    embedding = openai.embeddings.create(input=text, model="text-embedding-ada-002")
    # Do not seem able to track token usage for embedding
    return embedding#.to_dict()['data'][0]['embedding']

@traceable
def query_qdrant(query, collection_name, vector_name="page_vector", top_k=5):
    '''
    Calls get_text_embedding, Queries the Qdrant DB.
    Input: Text (Question), Qdrant Collection Name (optional: vector_name, top_k)
    Output: k=5 Matching Context Chunks
    '''
    # Create embedding vector for the Query
    embedded_query = get_text_embedding(query)
    embedded_query_v = embedded_query.to_dict()['data'][0]['embedding']

    query_results = qdrant_client.search(
        collection_name=collection_name,
        query_vector=(
            vector_name, embedded_query_v
        ),
        limit=top_k,
        query_filter=None
    )

    return query_results

@traceable(run_type="llm")
def generate_response_from_context(query, query_results):
    retrieved_text = '\n'.join([x.payload['page_text'] for x in query_results])

    response = openai.chat.completions.create(
        model = "gpt-35-turbo-16k",
        messages = [
            {"role":"system", "content": "You must answer the question asked by the User using only the included context from the documents."},
            {"role":"user", "content": f"""
            DOCUMENT:
            {retrieved_text}

            QUESTION:
            {query}

            INSTRUCTIONS:
            Answer the users QUESTION using the DOCUMENT text above.
            Keep your answer ground in the facts of the DOCUMENT.
            If the DOCUMENT doesn’t contain the facts to answer the QUESTION return NONE.
            """
            }
        ]
    )
    # by returning the full response we can also track the tokens usage.
    return response
#######################################
# For Running with experiment/evaluator
@traceable
def predict(inputs: dict) -> dict:
    #print(inputs)
    query = inputs['question']

    query_results = query_qdrant(query, collection_name)
    query_results_text = [x.payload['page_text'] for x in query_results]
    #print(query_results_text[0][:20])
    
    response = generate_response_from_context(query, query_results)
    #print(response.choices[0].message.content.strip()[:50])

    return {"output":response.choices[0].message.content.strip(), "source_documents":query_results_text}


#@traceable
def ragas_faithfulness(run: Run, example: Example) -> dict:
    data = [
        {"user_input": run.inputs['inputs']["question"],
        "response": run.outputs["output"],
        "retrieved_contexts": [str(d) for d in run.outputs["source_documents"]]}
    ]
    # Debug prints
    print("Dataset Context entry type:", type([str(d) for d in run.outputs["source_documents"]]))
    # global ragas_dataset ## added so I could test the same data with ragas evaluate outside of the function
    ragas_dataset = Dataset.from_list(data)
    #print(ragas_dataset[0])
    result = evaluate(
        ragas_dataset,
        metrics=[faithfulness],
        raise_exceptions=True
    )
      
    return {"key":"faithfulness", "score": result["faithfulness"]}

dataset_name = "Demo-dataset" # Langsmith Dataset

exp_results = ls_evaluate(
    predict,
    data=dataset_name,
    evaluators=[ragas_faithfulness],
    experiment_prefix="Test_Ragas_Faithfulness"
)

Additional context
A bit lost on what else to try so hoping someone may have experienced this before and can point me in the right direction. The fact that the RAGAs elements work in isolation has stumped me as I'm using the data in the same format as it should be in the functions.

@JoshShailes JoshShailes added the question Further information is requested label Jan 22, 2025
@dosubot dosubot bot added bug Something isn't working module-metrics this is part of metrics module labels Jan 22, 2025
@JoshShailes
Copy link
Author

Additional Information
I have been looking at the Traces and can see that the 'ragas evaluation' in my approach above has a parent_run_id (i.e is not None) and an empty outputs.

@jjmachan
Copy link
Member

@JoshShailes we used to have this issue in the an older version but it was fixed. I'm assuming you are using v0.2.12 right?

@jjmachan jjmachan added the waiting 🤖 waiting for response. In none will close this automatically label Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module-metrics this is part of metrics module question Further information is requested waiting 🤖 waiting for response. In none will close this automatically
Projects
None yet
Development

No branches or pull requests

2 participants