[Bug]: Race condition: Wrong trace_id sent to Langfuse when Redis caching is enabled #6783

yuriykuzin · 2024-11-17T10:41:10Z

What happened

When using LiteLLM with Redis caching enabled and making parallel calls, incorrect trace_ids are being sent to Langfuse, despite langfuse_context.get_current_trace_id() returning the correct value. The issue appears to be a race condition that only occurs when Redis caching is enabled - the problem disappears when using in-memory cache only.

LiteLLM version: 1.52.9

Steps to Reproduce

Set up LiteLLM with Redis caching and Langfuse integration
Run multiple parallel calls using the code below
Observe the trace IDs in Langfuse dashboard or add printing trace_id here:

litellm/litellm/integrations/langfuse/langfuse.py

Line 296 in 5f298cb

return {"trace_id": trace_id, "generation_id": generation_id}

Reproduction Code

import asyncio
from litellm import Router
import litellm
from langfuse.decorators import observe
import os
from langfuse.decorators import langfuse_context

# Configuration
MODEL_NAME = "your-model-name"  # Change to your deployment name
API_BASE = "https://your-endpoint.openai.azure.com"  # Insert your api base
API_VERSION = "2023-12-01-preview"
API_KEY = os.getenv("AZURE_API_KEY")
REDIS_URL = "redis://localhost:6379"

# Langfuse configuration
os.environ["LANGFUSE_HOST"] = "your-langfuse-host"
os.environ["LANGFUSE_PUBLIC_KEY"] = "your-public-key"
os.environ["LANGFUSE_SECRET_KEY"] = "your-secret-key"

# Configure LiteLLM callbacks
litellm.success_callback = ["langfuse"]
litellm.failure_callback = ["langfuse"]

# Initialize router
router = Router(
    model_list=[
        {
            "model_name": MODEL_NAME,
            "litellm_params": {
                "model": f"azure/{MODEL_NAME}",
                "api_base": API_BASE,
                "api_key": API_KEY,
                "api_version": API_VERSION,
            },
        }
    ],
    default_litellm_params={"acompletion": True},
    # Once REDIS is enabled here, langfuse integration sends the wrong
    # trace_id in parallel calls:
    redis_url=REDIS_URL,
)


async def call_llm(prompt: str):
    # Correct trace_id is printed here:
    print(
        "get_current_trace_id:",
        langfuse_context.get_current_trace_id(),
    )

    # Surprisingly, acompletion() works good, but we need
    # completions.create() to be fixed, as we need it for integration with
    # Instructor.
    # response = await router.acompletion(

    response = await router.chat.completions.create(
        model=MODEL_NAME,
        messages=[{"role": "user", "content": prompt}],
        metadata={
            "trace_id": langfuse_context.get_current_trace_id(),
            "generation_name": prompt,
            "debug_langfuse": True,
        },
    )
    return response


@observe()
async def process():
    # First call with Request1
    await call_llm("Tell me the result of 2+2")

    # Second call with Request2
    await call_llm("Do you like Math, yes or no?")


async def main():
    # Run two process functions in parallel
    await asyncio.gather(process(), process())


if __name__ == "__main__":
    asyncio.run(main())

Current Behavior

When Redis caching is enabled and parallel calls are made:

langfuse_context.get_current_trace_id() returns the correct trace_id
However, the wrong trace_id is being sent to Langfuse
This can be verified by adding a print statement before line 296 in litellm/integrations/langfuse/langfuse.py

get_current_trace_id: c45394a2-4fa0-4599-aa3c-88a101b35868
get_current_trace_id: fcb74aee-2de0-465e-b1f3-afd4730fe193
Real sent trace_id: fcb74aee-2de0-465e-b1f3-afd4730fe193
get_current_trace_id: c45394a2-4fa0-4599-aa3c-88a101b35868
Real sent trace_id: fcb74aee-2de0-465e-b1f3-afd4730fe193
get_current_trace_id: fcb74aee-2de0-465e-b1f3-afd4730fe193
Real sent trace_id: c45394a2-4fa0-4599-aa3c-88a101b35868
Real sent trace_id: fcb74aee-2de0-465e-b1f3-afd4730fe193

Here c45394a2-4fa0-4599-aa3c-88a101b35868 should be sent twice, but in fact it has been sent only once. And fcb74aee-2de0-465e-b1f3-afd4730fe193 should be sent only twice, but it has been sent 3 times.

Expected Behavior

The correct trace_id should be sent to Langfuse, matching the one returned by langfuse_context.get_current_trace_id()
Trace IDs should remain consistent regardless of whether Redis caching is enabled or not.

In this example here's what is being sent when Redis is disabled:

get_current_trace_id: 3a0d9972-9730-465e-9a63-840e9c8f8fd3
get_current_trace_id: 94e7c707-0bd7-47a9-8e25-bc8f8eca2b6d
Real sent trace_id: 94e7c707-0bd7-47a9-8e25-bc8f8eca2b6d
get_current_trace_id: 94e7c707-0bd7-47a9-8e25-bc8f8eca2b6d
Real sent trace_id: 3a0d9972-9730-465e-9a63-840e9c8f8fd3
get_current_trace_id: 3a0d9972-9730-465e-9a63-840e9c8f8fd3
Real sent trace_id: 94e7c707-0bd7-47a9-8e25-bc8f8eca2b6d
Real sent trace_id: 3a0d9972-9730-465e-9a63-840e9c8f8fd3

Each trace_id has been sent 2 times.

Additional Notes

The issue only occurs when Redis caching is enabled.
The problem disappears when using in-memory cache only.
Interestingly, router.acompletion() works correctly, but router.chat.completions.create() exhibits the issue. This affects integrations that specifically need to use completions.create(), such as Instructor

Possible Investigation Points

Race condition in how trace IDs are handled when Redis caching is enabled.
Difference in trace ID handling between acompletion() and completions.create().

Files to Look At

litellm/integrations/langfuse/langfuse.py (specifically around line 296)

Let me know if you need any additional information or clarification.

Relevant log output

No response

Twitter / LinkedIn details

No response

The text was updated successfully, but these errors were encountered:

yuriykuzin · 2024-11-18T09:58:02Z

Actually, even more, the whole langfuse report during parallel calls sometimes is wrong when Redis caching is enabled.

ishaan-jaff · 2025-02-07T22:17:45Z

Is this still happening on latest ? Can we get help with more details / how to repro @yuriykuzin ?

yuriykuzin added the bug Something isn't working label Nov 17, 2024

ishaan-jaff added logging feb 2025 langfuse labels Feb 7, 2025

julianolm linked a pull request Feb 18, 2025 that will close this issue

fix: langfuse logging bug on router.chat.completions #8625

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Race condition: Wrong trace_id sent to Langfuse when Redis caching is enabled #6783

[Bug]: Race condition: Wrong trace_id sent to Langfuse when Redis caching is enabled #6783

yuriykuzin commented Nov 17, 2024 •

edited

Loading

yuriykuzin commented Nov 18, 2024

ishaan-jaff commented Feb 7, 2025

[Bug]: Race condition: Wrong trace_id sent to Langfuse when Redis caching is enabled #6783

[Bug]: Race condition: Wrong trace_id sent to Langfuse when Redis caching is enabled #6783

Comments

yuriykuzin commented Nov 17, 2024 • edited Loading

What happened

LiteLLM version: 1.52.9

Steps to Reproduce

Reproduction Code

Current Behavior

Expected Behavior

Additional Notes

Possible Investigation Points

Files to Look At

Relevant log output

Twitter / LinkedIn details

yuriykuzin commented Nov 18, 2024

ishaan-jaff commented Feb 7, 2025

yuriykuzin commented Nov 17, 2024 •

edited

Loading