Twilio Audio Output Causing Broken Audio Chunks #826

saitharunsai · 2024-12-11T03:30:32Z

Description

Bug Report: Malformed Twilio Audio Output Causing Broken Audio Chunks

Environment

pipecat-ai version: 0.0.50
python version: 3.11.10
OS: M2

Issue description

The Twilio output context is being malformed, resulting in broken audio chunks during the pipeline processing. This appears to be happening in a WebSocket-based voice pipeline that integrates Twilio, Deepgram (for STT/TTS), and OpenAI's GPT-4 for conversation processing.

Repro steps

Initialize a WebSocket connection with Twilio stream_sid
Set up the voice pipeline with the following components:
- FastAPIWebsocketTransport with TwilioFrameSerializer
- SileroVADAnalyzer for voice activity detection
- DeepgramSTTService for speech-to-text
- OpenAILLMService for conversation processing
- DeepgramTTSService for text-to-speech
Start the pipeline with an initial system message
Begin audio streaming

Expected behavior

Audio chunks should be properly formed and maintained throughout the pipeline
Smooth audio streaming without breaks or malformation
Proper serialization of audio frames by TwilioFrameSerializer

Actual behavior

Audio chunks are breaking during processing, suggesting potential issues with either:

Frame serialization in TwilioFrameSerializer
Audio buffer handling in the WebSocket transport
VAD passthrough configuration
Sample rate or encoding mismatches (currently set to 8000Hz mulaw)

Code

`from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import EndFrame, LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.serializers.twilio import TwilioFrameSerializer
from pipecat.services.deepgram import DeepgramSTTService, DeepgramTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.network.fastapi_websocket import (
    FastAPIWebsocketParams,
    FastAPIWebsocketTransport,
)

from app.core.config import settings


async def run_twilio_bot(websocket_client, stream_sid):
    transport = FastAPIWebsocketTransport(
        websocket=websocket_client,
        params=FastAPIWebsocketParams(
            audio_out_enabled=True,
            add_wav_header=False,
            vad_enabled=True,
            vad_analyzer=SileroVADAnalyzer(),
            vad_audio_passthrough=True,
            serializer=TwilioFrameSerializer(stream_sid),
        ),
    )

    llm = OpenAILLMService(api_key=settings.OPENAI_API_KEY, model="gpt-4o")

    stt = DeepgramSTTService(api_key=settings.DEEPGRAM_API_KEY)

    tts = DeepgramTTSService(
        api_key=settings.DEEPGRAM_API_KEY,
    )

    messages = [
        {
            "role": "system",
            "content": "You are a helpful LLM in an audio call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
        },
    ]

    context = OpenAILLMContext(messages)
    context_aggregator = llm.create_context_aggregator(context)

    pipeline = Pipeline(
        [
            transport.input(),  # Websocket input from client
            stt,  # Speech-To-Text
            context_aggregator.user(),
            llm,  # LLM
            tts,  # Text-To-Speech
            transport.output(),  # Websocket output to client
            context_aggregator.assistant(),
        ]
    )

    task = PipelineTask(pipeline, params=PipelineParams(enable_metrics=True))

    @transport.event_handler("on_client_connected")
    async def on_client_connected(transport, client):
        # Kick off the conversation.
        messages.append(
            {"role": "system", "content": "Please introduce yourself to the user."}
        )
        await task.queue_frames([LLMMessagesFrame(messages)])

    @transport.event_handler("on_client_disconnected")
    async def on_client_disconnected(transport, client):
        await task.queue_frames([EndFrame()])

    runner = PipelineRunner(handle_sigint=True)

    await runner.run(task)

To help diagnose this issue, could you provide:

Logs

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Twilio Audio Output Causing Broken Audio Chunks #826

Twilio Audio Output Causing Broken Audio Chunks #826

saitharunsai commented Dec 11, 2024 •

edited

Loading

Twilio Audio Output Causing Broken Audio Chunks #826

Twilio Audio Output Causing Broken Audio Chunks #826

Comments

saitharunsai commented Dec 11, 2024 • edited Loading

Description

Environment

Issue description

Repro steps

Expected behavior

Actual behavior

Code

Logs

saitharunsai commented Dec 11, 2024 •

edited

Loading