You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Twilio output context is being malformed, resulting in broken audio chunks during the pipeline processing. This appears to be happening in a WebSocket-based voice pipeline that integrates Twilio, Deepgram (for STT/TTS), and OpenAI's GPT-4 for conversation processing.
Repro steps
Initialize a WebSocket connection with Twilio stream_sid
Set up the voice pipeline with the following components:
FastAPIWebsocketTransport with TwilioFrameSerializer
SileroVADAnalyzer for voice activity detection
DeepgramSTTService for speech-to-text
OpenAILLMService for conversation processing
DeepgramTTSService for text-to-speech
Start the pipeline with an initial system message
Begin audio streaming
Expected behavior
Audio chunks should be properly formed and maintained throughout the pipeline
Smooth audio streaming without breaks or malformation
Proper serialization of audio frames by TwilioFrameSerializer
Actual behavior
Audio chunks are breaking during processing, suggesting potential issues with either:
Frame serialization in TwilioFrameSerializer
Audio buffer handling in the WebSocket transport
VAD passthrough configuration
Sample rate or encoding mismatches (currently set to 8000Hz mulaw)
Code
`from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import EndFrame, LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.serializers.twilio import TwilioFrameSerializer
from pipecat.services.deepgram import DeepgramSTTService, DeepgramTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.network.fastapi_websocket import (
FastAPIWebsocketParams,
FastAPIWebsocketTransport,
)
from app.core.config import settings
async def run_twilio_bot(websocket_client, stream_sid):
transport = FastAPIWebsocketTransport(
websocket=websocket_client,
params=FastAPIWebsocketParams(
audio_out_enabled=True,
add_wav_header=False,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
serializer=TwilioFrameSerializer(stream_sid),
),
)
llm = OpenAILLMService(api_key=settings.OPENAI_API_KEY, model="gpt-4o")
stt = DeepgramSTTService(api_key=settings.DEEPGRAM_API_KEY)
tts = DeepgramTTSService(
api_key=settings.DEEPGRAM_API_KEY,
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in an audio call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Websocket input from client
stt, # Speech-To-Text
context_aggregator.user(),
llm, # LLM
tts, # Text-To-Speech
transport.output(), # Websocket output to client
context_aggregator.assistant(),
]
)
task = PipelineTask(pipeline, params=PipelineParams(enable_metrics=True))
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMMessagesFrame(messages)])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
await task.queue_frames([EndFrame()])
runner = PipelineRunner(handle_sigint=True)
await runner.run(task)
To help diagnose this issue, could you provide:
Logs
The text was updated successfully, but these errors were encountered:
Description
Bug Report: Malformed Twilio Audio Output Causing Broken Audio Chunks
Environment
Issue description
The Twilio output context is being malformed, resulting in broken audio chunks during the pipeline processing. This appears to be happening in a WebSocket-based voice pipeline that integrates Twilio, Deepgram (for STT/TTS), and OpenAI's GPT-4 for conversation processing.
Repro steps
Expected behavior
Actual behavior
Audio chunks are breaking during processing, suggesting potential issues with either:
Code
To help diagnose this issue, could you provide:
Logs
The text was updated successfully, but these errors were encountered: