Skip to content

Commit

Permalink
Merge pull request #70 from pipecat-ai/mb/transcript-processor
Browse files Browse the repository at this point in the history
Add TranscriptProcessor docs
  • Loading branch information
markbackman authored Dec 19, 2024
2 parents f11a219 + 65740d1 commit 6d13e43
Show file tree
Hide file tree
Showing 2 changed files with 138 additions and 1 deletion.
3 changes: 2 additions & 1 deletion mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,8 @@
{
"group": "Filters",
"pages": ["server/utilities/filters/stt-mute"]
}
},
"server/utilities/transcript-processor"
]
},
{
Expand Down
136 changes: 136 additions & 0 deletions server/utilities/transcript-processor.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
---
title: "TranscriptProcessor"
description: "Factory for creating and managing user and assistant transcript processors with shared event handling"
---

## Overview

`TranscriptProcessor` is a factory that creates and manages processors for handling conversation transcripts from both users and assistants. It provides unified access to transcript processors with shared event handling, making it easy to track and respond to conversation updates in real-time.

The processor normalizes messages from various LLM services (OpenAI, Anthropic, Google) into a consistent format and emits events when new messages are added to the conversation.

<Tip>
Check out the transcript processor examples for
[OpenAI](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/28a-transcription-processor-openai.py),
[Anthropic](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/28b-transcript-processor-anthropic.py),
and [Google
Gemini](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/28c-transcription-processor-gemini.py)
to see it in action.
</Tip>

## Factory Methods

<ParamField path="user" type="method">
Creates/returns a UserTranscriptProcessor instance that handles user messages
</ParamField>

<ParamField path="assistant" type="method">
Creates/returns an AssistantTranscriptProcessor instance that handles
assistant messages
</ParamField>

<ParamField path="event_handler" type="decorator">
Registers event handlers that will be applied to both processors
</ParamField>

## Events

<ParamField path="on_transcript_update" type="event">
Emitted when new messages are added to the conversation transcript. Handler
receives: - processor: The TranscriptProcessor instance - frame:
TranscriptionUpdateFrame containing new messages
</ParamField>

## Transcription Messages

Transcription messages are normalized to this format:

```python
@dataclass
class TranscriptionMessage:
role: Literal["user", "assistant"] # Type of message sender
content: str # Transcription message
timestamp: str | None = None # Message timestamp
```

## Pipeline Integration

The TranscriptProcessor is designed to be used in a pipeline to process conversation transcripts in real-time. In the pipeline:

- The UserTranscriptProcessor (`transcript.user()`) handles TranscriptionFrames from the STT service
- The AssistantTranscriptProcessor (`transcript.assistant()`) handles TextFrames from the LLM service

Ensure that the processors are placed after their respective services in the pipeline. See the Basic Usage example below for more details.

## Usage Examples

### Basic Usage

This example shows the basic usage for the TranscriptProcessor factory.

```python
transcript = TranscriptProcessor()

pipeline = Pipeline([
transport.input(),
stt,
transcript.user(), # Process user messages
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
transcript.assistant(), # Process assistant messages
])

# Register event handler for transcript updates
@transcript.event_handler("on_transcript_update")
async def handle_update(processor, frame):
for msg in frame.messages:
print(f"{msg.role}: {msg.content}")
```

### Maintaining Conversation History

This example extends the basic usage example by showing how to create a custom handler to maintain conversation history and log new messages with timestamps.

```python
class TranscriptHandler:
def __init__(self):
self.messages = []

async def on_transcript_update(self, processor, frame):
self.messages.extend(frame.messages)

# Log new messages with timestamps
for msg in frame.messages:
timestamp = f"[{msg.timestamp}] " if msg.timestamp else ""
print(f"{timestamp}{msg.role}: {msg.content}")

transcript = TranscriptProcessor()
handler = TranscriptHandler()

@transcript.event_handler("on_transcript_update")
async def on_update(processor, frame):
await handler.on_transcript_update(processor, frame)
```

### Frame Flow

```mermaid
graph TD
A[User Input] --> B[UserTranscriptProcessor]
B --> C[TranscriptionUpdateFrame]
D[LLM Output] --> E[AssistantTranscriptProcessor]
E --> F[TranscriptionUpdateFrame]
C --> G[Event Handlers]
F --> G
```

## Notes

- Supports multiple LLM services (OpenAI, Anthropic, Google)
- Normalizes message formats from different services
- Maintains conversation history with timestamps
- Emits events for real-time transcript updates
- Thread-safe for concurrent processing

0 comments on commit 6d13e43

Please sign in to comment.