diff --git a/docs/architecture.md b/docs/architecture.md index a44cd2add..00eaeb722 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -1,17 +1,65 @@ -# Pipecat architecture guide +# Pipecat Architecture Guide -## Frames +This guide provides an overview of the key components that make up the Pipecat framework. Understanding these components will help you build efficient and flexible data processing pipelines. -Frames can represent discrete chunks of data, for instance a chunk of text, a chunk of audio, or an image. They can also be used to as control flow, for instance a frame that indicates that there is no more data available, or that a user started or stopped talking. They can also represent more complex data structures, such as a message array used for an LLM completion. +## Core Components -## FrameProcessors +### 1. Frames -Frame processors operate on frames. Every frame processor implements a `process_frame` method that consumes one frame and produces zero or more frames. Frame processors can do simple transforms, such as concatenating text fragments into sentences, or they can treat frames as input for an AI Service, and emit chat completions based on message arrays or transform text into audio or images. +Frames are the fundamental units of data in Pipecat. They can represent: -## Pipelines +- Discrete data chunks (e.g., text, audio, images) +- Control flow signals (e.g., end of data, start/stop of user input) +- Complex data structures (e.g., message arrays for LLM completions) -Pipelines are lists of frame processors linked together. Frame processors can push frames upstream or downstream to their peers. A very simple pipeline might chain an LLM frame processor to a text-to-speech frame processor, with a transport as an output. +Frames serve as a versatile abstraction, allowing Pipecat to handle various types of data uniformly. -## Transports +### 2. Frame Processors -Transports provide input and output frame processors to receive or send frames respectively. For example, the `DailyTransport` does this with a WebRTC session joined to a Daily.co room. +Frame processors are the workhorses of Pipecat. They: + +- Implement a `process_frame` method +- Consume one frame and produce zero or more frames +- Perform operations ranging from simple transformations to complex AI service interactions + +Examples of frame processor operations: +- Concatenating text fragments into sentences +- Generating chat completions based on message arrays +- Converting text to audio or images + +### 3. Pipelines + +Pipelines orchestrate the flow of frames through a series of frame processors. They: + +- Consist of linked lists of frame processors +- Allow frame processors to push frames upstream or downstream +- Enable the creation of complex data processing workflows + +A simple pipeline example: +LLM Frame Processor → Text-to-Speech Frame Processor → Transport (output) + +### 4. Transports + +Transports act as the interface between Pipecat pipelines and the outside world. They: + +- Provide input and output frame processors +- Handle receiving and sending frames +- Can integrate with various communication protocols and services + +Example: The `DailyTransport` interfaces with a WebRTC session in a Daily.co room. + +## How It All Fits Together + +1. Frames enter the system through a Transport. +2. The Pipeline routes Frames through a series of Frame Processors. +3. Each Frame Processor performs its specific operation on the Frame. +4. Processed Frames are either passed to the next Frame Processor or sent out through a Transport. + +This architecture allows for flexible, modular, and scalable data processing pipelines that can handle a wide variety of tasks and data types. + +## Best Practices + +- Design Frame Processors to be modular and reusable +- Use appropriate Transports for your input/output requirements +- Structure your Pipeline to optimize data flow and processing efficiency +- Leverage the flexibility of Frames to represent diverse data types and control signals diff --git a/docs/frame-progress.md b/docs/frame-progress.md index f4348bf88..d1520d977 100644 --- a/docs/frame-progress.md +++ b/docs/frame-progress.md @@ -1,46 +1,90 @@ # A Frame's Progress -1. A user says “Hello, LLM” and the cloud transcription service delivers a transcription to the Transport. +This guide walks you through the journey of a frame as it moves through the Pipecat system, from user input to final output. Understanding this process will help you grasp how Pipecat processes data and how different components interact. + +## Overview of the Process + +1. User Input and Transcription +2. Frame Creation and Pipeline Entry +3. Frame Processing +4. Audio Generation +5. Output and System Reset + +## Detailed Frame Journey + +### 1. User Input and Transcription + +The process begins when a user speaks to the system. + +- User says: "Hello, LLM" +- A cloud transcription service converts this to text +- The transcription is delivered to the Transport + ![A transcript frame arrives](images/frame-progress-01.png) -2. The Transport places a Transcription frame in the Pipeline’s source queue. +### 2. Frame Creation and Pipeline Entry + +The Transport creates a frame from the transcription and introduces it to the pipeline. + +- A Transcription frame is created +- The frame is placed in the Pipeline's source queue + ![Frame in source queue](images/frame-progress-02.png) -3. The Pipeline passes the Transcription frame to the first Frame Processor in its list, the LLM User Message Aggregator. -![To UMA](images/frame-progress-03.png) +### 3. Frame Processing -4. The LLM User Message Aggregator updates the LLM Context with a `{“user”: “Hello LLM”}` message. -![Update context](images/frame-progress-04.png) +The frame moves through various processors in the pipeline. + +#### LLM User Message Aggregator +- Receives the Transcription frame +- Updates the LLM Context with the user's message +- Yields an LLM Message Frame with the updated context -5. The LLM User Message Aggregator yields an LLM Message Frame, containing the updated LLM Context. The Pipeline passes this frame to the LLM Frame Processor. +![Update context](images/frame-progress-04.png) ![Update context](images/frame-progress-05.png) -6. The LLM Frame Processor creates a streaming chat completion based on the LLM context and yields the first chunk of a response, Text Frame with the value “Hi, “. The Pipeline passes this frame to the TTS Frame Processor. The TTS Frame Processor aggregates this response but doesn’t yield anything, yet, because it’s waiting for a full sentence. -![LLM yields Text](images/frame-progress-06.png) +#### LLM Frame Processor +- Creates a streaming chat completion based on the LLM context +- Yields Text Frames with chunks of the response -7. The LLM Frame Processor yields another Text Frame with the value “there.”. The Pipeline passes this frame to the TTS Frame Processor. +![LLM yields Text](images/frame-progress-06.png) ![LLM yields more Text](images/frame-progress-07.png) -8. The TTS Frame Processor now has a full sentence, so it starts streaming audio based on “Hi, there.” It yields the first chunk of streaming audio as an Audio frame, which the Pipeline passes to the LLM Assistant Message Aggregator. +### 4. Audio Generation + +The text response is converted to audio. + +#### TTS Frame Processor +- Aggregates Text Frames until a full sentence is formed +- Generates streaming audio based on the complete sentence +- Yields Audio frames + ![TTS yields Audio](images/frame-progress-08.png) -9. The LLM Assistant Message Aggregator doesn’t do anything with Audio frames, so it immediately yields the frame, unchanged. This is the convention for all Frame Processors: frames that the processor doesn’t process should be immediately yielded. -![pass-through](images/frame-progress-09.png) +### 5. Output and System Reset -10. The Pipeline places the first Audio frame in its sink queue, which is being watched by the Transport. Since the frame is now in a queue, the Pipeline can continue processing other frames. Note that the source and sink queues form a sort of “boundary of concurrent processing” between a Pipeline and the outside world. In a Pipeline, Frames are processed sequentially; once a Frame is on a queue it can be processed in parallel with the frames being processed by the Pipeline. TODO: link to a more in-depth section about this. -![sink queue](images/frame-progress-10.png) +The audio is prepared for output, and the system readies itself for the next interaction. -11. The TTS Frame Processor yields another Audio frame as the Transport transmits the first Audio frame. -![parallel audio](images/frame-progress-11.png) +#### LLM Assistant Message Aggregator +- Passes Audio frames through unchanged +- Updates the LLM Context with the full LLM response when processing is complete -12. As before, the LLM Assistant Message Aggregator immediately yields the Audio frame and the Pipeline places the Audio frame in the sink queue. -![sink queue 2](images/frame-progress-12.png) +#### Pipeline Output +- Places Audio frames in the sink queue for the Transport to handle +- Continues processing frames in parallel -13. The TTS Frame Processor has no more frames to yield. The LLM Frame Processor emits an LLM Response End Frame, which the Pipeline passes to the TTS Frame Processor. -![response end](images/frame-progress-13.png) +![sink queue](images/frame-progress-10.png) -14. The TTS Frame Processor immediately yields the LLM Response End Frame, so the Pipeline passes it along to the LLM Assistant Message Aggregator. The LLM Assistant Message Aggregator updates the LLM Context with the full response from the LLM. TODO TODO: I realized I forgot that the TSS Frame Processor also yields the Text frames that the LLM emitted so that the LLM Assistant Message Aggregator could accumulate them, arrggh. -![response end](images/frame-progress-14.png) +#### System Reset +- After processing all frames, the system returns to a quiet state +- Waits for the next user input to restart the process -15. The system is quiet, and waiting for the next message from the Transport. ![response end](images/frame-progress-15.png) + +## Key Concepts + +- **Concurrent Processing**: The source and sink queues allow for parallel processing between the Pipeline and external components. +- **Frame Processor Convention**: Processors should immediately yield frames they don't process. +- **Context Updates**: Both user input and system responses update the LLM Context, maintaining the conversation state. + +By understanding this flow, you can better conceptualize how Pipecat handles data processing and how to design efficient pipelines for your specific use cases. diff --git a/docs/transports-guide.md b/docs/transports-guide.md new file mode 100644 index 000000000..4cdd1ce6e --- /dev/null +++ b/docs/transports-guide.md @@ -0,0 +1,127 @@ +# Transports Guide + +Transports are a crucial component of the Pipecat framework, serving as the interface between your pipeline and the outside world. They handle the input and output of frames, allowing your pipeline to receive and send data. + +## What are Transports? + +Transports provide input and output frame processors to receive or send frames respectively. They act as the entry and exit points for data in your Pipecat pipeline. + + +## Built-in Transports + +Pipecat comes with several built-in transports to cover common use cases: + +1. **DailyTransport**: Integrates with WebRTC sessions using Daily.co rooms. +2. **WebSocketTransport**: Allows communication over WebSocket connections. +3. **HTTPTransport**: Enables HTTP-based communication for RESTful APIs. +4. **FileTransport**: Reads from and writes to files on the local filesystem. + +### DailyTransport + +The DailyTransport is designed for real-time communication using WebRTC through Daily.co rooms. It's ideal for applications requiring audio/video streaming and real-time data exchange. + +Usage example: +```python +from pipecat.transports import WebSocketTransport +transport = WebSocketTransport(url="ws://example.com/websocket")``` + + +### WebSocketTransport + +WebSocketTransport is suitable for bi-directional, full-duplex communication over a single TCP connection. It's great for real-time applications that require low-latency data exchange. + +Usage example: + +```python +from pipecat.transports import HTTPTransport +transport = HTTPTransport(base_url="https://api.example.com")``` + + +### HTTPTransport + +HTTPTransport is useful for RESTful API interactions or when you need to communicate with HTTP-based services. + +Usage example: +```python +from pipecat.transports import HTTPTransport +transport = HTTPTransport(base_url="https://api.example.com") +``` + + + +### FileTransport + +FileTransport is used for reading from and writing to files on the local filesystem. It's helpful for processing local data or storing output locally. + +Usage example: +python +from pipecat.transports import FileTransport +transport = FileTransport(file_path="path/to/file") + + +## Creating Custom Transports + +To create a custom transport, you need to implement the `Transport` interface. Here's a basic template for creating a custom transport: + +```python +from pipecat.transports import Transport +from pipecat.frames import Frame + +class CustomTransport(Transport): + def __init__(self, *args, **kwargs): + super().__init__() + # Initialize your transport-specific attributes here + + async def connect(self): + # Implement connection logic here + pass + + async def disconnect(self): + # Implement disconnection logic here + pass + + async def receive_frame(self) -> Frame: + # Implement logic to receive a frame and return a Frame object + pass + + async def send_frame(self, frame: Frame): + # Implement logic to send a frame here + pass + + async def is_connected(self) -> bool: + # Implement logic to check the connection status here + pass + +``` + +When implementing a custom transport, ensure that you handle errors gracefully and maintain a consistent state. + +## Use Cases for Different Types of Transports + +1. **Real-time Communication**: Use DailyTransport or WebSocketTransport for applications requiring real-time, bi-directional communication, such as chat applications or live collaboration tools. + +2. **API Integration**: Use HTTPTransport when integrating with RESTful APIs or services that communicate over HTTP. + +3. **Local Data Processing**: Use FileTransport for batch processing of local files or when you need to store output locally. + +4. **IoT and Sensor Data**: Create a custom transport for specific protocols used in IoT devices or sensor networks. + +5. **Database Integration**: Implement a custom transport to directly interface with databases for real-time data processing. + +6. **Message Queues**: Create a custom transport to integrate with message queue systems like RabbitMQ or Apache Kafka for distributed systems. + +## Best Practices + +1. **Error Handling**: Implement robust error handling in your transports to manage connection issues, timeouts, and other potential failures. + +2. **Asynchronous Design**: Design your transports to work asynchronously to prevent blocking operations and improve performance. + +3. **Configuration**: Allow for easy configuration of your transports through constructor parameters or configuration files. + +4. **Logging**: Implement comprehensive logging in your transports to aid in debugging and monitoring. + +5. **Testing**: Create unit tests for your custom transports to ensure they behave correctly under various conditions. + +6. **Documentation**: Provide clear documentation for your custom transports, including usage examples and any specific requirements or limitations. + +By following these guidelines and understanding the various transport options available, you can effectively integrate Pipecat into your data processing workflows and create robust, efficient pipelines.