diff --git a/README.md b/README.md index 331d27b2d..168666aac 100644 --- a/README.md +++ b/README.md @@ -1,79 +1,123 @@ -[![PyPI](https://img.shields.io/pypi/v/pipecat-ai)](https://pypi.org/project/pipecat-ai) +
pipecat +
-# Pipecat — an open source framework for voice (and multimodal) assistants +# Pipecat + +[![PyPI](https://img.shields.io/pypi/v/pipecat)](https://pypi.org/project/pipecat) [![Discord](https://img.shields.io/discord/1239284677165056021 +)](https://discord.gg/pipecat) + +`pipecat` is a framework for building voice (and multimodal) conversational agents. Things like personal coaches, meeting assistants, story-telling toys for kids, customer support bots, and snarky social companions. Build things like this: [![AI-powered voice patient intake for healthcare](https://img.youtube.com/vi/lDevgsp9vn0/0.jpg)](https://www.youtube.com/watch?v=lDevgsp9vn0) -[ [pipecat starter kits repository](https://github.com/daily-co/pipecat-examples) ] +## Getting started with voice agents + +You can get started with Pipecat running on your local machine, then move your agent processes to the cloud when you’re ready. You can also add a telephone number, image output, video input, use different LLMs, and more. + +```shell +# install the module +pip install pipecat-ai + +# set up an .env file with API keys +cp dot-env.template .env +``` -**`Pipecat` started as a toolkit for implementing generative AI voice bots.** Things like personal coaches, meeting assistants, story-telling toys for kids, customer support bots, and snarky social companions. +By default, in order to minimize dependencies, only the basic framework functionality is available. Some third-party AI services require additional dependencies that you can install with: -In 2023 a _lot_ of us got excited about the possibility of having open-ended conversations with LLMs. It became clear pretty quickly that we were all solving the same [low-level problems](https://www.daily.co/blog/how-to-talk-to-an-llm-with-your-voice/): +```shell +pip install "pipecat-ai[option,...]" +``` -- low-latency, reliable audio transport -- echo cancellation -- phrase endpointing (knowing when the bot should respond to human speech) -- interruptibility -- writing clean code to stream data through "pipelines" of speech-to-text, LLM inference, and text-to-speech models +Your project may or may not need these, so they're made available as optional requirements. Here is a list: -As our applications expanded to include additional things like image generation, function calling, and vision models, we started to think about what a complete framework for these kinds of apps could look like. +- **AI services**: `anthropic`, `azure`, `fal`, `moondream`, `openai`, `playht`, `silero`, `whisper` +- **Transports**: `daily`, `local`, `websocket` -Today, `pipecat` is: +## A simple voice agent running locally -1. a set of code building blocks for interacting with generative AI services and creating low-latency, interruptible data pipelines that use multiple services -2. transport services that moves audio, video, and events across the Internet -3. implementations of specific generative AI services +If you’re doing AI-related stuff, you probably have an OpenAI API key. -Currently implemented services: +To generate voice output, one service that’s easy to get started with is ElevenLabs. If you don’t already have an ElevenLabs developer account, you can sign up for one [here]. -- Speech-to-text - - Deepgram - - Whisper -- LLMs - - Azure - - Fireworks - - OpenAI -- Image generation - - Azure - - Fal - - OpenAI -- Text-to-speech - - Azure - - Deepgram - - ElevenLabs -- Transport - - Daily - - Local -- Vision - - Moondream +So let’s run a really simple agent that’s just a GPT-4 prompt, wired up to voice input and speaker output. -If you'd like to [implement a service](<(https://github.com/daily-co/pipecat/tree/main/src/pipecat/services)>), we welcome PRs! Our goal is to support lots of services in all of the above categories, plus new categories (like real-time video) as they emerge. +You can change the prompt, in the code. The current prompt is “Tell me something interesting about the Roman Empire.” -## Getting started +`cd examples/getting-started` to run the following examples … -Today, the easiest way to get started with `pipecat` is to use [Daily](https://www.daily.co/) as your transport service. This toolkit started life as an internal SDK at Daily and millions of minutes of AI conversation have been served using it and its earlier prototype incarnations. +```shell +# Talk to a local pipecat process with your voice. Specify GPT-4 as the LLM. +export OPENAI_API_KEY=... +export ELEVENLABS_API_KEY=... +python ./local-mic.py | ./pipecat-pipes-gpt-4.py | ./local-speaker.py ``` -# install the module -pip install pipecat -# set up an .env file with API keys -cp dot-env.template .env -``` +## WebSockets instead of pipes + +To run your agent in the cloud, you can switch the Pipecat transport layer to use a WebSocket instead of Unix pipes. -By default, in order to minimize dependencies, only the basic framework functionality is available. Some third-party AI services require additional -dependencies that you can install with: +```shell +# Talk to a local pipecat process with your voice. Specify GPT-4 as the LLM. +export OPENAI_API_KEY=... +export ELEVENLABS_API_KEY=... +python ./local-mic-and-speaker-wss.py wss://localhost:8088 ``` -pip install "pipecat[option,...]" + +## WebRTC for production use + +WebSockets are fine for server-to-server communication or for initial development. But for production use, you’ll need client-server audio to use a protocol designed for real-time media transport. (For an explanation of the difference between WebSockets and WebRTC, see [this post.]) + +One way to get up and running quickly with WebRTC is to sign up for a Daily developer account. Daily gives you SDKs and global infrastructure for audio (and video) routing. Every account gets 10,000 audio/video/transcription minutes free each month. + +Sign up [here](https://dashboard.daily.co/u/signup) and [create a room](https://docs.daily.co/reference/rest-api/rooms) in the developer Dashboard. Then run the examples, this time connecting via WebRTC instead of a WebSocket. + +```shell +# 1. Run the pipecat process. Provide your Daily API key and a Daily room +export DAILY_API_KEY=... +export OPENAI_API_KEY=... +export ELEVENLABS_API_KEY=... +python pipecat-daily-gpt-4.py --daily-room https://example.daily.co/pipecat + +# 2. Visit the Daily room link in any web browser to talk to the pipecat process. +# You'll want to use a Daily SDK to embed the client-side code into your own +# app. But visiting the room URL in a browser is a quick way to start building +# agents because you can focus on just the agent code at first. +open -a "Google Chrome" https://example.daily.co/pipecat ``` -Your project may or may not need these, so they're made available as optional requirements. Here is a list: +## Deploy your agent to the cloud +Now that you’ve decoupled client and server, and have a Pipecat process that can run anywhere you can run Python, you can deploy this example agent to the cloud. + +`TBC` + +## Taking it further + +### Add a telephone number +Daily supports telephone connections in addition to WebRTC streams. You can add a telephone number to your Daily room with the following REST API call. Once you’ve done that, you can call your agent on the phone. + +You’ll need to add a credit card to your Daily account to enable telephone numbers. + +`TBC` + + +### Add image output + +Daily supports telephone connections in addition to WebRTC streams. You can add a telephone number to your Daily room with the following REST API call. Once you’ve done that, you can call your agent on the phone. + +You’ll need to add a credit card to your Daily account to enable telephone numbers. + +`TBC` + +### Add video output + + +`TBC` -- **AI services**: `anthropic`, `azure`, `fal`, `moondream`, `openai`, `playht`, `silero`, `whisper` -- **Transports**: `daily`, `local`, `websocket` ## Code examples diff --git a/pipecat.png b/pipecat.png new file mode 100644 index 000000000..912360f2c Binary files /dev/null and b/pipecat.png differ diff --git a/pyproject.toml b/pyproject.toml index 49ac82492..33d61424e 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -33,12 +33,12 @@ Website = "https://pipecat.ai" [project.optional-dependencies] anthropic = [ "anthropic~=0.25.7" ] -audio = [ "pyaudio~=0.2.0" ] azure = [ "azure-cognitiveservices-speech~=1.37.0" ] daily = [ "daily-python~=0.7.4" ] examples = [ "python-dotenv~=1.0.0", "flask~=3.0.3", "flask_cors~=4.0.1" ] fal = [ "fal-client~=0.4.0" ] fireworks = [ "openai~=1.26.0" ] +local = [ "pyaudio~=0.2.0" ] moondream = [ "einops~=0.8.0", "timm~=0.9.16", "transformers~=4.40.2" ] openai = [ "openai~=1.26.0" ] playht = [ "pyht~=0.0.28" ] diff --git a/src/pipecat/services/anthropic.py b/src/pipecat/services/anthropic.py index 8632fdaf1..25620f783 100644 --- a/src/pipecat/services/anthropic.py +++ b/src/pipecat/services/anthropic.py @@ -15,7 +15,7 @@ except ModuleNotFoundError as e: logger.error(f"Exception: {e}") logger.error( - "In order to use Anthropic, you need to `pip install pipecat[anthropic]`. Also, set `ANTHROPIC_API_KEY` environment variable.") + "In order to use Anthropic, you need to `pip install pipecat-ai[anthropic]`. Also, set `ANTHROPIC_API_KEY` environment variable.") raise Exception(f"Missing module: {e}") diff --git a/src/pipecat/services/azure.py b/src/pipecat/services/azure.py index d56058821..596e6726d 100644 --- a/src/pipecat/services/azure.py +++ b/src/pipecat/services/azure.py @@ -21,7 +21,7 @@ except ModuleNotFoundError as e: logger.error(f"Exception: {e}") logger.error( - "In order to use Azure TTS, you need to `pip install pipecat[azure]`. Also, set `AZURE_SPEECH_API_KEY` and `AZURE_SPEECH_REGION` environment variables.") + "In order to use Azure TTS, you need to `pip install pipecat-ai[azure]`. Also, set `AZURE_SPEECH_API_KEY` and `AZURE_SPEECH_REGION` environment variables.") raise Exception(f"Missing module: {e}") from pipecat.services.openai_api_llm_service import BaseOpenAILLMService diff --git a/src/pipecat/services/fal.py b/src/pipecat/services/fal.py index 1049b4428..ca58f0337 100644 --- a/src/pipecat/services/fal.py +++ b/src/pipecat/services/fal.py @@ -23,7 +23,7 @@ except ModuleNotFoundError as e: logger.error(f"Exception: {e}") logger.error( - "In order to use Fal, you need to `pip install pipecat[fal]`. Also, set `FAL_KEY` environment variable.") + "In order to use Fal, you need to `pip install pipecat-ai[fal]`. Also, set `FAL_KEY` environment variable.") raise Exception(f"Missing module: {e}") diff --git a/src/pipecat/services/fireworks.py b/src/pipecat/services/fireworks.py index 402384d0d..6d2d44e6c 100644 --- a/src/pipecat/services/fireworks.py +++ b/src/pipecat/services/fireworks.py @@ -13,7 +13,7 @@ except ModuleNotFoundError as e: logger.error(f"Exception: {e}") logger.error( - "In order to use Fireworks, you need to `pip install pipecat[fireworks]`. Also, set the `FIREWORKS_API_KEY` environment variable.") + "In order to use Fireworks, you need to `pip install pipecat-ai[fireworks]`. Also, set the `FIREWORKS_API_KEY` environment variable.") raise Exception(f"Missing module: {e}") diff --git a/src/pipecat/services/moondream.py b/src/pipecat/services/moondream.py index f74ba828b..e069c98ed 100644 --- a/src/pipecat/services/moondream.py +++ b/src/pipecat/services/moondream.py @@ -19,7 +19,7 @@ from transformers import AutoModelForCausalLM, AutoTokenizer except ModuleNotFoundError as e: logger.error(f"Exception: {e}") - logger.error("In order to use Moondream, you need to `pip install pipecat[moondream]`.") + logger.error("In order to use Moondream, you need to `pip install pipecat-ai[moondream]`.") raise Exception(f"Missing module(s): {e}") diff --git a/src/pipecat/services/openai.py b/src/pipecat/services/openai.py index b15d7950b..50b8b1478 100644 --- a/src/pipecat/services/openai.py +++ b/src/pipecat/services/openai.py @@ -32,7 +32,7 @@ except ModuleNotFoundError as e: logger.error(f"Exception: {e}") logger.error( - "In order to use OpenAI, you need to `pip install pipecat[openai]`. Also, set `OPENAI_API_KEY` environment variable.") + "In order to use OpenAI, you need to `pip install pipecat-ai[openai]`. Also, set `OPENAI_API_KEY` environment variable.") raise Exception(f"Missing module: {e}") diff --git a/src/pipecat/services/playht.py b/src/pipecat/services/playht.py index 69c7bac9d..b2aa4e198 100644 --- a/src/pipecat/services/playht.py +++ b/src/pipecat/services/playht.py @@ -19,7 +19,7 @@ except ModuleNotFoundError as e: logger.error(f"Exception: {e}") logger.error( - "In order to use PlayHT, you need to `pip install pipecat[playht]`. Also, set `PLAY_HT_USER_ID` and `PLAY_HT_API_KEY` environment variables.") + "In order to use PlayHT, you need to `pip install pipecat-ai[playht]`. Also, set `PLAY_HT_USER_ID` and `PLAY_HT_API_KEY` environment variables.") raise Exception(f"Missing module: {e}") diff --git a/src/pipecat/services/whisper.py b/src/pipecat/services/whisper.py index 768e689c8..e0d14e903 100644 --- a/src/pipecat/services/whisper.py +++ b/src/pipecat/services/whisper.py @@ -22,7 +22,7 @@ except ModuleNotFoundError as e: logger.error(f"Exception: {e}") logger.error( - "In order to use Whisper, you need to `pip install pipecat[whisper]`.") + "In order to use Whisper, you need to `pip install pipecat-ai[whisper]`.") raise Exception(f"Missing module: {e}") diff --git a/src/pipecat/transports/local/audio.py b/src/pipecat/transports/local/audio.py index 32266444f..c0038d250 100644 --- a/src/pipecat/transports/local/audio.py +++ b/src/pipecat/transports/local/audio.py @@ -18,7 +18,7 @@ except ModuleNotFoundError as e: logger.error(f"Exception: {e}") logger.error( - "In order to use local audio, you need to `pip install pipecat[audio]`. On MacOS, you also need to `brew install portaudio`.") + "In order to use local audio, you need to `pip install pipecat-ai[local]`. On MacOS, you also need to `brew install portaudio`.") raise Exception(f"Missing module: {e}") diff --git a/src/pipecat/transports/local/tk.py b/src/pipecat/transports/local/tk.py index 6a05c9a63..3d5ea9650 100644 --- a/src/pipecat/transports/local/tk.py +++ b/src/pipecat/transports/local/tk.py @@ -22,7 +22,7 @@ except ModuleNotFoundError as e: logger.error(f"Exception: {e}") logger.error( - "In order to use local audio, you need to `pip install pipecat[audio]`. On MacOS, you also need to `brew install portaudio`.") + "In order to use local audio, you need to `pip install pipecat-ai[audio]`. On MacOS, you also need to `brew install portaudio`.") raise Exception(f"Missing module: {e}") try: diff --git a/src/pipecat/transports/services/daily.py b/src/pipecat/transports/services/daily.py index 84d690569..0343c53d8 100644 --- a/src/pipecat/transports/services/daily.py +++ b/src/pipecat/transports/services/daily.py @@ -44,7 +44,8 @@ from daily import (EventHandler, CallClient, Daily) except ModuleNotFoundError as e: logger.error(f"Exception: {e}") - logger.error("In order to use the Daily transport, you need to `pip install pipecat[daily]`.") + logger.error( + "In order to use the Daily transport, you need to `pip install pipecat-ai[daily]`.") raise Exception(f"Missing module: {e}") VAD_RESET_PERIOD_MS = 2000 diff --git a/src/pipecat/vad/silero.py b/src/pipecat/vad/silero.py index a9e5aa0ed..f2438b085 100644 --- a/src/pipecat/vad/silero.py +++ b/src/pipecat/vad/silero.py @@ -22,7 +22,7 @@ except ModuleNotFoundError as e: logger.error(f"Exception: {e}") - logger.error("In order to use Silero VAD, you need to `pip install pipecat[silero]`.") + logger.error("In order to use Silero VAD, you need to `pip install pipecat-ai[silero]`.") raise Exception(f"Missing module(s): {e}")