Merge pull request #133 from pipecat-ai/aleix/readme

rebased jpt/readme branch
pipecat-ai · May 13, 2024 · 721cd11 · 721cd11
2 parents 7856d20 + bfbcb9d
commit 721cd11
Show file tree

Hide file tree

Showing 15 changed files with 108 additions and 63 deletions.
diff --git a/README.md b/README.md
@@ -1,79 +1,123 @@
-[![PyPI](https://img.shields.io/pypi/v/pipecat-ai)](https://pypi.org/project/pipecat-ai)
+<div align="center">
+ <img alt="pipecat" width="300px" height="auto" src="pipecat.png">
+</div>
 
-# Pipecat — an open source framework for voice (and multimodal) assistants
+# Pipecat
+
+[![PyPI](https://img.shields.io/pypi/v/pipecat)](https://pypi.org/project/pipecat) [![Discord](https://img.shields.io/discord/1239284677165056021
+)](https://discord.gg/pipecat)
+
+`pipecat` is a framework for building voice (and multimodal) conversational agents. Things like personal coaches, meeting assistants, story-telling toys for kids, customer support bots, and snarky social companions.
 
 Build things like this:
 
 [![AI-powered voice patient intake for healthcare](https://img.youtube.com/vi/lDevgsp9vn0/0.jpg)](https://www.youtube.com/watch?v=lDevgsp9vn0)
 
-[ [pipecat starter kits repository](https://github.com/daily-co/pipecat-examples) ]
+## Getting started with voice agents
+
+You can get started with Pipecat running on your local machine, then move your agent processes to the cloud when you’re ready. You can also add a telephone number, image output, video input, use different LLMs, and more.
+
+```shell
+# install the module
+pip install pipecat-ai
+
+# set up an .env file with API keys
+cp dot-env.template .env
+```
 
-**`Pipecat` started as a toolkit for implementing generative AI voice bots.** Things like personal coaches, meeting assistants, story-telling toys for kids, customer support bots, and snarky social companions.
+By default, in order to minimize dependencies, only the basic framework functionality is available. Some third-party AI services require additional dependencies that you can install with:
 
-In 2023 a _lot_ of us got excited about the possibility of having open-ended conversations with LLMs. It became clear pretty quickly that we were all solving the same [low-level problems](https://www.daily.co/blog/how-to-talk-to-an-llm-with-your-voice/):
+```shell
+pip install "pipecat-ai[option,...]"
+```
 
-- low-latency, reliable audio transport
-- echo cancellation
-- phrase endpointing (knowing when the bot should respond to human speech)
-- interruptibility
-- writing clean code to stream data through "pipelines" of speech-to-text, LLM inference, and text-to-speech models
+Your project may or may not need these, so they're made available as optional requirements. Here is a list:
 
-As our applications expanded to include additional things like image generation, function calling, and vision models, we started to think about what a complete framework for these kinds of apps could look like.
+- **AI services**: `anthropic`, `azure`, `fal`, `moondream`, `openai`, `playht`, `silero`, `whisper`
+- **Transports**: `daily`, `local`, `websocket`
 
-Today, `pipecat` is:
+## A simple voice agent running locally
 
-1. a set of code building blocks for interacting with generative AI services and creating low-latency, interruptible data pipelines that use multiple services
-2. transport services that moves audio, video, and events across the Internet
-3. implementations of specific generative AI services
+If you’re doing AI-related stuff, you probably have an OpenAI API key.
 
-Currently implemented services:
+To generate voice output, one service that’s easy to get started with is ElevenLabs. If you don’t already have an ElevenLabs developer account, you can sign up for one [here].
 
-- Speech-to-text
-  - Deepgram
-  - Whisper
-- LLMs
-  - Azure
-  - Fireworks
-  - OpenAI
-- Image generation
-  - Azure
-  - Fal
-  - OpenAI
-- Text-to-speech
-  - Azure
-  - Deepgram
-  - ElevenLabs
-- Transport
-  - Daily
-  - Local
-- Vision
-  - Moondream
+So let’s run a really simple agent that’s just a GPT-4 prompt, wired up to voice input and speaker output.
 
-If you'd like to [implement a service](<(https://github.com/daily-co/pipecat/tree/main/src/pipecat/services)>), we welcome PRs! Our goal is to support lots of services in all of the above categories, plus new categories (like real-time video) as they emerge.
+You can change the prompt, in the code. The current prompt is “Tell me something interesting about the Roman Empire.”
 
-## Getting started
+`cd examples/getting-started` to run the following examples …
 
-Today, the easiest way to get started with `pipecat` is to use [Daily](https://www.daily.co/) as your transport service. This toolkit started life as an internal SDK at Daily and millions of minutes of AI conversation have been served using it and its earlier prototype incarnations.
+```shell
+# Talk to a local pipecat process with your voice. Specify GPT-4 as the LLM.
 
+export OPENAI_API_KEY=...
+export ELEVENLABS_API_KEY=...
+python ./local-mic.py | ./pipecat-pipes-gpt-4.py | ./local-speaker.py
 ```
-# install the module
-pip install pipecat
 
-# set up an .env file with API keys
-cp dot-env.template .env
-```
+## WebSockets instead of pipes
+
+To run your agent in the cloud, you can switch the Pipecat transport layer to use a WebSocket instead of Unix pipes.
 
-By default, in order to minimize dependencies, only the basic framework functionality is available. Some third-party AI services require additional
-dependencies that you can install with:
+```shell
+# Talk to a local pipecat process with your voice. Specify GPT-4 as the LLM.
 
+export OPENAI_API_KEY=...
+export ELEVENLABS_API_KEY=...
+python ./local-mic-and-speaker-wss.py wss://localhost:8088
 ```
-pip install "pipecat[option,...]"
+
+## WebRTC for production use
+
+WebSockets are fine for server-to-server communication or for initial development. But for production use, you’ll need client-server audio to use a protocol designed for real-time media transport. (For an explanation of the difference between WebSockets and WebRTC, see [this post.])
+
+One way to get up and running quickly with WebRTC is to sign up for a Daily developer account. Daily gives you SDKs and global infrastructure for audio (and video) routing. Every account gets 10,000 audio/video/transcription minutes free each month.
+
+Sign up [here](https://dashboard.daily.co/u/signup) and [create a room](https://docs.daily.co/reference/rest-api/rooms) in the developer Dashboard. Then run the examples, this time connecting via WebRTC instead of a WebSocket.
+
+```shell
+# 1. Run the pipecat process. Provide your Daily API key and a Daily room
+export DAILY_API_KEY=...
+export OPENAI_API_KEY=...
+export ELEVENLABS_API_KEY=...
+python pipecat-daily-gpt-4.py --daily-room https://example.daily.co/pipecat
+
+# 2. Visit the Daily room link in any web browser to talk to the pipecat process.
+#    You'll want to use a Daily SDK to embed the client-side code into your own
+#    app. But visiting the room URL in a browser is a quick way to start building
+#    agents because you can focus on just the agent code at first.
+open -a "Google Chrome" https://example.daily.co/pipecat
 ```
 
-Your project may or may not need these, so they're made available as optional requirements. Here is a list:
+## Deploy your agent to the cloud
+Now that you’ve decoupled client and server, and have a Pipecat process that can run anywhere you can run Python, you can deploy this example agent to the cloud.
+
+`TBC`
+
+## Taking it further
+
+### Add a telephone number
+Daily supports telephone connections in addition to WebRTC streams. You can add a telephone number to your Daily room with the following REST API call. Once you’ve done that, you can call your agent on the phone.
+
+You’ll need to add a credit card to your Daily account to enable telephone numbers.
+
+`TBC`
+
+
+### Add image output
+
+Daily supports telephone connections in addition to WebRTC streams. You can add a telephone number to your Daily room with the following REST API call. Once you’ve done that, you can call your agent on the phone.
+
+You’ll need to add a credit card to your Daily account to enable telephone numbers.
+
+`TBC`
+
+### Add video output
+
+
+`TBC`
 
-- **AI services**: `anthropic`, `azure`, `fal`, `moondream`, `openai`, `playht`, `silero`, `whisper`
-- **Transports**: `daily`, `local`, `websocket`
 
 ## Code examples
 

diff --git a/pipecat.png b/pipecat.png
diff --git a/pyproject.toml b/pyproject.toml
@@ -33,12 +33,12 @@ Website = "https://pipecat.ai"
 
 [project.optional-dependencies]
 anthropic = [ "anthropic~=0.25.7" ]
-audio = [ "pyaudio~=0.2.0" ]
 azure = [ "azure-cognitiveservices-speech~=1.37.0" ]
 daily = [ "daily-python~=0.7.4" ]
 examples = [ "python-dotenv~=1.0.0", "flask~=3.0.3", "flask_cors~=4.0.1" ]
 fal = [ "fal-client~=0.4.0" ]
 fireworks = [ "openai~=1.26.0" ]
+local = [ "pyaudio~=0.2.0" ]
 moondream = [ "einops~=0.8.0", "timm~=0.9.16", "transformers~=4.40.2" ]
 openai = [ "openai~=1.26.0" ]
 playht = [ "pyht~=0.0.28" ]

diff --git a/src/pipecat/services/anthropic.py b/src/pipecat/services/anthropic.py
@@ -15,7 +15,7 @@
 except ModuleNotFoundError as e:
     logger.error(f"Exception: {e}")
     logger.error(
-        "In order to use Anthropic, you need to `pip install pipecat[anthropic]`. Also, set `ANTHROPIC_API_KEY` environment variable.")
+        "In order to use Anthropic, you need to `pip install pipecat-ai[anthropic]`. Also, set `ANTHROPIC_API_KEY` environment variable.")
     raise Exception(f"Missing module: {e}")
 
 

diff --git a/src/pipecat/services/azure.py b/src/pipecat/services/azure.py
@@ -21,7 +21,7 @@
 except ModuleNotFoundError as e:
     logger.error(f"Exception: {e}")
     logger.error(
-        "In order to use Azure TTS, you need to `pip install pipecat[azure]`. Also, set `AZURE_SPEECH_API_KEY` and `AZURE_SPEECH_REGION` environment variables.")
+        "In order to use Azure TTS, you need to `pip install pipecat-ai[azure]`. Also, set `AZURE_SPEECH_API_KEY` and `AZURE_SPEECH_REGION` environment variables.")
     raise Exception(f"Missing module: {e}")
 
 from pipecat.services.openai_api_llm_service import BaseOpenAILLMService

diff --git a/src/pipecat/services/fal.py b/src/pipecat/services/fal.py
@@ -23,7 +23,7 @@
 except ModuleNotFoundError as e:
     logger.error(f"Exception: {e}")
     logger.error(
-        "In order to use Fal, you need to `pip install pipecat[fal]`. Also, set `FAL_KEY` environment variable.")
+        "In order to use Fal, you need to `pip install pipecat-ai[fal]`. Also, set `FAL_KEY` environment variable.")
     raise Exception(f"Missing module: {e}")
 
 

diff --git a/src/pipecat/services/fireworks.py b/src/pipecat/services/fireworks.py
@@ -13,7 +13,7 @@
 except ModuleNotFoundError as e:
     logger.error(f"Exception: {e}")
     logger.error(
-        "In order to use Fireworks, you need to `pip install pipecat[fireworks]`. Also, set the `FIREWORKS_API_KEY` environment variable.")
+        "In order to use Fireworks, you need to `pip install pipecat-ai[fireworks]`. Also, set the `FIREWORKS_API_KEY` environment variable.")
     raise Exception(f"Missing module: {e}")
 
 

diff --git a/src/pipecat/services/moondream.py b/src/pipecat/services/moondream.py
@@ -19,7 +19,7 @@
     from transformers import AutoModelForCausalLM, AutoTokenizer
 except ModuleNotFoundError as e:
     logger.error(f"Exception: {e}")
-    logger.error("In order to use Moondream, you need to `pip install pipecat[moondream]`.")
+    logger.error("In order to use Moondream, you need to `pip install pipecat-ai[moondream]`.")
     raise Exception(f"Missing module(s): {e}")
 
 

diff --git a/src/pipecat/services/openai.py b/src/pipecat/services/openai.py
@@ -32,7 +32,7 @@
 except ModuleNotFoundError as e:
     logger.error(f"Exception: {e}")
     logger.error(
-        "In order to use OpenAI, you need to `pip install pipecat[openai]`. Also, set `OPENAI_API_KEY` environment variable.")
+        "In order to use OpenAI, you need to `pip install pipecat-ai[openai]`. Also, set `OPENAI_API_KEY` environment variable.")
     raise Exception(f"Missing module: {e}")
 
 

diff --git a/src/pipecat/services/playht.py b/src/pipecat/services/playht.py
@@ -19,7 +19,7 @@
 except ModuleNotFoundError as e:
     logger.error(f"Exception: {e}")
     logger.error(
-        "In order to use PlayHT, you need to `pip install pipecat[playht]`. Also, set `PLAY_HT_USER_ID` and `PLAY_HT_API_KEY` environment variables.")
+        "In order to use PlayHT, you need to `pip install pipecat-ai[playht]`. Also, set `PLAY_HT_USER_ID` and `PLAY_HT_API_KEY` environment variables.")
     raise Exception(f"Missing module: {e}")
 
 

diff --git a/src/pipecat/services/whisper.py b/src/pipecat/services/whisper.py
@@ -22,7 +22,7 @@
 except ModuleNotFoundError as e:
     logger.error(f"Exception: {e}")
     logger.error(
-        "In order to use Whisper, you need to `pip install pipecat[whisper]`.")
+        "In order to use Whisper, you need to `pip install pipecat-ai[whisper]`.")
     raise Exception(f"Missing module: {e}")
 
 

diff --git a/src/pipecat/transports/local/audio.py b/src/pipecat/transports/local/audio.py
@@ -18,7 +18,7 @@
 except ModuleNotFoundError as e:
     logger.error(f"Exception: {e}")
     logger.error(
-        "In order to use local audio, you need to `pip install pipecat[audio]`. On MacOS, you also need to `brew install portaudio`.")
+        "In order to use local audio, you need to `pip install pipecat-ai[local]`. On MacOS, you also need to `brew install portaudio`.")
     raise Exception(f"Missing module: {e}")
 
 

diff --git a/src/pipecat/transports/local/tk.py b/src/pipecat/transports/local/tk.py
@@ -22,7 +22,7 @@
 except ModuleNotFoundError as e:
     logger.error(f"Exception: {e}")
     logger.error(
-        "In order to use local audio, you need to `pip install pipecat[audio]`. On MacOS, you also need to `brew install portaudio`.")
+        "In order to use local audio, you need to `pip install pipecat-ai[audio]`. On MacOS, you also need to `brew install portaudio`.")
     raise Exception(f"Missing module: {e}")
 
 try:

diff --git a/src/pipecat/transports/services/daily.py b/src/pipecat/transports/services/daily.py
@@ -44,7 +44,8 @@
     from daily import (EventHandler, CallClient, Daily)
 except ModuleNotFoundError as e:
     logger.error(f"Exception: {e}")
-    logger.error("In order to use the Daily transport, you need to `pip install pipecat[daily]`.")
+    logger.error(
+        "In order to use the Daily transport, you need to `pip install pipecat-ai[daily]`.")
     raise Exception(f"Missing module: {e}")
 
 VAD_RESET_PERIOD_MS = 2000

diff --git a/src/pipecat/vad/silero.py b/src/pipecat/vad/silero.py
@@ -22,7 +22,7 @@
 
 except ModuleNotFoundError as e:
     logger.error(f"Exception: {e}")
-    logger.error("In order to use Silero VAD, you need to `pip install pipecat[silero]`.")
+    logger.error("In order to use Silero VAD, you need to `pip install pipecat-ai[silero]`.")
     raise Exception(f"Missing module(s): {e}")