Skip to content

Commit

Permalink
Merge pull request #133 from pipecat-ai/aleix/readme
Browse files Browse the repository at this point in the history
rebased jpt/readme branch
  • Loading branch information
aconchillo authored May 13, 2024
2 parents 7856d20 + bfbcb9d commit 721cd11
Show file tree
Hide file tree
Showing 15 changed files with 108 additions and 63 deletions.
144 changes: 94 additions & 50 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,79 +1,123 @@
[![PyPI](https://img.shields.io/pypi/v/pipecat-ai)](https://pypi.org/project/pipecat-ai)
<div align="center">
 <img alt="pipecat" width="300px" height="auto" src="pipecat.png">
</div>

# Pipecat — an open source framework for voice (and multimodal) assistants
# Pipecat

[![PyPI](https://img.shields.io/pypi/v/pipecat)](https://pypi.org/project/pipecat) [![Discord](https://img.shields.io/discord/1239284677165056021
)](https://discord.gg/pipecat)

`pipecat` is a framework for building voice (and multimodal) conversational agents. Things like personal coaches, meeting assistants, story-telling toys for kids, customer support bots, and snarky social companions.

Build things like this:

[![AI-powered voice patient intake for healthcare](https://img.youtube.com/vi/lDevgsp9vn0/0.jpg)](https://www.youtube.com/watch?v=lDevgsp9vn0)

[ [pipecat starter kits repository](https://github.com/daily-co/pipecat-examples) ]
## Getting started with voice agents

You can get started with Pipecat running on your local machine, then move your agent processes to the cloud when you’re ready. You can also add a telephone number, image output, video input, use different LLMs, and more.

```shell
# install the module
pip install pipecat-ai

# set up an .env file with API keys
cp dot-env.template .env
```

**`Pipecat` started as a toolkit for implementing generative AI voice bots.** Things like personal coaches, meeting assistants, story-telling toys for kids, customer support bots, and snarky social companions.
By default, in order to minimize dependencies, only the basic framework functionality is available. Some third-party AI services require additional dependencies that you can install with:

In 2023 a _lot_ of us got excited about the possibility of having open-ended conversations with LLMs. It became clear pretty quickly that we were all solving the same [low-level problems](https://www.daily.co/blog/how-to-talk-to-an-llm-with-your-voice/):
```shell
pip install "pipecat-ai[option,...]"
```

- low-latency, reliable audio transport
- echo cancellation
- phrase endpointing (knowing when the bot should respond to human speech)
- interruptibility
- writing clean code to stream data through "pipelines" of speech-to-text, LLM inference, and text-to-speech models
Your project may or may not need these, so they're made available as optional requirements. Here is a list:

As our applications expanded to include additional things like image generation, function calling, and vision models, we started to think about what a complete framework for these kinds of apps could look like.
- **AI services**: `anthropic`, `azure`, `fal`, `moondream`, `openai`, `playht`, `silero`, `whisper`
- **Transports**: `daily`, `local`, `websocket`

Today, `pipecat` is:
## A simple voice agent running locally

1. a set of code building blocks for interacting with generative AI services and creating low-latency, interruptible data pipelines that use multiple services
2. transport services that moves audio, video, and events across the Internet
3. implementations of specific generative AI services
If you’re doing AI-related stuff, you probably have an OpenAI API key.

Currently implemented services:
To generate voice output, one service that’s easy to get started with is ElevenLabs. If you don’t already have an ElevenLabs developer account, you can sign up for one [here].

- Speech-to-text
- Deepgram
- Whisper
- LLMs
- Azure
- Fireworks
- OpenAI
- Image generation
- Azure
- Fal
- OpenAI
- Text-to-speech
- Azure
- Deepgram
- ElevenLabs
- Transport
- Daily
- Local
- Vision
- Moondream
So let’s run a really simple agent that’s just a GPT-4 prompt, wired up to voice input and speaker output.

If you'd like to [implement a service](<(https://github.com/daily-co/pipecat/tree/main/src/pipecat/services)>), we welcome PRs! Our goal is to support lots of services in all of the above categories, plus new categories (like real-time video) as they emerge.
You can change the prompt, in the code. The current prompt is “Tell me something interesting about the Roman Empire.”

## Getting started
`cd examples/getting-started` to run the following examples …

Today, the easiest way to get started with `pipecat` is to use [Daily](https://www.daily.co/) as your transport service. This toolkit started life as an internal SDK at Daily and millions of minutes of AI conversation have been served using it and its earlier prototype incarnations.
```shell
# Talk to a local pipecat process with your voice. Specify GPT-4 as the LLM.

export OPENAI_API_KEY=...
export ELEVENLABS_API_KEY=...
python ./local-mic.py | ./pipecat-pipes-gpt-4.py | ./local-speaker.py
```
# install the module
pip install pipecat

# set up an .env file with API keys
cp dot-env.template .env
```
## WebSockets instead of pipes

To run your agent in the cloud, you can switch the Pipecat transport layer to use a WebSocket instead of Unix pipes.

By default, in order to minimize dependencies, only the basic framework functionality is available. Some third-party AI services require additional
dependencies that you can install with:
```shell
# Talk to a local pipecat process with your voice. Specify GPT-4 as the LLM.

export OPENAI_API_KEY=...
export ELEVENLABS_API_KEY=...
python ./local-mic-and-speaker-wss.py wss://localhost:8088
```
pip install "pipecat[option,...]"

## WebRTC for production use

WebSockets are fine for server-to-server communication or for initial development. But for production use, you’ll need client-server audio to use a protocol designed for real-time media transport. (For an explanation of the difference between WebSockets and WebRTC, see [this post.])

One way to get up and running quickly with WebRTC is to sign up for a Daily developer account. Daily gives you SDKs and global infrastructure for audio (and video) routing. Every account gets 10,000 audio/video/transcription minutes free each month.

Sign up [here](https://dashboard.daily.co/u/signup) and [create a room](https://docs.daily.co/reference/rest-api/rooms) in the developer Dashboard. Then run the examples, this time connecting via WebRTC instead of a WebSocket.

```shell
# 1. Run the pipecat process. Provide your Daily API key and a Daily room
export DAILY_API_KEY=...
export OPENAI_API_KEY=...
export ELEVENLABS_API_KEY=...
python pipecat-daily-gpt-4.py --daily-room https://example.daily.co/pipecat

# 2. Visit the Daily room link in any web browser to talk to the pipecat process.
# You'll want to use a Daily SDK to embed the client-side code into your own
# app. But visiting the room URL in a browser is a quick way to start building
# agents because you can focus on just the agent code at first.
open -a "Google Chrome" https://example.daily.co/pipecat
```

Your project may or may not need these, so they're made available as optional requirements. Here is a list:
## Deploy your agent to the cloud
Now that you’ve decoupled client and server, and have a Pipecat process that can run anywhere you can run Python, you can deploy this example agent to the cloud.

`TBC`

## Taking it further

### Add a telephone number
Daily supports telephone connections in addition to WebRTC streams. You can add a telephone number to your Daily room with the following REST API call. Once you’ve done that, you can call your agent on the phone.

You’ll need to add a credit card to your Daily account to enable telephone numbers.

`TBC`


### Add image output

Daily supports telephone connections in addition to WebRTC streams. You can add a telephone number to your Daily room with the following REST API call. Once you’ve done that, you can call your agent on the phone.

You’ll need to add a credit card to your Daily account to enable telephone numbers.

`TBC`

### Add video output


`TBC`

- **AI services**: `anthropic`, `azure`, `fal`, `moondream`, `openai`, `playht`, `silero`, `whisper`
- **Transports**: `daily`, `local`, `websocket`

## Code examples

Expand Down
Binary file added pipecat.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,12 @@ Website = "https://pipecat.ai"

[project.optional-dependencies]
anthropic = [ "anthropic~=0.25.7" ]
audio = [ "pyaudio~=0.2.0" ]
azure = [ "azure-cognitiveservices-speech~=1.37.0" ]
daily = [ "daily-python~=0.7.4" ]
examples = [ "python-dotenv~=1.0.0", "flask~=3.0.3", "flask_cors~=4.0.1" ]
fal = [ "fal-client~=0.4.0" ]
fireworks = [ "openai~=1.26.0" ]
local = [ "pyaudio~=0.2.0" ]
moondream = [ "einops~=0.8.0", "timm~=0.9.16", "transformers~=4.40.2" ]
openai = [ "openai~=1.26.0" ]
playht = [ "pyht~=0.0.28" ]
Expand Down
2 changes: 1 addition & 1 deletion src/pipecat/services/anthropic.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error(
"In order to use Anthropic, you need to `pip install pipecat[anthropic]`. Also, set `ANTHROPIC_API_KEY` environment variable.")
"In order to use Anthropic, you need to `pip install pipecat-ai[anthropic]`. Also, set `ANTHROPIC_API_KEY` environment variable.")
raise Exception(f"Missing module: {e}")


Expand Down
2 changes: 1 addition & 1 deletion src/pipecat/services/azure.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error(
"In order to use Azure TTS, you need to `pip install pipecat[azure]`. Also, set `AZURE_SPEECH_API_KEY` and `AZURE_SPEECH_REGION` environment variables.")
"In order to use Azure TTS, you need to `pip install pipecat-ai[azure]`. Also, set `AZURE_SPEECH_API_KEY` and `AZURE_SPEECH_REGION` environment variables.")
raise Exception(f"Missing module: {e}")

from pipecat.services.openai_api_llm_service import BaseOpenAILLMService
Expand Down
2 changes: 1 addition & 1 deletion src/pipecat/services/fal.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error(
"In order to use Fal, you need to `pip install pipecat[fal]`. Also, set `FAL_KEY` environment variable.")
"In order to use Fal, you need to `pip install pipecat-ai[fal]`. Also, set `FAL_KEY` environment variable.")
raise Exception(f"Missing module: {e}")


Expand Down
2 changes: 1 addition & 1 deletion src/pipecat/services/fireworks.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error(
"In order to use Fireworks, you need to `pip install pipecat[fireworks]`. Also, set the `FIREWORKS_API_KEY` environment variable.")
"In order to use Fireworks, you need to `pip install pipecat-ai[fireworks]`. Also, set the `FIREWORKS_API_KEY` environment variable.")
raise Exception(f"Missing module: {e}")


Expand Down
2 changes: 1 addition & 1 deletion src/pipecat/services/moondream.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
from transformers import AutoModelForCausalLM, AutoTokenizer
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Moondream, you need to `pip install pipecat[moondream]`.")
logger.error("In order to use Moondream, you need to `pip install pipecat-ai[moondream]`.")
raise Exception(f"Missing module(s): {e}")


Expand Down
2 changes: 1 addition & 1 deletion src/pipecat/services/openai.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error(
"In order to use OpenAI, you need to `pip install pipecat[openai]`. Also, set `OPENAI_API_KEY` environment variable.")
"In order to use OpenAI, you need to `pip install pipecat-ai[openai]`. Also, set `OPENAI_API_KEY` environment variable.")
raise Exception(f"Missing module: {e}")


Expand Down
2 changes: 1 addition & 1 deletion src/pipecat/services/playht.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error(
"In order to use PlayHT, you need to `pip install pipecat[playht]`. Also, set `PLAY_HT_USER_ID` and `PLAY_HT_API_KEY` environment variables.")
"In order to use PlayHT, you need to `pip install pipecat-ai[playht]`. Also, set `PLAY_HT_USER_ID` and `PLAY_HT_API_KEY` environment variables.")
raise Exception(f"Missing module: {e}")


Expand Down
2 changes: 1 addition & 1 deletion src/pipecat/services/whisper.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error(
"In order to use Whisper, you need to `pip install pipecat[whisper]`.")
"In order to use Whisper, you need to `pip install pipecat-ai[whisper]`.")
raise Exception(f"Missing module: {e}")


Expand Down
2 changes: 1 addition & 1 deletion src/pipecat/transports/local/audio.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error(
"In order to use local audio, you need to `pip install pipecat[audio]`. On MacOS, you also need to `brew install portaudio`.")
"In order to use local audio, you need to `pip install pipecat-ai[local]`. On MacOS, you also need to `brew install portaudio`.")
raise Exception(f"Missing module: {e}")


Expand Down
2 changes: 1 addition & 1 deletion src/pipecat/transports/local/tk.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error(
"In order to use local audio, you need to `pip install pipecat[audio]`. On MacOS, you also need to `brew install portaudio`.")
"In order to use local audio, you need to `pip install pipecat-ai[audio]`. On MacOS, you also need to `brew install portaudio`.")
raise Exception(f"Missing module: {e}")

try:
Expand Down
3 changes: 2 additions & 1 deletion src/pipecat/transports/services/daily.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,8 @@
from daily import (EventHandler, CallClient, Daily)
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use the Daily transport, you need to `pip install pipecat[daily]`.")
logger.error(
"In order to use the Daily transport, you need to `pip install pipecat-ai[daily]`.")
raise Exception(f"Missing module: {e}")

VAD_RESET_PERIOD_MS = 2000
Expand Down
2 changes: 1 addition & 1 deletion src/pipecat/vad/silero.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@

except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Silero VAD, you need to `pip install pipecat[silero]`.")
logger.error("In order to use Silero VAD, you need to `pip install pipecat-ai[silero]`.")
raise Exception(f"Missing module(s): {e}")


Expand Down

0 comments on commit 721cd11

Please sign in to comment.