wip proposal: initial commit

pipecat-ai · May 12, 2024 · 588137d · 588137d
1 parent 712a889
commit 588137d
Show file tree

Hide file tree

Showing 130 changed files with 5,058 additions and 3,804 deletions.
diff --git a/.github/workflows/publish.yaml b/.github/workflows/publish.yaml
@@ -46,7 +46,7 @@ jobs:
     needs: [ build ]
     environment:
       name: pypi
-      url: https://pypi.org/p/dailyai
+      url: https://pypi.org/p/pipecat
     permissions:
       id-token: write
     steps:
@@ -67,7 +67,7 @@ jobs:
     needs: [ build ]
     environment:
       name: testpypi
-      url: https://pypi.org/p/dailyai
+      url: https://pypi.org/p/pipecat
     permissions:
       id-token: write
     steps:

diff --git a/.github/workflows/publish_test.yaml b/.github/workflows/publish_test.yaml
@@ -46,7 +46,7 @@ jobs:
     needs: [ build ]
     environment:
       name: testpypi
-      url: https://pypi.org/p/dailyai
+      url: https://pypi.org/p/pipecat
     permissions:
       id-token: write
     steps:

diff --git a/README.md b/README.md
@@ -1,18 +1,14 @@
-[![PyPI](https://img.shields.io/pypi/v/dailyai)](https://pypi.org/project/dailyai)
+[![PyPI](https://img.shields.io/pypi/v/pipecat)](https://pypi.org/project/pipecat)
 
-> [!IMPORTANT]
-> Hackathon attendees - getting started doc can be found [here](https://dailyco.notion.site/Daily-AI-ff356d3a799649e583fa91c1ccfe0d87)
-
-
-# dailyai — an open source framework for real-time, multi-modal, conversational AI applications
+# Pipecat — an open source framework for voice (and multimodal) assistants
 
 Build things like this:
 
 [![AI-powered voice patient intake for healthcare](https://img.youtube.com/vi/lDevgsp9vn0/0.jpg)](https://www.youtube.com/watch?v=lDevgsp9vn0)
 
-[ [dailyai starter kits repository](https://github.com/daily-co/dailyai-examples) ]
+[ [pipecat starter kits repository](https://github.com/daily-co/pipecat-examples) ]
 
-**`dailyai` started as a toolkit for implementing generative AI voice bots.** Things like personal coaches, meeting assistants, story-telling toys for kids, customer support bots, and snarky social companions.
+**`Pipecat` started as a toolkit for implementing generative AI voice bots.** Things like personal coaches, meeting assistants, story-telling toys for kids, customer support bots, and snarky social companions.
 
 In 2023 a _lot_ of us got excited about the possibility of having open-ended conversations with LLMs. It became clear pretty quickly that we were all solving the same [low-level problems](https://www.daily.co/blog/how-to-talk-to-an-llm-with-your-voice/):
 
@@ -24,7 +20,7 @@ In 2023 a _lot_ of us got excited about the possibility of having open-ended con
 
 As our applications expanded to include additional things like image generation, function calling, and vision models, we started to think about what a complete framework for these kinds of apps could look like.
 
-Today, `dailyai` is:
+Today, `pipecat` is:
 
 1. a set of code building blocks for interacting with generative AI services and creating low-latency, interruptible data pipelines that use multiple services
 2. transport services that moves audio, video, and events across the Internet
@@ -49,19 +45,19 @@ Currently implemented services:
   - ElevenLabs
 - Transport
   - Daily
-  - Local (in progress, intended as a quick start example service)
+  - Local
 - Vision
   - Moondream
 
-If you'd like to [implement a service](<(https://github.com/daily-co/daily-ai-sdk/tree/main/src/dailyai/services)>), we welcome PRs! Our goal is to support lots of services in all of the above categories, plus new categories (like real-time video) as they emerge.
+If you'd like to [implement a service](<(https://github.com/daily-co/pipecat/tree/main/src/pipecat/services)>), we welcome PRs! Our goal is to support lots of services in all of the above categories, plus new categories (like real-time video) as they emerge.
 
 ## Getting started
 
-Today, the easiest way to get started with `dailyai` is to use [Daily](https://www.daily.co/) as your transport service. This toolkit started life as an internal SDK at Daily and millions of minutes of AI conversation have been served using it and its earlier prototype incarnations. (The [transport base class](https://github.com/daily-co/daily-ai-sdk/blob/main/src/dailyai/transports/abstract_transport.py) is easy to extend, though, so feel free to submit PRs if you'd like to implement another transport service.)
+Today, the easiest way to get started with `pipecat` is to use [Daily](https://www.daily.co/) as your transport service. This toolkit started life as an internal SDK at Daily and millions of minutes of AI conversation have been served using it and its earlier prototype incarnations.
 
 ```
 # install the module
-pip install dailyai
+pip install pipecat
 
 # set up an .env file with API keys
 cp dot-env.template .env
@@ -71,7 +67,7 @@ By default, in order to minimize dependencies, only the basic framework function
 dependencies that you can install with:
 
 ```
-pip install "dailyai[option,...]"
+pip install "pipecat[option,...]"
 ```
 
 Your project may or may not need these, so they're made available as optional requirements. Here is a list:
@@ -83,8 +79,8 @@ Your project may or may not need these, so they're made available as optional re
 
 There are two directories of examples:
 
-- [foundational](https://github.com/daily-co/daily-ai-sdk/tree/main/examples/foundational) — demos that build on each other, introducing one or two concepts at a time
-- [starter apps](https://github.com/daily-co/daily-ai-sdk/tree/main/examples/starter-apps) — complete applications that you can use as starting points for development
+- [foundational](https://github.com/daily-co/pipecat/tree/main/examples/foundational) — examples that build on each other, introducing one or two concepts at a time
+- [starter apps](https://github.com/daily-co/pipecat/tree/main/examples/starter-apps) — complete applications that you can use as starting points for development
 
 Before running the examples you need to install the dependencies (which will install all the dependencies to run all of the examples):
 

diff --git a/dev-requirements.txt b/dev-requirements.txt
@@ -1,6 +1,6 @@
-autopep8==2.0.4
-build==1.0.3
-pip-tools==7.4.1
-pytest==8.1.1
-setuptools==69.2.0
-setuptools_scm==8.0.4
+autopep8~=2.1.0
+build~=1.2.1
+pip-tools~=7.4.1
+pytest~=8.2.0
+setuptools~=69.5.1
+setuptools_scm~=8.1.0
diff --git a/docs/README.md b/docs/README.md
@@ -1,16 +1,16 @@
-# Daily AI SDK Docs
+# Pipecat Docs
 
 ## [Architecture Overview](architecture.md)
 
-Learn about the thinking behind the SDK's design.
+Learn about the thinking behind the framework's design.
 
 ## [A Frame's Progress](frame-progress.md)
 
 See how a Frame is processed through a Transport, a Pipeline, and a series of Frame Processors.
 
 ## [Example Code](examples/)
 
-The repo includes several example apps in the `examples` directory. The docs explain how they work.
+The repository includes several example apps in the `examples` directory. The docs explain how they work.
 
 ## [API Reference](api/)
 

diff --git a/docs/architecture.md b/docs/architecture.md
@@ -1,4 +1,4 @@
-# Daily AI SDK Architecture Guide
+# Pipecat architecture guide
 
 ## Frames
 
@@ -10,8 +10,8 @@ Frame processors operate on frames. Every frame processor implements a `process_
 
 ## Pipelines
 
-Pipelines are lists of frame processors that read from a source queue and send the processed frames to a sink queue. A very simple pipeline might chain an LLM frame processor to a text-to-speech frame processor, with a transport's send queue as its sync. Placing LLM message frames on the pipeline's source queue will cause the LLM's response to be spoken. See example #2 for an implementation of this.
+Pipelines are lists of frame processors linked together. Frame processors can push frames upstream or downstream to their peers. A very simple pipeline might chain an LLM frame processor to a text-to-speech frame processor, with a transport as an output.
 
 ## Transports
 
-Transports provide a receive queue, which is input from "the outside world", and a sink queue, which is data that will be sent "to the outside world". The `LocalTransportService` does this with the local camera, mic, display and speaker. The `DailyTransportService` does this with a WebRTC session joined to a Daily.co room.
+Transports provide input and output frame processors to receive or send frames respectively. For example, the `DailyTransport` does this with a WebRTC session joined to a Daily.co room.
diff --git a/examples/foundational/01-say-one-thing.py b/examples/foundational/01-say-one-thing.py
@@ -1,53 +1,55 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
 import asyncio
 import aiohttp
-import logging
 import os
-from dailyai.pipeline.frames import EndFrame, TextFrame
-from dailyai.pipeline.pipeline import Pipeline
+import sys
 
-from dailyai.transports.daily_transport import DailyTransport
-from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
+from pipecat.frames.frames import EndFrame, TextFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.task import PipelineTask
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.services.elevenlabs import ElevenLabsTTSService
+from pipecat.transports.services.daily import DailyParams, DailyTransport
 
 from runner import configure
 
+from loguru import logger
+
 from dotenv import load_dotenv
 load_dotenv(override=True)
 
-logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
-logger = logging.getLogger("dailyai")
-logger.setLevel(logging.DEBUG)
+logger.remove(0)
+logger.add(sys.stderr, level="DEBUG")
 
 
 async def main(room_url):
     async with aiohttp.ClientSession() as session:
         transport = DailyTransport(
-            room_url,
-            None,
-            "Say One Thing",
-            mic_enabled=True,
-        )
+            room_url, None, "Say One Thing", DailyParams(audio_out_enabled=True))
 
         tts = ElevenLabsTTSService(
             aiohttp_session=session,
             api_key=os.getenv("ELEVENLABS_API_KEY"),
             voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
         )
 
-        pipeline = Pipeline([tts])
+        runner = PipelineRunner()
+
+        task = PipelineTask(Pipeline([tts, transport.output()]))
 
         # Register an event handler so we can play the audio when the
         # participant joins.
         @transport.event_handler("on_participant_joined")
-        async def on_participant_joined(transport, participant):
-            if participant["info"]["isLocal"]:
-                return
-
+        async def on_new_participant_joined(transport, participant):
             participant_name = participant["info"]["userName"] or ''
-            await pipeline.queue_frames([TextFrame("Hello there, " + participant_name + "!"), EndFrame()])
-
-        await transport.run(pipeline)
-        del tts
+            await task.queue_frames([TextFrame(f"Hello there, {participant_name}!"), EndFrame()])
 
+        await runner.run(task)
 
 if __name__ == "__main__":
     (url, token) = configure()

diff --git a/examples/foundational/01a-local-audio.py b/examples/foundational/01a-local-audio.py
@@ -0,0 +1,53 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+import asyncio
+import aiohttp
+import os
+import sys
+
+from pipecat.frames.frames import EndFrame, TextFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineTask
+from pipecat.services.elevenlabs import ElevenLabsTTSService
+from pipecat.transports.base_transport import TransportParams
+from pipecat.transports.local.audio import AudioLocalTransport
+
+from loguru import logger
+
+from dotenv import load_dotenv
+load_dotenv(override=True)
+
+logger.remove(0)
+logger.add(sys.stderr, level="DEBUG")
+
+
+async def main():
+    async with aiohttp.ClientSession() as session:
+        transport = AudioLocalTransport(TransportParams(audio_out_enabled=True))
+
+        tts = ElevenLabsTTSService(
+            aiohttp_session=session,
+            api_key=os.getenv("ELEVENLABS_API_KEY"),
+            voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
+        )
+
+        pipeline = Pipeline([tts, transport.output()])
+
+        task = PipelineTask(pipeline)
+
+        async def say_something():
+            await asyncio.sleep(1)
+            await task.queue_frames([TextFrame("Hello there!"), EndFrame()])
+
+        runner = PipelineRunner()
+
+        await asyncio.gather(runner.run(task), say_something())
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
diff --git a/examples/foundational/01a-local-transport.py b/examples/foundational/01a-local-transport.py
diff --git a/examples/foundational/02-llm-say-one-thing.py b/examples/foundational/02-llm-say-one-thing.py
@@ -1,23 +1,31 @@
-import asyncio
-import os
-import logging
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
 
+import asyncio
 import aiohttp
+import os
+import sys
 
-from dailyai.pipeline.frames import EndFrame, LLMMessagesFrame
-from dailyai.pipeline.pipeline import Pipeline
-from dailyai.transports.daily_transport import DailyTransport
-from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
-from dailyai.services.open_ai_services import OpenAILLMService
+from pipecat.frames.frames import EndFrame, LLMMessagesFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineTask
+from pipecat.services.elevenlabs import ElevenLabsTTSService
+from pipecat.services.openai import OpenAILLMService
+from pipecat.transports.services.daily import DailyParams, DailyTransport
 
 from runner import configure
 
+from loguru import logger
+
 from dotenv import load_dotenv
 load_dotenv(override=True)
 
-logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
-logger = logging.getLogger("dailyai")
-logger.setLevel(logging.DEBUG)
+logger.remove(0)
+logger.add(sys.stderr, level="DEBUG")
 
 
 async def main(room_url):
@@ -26,8 +34,7 @@ async def main(room_url):
             room_url,
             None,
             "Say One Thing From an LLM",
-            mic_enabled=True,
-        )
+            DailyParams(audio_out_enabled=True))
 
         tts = ElevenLabsTTSService(
             aiohttp_session=session,
@@ -45,13 +52,15 @@ async def main(room_url):
                 "content": "You are an LLM in a WebRTC session, and this is a 'hello world' demo. Say hello to the world.",
             }]
 
-        pipeline = Pipeline([llm, tts])
+        runner = PipelineRunner()
+
+        task = PipelineTask(Pipeline([llm, tts, transport.output()]))
 
-        @transport.event_handler("on_first_other_participant_joined")
-        async def on_first_other_participant_joined(transport, participant):
-            await pipeline.queue_frames([LLMMessagesFrame(messages), EndFrame()])
+        @transport.event_handler("on_first_participant_joined")
+        async def on_first_participant_joined(transport, participant):
+            await task.queue_frames([LLMMessagesFrame(messages), EndFrame()])
 
-        await transport.run(pipeline)
+        await runner.run(task)
 
 
 if __name__ == "__main__":