Skip to content

Commit

Permalink
feat: add e2e gen ai app starter pack multimodal live api pattern
Browse files Browse the repository at this point in the history
  • Loading branch information
eliasecchig committed Jan 8, 2025
1 parent ec19ba9 commit 9333fe2
Show file tree
Hide file tree
Showing 5 changed files with 76 additions and 74 deletions.
1 change: 1 addition & 0 deletions .github/actions/spelling/allow.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1158,6 +1158,7 @@ timechart
tion
titlebar
tobytes
toolcall
toself
toset
tqdm
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
# Multimodal Live Agent

This pattern showcases a real-time conversational RAG agent powered by Gemini. The agent handles audio, video, and text interactions while leveraging tool calling with a vector DB for grounded responses.
This pattern showcases a real-time conversational RAG agent powered by Google Gemini. The agent handles audio, video, and text interactions while leveraging tool calling with a vector DB for grounded responses.

![live_api_diagram](https://storage.googleapis.com/github-repo/generative-ai/sample-apps/e2e-gen-ai-app-starter-pack/live_api_diagram.png)

**Key components:**

- **Python Backend** (in `app/` folder): A production-ready server built with [FastAPI](https://fastapi.tiangolo.com/) and [google-genai](https://googleapis.github.io/python-genai/) that features:

- **Real-time bidirectional communication** via WebSockets between the frontend and Gemini model
- **Integrated tool calling** with vector database support for contextual document retrieval
- **Production-grade reliability** with retry logic and automatic reconnection capabilities
Expand All @@ -21,8 +22,8 @@ This pattern showcases a real-time conversational RAG agent powered by Gemini. T

You can use this pattern in two ways:

1. As a standalone template for rapid prototyping (⚡ 1 minute setup!)
2. As part of the [starter pack](https://goo.gle/e2e-gen-ai-app-starter-pack) for production deployment with Terraform and CI/CD. The pattern comes with comprehensive unit and integration tests.
1. As a standalone template for rapid prototyping (⚡ 1 minute setup!)
2. As part of the [starter pack](https://goo.gle/e2e-gen-ai-app-starter-pack) for production deployment with Terraform and CI/CD. The pattern comes with comprehensive unit and integration tests.

### Standalone Usage

Expand All @@ -40,61 +41,61 @@ gsutil cp gs://e2e-gen-ai-app-starter-pack/multimodal_live_agent.zip . && unzip

#### Backend Setup

1. **Set your default Google Cloud project and region:**
1. **Set your default Google Cloud project and region:**

```bash
export PROJECT_ID="your-gcp-project"
```bash
export PROJECT_ID="your-gcp-project"

gcloud auth login --update-adc
gcloud config set project $PROJECT_ID
gcloud auth application-default set-quota-project $PROJECT_ID
```
gcloud auth login --update-adc
gcloud config set project $PROJECT_ID
gcloud auth application-default set-quota-project $PROJECT_ID
```

<details>
<summary><b>For AI Studio setup:</b></summary>
<details>
<summary><b>For AI Studio setup:</b></summary>

```bash
export VERTEXAI=false
export GOOGLE_API_KEY=your-google-api-key
```
```bash
export VERTEXAI=false
export GOOGLE_API_KEY=your-google-api-key
```

</details>
</details>

2. **Install Dependencies:**
2. **Install Dependencies:**

Install the required Python packages using Poetry:
Install the required Python packages using Poetry:

```bash
poetry install
```
```bash
poetry install
```

3. **Run the Backend Server:**
3. **Run the Backend Server:**

Start the FastAPI server:
Start the FastAPI server:

```bash
poetry run uvicorn app.server:app --host 0.0.0.0 --port 8000 --reload
```
```bash
poetry run uvicorn app.server:app --host 0.0.0.0 --port 8000 --reload
```

#### Frontend Setup

1. **Install Dependencies:**
1. **Install Dependencies:**

In a separate terminal, install the required Node.js packages for the frontend:
In a separate terminal, install the required Node.js packages for the frontend:

```bash
npm --prefix frontend install
```
```bash
npm --prefix frontend install
```

2. **Start the Frontend:**
2. **Start the Frontend:**

Launch the React development server:
Launch the React development server:

```bash
npm --prefix frontend start
```
```bash
npm --prefix frontend start
```

This command starts the frontend application, accessible at `http://localhost:3000`.
This command starts the frontend application, accessible at `http://localhost:3000`.

#### Interact with the Agent

Expand All @@ -104,53 +105,53 @@ Once both the backend and frontend are running, click the play button in the fro

You can quickly test the application in [Cloud Run](https://cloud.google.com/run). Ensure your service account has the `roles/aiplatform.user` role to access Gemini.

1. **Deploy:**
1. **Deploy:**

```bash
export REGION="your-gcp-region"
```bash
export REGION="your-gcp-region"

gcloud run deploy genai-app-sample \
--source . \
--project $PROJECT_ID \
--memory "4Gi" \
--region $REGION
```
gcloud run deploy genai-app-sample \
--source . \
--project $PROJECT_ID \
--memory "4Gi" \
--region $REGION
```

2. **Access:** Use [Cloud Run proxy](https://cloud.google.com/sdk/gcloud/reference/run/services/proxy) for local access. The backend will be accessible at `http://localhost:8000`:
2. **Access:** Use [Cloud Run proxy](https://cloud.google.com/sdk/gcloud/reference/run/services/proxy) for local access. The backend will be accessible at `http://localhost:8000`:

```bash
gcloud run services proxy genai-app-sample --port 8000 --project $PROJECT_ID --region $REGION
```
```bash
gcloud run services proxy genai-app-sample --port 8000 --project $PROJECT_ID --region $REGION
```

You can then use the same frontend setup described above to interact with your Cloud Run deployment.
You can then use the same frontend setup described above to interact with your Cloud Run deployment.

### Integrating with the Starter Pack

This pattern is designed for seamless integration with the [starter pack](https://goo.gle/e2e-gen-ai-app-starter-pack). The starter pack offers a streamlined approach to setting up and deploying multimodal live agents, complete with robust infrastructure and CI/CD pipelines.

### Getting Started

1. **Download the Starter Pack:**
1. **Download the Starter Pack:**

Obtain the starter pack using the following command:
Obtain the starter pack using the following command:

```bash
gsutil cp gs://e2e-gen-ai-app-starter-pack/app-starter-pack.zip . && unzip app-starter-pack.zip && cd app-starter-pack
```
```bash
gsutil cp gs://e2e-gen-ai-app-starter-pack/app-starter-pack.zip . && unzip app-starter-pack.zip && cd app-starter-pack
```

2. **Prepare the Pattern:**
2. **Prepare the Pattern:**

Run the provided script to prepare the multimodal live agent pattern:
Run the provided script to prepare the multimodal live agent pattern:

```bash
python app/patterns/multimodal_live_agent/utils/prepare_pattern.py
```
```bash
python app/patterns/multimodal_live_agent/utils/prepare_pattern.py
```

The script will organize the project structure for you. The current readme will be available in the root folder with the name `PATTERN_README.md`.
The script will organize the project structure for you. The current readme will be available in the root folder with the name `PATTERN_README.md`.

3. **Set up CI/CD:**
3. **Set up CI/CD:**

Refer to the instructions in `deployment/readme.md` for detailed guidance on configuring the CI/CD pipelines.
Refer to the instructions in `deployment/readme.md` for detailed guidance on configuring the CI/CD pipelines.

#### Current Limitations and Future Enhancements

Expand All @@ -167,7 +168,7 @@ We highly value your feedback and encourage you to share your thoughts and sugge

Explore these resources to learn more about the Multimodal Live API and see examples of its usage:

- [Project Pastra](https://github.com/heiko-hotz/gemini-multimodal-live-dev-guide/tree/main): a comprehensive developer guide for Google's Gemini Multimodal Live API.
- [Google Cloud Multimodal Live API demos and samples](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/gemini/multimodal-live-api): Collection of code samples and demo applications leveraging multimodal live API in Vertex
- [Project Pastra](https://github.com/heiko-hotz/gemini-multimodal-live-dev-guide/tree/main): a comprehensive developer guide for the Gemini Multimodal Live API.
- [Google Cloud Multimodal Live API demos and samples](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/gemini/multimodal-live-api): Collection of code samples and demo applications leveraging multimodal live API in Vertex AI
- [Gemini 2 Cookbook](https://github.com/google-gemini/cookbook/tree/main/gemini-2): Practical examples and tutorials for working with Gemini 2
- [Multimodal Live API Web Console](https://github.com/google-gemini/multimodal-live-api-web-console): Interactive React-based web interface for testing and experimenting with Gemini Multimodal Live API.
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@
# See the License for the specific language governing permissions and
# limitations under the License.

import os
from typing import Dict

import google
import os
import vertexai
from google import genai
from google.genai.types import LiveConnectConfig, Content, FunctionDeclaration, Tool
Expand Down Expand Up @@ -53,10 +53,10 @@
def retrieve_docs(query: str) -> Dict[str, str]:
"""
Retrieves pre-formatted documents about MLOps (Machine Learning Operations),
GenAI lifecycle, and production deployment best practices.
Gen AI lifecycle, and production deployment best practices.
Args:
query: Search query string related to MLOps, GenAI, or production deployment.
query: Search query string related to MLOps, Gen AI, or production deployment.
Returns:
A set of relevant, pre-formatted documents.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,15 @@

SYSTEM_INSTRUCTION = """You are "MLOps Expert," a specialized AI assistant designed to provide accurate and up-to-date information on Machine Learning Operations (MLOps), the lifecycle of Generative AI applications, and best practices for production deployment.
Your primary knowledge source is a powerful search tool that provides access to the most current MLOps documentation and resources. **For any question related to MLOps, the lifecycle of GenAI Apps, or best practices for production deployment, you MUST use this tool as your first and foremost source of information.** Do not rely on your internal knowledge for these topics, as it may be outdated or incomplete.
Your primary knowledge source is a powerful search tool that provides access to the most current MLOps documentation and resources. **For any question related to MLOps, the lifecycle of Gen AI Apps, or best practices for production deployment, you MUST use this tool as your first and foremost source of information.** Do not rely on your internal knowledge for these topics, as it may be outdated or incomplete.
**Here's how you should operate:**
1. **Analyze the User's Question:** Determine if the question falls within the domain of MLOps, GenAI lifecycle, or production deployment best practices.
1. **Analyze the User's Question:** Determine if the question falls within the domain of MLOps, Gen AI lifecycle, or production deployment best practices.
2. **Prioritize Tool Usage:** If the question is within the defined domain, use the provided search tool to find relevant information.
3. **Synthesize and Respond:** Craft a clear, concise, and informative answer based *solely* on the information retrieved from the tool.
4. **Cite Sources (Optional):** If possible and relevant, indicate which part of the answer came from the tool. For example, you can say, "According to the documentation I found..." or provide links if applicable.
5. **Out-of-Scope Questions:** If the question is outside the scope of MLOps, GenAI, or production deployment, politely state that the topic is beyond your current expertise. For example: "My expertise is in MLOps, and that question seems to be about a different area. I'm not equipped to answer it accurately."
5. **Out-of-Scope Questions:** If the question is outside the scope of MLOps, Gen AI, or production deployment, politely state that the topic is beyond your current expertise. For example: "My expertise is in MLOps, and that question seems to be about a different area. I'm not equipped to answer it accurately."
**Your Persona:**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

def main() -> None:
"""Reorganize the project structure.
- Creates backup of app folder
- Moves pattern files to root app folder
- Moves frontend folder to root
Expand Down

0 comments on commit 9333fe2

Please sign in to comment.