From 9333fe26ff2e9e5779926b47346d1eb3787de832 Mon Sep 17 00:00:00 2001 From: eliasecchig Date: Wed, 8 Jan 2025 16:23:41 +0100 Subject: [PATCH] feat: add e2e gen ai app starter pack multimodal live api pattern --- .github/actions/spelling/allow.txt | 1 + .../patterns/multimodal_live_agent/README.md | 135 +++++++++--------- .../multimodal_live_agent/app/agent.py | 6 +- .../multimodal_live_agent/app/templates.py | 6 +- .../utils/prepare_pattern.py | 2 +- 5 files changed, 76 insertions(+), 74 deletions(-) diff --git a/.github/actions/spelling/allow.txt b/.github/actions/spelling/allow.txt index 052c2fbefef..eac9d169171 100644 --- a/.github/actions/spelling/allow.txt +++ b/.github/actions/spelling/allow.txt @@ -1158,6 +1158,7 @@ timechart tion titlebar tobytes +toolcall toself toset tqdm diff --git a/gemini/sample-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/README.md b/gemini/sample-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/README.md index e521bf89842..dd170170ace 100644 --- a/gemini/sample-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/README.md +++ b/gemini/sample-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/README.md @@ -1,12 +1,13 @@ # Multimodal Live Agent -This pattern showcases a real-time conversational RAG agent powered by Gemini. The agent handles audio, video, and text interactions while leveraging tool calling with a vector DB for grounded responses. +This pattern showcases a real-time conversational RAG agent powered by Google Gemini. The agent handles audio, video, and text interactions while leveraging tool calling with a vector DB for grounded responses. ![live_api_diagram](https://storage.googleapis.com/github-repo/generative-ai/sample-apps/e2e-gen-ai-app-starter-pack/live_api_diagram.png) **Key components:** - **Python Backend** (in `app/` folder): A production-ready server built with [FastAPI](https://fastapi.tiangolo.com/) and [google-genai](https://googleapis.github.io/python-genai/) that features: + - **Real-time bidirectional communication** via WebSockets between the frontend and Gemini model - **Integrated tool calling** with vector database support for contextual document retrieval - **Production-grade reliability** with retry logic and automatic reconnection capabilities @@ -21,8 +22,8 @@ This pattern showcases a real-time conversational RAG agent powered by Gemini. T You can use this pattern in two ways: -1. As a standalone template for rapid prototyping (⚡ 1 minute setup!) -2. As part of the [starter pack](https://goo.gle/e2e-gen-ai-app-starter-pack) for production deployment with Terraform and CI/CD. The pattern comes with comprehensive unit and integration tests. +1. As a standalone template for rapid prototyping (⚡ 1 minute setup!) +2. As part of the [starter pack](https://goo.gle/e2e-gen-ai-app-starter-pack) for production deployment with Terraform and CI/CD. The pattern comes with comprehensive unit and integration tests. ### Standalone Usage @@ -40,61 +41,61 @@ gsutil cp gs://e2e-gen-ai-app-starter-pack/multimodal_live_agent.zip . && unzip #### Backend Setup -1. **Set your default Google Cloud project and region:** +1. **Set your default Google Cloud project and region:** - ```bash - export PROJECT_ID="your-gcp-project" + ```bash + export PROJECT_ID="your-gcp-project" - gcloud auth login --update-adc - gcloud config set project $PROJECT_ID - gcloud auth application-default set-quota-project $PROJECT_ID - ``` + gcloud auth login --update-adc + gcloud config set project $PROJECT_ID + gcloud auth application-default set-quota-project $PROJECT_ID + ``` -
- For AI Studio setup: +
+ For AI Studio setup: - ```bash - export VERTEXAI=false - export GOOGLE_API_KEY=your-google-api-key - ``` + ```bash + export VERTEXAI=false + export GOOGLE_API_KEY=your-google-api-key + ``` -
+
-2. **Install Dependencies:** +2. **Install Dependencies:** - Install the required Python packages using Poetry: + Install the required Python packages using Poetry: - ```bash - poetry install - ``` + ```bash + poetry install + ``` -3. **Run the Backend Server:** +3. **Run the Backend Server:** - Start the FastAPI server: + Start the FastAPI server: - ```bash - poetry run uvicorn app.server:app --host 0.0.0.0 --port 8000 --reload - ``` + ```bash + poetry run uvicorn app.server:app --host 0.0.0.0 --port 8000 --reload + ``` #### Frontend Setup -1. **Install Dependencies:** +1. **Install Dependencies:** - In a separate terminal, install the required Node.js packages for the frontend: + In a separate terminal, install the required Node.js packages for the frontend: - ```bash - npm --prefix frontend install - ``` + ```bash + npm --prefix frontend install + ``` -2. **Start the Frontend:** +2. **Start the Frontend:** - Launch the React development server: + Launch the React development server: - ```bash - npm --prefix frontend start - ``` + ```bash + npm --prefix frontend start + ``` - This command starts the frontend application, accessible at `http://localhost:3000`. + This command starts the frontend application, accessible at `http://localhost:3000`. #### Interact with the Agent @@ -104,25 +105,25 @@ Once both the backend and frontend are running, click the play button in the fro You can quickly test the application in [Cloud Run](https://cloud.google.com/run). Ensure your service account has the `roles/aiplatform.user` role to access Gemini. -1. **Deploy:** +1. **Deploy:** - ```bash - export REGION="your-gcp-region" + ```bash + export REGION="your-gcp-region" - gcloud run deploy genai-app-sample \ - --source . \ - --project $PROJECT_ID \ - --memory "4Gi" \ - --region $REGION - ``` + gcloud run deploy genai-app-sample \ + --source . \ + --project $PROJECT_ID \ + --memory "4Gi" \ + --region $REGION + ``` -2. **Access:** Use [Cloud Run proxy](https://cloud.google.com/sdk/gcloud/reference/run/services/proxy) for local access. The backend will be accessible at `http://localhost:8000`: +2. **Access:** Use [Cloud Run proxy](https://cloud.google.com/sdk/gcloud/reference/run/services/proxy) for local access. The backend will be accessible at `http://localhost:8000`: - ```bash - gcloud run services proxy genai-app-sample --port 8000 --project $PROJECT_ID --region $REGION - ``` + ```bash + gcloud run services proxy genai-app-sample --port 8000 --project $PROJECT_ID --region $REGION + ``` - You can then use the same frontend setup described above to interact with your Cloud Run deployment. + You can then use the same frontend setup described above to interact with your Cloud Run deployment. ### Integrating with the Starter Pack @@ -130,27 +131,27 @@ This pattern is designed for seamless integration with the [starter pack](https: ### Getting Started -1. **Download the Starter Pack:** +1. **Download the Starter Pack:** - Obtain the starter pack using the following command: + Obtain the starter pack using the following command: - ```bash - gsutil cp gs://e2e-gen-ai-app-starter-pack/app-starter-pack.zip . && unzip app-starter-pack.zip && cd app-starter-pack - ``` + ```bash + gsutil cp gs://e2e-gen-ai-app-starter-pack/app-starter-pack.zip . && unzip app-starter-pack.zip && cd app-starter-pack + ``` -2. **Prepare the Pattern:** +2. **Prepare the Pattern:** - Run the provided script to prepare the multimodal live agent pattern: + Run the provided script to prepare the multimodal live agent pattern: - ```bash - python app/patterns/multimodal_live_agent/utils/prepare_pattern.py - ``` + ```bash + python app/patterns/multimodal_live_agent/utils/prepare_pattern.py + ``` - The script will organize the project structure for you. The current readme will be available in the root folder with the name `PATTERN_README.md`. + The script will organize the project structure for you. The current readme will be available in the root folder with the name `PATTERN_README.md`. -3. **Set up CI/CD:** +3. **Set up CI/CD:** - Refer to the instructions in `deployment/readme.md` for detailed guidance on configuring the CI/CD pipelines. + Refer to the instructions in `deployment/readme.md` for detailed guidance on configuring the CI/CD pipelines. #### Current Limitations and Future Enhancements @@ -167,7 +168,7 @@ We highly value your feedback and encourage you to share your thoughts and sugge Explore these resources to learn more about the Multimodal Live API and see examples of its usage: -- [Project Pastra](https://github.com/heiko-hotz/gemini-multimodal-live-dev-guide/tree/main): a comprehensive developer guide for Google's Gemini Multimodal Live API. -- [Google Cloud Multimodal Live API demos and samples](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/gemini/multimodal-live-api): Collection of code samples and demo applications leveraging multimodal live API in Vertex +- [Project Pastra](https://github.com/heiko-hotz/gemini-multimodal-live-dev-guide/tree/main): a comprehensive developer guide for the Gemini Multimodal Live API. +- [Google Cloud Multimodal Live API demos and samples](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/gemini/multimodal-live-api): Collection of code samples and demo applications leveraging multimodal live API in Vertex AI - [Gemini 2 Cookbook](https://github.com/google-gemini/cookbook/tree/main/gemini-2): Practical examples and tutorials for working with Gemini 2 - [Multimodal Live API Web Console](https://github.com/google-gemini/multimodal-live-api-web-console): Interactive React-based web interface for testing and experimenting with Gemini Multimodal Live API. diff --git a/gemini/sample-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/app/agent.py b/gemini/sample-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/app/agent.py index d4df1bdadf5..b3fa4a0a049 100644 --- a/gemini/sample-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/app/agent.py +++ b/gemini/sample-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/app/agent.py @@ -12,10 +12,10 @@ # See the License for the specific language governing permissions and # limitations under the License. +import os from typing import Dict import google -import os import vertexai from google import genai from google.genai.types import LiveConnectConfig, Content, FunctionDeclaration, Tool @@ -53,10 +53,10 @@ def retrieve_docs(query: str) -> Dict[str, str]: """ Retrieves pre-formatted documents about MLOps (Machine Learning Operations), - GenAI lifecycle, and production deployment best practices. + Gen AI lifecycle, and production deployment best practices. Args: - query: Search query string related to MLOps, GenAI, or production deployment. + query: Search query string related to MLOps, Gen AI, or production deployment. Returns: A set of relevant, pre-formatted documents. diff --git a/gemini/sample-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/app/templates.py b/gemini/sample-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/app/templates.py index c5f5d9026f5..5b4638cc166 100644 --- a/gemini/sample-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/app/templates.py +++ b/gemini/sample-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/app/templates.py @@ -13,15 +13,15 @@ SYSTEM_INSTRUCTION = """You are "MLOps Expert," a specialized AI assistant designed to provide accurate and up-to-date information on Machine Learning Operations (MLOps), the lifecycle of Generative AI applications, and best practices for production deployment. -Your primary knowledge source is a powerful search tool that provides access to the most current MLOps documentation and resources. **For any question related to MLOps, the lifecycle of GenAI Apps, or best practices for production deployment, you MUST use this tool as your first and foremost source of information.** Do not rely on your internal knowledge for these topics, as it may be outdated or incomplete. +Your primary knowledge source is a powerful search tool that provides access to the most current MLOps documentation and resources. **For any question related to MLOps, the lifecycle of Gen AI Apps, or best practices for production deployment, you MUST use this tool as your first and foremost source of information.** Do not rely on your internal knowledge for these topics, as it may be outdated or incomplete. **Here's how you should operate:** -1. **Analyze the User's Question:** Determine if the question falls within the domain of MLOps, GenAI lifecycle, or production deployment best practices. +1. **Analyze the User's Question:** Determine if the question falls within the domain of MLOps, Gen AI lifecycle, or production deployment best practices. 2. **Prioritize Tool Usage:** If the question is within the defined domain, use the provided search tool to find relevant information. 3. **Synthesize and Respond:** Craft a clear, concise, and informative answer based *solely* on the information retrieved from the tool. 4. **Cite Sources (Optional):** If possible and relevant, indicate which part of the answer came from the tool. For example, you can say, "According to the documentation I found..." or provide links if applicable. -5. **Out-of-Scope Questions:** If the question is outside the scope of MLOps, GenAI, or production deployment, politely state that the topic is beyond your current expertise. For example: "My expertise is in MLOps, and that question seems to be about a different area. I'm not equipped to answer it accurately." +5. **Out-of-Scope Questions:** If the question is outside the scope of MLOps, Gen AI, or production deployment, politely state that the topic is beyond your current expertise. For example: "My expertise is in MLOps, and that question seems to be about a different area. I'm not equipped to answer it accurately." **Your Persona:** diff --git a/gemini/sample-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/utils/prepare_pattern.py b/gemini/sample-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/utils/prepare_pattern.py index 72c5e8ba213..6fac842919e 100644 --- a/gemini/sample-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/utils/prepare_pattern.py +++ b/gemini/sample-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/utils/prepare_pattern.py @@ -18,7 +18,7 @@ def main() -> None: """Reorganize the project structure. - + - Creates backup of app folder - Moves pattern files to root app folder - Moves frontend folder to root