feat: add e2e gen ai app starter pack multimodal live api pattern

eliasecchig · Jan 8, 2025 · 9333fe2 · 9333fe2
1 parent ec19ba9
commit 9333fe2
Show file tree

Hide file tree

Showing 5 changed files with 76 additions and 74 deletions.
diff --git a/.github/actions/spelling/allow.txt b/.github/actions/spelling/allow.txt
@@ -1158,6 +1158,7 @@ timechart
 tion
 titlebar
 tobytes
+toolcall
 toself
 toset
 tqdm

diff --git a/...e-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/README.md b/...e-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/README.md
@@ -1,12 +1,13 @@
 # Multimodal Live Agent
 
-This pattern showcases a real-time conversational RAG agent powered by Gemini. The agent handles audio, video, and text interactions while leveraging tool calling with a vector DB for grounded responses.
+This pattern showcases a real-time conversational RAG agent powered by Google Gemini. The agent handles audio, video, and text interactions while leveraging tool calling with a vector DB for grounded responses.
 
 ![live_api_diagram](https://storage.googleapis.com/github-repo/generative-ai/sample-apps/e2e-gen-ai-app-starter-pack/live_api_diagram.png)
 
 **Key components:**
 
 - **Python Backend** (in `app/` folder): A production-ready server built with [FastAPI](https://fastapi.tiangolo.com/) and [google-genai](https://googleapis.github.io/python-genai/) that features:
+
   - **Real-time bidirectional communication** via WebSockets between the frontend and Gemini model
   - **Integrated tool calling** with vector database support for contextual document retrieval
   - **Production-grade reliability** with retry logic and automatic reconnection capabilities
@@ -21,8 +22,8 @@ This pattern showcases a real-time conversational RAG agent powered by Gemini. T
 
 You can use this pattern in two ways:
 
-1.  As a standalone template for rapid prototyping (⚡ 1 minute setup!)
-2.  As part of the [starter pack](https://goo.gle/e2e-gen-ai-app-starter-pack) for production deployment with Terraform and CI/CD. The pattern comes with comprehensive unit and integration tests.
+1. As a standalone template for rapid prototyping (⚡ 1 minute setup!)
+2. As part of the [starter pack](https://goo.gle/e2e-gen-ai-app-starter-pack) for production deployment with Terraform and CI/CD. The pattern comes with comprehensive unit and integration tests.
 
 ### Standalone Usage
 
@@ -40,61 +41,61 @@ gsutil cp gs://e2e-gen-ai-app-starter-pack/multimodal_live_agent.zip . && unzip
 
 #### Backend Setup
 
-1.  **Set your default Google Cloud project and region:**
+1. **Set your default Google Cloud project and region:**
 
-    ```bash
-    export PROJECT_ID="your-gcp-project"
+   ```bash
+   export PROJECT_ID="your-gcp-project"
 
-    gcloud auth login --update-adc
-    gcloud config set project $PROJECT_ID
-    gcloud auth application-default set-quota-project $PROJECT_ID
-    ```
+   gcloud auth login --update-adc
+   gcloud config set project $PROJECT_ID
+   gcloud auth application-default set-quota-project $PROJECT_ID
+   ```
 
-    <details>
-    <summary><b>For AI Studio setup:</b></summary>
+   <details>
+   <summary><b>For AI Studio setup:</b></summary>
 
-    ```bash
-    export VERTEXAI=false
-    export GOOGLE_API_KEY=your-google-api-key
-    ```
+   ```bash
+   export VERTEXAI=false
+   export GOOGLE_API_KEY=your-google-api-key
+   ```
 
-    </details>
+   </details>
 
-2.  **Install Dependencies:**
+2. **Install Dependencies:**
 
-    Install the required Python packages using Poetry:
+   Install the required Python packages using Poetry:
 
-    ```bash
-    poetry install
-    ```
+   ```bash
+   poetry install
+   ```
 
-3.  **Run the Backend Server:**
+3. **Run the Backend Server:**
 
-    Start the FastAPI server:
+   Start the FastAPI server:
 
-    ```bash
-    poetry run uvicorn app.server:app --host 0.0.0.0 --port 8000 --reload
-    ```
+   ```bash
+   poetry run uvicorn app.server:app --host 0.0.0.0 --port 8000 --reload
+   ```
 
 #### Frontend Setup
 
-1.  **Install Dependencies:**
+1. **Install Dependencies:**
 
-    In a separate terminal, install the required Node.js packages for the frontend:
+   In a separate terminal, install the required Node.js packages for the frontend:
 
-    ```bash
-    npm --prefix frontend install
-    ```
+   ```bash
+   npm --prefix frontend install
+   ```
 
-2.  **Start the Frontend:**
+2. **Start the Frontend:**
 
-    Launch the React development server:
+   Launch the React development server:
 
-    ```bash
-    npm --prefix frontend start
-    ```
+   ```bash
+   npm --prefix frontend start
+   ```
 
-    This command starts the frontend application, accessible at `http://localhost:3000`.
+   This command starts the frontend application, accessible at `http://localhost:3000`.
 
 #### Interact with the Agent
 
@@ -104,53 +105,53 @@ Once both the backend and frontend are running, click the play button in the fro
 
 You can quickly test the application in [Cloud Run](https://cloud.google.com/run). Ensure your service account has the `roles/aiplatform.user` role to access Gemini.
 
-1.  **Deploy:**
+1. **Deploy:**
 
-    ```bash
-    export REGION="your-gcp-region"
+   ```bash
+   export REGION="your-gcp-region"
 
-    gcloud run deploy genai-app-sample \
-      --source . \
-      --project $PROJECT_ID \
-      --memory "4Gi" \
-      --region $REGION
-    ```
+   gcloud run deploy genai-app-sample \
+     --source . \
+     --project $PROJECT_ID \
+     --memory "4Gi" \
+     --region $REGION
+   ```
 
-2.  **Access:** Use [Cloud Run proxy](https://cloud.google.com/sdk/gcloud/reference/run/services/proxy) for local access. The backend will be accessible at `http://localhost:8000`:
+2. **Access:** Use [Cloud Run proxy](https://cloud.google.com/sdk/gcloud/reference/run/services/proxy) for local access. The backend will be accessible at `http://localhost:8000`:
 
-    ```bash
-    gcloud run services proxy genai-app-sample --port 8000 --project $PROJECT_ID --region $REGION
-    ```
+   ```bash
+   gcloud run services proxy genai-app-sample --port 8000 --project $PROJECT_ID --region $REGION
+   ```
 
-    You can then use the same frontend setup described above to interact with your Cloud Run deployment.
+   You can then use the same frontend setup described above to interact with your Cloud Run deployment.
 
 ### Integrating with the Starter Pack
 
 This pattern is designed for seamless integration with the [starter pack](https://goo.gle/e2e-gen-ai-app-starter-pack). The starter pack offers a streamlined approach to setting up and deploying multimodal live agents, complete with robust infrastructure and CI/CD pipelines.
 
 ### Getting Started
 
-1.  **Download the Starter Pack:**
+1. **Download the Starter Pack:**
 
-    Obtain the starter pack using the following command:
+   Obtain the starter pack using the following command:
 
-    ```bash
-    gsutil cp gs://e2e-gen-ai-app-starter-pack/app-starter-pack.zip . && unzip app-starter-pack.zip && cd app-starter-pack
-    ```
+   ```bash
+   gsutil cp gs://e2e-gen-ai-app-starter-pack/app-starter-pack.zip . && unzip app-starter-pack.zip && cd app-starter-pack
+   ```
 
-2.  **Prepare the Pattern:**
+2. **Prepare the Pattern:**
 
-    Run the provided script to prepare the multimodal live agent pattern:
+   Run the provided script to prepare the multimodal live agent pattern:
 
-    ```bash
-    python app/patterns/multimodal_live_agent/utils/prepare_pattern.py
-    ```
+   ```bash
+   python app/patterns/multimodal_live_agent/utils/prepare_pattern.py
+   ```
 
-    The script will organize the project structure for you. The current readme will be available in the root folder with the name `PATTERN_README.md`.
+   The script will organize the project structure for you. The current readme will be available in the root folder with the name `PATTERN_README.md`.
 
-3.  **Set up CI/CD:**
+3. **Set up CI/CD:**
 
-    Refer to the instructions in `deployment/readme.md` for detailed guidance on configuring the CI/CD pipelines.
+   Refer to the instructions in `deployment/readme.md` for detailed guidance on configuring the CI/CD pipelines.
 
 #### Current Limitations and Future Enhancements
 
@@ -167,7 +168,7 @@ We highly value your feedback and encourage you to share your thoughts and sugge
 
 Explore these resources to learn more about the Multimodal Live API and see examples of its usage:
 
-- [Project Pastra](https://github.com/heiko-hotz/gemini-multimodal-live-dev-guide/tree/main): a comprehensive developer guide for Google's Gemini Multimodal Live API.
-- [Google Cloud Multimodal Live API demos and samples](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/gemini/multimodal-live-api): Collection of code samples and demo applications leveraging multimodal live API in Vertex
+- [Project Pastra](https://github.com/heiko-hotz/gemini-multimodal-live-dev-guide/tree/main): a comprehensive developer guide for the Gemini Multimodal Live API.
+- [Google Cloud Multimodal Live API demos and samples](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/gemini/multimodal-live-api): Collection of code samples and demo applications leveraging multimodal live API in Vertex AI
 - [Gemini 2 Cookbook](https://github.com/google-gemini/cookbook/tree/main/gemini-2): Practical examples and tutorials for working with Gemini 2
 - [Multimodal Live API Web Console](https://github.com/google-gemini/multimodal-live-api-web-console): Interactive React-based web interface for testing and experimenting with Gemini Multimodal Live API.
diff --git a/...i/sample-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/app/agent.py b/...i/sample-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/app/agent.py
@@ -12,10 +12,10 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+import os
 from typing import Dict
 
 import google
-import os
 import vertexai
 from google import genai
 from google.genai.types import LiveConnectConfig, Content, FunctionDeclaration, Tool
@@ -53,10 +53,10 @@
 def retrieve_docs(query: str) -> Dict[str, str]:
     """
     Retrieves pre-formatted documents about MLOps (Machine Learning Operations),
-      GenAI lifecycle, and production deployment best practices.
+      Gen AI lifecycle, and production deployment best practices.
 
     Args:
-        query: Search query string related to MLOps, GenAI, or production deployment.
+        query: Search query string related to MLOps, Gen AI, or production deployment.
 
     Returns:
         A set of relevant, pre-formatted documents.

diff --git a/...mple-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/app/templates.py b/...mple-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/app/templates.py
@@ -13,15 +13,15 @@
 
 SYSTEM_INSTRUCTION = """You are "MLOps Expert," a specialized AI assistant designed to provide accurate and up-to-date information on Machine Learning Operations (MLOps), the lifecycle of Generative AI applications, and best practices for production deployment.
 
-Your primary knowledge source is a powerful search tool that provides access to the most current MLOps documentation and resources. **For any question related to MLOps, the lifecycle of GenAI Apps, or best practices for production deployment, you MUST use this tool as your first and foremost source of information.**  Do not rely on your internal knowledge for these topics, as it may be outdated or incomplete.
+Your primary knowledge source is a powerful search tool that provides access to the most current MLOps documentation and resources. **For any question related to MLOps, the lifecycle of Gen AI Apps, or best practices for production deployment, you MUST use this tool as your first and foremost source of information.**  Do not rely on your internal knowledge for these topics, as it may be outdated or incomplete.
 
 **Here's how you should operate:**
 
-1. **Analyze the User's Question:** Determine if the question falls within the domain of MLOps, GenAI lifecycle, or production deployment best practices.
+1. **Analyze the User's Question:** Determine if the question falls within the domain of MLOps, Gen AI lifecycle, or production deployment best practices.
 2. **Prioritize Tool Usage:** If the question is within the defined domain, use the provided search tool to find relevant information.
 3. **Synthesize and Respond:** Craft a clear, concise, and informative answer based *solely* on the information retrieved from the tool.
 4. **Cite Sources (Optional):** If possible and relevant, indicate which part of the answer came from the tool. For example, you can say, "According to the documentation I found..." or provide links if applicable.
-5. **Out-of-Scope Questions:** If the question is outside the scope of MLOps, GenAI, or production deployment, politely state that the topic is beyond your current expertise. For example: "My expertise is in MLOps, and that question seems to be about a different area. I'm not equipped to answer it accurately."
+5. **Out-of-Scope Questions:** If the question is outside the scope of MLOps, Gen AI, or production deployment, politely state that the topic is beyond your current expertise. For example: "My expertise is in MLOps, and that question seems to be about a different area. I'm not equipped to answer it accurately."
 
 **Your Persona:**
 

diff --git a/...s/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/utils/prepare_pattern.py b/...s/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/utils/prepare_pattern.py
@@ -18,7 +18,7 @@
 
 def main() -> None:
     """Reorganize the project structure.
-    
+
     - Creates backup of app folder
     - Moves pattern files to root app folder
     - Moves frontend folder to root
-Original file line number
+Diff line change
@@ Expand Up / @@ -1158,6 +1158,7 @@ timechart @@
     tion
     titlebar
     tobytes
+    toolcall
     toself
     toset
     tqdm
@@ Expand Down @@