From a86e76a1dbf7eedd7a729f4c68064095707b645d Mon Sep 17 00:00:00 2001 From: Anirudh Sriram Date: Fri, 10 May 2024 15:08:22 -0700 Subject: [PATCH 1/2] updating docs for video and parallel function calling --- .../docs/function-calling/python.ipynb | 267 ++++++++++++++---- .../docs/prompting_with_media.ipynb | 222 ++++++++++++--- 2 files changed, 397 insertions(+), 92 deletions(-) diff --git a/site/en/gemini-api/docs/function-calling/python.ipynb b/site/en/gemini-api/docs/function-calling/python.ipynb index 46614db26..974800208 100644 --- a/site/en/gemini-api/docs/function-calling/python.ipynb +++ b/site/en/gemini-api/docs/function-calling/python.ipynb @@ -11,7 +11,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": { "cellView": "form", "id": "906e07f6e562" @@ -65,7 +65,8 @@ "id": "df1767a3d1cc" }, "source": [ - "You can provide Gemini models with descriptions of functions. The model may ask you to call a function and send back the result to help the model handle your query." + "Use function calling to define custom functions and pass them to Gemini. The model does not directly invoke these functions, but instead generates structured data output that specifies the function name and suggested arguments. This output enables the calling of external APIs, and the resulting API output can then be incorporated back into the model, allowing for more comprehensive query responses. Function calling empowers LLMs to interact with real-time information and various services, such as databases, customer\n", + "relationship management systems, and document repositories, enhancing their ability to provide relevant and contextual answers. You can provide Gemini models with descriptions of functions. The model may ask you to call a function and send back the result to help the model handle your query." ] }, { @@ -119,7 +120,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": { "id": "TS9l5igubpHO" }, @@ -130,7 +131,7 @@ "import time\n", "\n", "import google.generativeai as genai\n", - "\n", + "import google.ai.generativelanguage as glm\n", "\n", "from IPython import display\n", "from IPython.display import Markdown\n", @@ -176,7 +177,7 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": null, "metadata": { "id": "ab9ASynfcIZn" }, @@ -201,25 +202,19 @@ "id": "3f383614ec30" }, "source": [ - "## Function Basics" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "b82c1aecb657" - }, - "source": [ - "You can pass a list of functions to the `tools` argument when creating a `genai.GenerativeModel`.\n", + "## Basics of function calling\n", + "\n", + "To use function calling, pass a list of functions to the `tools` parameter when creating a [`GenerativeModel`](https://ai.google.dev/api/python/google/generativeai/GenerativeModel). The model uses the function name, docstring, parameters, and parameter type annotations to decide if it needs the function to best answer a prompt.\n", "\n", - "> Important: The SDK converts the function's argument's type annotations to a format the API understands. The API only supports a limited selection of argument types, and this automatic conversion only supports a subset of that: `int | float | bool | str | list | dict`" + "> Important: The SDK converts function parameter type annotations to a format the API understands (`glm.FunctionDeclaration`). The API only supports a limited selection of parameter types, and the Python SDK's automatic conversion only supports a subset of that: `AllowedTypes = int | float | bool | str | list['AllowedTypes'] | dict`" ] }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "metadata": { - "id": "42b27b02d2f5" + "id": "42b27b02d2f5", + "outputId": "8440de71-9ec3-45c5-c9e8-5193f5ff7919" }, "outputs": [ { @@ -255,12 +250,12 @@ "id": "d5fd91032a1e" }, "source": [ - "The recomended way to use function calling is through the chat interface. The main reason is that `FunctionCalls` fit nicely into chat's multi-turn structure." + "It is recommended to use function calls through the chat interface. This is because function calls naturally fit in to [multi-turn chats](https://ai.google.dev/api/python/google/generativeai/GenerativeModel#multi-turn) as they capture the back-and-forth interaction between the user and model. The Python SDK's [`ChatSession`](https://ai.google.dev/api/python/google/generativeai/ChatSession) is a great interface for chats because it handles the conversation history for you, and using the parameter `enable_automatic_function_calling` simplifies function calling even further:" ] }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": { "id": "d3b91c855257" }, @@ -282,9 +277,10 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": { - "id": "81d8def3d865" + "id": "81d8def3d865", + "outputId": "cb88151d-e5e5-43f4-9c4e-e6341839a217" }, "outputs": [ { @@ -305,9 +301,10 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "metadata": { - "id": "951c0f83f72e" + "id": "951c0f83f72e", + "outputId": "2cfdb1eb-2bcc-4b5f-a184-6d7254a3b837" }, "outputs": [ { @@ -327,23 +324,34 @@ }, { "cell_type": "markdown", - "metadata": { - "id": "7731e35f2383" - }, "source": [ - "If you look in the `ChatSession.history` you can see the sequence of events:\n", + "Examine the chat history to see the flow of the conversation and how function calls are integrated within it.\n", "\n", - "1. You sent the question.\n", - "2. The model replied with a `glm.FunctionCall`.\n", - "3. The `genai.ChatSession` executed the function locally and sent the model back a `glm.FunctionResponse`.\n", - "4. The model used the function output in its answer." - ] + "The `ChatSession.history` property stores a chronological record of the conversation between the user and the Gemini model. Each turn in the conversation is represented by a [`glm.Content`](https://ai.google.dev/api/python/google/ai/generativelanguage/Content) object, which contains the following information:\n", + "\n", + "* **Role**: Identifies whether the content originated from the \"user\" or the \"model\".\n", + "* **Parts**: A list of [`glm.Part`](https://ai.google.dev/api/python/google/ai/generativelanguage/Part) objects that represent individual components of the message. With a text-only model, these parts can be:\n", + " * **Text**: Plain text messages.\n", + " * **Function Call** ([`glm.FunctionCall`](https://ai.google.dev/api/python/google/ai/generativelanguage/FunctionCall)): A request from the model to execute a specific function with provided arguments.\n", + " * **Function Response** ([`glm.FunctionResponse`](https://ai.google.dev/api/python/google/ai/generativelanguage/FunctionResponse)): The result returned by the user after executing the requested function.\n", + "\n", + " In the previous example with the mittens calculation, the history shows the following sequence:\n", + "\n", + "1. **User**: Asks the question about the total number of mittens.\n", + "1. **Model**: Determines that the multiply function is helpful and sends a FunctionCall request to the user.\n", + "1. **User**: The `ChatSession` automatically executes the function (due to `enable_automatic_function_calling` being set) and sends back a `FunctionResponse` with the calculated result.\n", + "1. **Model**: Uses the function's output to formulate the final answer and presents it as a text response." + ], + "metadata": { + "id": "J0bgvvIs3I9J" + } }, { "cell_type": "code", - "execution_count": 9, + "execution_count": null, "metadata": { - "id": "9f7eff1e8e60" + "id": "9f7eff1e8e60", + "outputId": "139b4b1a-4886-43eb-cd84-9023da7f653d" }, "outputs": [ { @@ -397,7 +405,154 @@ "While this was all handled automatically, if you need more control, you can:\n", "\n", "- Leave the default `enable_automatic_function_calling=False` and process the `glm.FunctionCall` responses yourself.\n", - "- Or use `GenerativeModel.generate_content`, where you also need to manage the chat history. " + "- Or use `GenerativeModel.generate_content`, where you also need to manage the chat history." + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Parallel function calling\n", + "\n", + "In addition to basic function calling described above, you can also call multiple functions in a single turn. This section shows an example for how you can use parallel function calling." + ], + "metadata": { + "id": "qiOShqKn1Bh_" + } + }, + { + "cell_type": "markdown", + "source": [ + "Define the tools." + ], + "metadata": { + "id": "PvHIHmFdTg_c" + } + }, + { + "cell_type": "code", + "source": [ + "def power_disco_ball(power: bool) -> bool:\n", + " \"\"\"Powers the spinning disco ball.\"\"\"\n", + " print(f\"Disco ball is {'spinning!' if power else 'stopped.'}\")\n", + " return True\n", + "\n", + "\n", + "def start_music(energetic: bool, loud: bool, bpm: int) -> str:\n", + " \"\"\"Play some music matching the specified parameters.\n", + "\n", + " Args:\n", + " energetic: Whether the music is energetic or not.\n", + " loud: Whether the music is loud or not.\n", + " bpm: The beats per minute of the music.\n", + "\n", + " Returns: The name of the song being played.\n", + " \"\"\"\n", + " print(f\"Starting music! {energetic=} {loud=}, {bpm=}\")\n", + " return \"Never gonna give you up.\"\n", + "\n", + "\n", + "def dim_lights(brightness: float) -> bool:\n", + " \"\"\"Dim the lights.\n", + "\n", + " Args:\n", + " brightness: The brightness of the lights, 0.0 is off, 1.0 is full.\n", + " \"\"\"\n", + " print(f\"Lights are now set to {brightness:.0%}\")\n", + " return True" + ], + "metadata": { + "id": "89QPizVHTeJa" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zlrmXN7fxQi0" + }, + "source": [ + "Now call the model with an instruction that could use all of the specified tools." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "21ecYHLgIsCl", + "outputId": "eb1a6a96-e021-4851-af28-38a35133b593" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "power_disco_ball(power=True)\n", + "start_music(energetic=True, loud=True, bpm=120.0)\n", + "dim_lights(brightness=0.3)\n" + ] + } + ], + "source": [ + "# Set the model up with tools.\n", + "house_fns = [power_disco_ball, start_music, dim_lights]\n", + "\n", + "model = genai.GenerativeModel(model_name=\"gemini-1.5-pro-latest\", tools=house_fns)\n", + "\n", + "# Call the API.\n", + "chat = model.start_chat()\n", + "response = chat.send_message(\"Turn this place into a party!\")\n", + "\n", + "# Print out each of the function calls requested from this single call.\n", + "for part in response.parts:\n", + " if fn := part.function_call:\n", + " args = \", \".join(f\"{key}={val}\" for key, val in fn.args.items())\n", + " print(f\"{fn.name}({args})\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "t6iYpty7yZct" + }, + "source": [ + "Each of the printed results reflects a single function call that the model has requested. To send the results back, include the responses in the same order as they were requested." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "L7RxoiR3foBR", + "outputId": "1f33f596-c29a-4014-d14f-9cc8d8ee1262" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Let's get this party started! I've turned on the disco ball, started playing some upbeat music, and dimmed the lights. 🎢✨ Get ready to dance! πŸ•ΊπŸ’ƒ \n", + "\n", + "\n" + ] + } + ], + "source": [ + "# Simulate the responses from the specified tools.\n", + "responses = {\n", + " \"power_disco_ball\": True,\n", + " \"start_music\": \"Never gonna give you up.\",\n", + " \"dim_lights\": True,\n", + "}\n", + "\n", + "# Build the response parts.\n", + "response_parts = [\n", + " glm.Part(function_response=glm.FunctionResponse(name=fn, response={\"result\": val}))\n", + " for fn, val in responses.items()\n", + "]\n", + "\n", + "response = chat.send_message(response_parts)\n", + "print(response.text)" ] }, { @@ -426,7 +581,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": null, "metadata": { "id": "S53E0EE8TBUF" }, @@ -446,9 +601,10 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": null, "metadata": { - "id": "e36166b2c1b6" + "id": "e36166b2c1b6", + "outputId": "af6f27d2-3358-409b-ef8c-7ee9ef9f58ca" }, "outputs": [ { @@ -515,7 +671,7 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": null, "metadata": { "id": "qCwHM4WbC4wb" }, @@ -549,7 +705,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": null, "metadata": { "id": "5f2804046c94" }, @@ -567,9 +723,10 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": null, "metadata": { - "id": "4cefe2c3c808" + "id": "4cefe2c3c808", + "outputId": "9bcf4b22-cfbd-42ae-e42b-092dbc07a780" }, "outputs": [ { @@ -613,12 +770,12 @@ "id": "jS6ruiTp6VBf" }, "source": [ - "Either way, you pass a representation of a `glm.Tool` or list of tools to " + "Either way, you pass a representation of a `glm.Tool` or list of tools to" ] }, { "cell_type": "code", - "execution_count": 15, + "execution_count": null, "metadata": { "id": "xwhWG22cIIDU" }, @@ -638,14 +795,15 @@ "id": "517ca06297bb" }, "source": [ - "Like before the model returns a `glm.FunctionCall` invoking the calculator's `multiply` function: " + "Like before the model returns a `glm.FunctionCall` invoking the calculator's `multiply` function:" ] }, { "cell_type": "code", - "execution_count": 16, + "execution_count": null, "metadata": { - "id": "xhey4QA0DTJf" + "id": "xhey4QA0DTJf", + "outputId": "282ef2ec-4f3b-408e-baaa-8c96a8262220" }, "outputs": [ { @@ -698,9 +856,10 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": null, "metadata": { - "id": "88758eebfd5c" + "id": "88758eebfd5c", + "outputId": "3a54fd79-1151-4ab2-9d97-571ca67f9147" }, "outputs": [ { @@ -733,7 +892,7 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": null, "metadata": { "id": "f3c67066411e" }, @@ -755,13 +914,13 @@ "source": [ "## Summary\n", "\n", - "Basic function calling is supported in the SDK. Remember that it is easier to manage using chat-mode, because of the natural back and forth structure. You're in charge of actually calling the functions and sending results back to the model so it can produce a text-response. " + "Basic function calling is supported in the SDK. Remember that it is easier to manage using chat-mode, because of the natural back and forth structure. You're in charge of actually calling the functions and sending results back to the model so it can produce a text-response." ] } ], "metadata": { "colab": { - "name": "python.ipynb", + "provenance": [], "toc_visible": true }, "google": { @@ -782,4 +941,4 @@ }, "nbformat": 4, "nbformat_minor": 0 -} +} \ No newline at end of file diff --git a/site/en/gemini-api/docs/prompting_with_media.ipynb b/site/en/gemini-api/docs/prompting_with_media.ipynb index a1c598970..b3d8085f4 100644 --- a/site/en/gemini-api/docs/prompting_with_media.ipynb +++ b/site/en/gemini-api/docs/prompting_with_media.ipynb @@ -64,23 +64,19 @@ "id": "3c5e92a74e64" }, "source": [ - "The Gemini API supports prompting with text, image, and audio data, also known as *multimodal* prompting. You can include text, image,\n", - "and audio in your prompts. For small files, you can point the Gemini model\n", + "The Gemini API supports *multimodal* prompting with text, image, and audio data. For small files, you can point the Gemini model\n", "directly to a local file when providing a prompt. Upload larger files with the\n", "[File API](https://ai.google.dev/api/rest/v1beta/files) before including them in\n", "prompts.\n", "\n", "The File API lets you store up to 20GB of files per project, with each file not\n", "exceeding 2GB in size. Files are stored for 48 hours and can be accessed with\n", - "your API key for generation within that time period. It is available at no cost in all regions where the [Gemini API is\n", + "your API key for generation within that time period and cannot be downloaded from the API. It is available at no cost in all regions where the [Gemini API is\n", "available](https://ai.google.dev/available_regions).\n", "\n", - "For information on valid file formats (MIME types) and supported models, see [Supported file formats](#supported_file_formats).\n", + "The File API handles inputs that can be used to generate content with [`model.generateContent`](https://ai.google.dev/api/rest/v1/models/generateContent) or [`model.streamGenerateContent`](https://ai.google.dev/api/rest/v1/models/streamGenerateContent). For information on valid file formats (MIME types) and supported models, see [Supported file formats](#supported_file_formats).\n", "\n", - "Note: Videos must be converted into image frames before uploading to the File\n", - "API.\n", - "\n", - "This guide shows how to use the File API to upload a media file and include it in a `GenerateContent` call to the Gemini API. For more information, see the [code\n", + "This guide shows how to use the File API to upload media files and include them in a `GenerateContent` call to the Gemini API. For more information, see the [code\n", "samples](https://github.com/google-gemini/gemini-api-cookbook/tree/main/quickstarts/file-api).\n" ] }, @@ -173,22 +169,27 @@ "id": "c-z4zsCUlaru" }, "source": [ - "## Upload a file to the File API\n", - "\n", - "The File API lets you upload a variety of multimodal MIME types, including plain text, images, and audio formats. The File API handles inputs that can be used to generate content with [`model.generateContent`](https://ai.google.dev/api/rest/v1/models/generateContent) or [`model.streamGenerateContent`](https://ai.google.dev/api/rest/v1/models/streamGenerateContent).\n", - "\n", - "The File API accepts files under 2GB in size and can store up to 20GB of files per project. Files last for 2 days and cannot be downloaded from the API." + "## Prompting with images\n" ] }, + { + "cell_type": "markdown", + "source": [ + "### Upload an image file to the File API" + ], + "metadata": { + "id": "rsdNkDszLBmQ" + } + }, { "cell_type": "markdown", "metadata": { "id": "o1K81yn9mFBo" }, "source": [ - "First, you will prepare a sample image to upload to the API.\n", + "In this tutorial, you upload a sample image to the API and use it to generate content.\n", "\n", - "To upload your own file, see the [Appendix section](#uploading_files_to_colab)." + "Refer to the [Appendix section](#uploading_files_to_colab) to learn how to upload your own file." ] }, { @@ -202,20 +203,12 @@ "!curl -o image.jpg https://storage.googleapis.com/generativeai-downloads/images/jetpack.jpg" ] }, - { - "cell_type": "markdown", - "metadata": { - "id": "rI84z01ZmSyF" - }, - "source": [ - "Next, you'll upload that file to the File API." - ] - }, { "cell_type": "code", "execution_count": null, "metadata": { - "id": "N9NxXGZKKusG" + "id": "N9NxXGZKKusG", + "outputId": "e0faa4d3-dc5e-4d78-8694-ddc92dee4379" }, "outputs": [ { @@ -227,6 +220,7 @@ } ], "source": [ + "# Upload the file\n", "sample_file = genai.upload_file(path=\"image.jpg\",\n", " display_name=\"Sample drawing\")\n", "\n", @@ -241,7 +235,7 @@ "source": [ "The `response` shows that the File API stored the specified `display_name` for the uploaded file and a `uri` to reference the file in Gemini API calls. Use `response` to track how uploaded files are mapped to URIs.\n", "\n", - "Depending on your use cases, you could store the URIs in structures such as a `dict` or a database." + "Depending on your use case, you can also store the URIs in structures such as a `dict` or a database." ] }, { @@ -250,7 +244,7 @@ "id": "ds5iJlaembWe" }, "source": [ - "## Get file\n", + "### Get file\n", "\n", "After uploading the file, you can verify the API has successfully received the files by calling `files.get`.\n", "\n", @@ -275,9 +269,9 @@ "id": "EPPOECHzsIGJ" }, "source": [ - "## Generate content\n", + "### Generate content\n", "\n", - "After uploading the file, you can make `GenerateContent` requests that reference the File API URI. In this example, you create prompt that starts with a text followed by the uploaded image." + "After uploading the file, you can make `GenerateContent` requests that reference the File API URI. In this example, you create a prompt that starts with a text followed by the uploaded image." ] }, { @@ -302,7 +296,7 @@ "id": "IrPDYdQSKTg4" }, "source": [ - "## Delete files\n", + "### Delete files\n", "\n", "Files are automatically deleted after 2 days. You can also manually delete them using `files.delete()`." ] @@ -319,6 +313,161 @@ "print(f'Deleted {sample_file.display_name}.')" ] }, + { + "cell_type": "markdown", + "source": [ + "## Prompting with videos" + ], + "metadata": { + "id": "TaUZc1mvLkHY" + } + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MNvhBdoDFnTC" + }, + "source": [ + "## Upload video to the File API\n", + "\n", + "The Gemini API accepts video file formats directly. This example uses the short film \"Big Buck Bunny\".\n", + "\n", + "> \"Big Buck Bunny\" is (c) copyright 2008, Blender Foundation / www.bigbuckbunny.org and [licensed](https://peach.blender.org/about/) under the [Creative Commons Attribution 3.0](http://creativecommons.org/licenses/by/3.0/) License.\n", + "\n", + "Refer to the [Appendix section](#uploading_files_to_colab) to learn how to upload your own file." + ] + }, + { + "cell_type": "code", + "source": [ + "!wget https://download.blender.org/peach/bigbuckbunny_movies/BigBuckBunny_320x180.mp4" + ], + "metadata": { + "id": "V4XeFdX1rxaE" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "_HzrDdp2Q1Cu" + }, + "outputs": [], + "source": [ + "video_file_name = \"BigBuckBunny_320x180.mp4\"\n", + "\n", + "print(f\"Uploading file...\")\n", + "file_response = genai.upload_file(path=video_file_name)\n", + "print(f\"Completed upload: {file_response.uri}\")" + ] + }, + { + "cell_type": "markdown", + "source": [ + "NOTE: The File API service currently samples the video at 1 FPS and may be subject to change to provide the best inference quality." + ], + "metadata": { + "id": "06GCLdmwNin5" + } + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oOZmTUb4FWOa" + }, + "source": [ + "## Get File\n", + "\n", + "After uploading the file, you can verify the API has successfully received the files by calling the `files.get` method.\n", + "\n", + "The `files.get` method lets you see the file uploaded to the File API that are associated with the Cloud project your API key belongs to. Only the `name` (and by extension, the `uri`) are unique." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "SHMVCWHkFhJW" + }, + "outputs": [], + "source": [ + "import requests\n", + "import googleapiclient\n", + "import time\n", + "\n", + "# Use Discovery until SDK are updated\n", + "GENAI_URL = f\"https://generativelanguage.googleapis.com/$discovery/rest?version=v1beta&key={GOOGLE_API_KEY}\"\n", + "discovery = requests.get(GENAI_URL)\n", + "service = googleapiclient.discovery.build_from_document(discovery.content, developerKey=GOOGLE_API_KEY)\n", + "resp = service.files().get(name=file_response.name).execute()\n", + "\n", + "while resp['state'] == \"PROCESSING\":\n", + " print(f\"File is in State {resp['state']}... Checking again in 5 seconds.\")\n", + " time.sleep(5)\n", + " resp = service.files().get(name=file_response.name).execute()\n", + "\n", + "print(f\"File is {resp['state']}. {file_response.uri}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zS5NmQeXLqeS" + }, + "source": [ + "## Generate Content\n", + "\n", + "After the video has been uploaded, you can make `GenerateContent` requests that reference the File API URI." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ypZuGQ-2LqeS" + }, + "outputs": [], + "source": [ + "# Create the prompt.\n", + "prompt = \"Describe this video.\"\n", + "\n", + "# Set the model to Gemini 1.5 Pro.\n", + "model = genai.GenerativeModel(model_name=\"models/gemini-1.5-pro-latest\")\n", + "\n", + "# Make the LLM request.\n", + "print(\"Making LLM inference request...\")\n", + "request = [{'role':'user', 'parts': [file_response]},\n", + " {'role':'user', 'parts': [prompt]}]\n", + "response = model.generate_content(request,\n", + " request_options={\"timeout\": 600})\n", + "print(response.text)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "diCy9BgjLqeS" + }, + "source": [ + "## Delete File\n", + "\n", + "Files are automatically deleted after 2 days or you can manually delete them using `files.delete()`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "YYyi5PrKLqeb" + }, + "outputs": [], + "source": [ + "genai.delete_file(file_response.name)\n", + "print(f'Deleted file {file_response.uri}')" + ] + }, { "cell_type": "markdown", "metadata": { @@ -387,15 +536,13 @@ "\n", "### Video formats\n", "\n", - "You can use video data for prompting with the `gemini-1.5-pro` model. However, video file formats are not supported as direct inputs by the Gemini API. You can use video data as prompt input by breaking down the video into a series of still frame images and a separate audio file. This approach lets you manage the amount of data, and the level of detail provided by the video, by choosing how many frames per second are included in your prompt from the video file.\n", - "\n", - "Note: Video files added to a prompt as constituent parts, audio file and image frames, are considered as separate prompt data inputs by the model. For this reason, requests or questions that specify the time when *both* an audio snippet and video frames appear in the source video may not produce useful results.\n", + "You can use video data for prompting with the `gemini-1.5-pro` model.\n", "\n", "### Plain text formats\n", "\n", "The File API supports uploading plain text files with the following MIME types:\n", "- text/plain\n", - "- text/html \n", + "- text/html\n", "- text/css\n", "- text/javascript\n", "- application/x-javascript\n", @@ -422,7 +569,7 @@ "source": [ "## Appendix: Uploading files to Colab\n", "\n", - "This notebook uses the File API with files that were downloaded from the internet. If you're running this in Colab and want to use your own files, you first need to upload them to the colab instance.\n", + "This notebook uses the File API with files that were downloaded from the internet. If you're running this in Colab and want to use your own files, you first need to upload them to the Colab instance.\n", "\n", "First, click **Files** on the left sidebar, then click the **Upload** button:\n", "\n", @@ -450,8 +597,7 @@ ], "metadata": { "colab": { - "name": "prompting_with_media.ipynb", - "toc_visible": true + "provenance": [] }, "kernelspec": { "display_name": "Python 3", @@ -460,4 +606,4 @@ }, "nbformat": 4, "nbformat_minor": 0 -} +} \ No newline at end of file From 71065ca062f8c9599369761fb38cc13f682adc80 Mon Sep 17 00:00:00 2001 From: Anirudh Sriram Date: Fri, 10 May 2024 15:19:58 -0700 Subject: [PATCH 2/2] updating docs for video and parallel function calling with formatted notebooks --- .../docs/function-calling/python.ipynb | 70 ++++++++----------- .../docs/prompting_with_media.ipynb | 42 +++++------ 2 files changed, 51 insertions(+), 61 deletions(-) diff --git a/site/en/gemini-api/docs/function-calling/python.ipynb b/site/en/gemini-api/docs/function-calling/python.ipynb index 974800208..ac332de0e 100644 --- a/site/en/gemini-api/docs/function-calling/python.ipynb +++ b/site/en/gemini-api/docs/function-calling/python.ipynb @@ -213,8 +213,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "id": "42b27b02d2f5", - "outputId": "8440de71-9ec3-45c5-c9e8-5193f5ff7919" + "id": "42b27b02d2f5" }, "outputs": [ { @@ -279,8 +278,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "id": "81d8def3d865", - "outputId": "cb88151d-e5e5-43f4-9c4e-e6341839a217" + "id": "81d8def3d865" }, "outputs": [ { @@ -303,8 +301,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "id": "951c0f83f72e", - "outputId": "2cfdb1eb-2bcc-4b5f-a184-6d7254a3b837" + "id": "951c0f83f72e" }, "outputs": [ { @@ -324,6 +321,9 @@ }, { "cell_type": "markdown", + "metadata": { + "id": "J0bgvvIs3I9J" + }, "source": [ "Examine the chat history to see the flow of the conversation and how function calls are integrated within it.\n", "\n", @@ -341,17 +341,13 @@ "1. **Model**: Determines that the multiply function is helpful and sends a FunctionCall request to the user.\n", "1. **User**: The `ChatSession` automatically executes the function (due to `enable_automatic_function_calling` being set) and sends back a `FunctionResponse` with the calculated result.\n", "1. **Model**: Uses the function's output to formulate the final answer and presents it as a text response." - ], - "metadata": { - "id": "J0bgvvIs3I9J" - } + ] }, { "cell_type": "code", "execution_count": null, "metadata": { - "id": "9f7eff1e8e60", - "outputId": "139b4b1a-4886-43eb-cd84-9023da7f653d" + "id": "9f7eff1e8e60" }, "outputs": [ { @@ -410,26 +406,31 @@ }, { "cell_type": "markdown", + "metadata": { + "id": "qiOShqKn1Bh_" + }, "source": [ "## Parallel function calling\n", "\n", "In addition to basic function calling described above, you can also call multiple functions in a single turn. This section shows an example for how you can use parallel function calling." - ], - "metadata": { - "id": "qiOShqKn1Bh_" - } + ] }, { "cell_type": "markdown", - "source": [ - "Define the tools." - ], "metadata": { "id": "PvHIHmFdTg_c" - } + }, + "source": [ + "Define the tools." + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "89QPizVHTeJa" + }, + "outputs": [], "source": [ "def power_disco_ball(power: bool) -> bool:\n", " \"\"\"Powers the spinning disco ball.\"\"\"\n", @@ -459,12 +460,7 @@ " \"\"\"\n", " print(f\"Lights are now set to {brightness:.0%}\")\n", " return True" - ], - "metadata": { - "id": "89QPizVHTeJa" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", @@ -479,8 +475,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "id": "21ecYHLgIsCl", - "outputId": "eb1a6a96-e021-4851-af28-38a35133b593" + "id": "21ecYHLgIsCl" }, "outputs": [ { @@ -523,8 +518,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "id": "L7RxoiR3foBR", - "outputId": "1f33f596-c29a-4014-d14f-9cc8d8ee1262" + "id": "L7RxoiR3foBR" }, "outputs": [ { @@ -603,8 +597,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "id": "e36166b2c1b6", - "outputId": "af6f27d2-3358-409b-ef8c-7ee9ef9f58ca" + "id": "e36166b2c1b6" }, "outputs": [ { @@ -725,8 +718,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "id": "4cefe2c3c808", - "outputId": "9bcf4b22-cfbd-42ae-e42b-092dbc07a780" + "id": "4cefe2c3c808" }, "outputs": [ { @@ -802,8 +794,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "id": "xhey4QA0DTJf", - "outputId": "282ef2ec-4f3b-408e-baaa-8c96a8262220" + "id": "xhey4QA0DTJf" }, "outputs": [ { @@ -858,8 +849,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "id": "88758eebfd5c", - "outputId": "3a54fd79-1151-4ab2-9d97-571ca67f9147" + "id": "88758eebfd5c" }, "outputs": [ { @@ -920,7 +910,7 @@ ], "metadata": { "colab": { - "provenance": [], + "name": "python.ipynb", "toc_visible": true }, "google": { @@ -941,4 +931,4 @@ }, "nbformat": 4, "nbformat_minor": 0 -} \ No newline at end of file +} diff --git a/site/en/gemini-api/docs/prompting_with_media.ipynb b/site/en/gemini-api/docs/prompting_with_media.ipynb index b3d8085f4..cbe8786fe 100644 --- a/site/en/gemini-api/docs/prompting_with_media.ipynb +++ b/site/en/gemini-api/docs/prompting_with_media.ipynb @@ -174,12 +174,12 @@ }, { "cell_type": "markdown", - "source": [ - "### Upload an image file to the File API" - ], "metadata": { "id": "rsdNkDszLBmQ" - } + }, + "source": [ + "### Upload an image file to the File API" + ] }, { "cell_type": "markdown", @@ -207,8 +207,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "id": "N9NxXGZKKusG", - "outputId": "e0faa4d3-dc5e-4d78-8694-ddc92dee4379" + "id": "N9NxXGZKKusG" }, "outputs": [ { @@ -315,12 +314,12 @@ }, { "cell_type": "markdown", - "source": [ - "## Prompting with videos" - ], "metadata": { "id": "TaUZc1mvLkHY" - } + }, + "source": [ + "## Prompting with videos" + ] }, { "cell_type": "markdown", @@ -339,14 +338,14 @@ }, { "cell_type": "code", - "source": [ - "!wget https://download.blender.org/peach/bigbuckbunny_movies/BigBuckBunny_320x180.mp4" - ], + "execution_count": null, "metadata": { "id": "V4XeFdX1rxaE" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "!wget https://download.blender.org/peach/bigbuckbunny_movies/BigBuckBunny_320x180.mp4" + ] }, { "cell_type": "code", @@ -365,12 +364,12 @@ }, { "cell_type": "markdown", - "source": [ - "NOTE: The File API service currently samples the video at 1 FPS and may be subject to change to provide the best inference quality." - ], "metadata": { "id": "06GCLdmwNin5" - } + }, + "source": [ + "NOTE: The File API service currently samples the video at 1 FPS and may be subject to change to provide the best inference quality." + ] }, { "cell_type": "markdown", @@ -597,7 +596,8 @@ ], "metadata": { "colab": { - "provenance": [] + "name": "prompting_with_media.ipynb", + "toc_visible": true }, "kernelspec": { "display_name": "Python 3", @@ -606,4 +606,4 @@ }, "nbformat": 4, "nbformat_minor": 0 -} \ No newline at end of file +}