diff --git a/site/en/tutorials/function_calling_python_quickstart.ipynb b/site/en/tutorials/function_calling_python_quickstart.ipynb index 6195c72f5..fe8779a4e 100644 --- a/site/en/tutorials/function_calling_python_quickstart.ipynb +++ b/site/en/tutorials/function_calling_python_quickstart.ipynb @@ -3,7 +3,7 @@ { "cell_type": "markdown", "metadata": { - "id": "Tce3stUlHN0L" + "id": "2edc81e382cf" }, "source": [ "##### Copyright 2024 Google LLC." @@ -11,10 +11,10 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "metadata": { "cellView": "form", - "id": "tuOe1ymfHZPu" + "id": "906e07f6e562" }, "outputs": [], "source": [ @@ -37,7 +37,7 @@ "id": "yeadDkMiISin" }, "source": [ - "# Gemini API: Basic function calling with Python" + "# Gemini API: Function calling with Python" ] }, { @@ -59,13 +59,22 @@ "" ] }, + { + "cell_type": "markdown", + "metadata": { + "id": "df1767a3d1cc" + }, + "source": [ + "You can provide Gemini models with descriptions of functions. The model may ask you to call a function and send back the result to help the model handle your query." + ] + }, { "cell_type": "markdown", "metadata": { "id": "FFPBKLapSCkM" }, "source": [ - "## Setup" + "## Setup\n" ] }, { @@ -76,18 +85,18 @@ "source": [ "### Install the Python SDK\n", "\n", - "The Python SDK for the Gemini API, is contained in the [`google-generativeai`](https://pypi.org/project/google-generativeai/) package. Install the dependency using pip:" + "The Python SDK for the Gemini API is contained in the [`google-generativeai`](https://pypi.org/project/google-generativeai/) package. Install the dependency using pip:\n" ] }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": { "id": "9OEoeosRTv-5" }, "outputs": [], "source": [ - "!pip install -U google-generativeai" + "!pip install -U -q google-generativeai" ] }, { @@ -110,7 +119,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 3, "metadata": { "id": "TS9l5igubpHO" }, @@ -118,16 +127,14 @@ "source": [ "import pathlib\n", "import textwrap\n", + "import time\n", "\n", "import google.generativeai as genai\n", "\n", - "# Used to securely store your API key\n", - "from google.colab import userdata\n", "\n", - "from IPython.display import display\n", + "from IPython import display\n", "from IPython.display import Markdown\n", "\n", - "\n", "def to_markdown(text):\n", " text = text.replace('•', ' *')\n", " return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))" @@ -143,7 +150,7 @@ "\n", "Before you can use the Gemini API, you must first obtain an API key. If you don't already have one, create a key with one click in Google AI Studio.\n", "\n", - "Get an API key" + "Get an API key\n" ] }, { @@ -164,318 +171,335 @@ "Once you have the API key, pass it to the SDK. You can do this in two ways:\n", "\n", "* Put the key in the `GOOGLE_API_KEY` environment variable (the SDK will automatically pick it up from there).\n", - "* Pass the key to `genai.configure(api_key=...)`" + "* Pass the key to `genai.configure(api_key=...)`\n" ] }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 22, "metadata": { "id": "ab9ASynfcIZn" }, "outputs": [], "source": [ - "# Or use `os.getenv('API_KEY')` to fetch an environment variable.\n", - "API_KEY=userdata.get('API_KEY')\n", - "\n", - "genai.configure(api_key=API_KEY)" + "try:\n", + " # Used to securely store your API key\n", + " from google.colab import userdata\n", + " \n", + " # Or use `os.getenv('API_KEY')` to fetch an environment variable.\n", + " GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')\n", + "except ImportError:\n", + " import os\n", + " GOOGLE_API_KEY = os.environ['GOOGLE_API_KEY']\n", + " \n", + "genai.configure(api_key=GOOGLE_API_KEY)" ] }, { "cell_type": "markdown", "metadata": { - "id": "JFz04WEgOwWp" + "id": "3f383614ec30" }, "source": [ - "## Function calls" + "## Function Basics" ] }, { "cell_type": "markdown", "metadata": { - "id": "Js4Y4mO20txL" + "id": "b82c1aecb657" }, "source": [ - "The google.ai.generativelanguage client library provides access to the low level types required for function calling." + "You can pass a list of functions to the `tools` argument when creating a `genai.GenerativeModel`.\n", + "\n", + "> Important: The SDK converts the function's argument's type annotations to a format the API understands. The API only supports a limited selection of argument types, and this automatic conversion only supports a subset of that: `int | float | bool | str | list | dict`" ] }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 5, "metadata": { - "id": "S53E0EE8TBUF" + "id": "42b27b02d2f5" }, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "genai.GenerativeModel(\n", + " model_name='models/gemini-1.0-pro',\n", + " generation_config={},\n", + " safety_settings={},\n", + " tools=,\n", + ")" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ - "import google.ai.generativelanguage as glm" + "def multiply(a:float, b:float):\n", + " \"\"\"returns a * b.\"\"\"\n", + " return a*b\n", + "\n", + "model = genai.GenerativeModel(model_name='gemini-1.0-pro',\n", + " tools=[multiply])\n", + "\n", + "model" ] }, { "cell_type": "markdown", "metadata": { - "id": "qFD4U7ym04F5" + "id": "d5fd91032a1e" }, "source": [ - "A `glm.Tool` contains a list of `glm.FunctionDeclarations`. These just describe the function, they don't implement it." + "The recomended way to use function calling is through the chat interface. The main reason is that `FunctionCalls` fit nicely into chat's multi-turn structure." ] }, { "cell_type": "code", - "execution_count": 99, + "execution_count": 6, "metadata": { - "id": "mNfJ8Hjj1BMd" + "id": "d3b91c855257" }, "outputs": [], "source": [ - "datetime = glm.Tool(\n", - " function_declarations=[\n", - " glm.FunctionDeclaration(\n", - " name='now',\n", - " description=\"Returns the current UTC date and time.\"\n", - " )\n", - " ]\n", - ")" + "chat = model.start_chat(enable_automatic_function_calling=True)" ] }, { "cell_type": "markdown", "metadata": { - "id": "a11LZTKT1CRp" + "id": "1481a6159399" }, "source": [ - "Pass a list of tools to the `genai.GenerativeModel` constructor to give the model access:" + "With automatic function calling enabled `chat.send_message` automatically calls your function if the model asks it to.\n", + "\n", + "It appears to simply return a text response, containing the correct answer:" ] }, { "cell_type": "code", - "execution_count": 100, - "metadata": { - "id": "aGEcm_lSOv_T" - }, - "outputs": [], - "source": [ - "model = genai.GenerativeModel(\n", - " 'gemini-pro',\n", - " tools=[datetime])" - ] - }, - { - "cell_type": "markdown", + "execution_count": 7, "metadata": { - "id": "FANMyp-V1can" + "id": "81d8def3d865" }, + "outputs": [ + { + "data": { + "text/plain": [ + "'The total number of mittens is 2508.'" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ - "For this basic tools support use chat-mode since tools require multiple rounds of back and forth." + "response = chat.send_message('I have 57 cats, each owns 44 mittens, how many mittens is that in total?')\n", + "response.text" ] }, { "cell_type": "code", - "execution_count": 117, + "execution_count": 8, "metadata": { - "id": "LRx-I8i41cxT" + "id": "951c0f83f72e" }, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "2508" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ - "chat = model.start_chat()\n", - "\n", - "response = chat.send_message(\n", - " 'How many days until Christmas',\n", - ")" + "57*44" ] }, { "cell_type": "markdown", "metadata": { - "id": "S7GgtVSJ17kW" + "id": "7731e35f2383" }, "source": [ - "When the model needs to call a tool to answer a question it returns a `glm.Part` containing a `function_call` instead of a text attribute:" + "If you look in the `ChatSession.history` you can see the sequence of events:\n", + "\n", + "1. You sent the question.\n", + "2. The model replied with a `glm.FunctionCall`.\n", + "3. The `genai.ChatSession` executed the function locally and sent the model back a `glm.FunctionResponse`.\n", + "4. The model used the function output in its answer." ] }, { "cell_type": "code", - "execution_count": 118, + "execution_count": 9, "metadata": { - "id": "zQmmIhneQV_J" + "id": "9f7eff1e8e60" }, "outputs": [ { - "data": { - "text/plain": [ - "[index: 0\n", - "content {\n", - " parts {\n", - " function_call {\n", - " name: \"now\"\n", - " args {\n", - " }\n", - " }\n", - " }\n", - " role: \"model\"\n", - "}\n", - "finish_reason: STOP\n", - "]" - ] - }, - "execution_count": 118, - "metadata": {}, - "output_type": "execute_result" + "name": "stdout", + "output_type": "stream", + "text": [ + "user -> {'text': 'I have 57 cats, each owns 44 mittens, how many mittens is that in total?'}\n", + "--------------------------------------------------------------------------------\n", + "model -> {'function_call': {'name': 'multiply', 'args': {'a': 57.0, 'b': 44.0}}}\n", + "--------------------------------------------------------------------------------\n", + "user -> {'function_response': {'name': 'multiply', 'response': {'result': 2508.0}}}\n", + "--------------------------------------------------------------------------------\n", + "model -> {'text': 'The total number of mittens is 2508.'}\n", + "--------------------------------------------------------------------------------\n" + ] } ], "source": [ - "response.candidates" + "for content in chat.history:\n", + " part = content.parts[0]\n", + " print(content.role, \"->\", type(part).to_dict(part))\n", + " print('-'*80)" ] }, { "cell_type": "markdown", "metadata": { - "id": "woioLEWo4b5N" + "id": "2471fd72f05e" }, "source": [ - "Reply with a `glm.Part` containing a `glm.FunctionResponse` to allow the model to finish the answer:" + "In general the state diagram is:\n", + "\n", + "\"The" ] }, { - "cell_type": "code", - "execution_count": 119, + "cell_type": "markdown", "metadata": { - "id": "TbmidjuxSaH6" + "id": "f42d69800cff" }, - "outputs": [], "source": [ - "response = chat.send_message(\n", - " glm.Content(\n", - " parts=[glm.Part(\n", - " function_response = glm.FunctionResponse(\n", - " name='now',\n", - " response={'datetime': 'Sun Dec 5 03:33:56 PM UTC 2023'}\n", - " )\n", - " )]\n", - " )\n", - ")" + "The model can respond with multiple function calls before returning a text response, and function calls come before the text response." ] }, { "cell_type": "markdown", "metadata": { - "id": "6DmxLrJZ4sYl" + "id": "9610f3465a69" }, "source": [ - "The model may respond with either a text response or another `glm.FunctionCall`:" + "While this was all handled automatically, if you need more control, you can:\n", + "\n", + "- Leave the default `enable_automatic_function_calling=False` and process the `glm.FunctionCall` responses yourself.\n", + "- Or use `GenerativeModel.generate_content`, where you also need to manage the chat history. " ] }, { - "cell_type": "code", - "execution_count": 120, + "cell_type": "markdown", "metadata": { - "id": "1EWmHodLVCLK" + "id": "JFz04WEgOwWp" }, - "outputs": [ - { - "data": { - "text/plain": [ - "' Okay, Christmas this year, 2023, is on Monday, December 25th. That makes it 20 days from now.'" - ] - }, - "execution_count": 120, - "metadata": {}, - "output_type": "execute_result" - } - ], "source": [ - "response.text" + "## [Optional] Low level access" ] }, { "cell_type": "markdown", "metadata": { - "id": "KyWYFj0b43X2" + "id": "Js4Y4mO20txL" }, "source": [ - "That `datetime` tool only contained a single function, which takes no arguments. Next try something more complex.\n", + "The automatic extraction of the schema from python functions doesn't work in all cases. For example: it doesn't handle cases where you describe the fields of a nested dictionary-object, but the API does support this. The API is able to describe any of the follwing types:\n", + "\n", + "```\n", + "AllowedType = (int | float | bool | str | list['AllowedType'] | dict[str, AllowedType]\n", + "```\n", "\n", - "LLMs are, generally, not 100% accurate at arithmetic:" + "The `google.ai.generativelanguage` client library provides access to the low level types giving you full control." ] }, { "cell_type": "code", - "execution_count": 133, + "execution_count": 10, "metadata": { - "id": "YCBm3EFDH0Kr" + "id": "S53E0EE8TBUF" }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "549899573314\n" - ] - } - ], + "outputs": [], "source": [ - "model = genai.GenerativeModel('gemini-pro')\n", - "chat = model.start_chat()\n", - "\n", - "a = 2312371\n", - "b = 234234\n", - "\n", - "response = chat.send_message(\n", - " f\"What's {a} X {b} ?\",\n", - "\n", - ")\n", - "print(response.text)" + "import google.ai.generativelanguage as glm" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "b4f73eef235e" + }, + "source": [ + "First peek inside the model's `_tools` attribute, you can see how it describes the function(s) you passed it to the model:" ] }, { "cell_type": "code", - "execution_count": 134, + "execution_count": 11, "metadata": { - "id": "u2cTnhrzIEQe" + "id": "e36166b2c1b6" }, "outputs": [ { "data": { "text/plain": [ - "541635908814" + "[function_declarations {\n", + " name: \"multiply\"\n", + " description: \"returns a * b.\"\n", + " parameters {\n", + " type_: OBJECT\n", + " properties {\n", + " key: \"b\"\n", + " value {\n", + " type_: NUMBER\n", + " }\n", + " }\n", + " properties {\n", + " key: \"a\"\n", + " value {\n", + " type_: NUMBER\n", + " }\n", + " }\n", + " required: \"a\"\n", + " required: \"b\"\n", + " }\n", + " }]" ] }, - "execution_count": 134, + "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "a*b" + "def multiply(a:float, b:float):\n", + " \"\"\"returns a * b.\"\"\"\n", + " return a*b\n", + "\n", + "model = genai.GenerativeModel(model_name='gemini-1.0-pro',\n", + " tools=[multiply])\n", + "\n", + "model._tools.to_proto()" ] }, { "cell_type": "markdown", "metadata": { - "id": "xmoe63Cd5-AF" + "id": "qFD4U7ym04F5" }, "source": [ - "Sometimes it's off by ~1%, sometimes it's off by 10X." - ] - }, - { - "cell_type": "code", - "execution_count": 135, - "metadata": { - "id": "dt0NhB5NJAOz" - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Error: -1.53%\n" - ] - } - ], - "source": [ - "error_percent = (a*b - int(response.text.replace(',', '')))/(a*b) * 100\n", - "\n", - "print(f\"Error: {error_percent:.2f}%\")" + "This returns the list of `glm.Tool` objects that would be sent to the API. If the printed format is not familiar, it's because these are Google protobuf classes. Each `glm.Tool` (1 in this case) contains a list of `glm.FunctionDeclarations`, which describe a function and its arguments." ] }, { @@ -484,12 +508,14 @@ "id": "eY6RmFQ76FVu" }, "source": [ - "So, describe a calculator as a `glm.Tool`:" + "Here is a declaration for the same multiply function written using the `glm` classes.\n", + "\n", + "Note that these classes just describe the function for the API, they don't include an implementation of it. So using this doesn't work with automatic function calling, but functions don't always need an implementation." ] }, { "cell_type": "code", - "execution_count": 56, + "execution_count": 12, "metadata": { "id": "qCwHM4WbC4wb" }, @@ -498,18 +524,6 @@ "calculator = glm.Tool(\n", " function_declarations=[\n", " glm.FunctionDeclaration(\n", - " name='add',\n", - " description=\"Returns the sum of two numbers.\",\n", - " parameters=glm.Schema(\n", - " type=glm.Type.OBJECT,\n", - " properties={\n", - " 'a': glm.Schema(type=glm.Type.NUMBER),\n", - " 'b': glm.Schema(type=glm.Type.NUMBER)\n", - " },\n", - " required=['a','b']\n", - " )\n", - " ),\n", - " glm.FunctionDeclaration(\n", " name='multiply',\n", " description=\"Returns the product of two numbers.\",\n", " parameters=glm.Schema(\n", @@ -524,28 +538,97 @@ " ])" ] }, + { + "cell_type": "markdown", + "metadata": { + "id": "19ad564235a6" + }, + "source": [ + "Equivalently, you can describe this as a JSON-compatible object:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "id": "5f2804046c94" + }, + "outputs": [], + "source": [ + "calculator = {'function_declarations': [\n", + " {'name': 'multiply',\n", + " 'description': 'Returns the product of two numbers.',\n", + " 'parameters': {'type_': 'OBJECT',\n", + " 'properties': {\n", + " 'a': {'type_': 'NUMBER'},\n", + " 'b': {'type_': 'NUMBER'}},\n", + " 'required': ['a', 'b']}}]}" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "id": "4cefe2c3c808" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "function_declarations {\n", + " name: \"multiply\"\n", + " description: \"Returns the product of two numbers.\"\n", + " parameters {\n", + " type_: OBJECT\n", + " properties {\n", + " key: \"b\"\n", + " value {\n", + " type_: NUMBER\n", + " }\n", + " }\n", + " properties {\n", + " key: \"a\"\n", + " value {\n", + " type_: NUMBER\n", + " }\n", + " }\n", + " required: \"a\"\n", + " required: \"b\"\n", + " }\n", + "}" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "glm.Tool(calculator)" + ] + }, { "cell_type": "markdown", "metadata": { "id": "jS6ruiTp6VBf" }, "source": [ - "Give the model the calculator and ask again:" + "Either way, you pass a representation of a `glm.Tool` or list of tools to " ] }, { "cell_type": "code", - "execution_count": 93, + "execution_count": 15, "metadata": { "id": "xwhWG22cIIDU" }, "outputs": [], "source": [ - "model = genai.GenerativeModel('gemini-pro', tools=[calculator])\n", + "model = genai.GenerativeModel('gemini-pro', tools=calculator)\n", "chat = model.start_chat()\n", "\n", "response = chat.send_message(\n", - " f\"What's {a} X {b} ?\",\n", + " f\"What's 234551 X 325552 ?\",\n", ")" ] }, @@ -555,12 +638,12 @@ "id": "517ca06297bb" }, "source": [ - "Now instead of guessing at the answer the model returns a `glm.FunctionCall` invoking the calculator's `multiply` function: " + "Like before the model returns a `glm.FunctionCall` invoking the calculator's `multiply` function: " ] }, { "cell_type": "code", - "execution_count": 95, + "execution_count": 16, "metadata": { "id": "xhey4QA0DTJf" }, @@ -577,13 +660,13 @@ " fields {\n", " key: \"b\"\n", " value {\n", - " number_value: 234234\n", + " number_value: 325552\n", " }\n", " }\n", " fields {\n", " key: \"a\"\n", " value {\n", - " number_value: 2312371\n", + " number_value: 234551\n", " }\n", " }\n", " }\n", @@ -595,7 +678,7 @@ "]" ] }, - "execution_count": 95, + "execution_count": 16, "metadata": {}, "output_type": "execute_result" } @@ -615,7 +698,7 @@ }, { "cell_type": "code", - "execution_count": 96, + "execution_count": 17, "metadata": { "id": "88758eebfd5c" }, @@ -623,10 +706,10 @@ { "data": { "text/plain": [ - "541635908814.0" + "76358547152.0" ] }, - "execution_count": 96, + "execution_count": 17, "metadata": {}, "output_type": "execute_result" } @@ -650,7 +733,7 @@ }, { "cell_type": "code", - "execution_count": 97, + "execution_count": 18, "metadata": { "id": "f3c67066411e" }, @@ -661,33 +744,7 @@ " parts=[glm.Part(\n", " function_response = glm.FunctionResponse(\n", " name='multiply',\n", - " response={'result': result}\n", - " )\n", - " )]\n", - " )\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": 98, - "metadata": { - "id": "9f7a9662d816" - }, - "outputs": [ - { - "data": { - "text/plain": [ - "' 541636000000'" - ] - }, - "execution_count": 98, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "response.text" + " response={'result': result}))]))" ] }, {