diff --git a/site/en/gemini-api/docs/vision.ipynb b/site/en/gemini-api/docs/vision.ipynb index 802d2e1b5..48074c34a 100644 --- a/site/en/gemini-api/docs/vision.ipynb +++ b/site/en/gemini-api/docs/vision.ipynb @@ -394,7 +394,9 @@ "source": [ "### Get bounding boxes\n", "\n", - "You can ask the model for the coordinates of bounding boxes for objects in images." + "You can ask the model for the coordinates of bounding boxes for objects in images. For object detection, the Gemini model has been trained to provide\n", + "these coordinates as relative widths or heights in range `[0,1]`, scaled by 1000 and converted to an integer. Effectively, the coordinates given are for a\n", + "1000x1000 version of the original image, and need to be converted back to the dimensions of the original image." ] }, { @@ -414,6 +416,19 @@ "print(response.text)" ] }, + { + "cell_type": "markdown", + "metadata": { + "id": "b8e422c55df2" + }, + "source": [ + "To convert these coordinates to the dimensions of the original image:\n", + "\n", + "1. Divide each output coordinate by 1000.\n", + "1. Multiply the x-coordinates by the original image width.\n", + "1. Multiply the y-coordinates by the original image height." + ] + }, { "cell_type": "markdown", "metadata": {