TextGrad Vision (#41)

* textgrad vision integration init * max tokens for multimodal calls * an example nb * support for png and jpgs without having to swap encodings * check if bytes * support image from * check for variable values * remove unfinihsed notebook * remove unused fns * gitignore logs * better error handling in the optimizer * remove redundant logging * better function names, a bit more typing * move the utilities from __init__ to a new file * additional tests and solving small issue --------- Co-authored-by: vinid <[email protected]>
zou-group · Jul 7, 2024 · d56d1c1 · d56d1c1
1 parent 60198d3
commit d56d1c1
Show file tree

Hide file tree

Showing 25 changed files with 1,102 additions and 133 deletions.
diff --git a/.gitignore b/.gitignore
@@ -160,3 +160,4 @@ cython_debug/
 #  and can be added to the global gitignore or merged into this file.  For a more nuclear
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
 #.idea/
+logs/
diff --git a/README.md b/README.md
@@ -103,17 +103,19 @@ We have many more examples around how TextGrad can optimize all kinds of variabl
 
 ### Tutorials 
 
-We have prepared a couple of tutorials to get you started with TextGrad. 
-You can run them directly in Google Colab by clicking on the links below.
+We have prepared a couple of tutorials to get you started with TextGrad. The order of this
+tutorial is what we would recommend to follow for a beginner. You can run them directly in Google Colab by clicking on the links below (but
+you need an OpenAI/Anthropic key to run the LLMs).
 
 <div align="center">
 
-| Example                                         | Colab Link                                                                                                                                                                                                    |
-|-------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Introduction to TextGrad Primitives             | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/TextGrad/blob/main/examples/notebooks/Primitives.ipynb)                       |
-| Optimizing a Code Snippet and Define a New Loss | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/textgrad/blob/main/examples/notebooks/Tutorial-Test-Time-Loss-for-Code.ipynb) |
-| Prompt Optimization                             | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/TextGrad/blob/main/examples/notebooks/Prompt-Optimization.ipynb)              |
-| Solution Optimization                           | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/TextGrad/blob/main/examples/notebooks/Tutorial-Solution-Optimization.ipynb)   |
+| Tutorial                                           | Difficulty                                                      | Colab Link                                                                                                                                                                                                    |
+|----------------------------------------------------|-----------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| 1. Introduction to TextGrad Primitives             | ![](https://img.shields.io/badge/Level-Beginner-green.svg)      | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/TextGrad/blob/main/examples/notebooks/Tutorial-Primitives.ipynb)              |
+| 2. Solution Optimization                           | ![](https://img.shields.io/badge/Level-Beginner-green.svg)      | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/TextGrad/blob/main/examples/notebooks/Tutorial-Solution-Optimization.ipynb)   |
+| 3. Optimizing a Code Snippet and Define a New Loss | ![](https://img.shields.io/badge/Level-Beginner-green.svg)      | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/textgrad/blob/main/examples/notebooks/Tutorial-Test-Time-Loss-for-Code.ipynb) |
+| 4. Prompt Optimization                             | ![](https://img.shields.io/badge/Level-Intermediate-yellow.svg) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/TextGrad/blob/main/examples/notebooks/Tutorial-Prompt-Optimization.ipynb)     |
+| 5. MultiModal Optimization                         | ![](https://img.shields.io/badge/Level-Beginner-green.svg)      | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/TextGrad/blob/main/examples/notebooks/Tutorial-MultiModal.ipynb)              |
 
 </div>
 

diff --git a/examples/notebooks/Local-Model-With-LMStudio.ipynb b/examples/notebooks/Local-Model-With-LMStudio.ipynb
@@ -182,7 +182,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "textgrad",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },
@@ -196,9 +196,9 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.9"
+   "version": "3.10.12"
   }
  },
  "nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 4
 }
diff --git a/examples/notebooks/Tutorial-MultiModal.ipynb b/examples/notebooks/Tutorial-MultiModal.ipynb
diff --git a/examples/notebooks/Primitives.ipynb → examples/notebooks/Tutorial-Primitives.ipynb b/examples/notebooks/Primitives.ipynb → examples/notebooks/Tutorial-Primitives.ipynb
@@ -64,7 +64,10 @@
    "cell_type": "markdown",
    "id": "8887fbed36c7daf2",
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "jupyter": {
+     "outputs_hidden": false
+    }
    },
    "source": [
     "## Introduction: Variable\n",
@@ -89,7 +92,10 @@
      "end_time": "2024-06-11T15:43:17.669096228Z",
      "start_time": "2024-06-11T15:43:17.665325560Z"
     },
-    "collapsed": false
+    "collapsed": false,
+    "jupyter": {
+     "outputs_hidden": false
+    }
    },
    "outputs": [],
    "source": [
@@ -105,7 +111,10 @@
      "end_time": "2024-06-11T15:43:18.184004948Z",
      "start_time": "2024-06-11T15:43:18.178187640Z"
     },
-    "collapsed": false
+    "collapsed": false,
+    "jupyter": {
+     "outputs_hidden": false
+    }
    },
    "outputs": [
     {
@@ -127,7 +136,10 @@
    "cell_type": "markdown",
    "id": "63f6a6921a1cce6a",
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "jupyter": {
+     "outputs_hidden": false
+    }
    },
    "source": [
     "## Introduction: Engine\n",
@@ -144,7 +156,10 @@
      "end_time": "2024-06-11T15:44:32.606319032Z",
      "start_time": "2024-06-11T15:44:32.561460448Z"
     },
-    "collapsed": false
+    "collapsed": false,
+    "jupyter": {
+     "outputs_hidden": false
+    }
    },
    "outputs": [],
    "source": [
@@ -155,7 +170,10 @@
    "cell_type": "markdown",
    "id": "33c7d6eaa115cd6a",
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "jupyter": {
+     "outputs_hidden": false
+    }
    },
    "source": [
     "This object behaves like you would expect an LLM to behave: You can sample generation from the engine using the `generate` method. "
@@ -170,7 +188,10 @@
      "end_time": "2024-06-11T17:29:41.108552705Z",
      "start_time": "2024-06-11T17:29:40.294256814Z"
     },
-    "collapsed": false
+    "collapsed": false,
+    "jupyter": {
+     "outputs_hidden": false
+    }
    },
    "outputs": [
     {
@@ -192,7 +213,10 @@
    "cell_type": "markdown",
    "id": "b627edc07c0d3737",
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "jupyter": {
+     "outputs_hidden": false
+    }
    },
    "source": [
     "## Introduction: Loss\n",
@@ -209,7 +233,10 @@
      "end_time": "2024-06-11T15:44:32.894722136Z",
      "start_time": "2024-06-11T15:44:32.890708561Z"
     },
-    "collapsed": false
+    "collapsed": false,
+    "jupyter": {
+     "outputs_hidden": false
+    }
    },
    "outputs": [],
    "source": [
@@ -221,15 +248,21 @@
    "cell_type": "markdown",
    "id": "ff137c99e0659dcc",
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "jupyter": {
+     "outputs_hidden": false
+    }
    },
    "source": []
   },
   {
    "cell_type": "markdown",
    "id": "6f05ec2bf907b3ba",
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "jupyter": {
+     "outputs_hidden": false
+    }
    },
    "source": [
     "## Introduction: Optimizer\n",
@@ -248,7 +281,10 @@
      "end_time": "2024-06-11T15:44:33.741130951Z",
      "start_time": "2024-06-11T15:44:33.734977769Z"
     },
-    "collapsed": false
+    "collapsed": false,
+    "jupyter": {
+     "outputs_hidden": false
+    }
    },
    "outputs": [],
    "source": [
@@ -259,7 +295,10 @@
    "cell_type": "markdown",
    "id": "d26883eb74ce0d01",
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "jupyter": {
+     "outputs_hidden": false
+    }
    },
    "source": [
     "## Putting it all together\n",
@@ -276,7 +315,10 @@
      "end_time": "2024-06-11T15:44:41.730132530Z",
      "start_time": "2024-06-11T15:44:34.997777872Z"
     },
-    "collapsed": false
+    "collapsed": false,
+    "jupyter": {
+     "outputs_hidden": false
+    }
    },
    "outputs": [],
    "source": [
@@ -294,7 +336,10 @@
      "end_time": "2024-06-11T15:44:41.738985151Z",
      "start_time": "2024-06-11T15:44:41.731989729Z"
     },
-    "collapsed": false
+    "collapsed": false,
+    "jupyter": {
+     "outputs_hidden": false
+    }
    },
    "outputs": [
     {
@@ -316,7 +361,10 @@
    "cell_type": "markdown",
    "id": "6a8aab93b80fb82c",
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "jupyter": {
+     "outputs_hidden": false
+    }
    },
    "source": [
     "While here it is not going to be useful, we can also do multiple optimization steps in a loop! Do not forget to reset the gradients after each step!"
@@ -330,7 +378,10 @@
     "ExecuteTime": {
      "start_time": "2024-06-11T15:44:30.989940227Z"
     },
-    "collapsed": false
+    "collapsed": false,
+    "jupyter": {
+     "outputs_hidden": false
+    }
    },
    "outputs": [],
    "source": [
@@ -342,7 +393,10 @@
    "execution_count": null,
    "id": "a3a84aad4cd58737",
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "jupyter": {
+     "outputs_hidden": false
+    }
    },
    "outputs": [],
    "source": []

diff --git a/examples/notebooks/Prompt-Optimization.ipynb → ...ebooks/Tutorial-Prompt-Optimization.ipynb b/examples/notebooks/Prompt-Optimization.ipynb → ...ebooks/Tutorial-Prompt-Optimization.ipynb
diff --git a/requirements.txt b/requirements.txt
@@ -6,4 +6,6 @@ platformdirs>=3.11.0
 datasets>=2.14.6
 diskcache>=5.6.3
 graphviz>=0.20.3
-gdown>=5.2.0
+gdown>=5.2.0
+pillow
+httpx
diff --git a/setup.py b/setup.py
@@ -8,9 +8,9 @@
 
 setup(
     name="textgrad",
-    version="0.1.3",
+    version="0.1.4",
     description="",
-    python_requires=">=3.8",
+    python_requires=">=3.9",
     classifiers=[
         "Development Status :: 2 - Pre-Alpha",
         "Intended Audience :: Developers",

diff --git a/tests/test_basics.py b/tests/test_basics.py
@@ -1,5 +1,6 @@
 import os
 import pytest
+from typing import Union, List
 import logging
 
 
@@ -18,6 +19,26 @@ def generate(self, prompt, system_prompt=None, **kwargs):
     def __call__(self, prompt, system_prompt=None):
         return self.generate(prompt)
 
+class DummyMultimodalEngine(EngineLM):
+
+    def __init__(self, is_multimodal=False):
+        self.is_multimodal = is_multimodal
+        self.model_string = "gpt-4o" # fake
+
+    def generate(self, content: Union[str, List[Union[str, bytes]]], system_prompt: str = None, **kwargs):
+        if isinstance(content, str):
+            return "Hello Text"
+
+        elif isinstance(content, list):
+            has_multimodal_input = any(isinstance(item, bytes) for item in content)
+            if (has_multimodal_input) and (not self.is_multimodal):
+                raise NotImplementedError("Multimodal generation is only supported for Claude-3 and beyond.")
+
+            return "Hello Text from Image"
+
+    def __call__(self, prompt, system_prompt=None):
+        return self.generate(prompt)
+
 # Idempotent engine that returns the prompt as is
 class IdempotentEngine(EngineLM):
     def generate(self, prompt, system_prompt=None, **kwargs):
@@ -124,3 +145,53 @@ def test_formattedllmcall():
     assert inputs["question"] in output.predecessors
     assert inputs["prediction"] in output.predecessors
     assert output.get_role_description() == "test response"
+
+
+def test_multimodal():
+    from textgrad.autograd import MultimodalLLMCall, LLMCall
+    from textgrad import Variable
+    import httpx
+
+    image_url = "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg"
+    image_data = httpx.get(image_url).content
+
+    os.environ['OPENAI_API_KEY'] = "fake_key"
+    engine = DummyMultimodalEngine(is_multimodal=True)
+
+    image_variable = Variable(image_data,
+                              role_description="image to answer a question about", requires_grad=False)
+
+    text = Variable("Hello", role_description="A variable")
+    question_variable = Variable("What do you see in this image?", role_description="question", requires_grad=False)
+    response = MultimodalLLMCall(engine=engine)([image_variable, question_variable])
+
+    assert response.value == "Hello Text from Image"
+
+    response = LLMCall(engine=engine)(text)
+
+    assert response.value == "Hello Text"
+
+    ## llm call cannot handle images
+    with pytest.raises(AttributeError):
+        response = LLMCall(engine=engine)([text, image_variable])
+
+    # this is just to check the content, we can't really have int variables but
+    # it's just for testing purposes
+    with pytest.raises(AssertionError):
+        response = MultimodalLLMCall(engine=engine)([Variable(4, role_description="tst"),
+                                                 Variable(5, role_description="tst")])
+
+def test_multimodal_from_url():
+    from textgrad import Variable
+    import httpx
+
+    image_url = "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg"
+    image_data = httpx.get(image_url).content
+
+    image_variable = Variable(image_path=image_url,
+                              role_description="image to answer a question about", requires_grad=False)
+
+    image_variable_2 = Variable(image_data,
+                                role_description="image to answer a question about", requires_grad=False)
+
+    assert image_variable_2.value == image_variable.value
diff --git a/textgrad/autograd/__init__.py b/textgrad/autograd/__init__.py
@@ -1,4 +1,5 @@
 from .functional import sum, aggregate
 from .llm_ops import LLMCall, FormattedLLMCall, LLMCall_with_in_context_examples
+from .multimodal_ops import MultimodalLLMCall, OrderedFieldsMultimodalLLMCall
 from .function import Module
 from .string_based_ops import StringBasedFunction
-Original file line number
+Diff line change
@@ Expand Up / @@ -160,3 +160,4 @@ cython_debug/ @@
     #  and can be added to the global gitignore or merged into this file.  For a more nuclear
     #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
     #.idea/
+    logs/