Skip to content

Commit

Permalink
TextGrad Vision (#41)
Browse files Browse the repository at this point in the history
* textgrad vision integration init

* max tokens for multimodal calls

* an example nb

* support for png and jpgs without having to swap encodings

* check if bytes

* support image from

* check for variable values

* remove unfinihsed notebook

* remove unused fns

* gitignore logs

* better error handling in the optimizer

* remove redundant logging

* better function names, a bit more typing

* move the utilities from __init__ to a new file

* additional tests and solving small issue

---------

Co-authored-by: vinid <[email protected]>
  • Loading branch information
mertyg and vinid authored Jul 7, 2024
1 parent 60198d3 commit d56d1c1
Show file tree
Hide file tree
Showing 25 changed files with 1,102 additions and 133 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -160,3 +160,4 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
logs/
18 changes: 10 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,17 +103,19 @@ We have many more examples around how TextGrad can optimize all kinds of variabl

### Tutorials

We have prepared a couple of tutorials to get you started with TextGrad.
You can run them directly in Google Colab by clicking on the links below.
We have prepared a couple of tutorials to get you started with TextGrad. The order of this
tutorial is what we would recommend to follow for a beginner. You can run them directly in Google Colab by clicking on the links below (but
you need an OpenAI/Anthropic key to run the LLMs).

<div align="center">

| Example | Colab Link |
|-------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Introduction to TextGrad Primitives | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/TextGrad/blob/main/examples/notebooks/Primitives.ipynb) |
| Optimizing a Code Snippet and Define a New Loss | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/textgrad/blob/main/examples/notebooks/Tutorial-Test-Time-Loss-for-Code.ipynb) |
| Prompt Optimization | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/TextGrad/blob/main/examples/notebooks/Prompt-Optimization.ipynb) |
| Solution Optimization | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/TextGrad/blob/main/examples/notebooks/Tutorial-Solution-Optimization.ipynb) |
| Tutorial | Difficulty | Colab Link |
|----------------------------------------------------|-----------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1. Introduction to TextGrad Primitives | ![](https://img.shields.io/badge/Level-Beginner-green.svg) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/TextGrad/blob/main/examples/notebooks/Tutorial-Primitives.ipynb) |
| 2. Solution Optimization | ![](https://img.shields.io/badge/Level-Beginner-green.svg) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/TextGrad/blob/main/examples/notebooks/Tutorial-Solution-Optimization.ipynb) |
| 3. Optimizing a Code Snippet and Define a New Loss | ![](https://img.shields.io/badge/Level-Beginner-green.svg) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/textgrad/blob/main/examples/notebooks/Tutorial-Test-Time-Loss-for-Code.ipynb) |
| 4. Prompt Optimization | ![](https://img.shields.io/badge/Level-Intermediate-yellow.svg) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/TextGrad/blob/main/examples/notebooks/Tutorial-Prompt-Optimization.ipynb) |
| 5. MultiModal Optimization | ![](https://img.shields.io/badge/Level-Beginner-green.svg) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/TextGrad/blob/main/examples/notebooks/Tutorial-MultiModal.ipynb) |

</div>

Expand Down
6 changes: 3 additions & 3 deletions examples/notebooks/Local-Model-With-LMStudio.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "textgrad",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -196,9 +196,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.9"
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}
265 changes: 265 additions & 0 deletions examples/notebooks/Tutorial-MultiModal.ipynb

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,10 @@
"cell_type": "markdown",
"id": "8887fbed36c7daf2",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"## Introduction: Variable\n",
Expand All @@ -89,7 +92,10 @@
"end_time": "2024-06-11T15:43:17.669096228Z",
"start_time": "2024-06-11T15:43:17.665325560Z"
},
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
Expand All @@ -105,7 +111,10 @@
"end_time": "2024-06-11T15:43:18.184004948Z",
"start_time": "2024-06-11T15:43:18.178187640Z"
},
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
Expand All @@ -127,7 +136,10 @@
"cell_type": "markdown",
"id": "63f6a6921a1cce6a",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"## Introduction: Engine\n",
Expand All @@ -144,7 +156,10 @@
"end_time": "2024-06-11T15:44:32.606319032Z",
"start_time": "2024-06-11T15:44:32.561460448Z"
},
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
Expand All @@ -155,7 +170,10 @@
"cell_type": "markdown",
"id": "33c7d6eaa115cd6a",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"This object behaves like you would expect an LLM to behave: You can sample generation from the engine using the `generate` method. "
Expand All @@ -170,7 +188,10 @@
"end_time": "2024-06-11T17:29:41.108552705Z",
"start_time": "2024-06-11T17:29:40.294256814Z"
},
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
Expand All @@ -192,7 +213,10 @@
"cell_type": "markdown",
"id": "b627edc07c0d3737",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"## Introduction: Loss\n",
Expand All @@ -209,7 +233,10 @@
"end_time": "2024-06-11T15:44:32.894722136Z",
"start_time": "2024-06-11T15:44:32.890708561Z"
},
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
Expand All @@ -221,15 +248,21 @@
"cell_type": "markdown",
"id": "ff137c99e0659dcc",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": []
},
{
"cell_type": "markdown",
"id": "6f05ec2bf907b3ba",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"## Introduction: Optimizer\n",
Expand All @@ -248,7 +281,10 @@
"end_time": "2024-06-11T15:44:33.741130951Z",
"start_time": "2024-06-11T15:44:33.734977769Z"
},
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
Expand All @@ -259,7 +295,10 @@
"cell_type": "markdown",
"id": "d26883eb74ce0d01",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"## Putting it all together\n",
Expand All @@ -276,7 +315,10 @@
"end_time": "2024-06-11T15:44:41.730132530Z",
"start_time": "2024-06-11T15:44:34.997777872Z"
},
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
Expand All @@ -294,7 +336,10 @@
"end_time": "2024-06-11T15:44:41.738985151Z",
"start_time": "2024-06-11T15:44:41.731989729Z"
},
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
Expand All @@ -316,7 +361,10 @@
"cell_type": "markdown",
"id": "6a8aab93b80fb82c",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"While here it is not going to be useful, we can also do multiple optimization steps in a loop! Do not forget to reset the gradients after each step!"
Expand All @@ -330,7 +378,10 @@
"ExecuteTime": {
"start_time": "2024-06-11T15:44:30.989940227Z"
},
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
Expand All @@ -342,7 +393,10 @@
"execution_count": null,
"id": "a3a84aad4cd58737",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": []
Expand Down
4 changes: 3 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,6 @@ platformdirs>=3.11.0
datasets>=2.14.6
diskcache>=5.6.3
graphviz>=0.20.3
gdown>=5.2.0
gdown>=5.2.0
pillow
httpx
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@

setup(
name="textgrad",
version="0.1.3",
version="0.1.4",
description="",
python_requires=">=3.8",
python_requires=">=3.9",
classifiers=[
"Development Status :: 2 - Pre-Alpha",
"Intended Audience :: Developers",
Expand Down
71 changes: 71 additions & 0 deletions tests/test_basics.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import os
import pytest
from typing import Union, List
import logging


Expand All @@ -18,6 +19,26 @@ def generate(self, prompt, system_prompt=None, **kwargs):
def __call__(self, prompt, system_prompt=None):
return self.generate(prompt)

class DummyMultimodalEngine(EngineLM):

def __init__(self, is_multimodal=False):
self.is_multimodal = is_multimodal
self.model_string = "gpt-4o" # fake

def generate(self, content: Union[str, List[Union[str, bytes]]], system_prompt: str = None, **kwargs):
if isinstance(content, str):
return "Hello Text"

elif isinstance(content, list):
has_multimodal_input = any(isinstance(item, bytes) for item in content)
if (has_multimodal_input) and (not self.is_multimodal):
raise NotImplementedError("Multimodal generation is only supported for Claude-3 and beyond.")

return "Hello Text from Image"

def __call__(self, prompt, system_prompt=None):
return self.generate(prompt)

# Idempotent engine that returns the prompt as is
class IdempotentEngine(EngineLM):
def generate(self, prompt, system_prompt=None, **kwargs):
Expand Down Expand Up @@ -124,3 +145,53 @@ def test_formattedllmcall():
assert inputs["question"] in output.predecessors
assert inputs["prediction"] in output.predecessors
assert output.get_role_description() == "test response"


def test_multimodal():
from textgrad.autograd import MultimodalLLMCall, LLMCall
from textgrad import Variable
import httpx

image_url = "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg"
image_data = httpx.get(image_url).content

os.environ['OPENAI_API_KEY'] = "fake_key"
engine = DummyMultimodalEngine(is_multimodal=True)

image_variable = Variable(image_data,
role_description="image to answer a question about", requires_grad=False)

text = Variable("Hello", role_description="A variable")
question_variable = Variable("What do you see in this image?", role_description="question", requires_grad=False)
response = MultimodalLLMCall(engine=engine)([image_variable, question_variable])

assert response.value == "Hello Text from Image"

response = LLMCall(engine=engine)(text)

assert response.value == "Hello Text"

## llm call cannot handle images
with pytest.raises(AttributeError):
response = LLMCall(engine=engine)([text, image_variable])

# this is just to check the content, we can't really have int variables but
# it's just for testing purposes
with pytest.raises(AssertionError):
response = MultimodalLLMCall(engine=engine)([Variable(4, role_description="tst"),
Variable(5, role_description="tst")])

def test_multimodal_from_url():
from textgrad import Variable
import httpx

image_url = "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg"
image_data = httpx.get(image_url).content

image_variable = Variable(image_path=image_url,
role_description="image to answer a question about", requires_grad=False)

image_variable_2 = Variable(image_data,
role_description="image to answer a question about", requires_grad=False)

assert image_variable_2.value == image_variable.value
1 change: 1 addition & 0 deletions textgrad/autograd/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from .functional import sum, aggregate
from .llm_ops import LLMCall, FormattedLLMCall, LLMCall_with_in_context_examples
from .multimodal_ops import MultimodalLLMCall, OrderedFieldsMultimodalLLMCall
from .function import Module
from .string_based_ops import StringBasedFunction
Loading

0 comments on commit d56d1c1

Please sign in to comment.