From 77bb1b2c5ecdab02e19f4ecf36f3feefe0214405 Mon Sep 17 00:00:00 2001 From: Josh Bickett <42594239+joshbickett@users.noreply.github.com> Date: Wed, 22 Jan 2025 17:26:12 -0800 Subject: [PATCH 1/7] Update README.mdttps;//ww --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 40c151f0..fb952c99 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,4 @@ +ome
@@ -19,7 +20,7 @@ ## Key Features - **Compatibility**: Designed for various multimodal models. -- **Integration**: Currently integrated with **GPT-4o, Gemini Pro Vision, Claude 3 and LLaVa.** +- **Integration**: Currently integrated with **GPT-4o, o1,th Gemini Pro Vision, Claude 3 and LLaVa.** - **Future Plans**: Support for additional models. ## Ongoing Development From f48e1a61584d37ec03d77e9f6140555ae8556f92 Mon Sep 17 00:00:00 2001 From: Josh Bickett <42594239+joshbickett@users.noreply.github.com> Date: Wed, 22 Jan 2025 17:29:06 -0800 Subject: [PATCH 2/7] Update README.md --- README.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index fb952c99..2730893c 100644 --- a/README.md +++ b/README.md @@ -20,19 +20,10 @@ ome ## Key Features - **Compatibility**: Designed for various multimodal models. -- **Integration**: Currently integrated with **GPT-4o, o1,th Gemini Pro Vision, Claude 3 and LLaVa.** +- **Integration**: Currently integrated with **GPT-4o, o1, Gemini Pro Vision, Claude 3 and LLaVa.** - **Future Plans**: Support for additional models. -## Ongoing Development -At [HyperwriteAI](https://www.hyperwriteai.com/), we are developing Agent-1-Vision a multimodal model with more accurate click location predictions. - -## Agent-1-Vision Model API Access -We will soon be offering API access to our Agent-1-Vision model. - -If you're interested in gaining access to this API, sign up [here](https://othersideai.typeform.com/to/FszaJ1k8?typeform-source=www.hyperwriteai.com). - ## Demo - https://github.com/OthersideAI/self-operating-computer/assets/42594239/9e8abc96-c76a-46fb-9b13-03678b3c67e0 @@ -61,6 +52,15 @@ operate ## Using `operate` Modes +#### Try OpenAI models + +The default model for the project is gpt-4o. Which is run by simply typing `operate`. To try running OpenAI's new `o1` model, use the command below. + +``` +operate -m o1-with-ocr +``` + + ### Multimodal Models `-m` An additional model is now compatible with the Self Operating Computer Framework. Try Google's `gemini-pro-vision` by following the instructions below. From 4672acc87ed4ccf7eeec78b9d49ae03b19867a1b Mon Sep 17 00:00:00 2001 From: Josh Bickett <42594239+joshbickett@users.noreply.github.com> Date: Wed, 22 Jan 2025 17:29:36 -0800 Subject: [PATCH 3/7] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 2730893c..b3fc34e2 100644 --- a/README.md +++ b/README.md @@ -52,7 +52,7 @@ operate ## Using `operate` Modes -#### Try OpenAI models +#### OpenAI models The default model for the project is gpt-4o. Which is run by simply typing `operate`. To try running OpenAI's new `o1` model, use the command below. From 516d26473fdfcf7c7d201e7a9a5c621b5e79fa25 Mon Sep 17 00:00:00 2001 From: Josh Bickett <42594239+joshbickett@users.noreply.github.com> Date: Wed, 22 Jan 2025 17:30:19 -0800 Subject: [PATCH 4/7] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index b3fc34e2..48c9d2d9 100644 --- a/README.md +++ b/README.md @@ -54,7 +54,7 @@ operate #### OpenAI models -The default model for the project is gpt-4o. Which is run by simply typing `operate`. To try running OpenAI's new `o1` model, use the command below. +The default model for the project is gpt-4o which you can use by simply typing `operate`. To try running OpenAI's new `o1` model, use the command below. ``` operate -m o1-with-ocr From 7b90a44e061685e6ab744da0b7f09c54cc7ba6cd Mon Sep 17 00:00:00 2001 From: Josh Bickett <42594239+joshbickett@users.noreply.github.com> Date: Wed, 22 Jan 2025 17:31:40 -0800 Subject: [PATCH 5/7] Update README.md --- README.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/README.md b/README.md index 48c9d2d9..00ee7f7e 100644 --- a/README.md +++ b/README.md @@ -62,9 +62,7 @@ operate -m o1-with-ocr ### Multimodal Models `-m` -An additional model is now compatible with the Self Operating Computer Framework. Try Google's `gemini-pro-vision` by following the instructions below. - -Start `operate` with the Gemini model +Try Google's `gemini-pro-vision` by following the instructions below. Start `operate` with the Gemini model ``` operate -m gemini-pro-vision ``` From 3b53614826f867d6364d434255d8fe722ff48884 Mon Sep 17 00:00:00 2001 From: Josh Bickett <42594239+joshbickett@users.noreply.github.com> Date: Wed, 22 Jan 2025 17:49:47 -0800 Subject: [PATCH 6/7] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 00ee7f7e..e8f445cb 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ ome A framework to enable multimodal models to operate a computer.
- Using the same inputs and outputs as a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective. + Using the same inputs and outputs as a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective. Self-Operating Computer was the first example of using a VLM to operate a computer.
- Using the same inputs and outputs as a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective. Self-Operating Computer was the first example of using a VLM to operate a computer. + Using the same inputs and outputs as a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective. Self-Operating Computer was the first project to use a VLM to operate a computer.