You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently trying to run the demo and when I run it, it gives the following error
Model inited on device: mps.
Screen BBox: (-249, -1080, 1671, 0)
Model Inited: gpt-4o + ShowUI, Provider: openai
Start the message loop. User messages: [{'role': 'user', 'content': [TextBlock(text='1.Click the "PDF 下载" button on the right side of the page. 2.Then click "下载PDF" button on the left side of the page.', type='text')]}]
filtered_messages: ['1.Click the "PDF 下载" button on the right side of the page. 2.Then click "下载PDF" button on the left side of the page.']
_render_message: Screenshot for VLMPlanner:
<img src="
Sending messages to VLMPlanner: ['1.Click the "PDF 下载" button on the right side of the page. 2.Then click "下载PDF" button on the left side of the page.']
[oai] sending messages: [{'role': 'system', 'content': '\nYou are using an Darwin device.\nYou are able to use a mouse and keyboard to interact with the computer based on the given task and screenshot.\nYou can only interact with the desktop GUI (no terminal or application menu access).\n\nYou may be given some history plan and actions, this is the response from the previous loop.\nYou should carefully consider your plan base on the task, screenshot, and history actions.\n\nYour available "Next Action" only include:\n- ENTER: Press an enter key.\n- ESCAPE: Press an ESCAPE key.\n- INPUT: Input a string of text.\n- CLICK: Describe the ui element to be clicked.\n- HOVER: Describe the ui element to be hovered.\n- SCROLL: Scroll the screen, you must specify up or down.\n- PRESS: Describe the ui element to be pressed.\n\n\nOutput format:\njson\n{\n "Thinking": str, # describe your thoughts on how to achieve the task, choose one action from available actions at a time.\n "Next Action": "action_type, action description" | "None" # one action at a time, describe it in short and precisely. \n}\n\n\nOne Example:\njson\n{ \n "Thinking": "I need to search and navigate to amazon.com.",\n "Next Action": "CLICK \'Search Google or type a URL\'."\n}\n\n\nIMPORTANT NOTES:\n1. Carefully observe the screenshot to understand the current state and read history actions.\n2. You should only give a single action at a time. for example, INPUT text, and ENTER can't be in one Next Action.\n3. Attach the text to Next Action, if there is text or any description for the button. \n4. You should not include other actions, such as keyboard shortcuts.\n5. When the task is completed, you should say "Next Action": "None" in the json field.\n\n\nNOTE: you are operating a Mac machine'}, {'role': 'user', 'content': [{'type': 'text', 'text': '1.Click the "PDF 下载" button on the right side of the page. 2.Then click "下载PDF" button on the left side of the page.'}]}]
oai token usage: 486
VLMPlanner response:
{
"Thinking": "I need to locate and click the 'PDF 下载' button on the right side of the page.",
"Next Action": "CLICK 'PDF 下载' button on the right side of the page."
}
VLMPlanner total token usage so far: 486. Total cost so far: $USD0.00007
_render_message: VLMPlanner:
I need to locate and click the 'PDF 下载' button on the right side of the page.
Next A
_render_message: VLMPlanner sending action to **S<span style="color
_render_message: Screenshot for **S<span style="color:rgb(111, 163, 82)
Output Text: [{'action': 'CLICK', 'value': None, 'position': [0.78, 0.45]}]
Parsed Output: [{'action': 'CLICK', 'value': None, 'position': [0.78, 0.45]}]
Action Item: {'action': 'CLICK', 'value': None, 'position': [0.78, 0.45]}
Parsed Action List: [{'action': 'mouse_move', 'text': None, 'coordinate': (1497, 486)}, {'action': 'left_click', 'text': None, 'coordinate': None}]
_render_message: **Sh<span
Converted Action: {'action': 'mouse_move', 'text': None, 'coordinate': (1497, 486)}
sync_call: computer {'action': 'mouse_move', 'text': None, 'coordinate': (1497, 486)}
action: mouse_move, text: None, coordinate: (1497, 486)
mouse move to 1248, -594
_render_message: **Sh<span
Converted Action: {'action': 'left_click', 'text': None, 'coordinate': None}
sync_call: computer {'action': 'left_click', 'text': None, 'coordinate': None}
action: left_click, text: None, coordinate: None
Traceback (most recent call last):
File "/opt/anaconda3/envs/ootb/lib/python3.11/site-packages/gradio/queueing.py", line 715, in process_events
response = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/ootb/lib/python3.11/site-packages/gradio/route_utils.py", line 322, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/ootb/lib/python3.11/site-packages/gradio/blocks.py", line 2042, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/ootb/lib/python3.11/site-packages/gradio/blocks.py", line 1601, in call_function
prediction = await utils.async_iteration(iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/ootb/lib/python3.11/site-packages/gradio/utils.py", line 728, in async_iteration
return await anext(iterator)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/ootb/lib/python3.11/site-packages/gradio/utils.py", line 722, in anext
return await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/ootb/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/ootb/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2461, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/opt/anaconda3/envs/ootb/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 962, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/ootb/lib/python3.11/site-packages/gradio/utils.py", line 705, in run_sync_iterator_async
return next(iterator)
^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/ootb/lib/python3.11/site-packages/gradio/utils.py", line 866, in gen_wrapper
response = next(iterator)
^^^^^^^^^^^^^^
File "/Users/blink_bbk/VSCodeProjects/computer_use_ootb/app.py", line 273, in process_input
for loop_msg in sampling_loop_sync(
File "/Users/blink_bbk/VSCodeProjects/computer_use_ootb/computer_use_demo/loop.py", line 210, in sampling_loop_sync
print(f"End of loop {showui_loop_count+1}. Messages: {str(messages)[:100000]}. Total cost: $USD{planner.total_cost:.5f}")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'showui_loop_count' where it is not associated with a value
The text was updated successfully, but these errors were encountered:
I am currently trying to run the demo and when I run it, it gives the following error
Model inited on device: mps.
Screen BBox: (-249, -1080, 1671, 0)
Model Inited: gpt-4o + ShowUI, Provider: openai
Start the message loop. User messages: [{'role': 'user', 'content': [TextBlock(text='1.Click the "PDF 下载" button on the right side of the page. 2.Then click "下载PDF" button on the left side of the page.', type='text')]}]
filtered_messages: ['1.Click the "PDF 下载" button on the right side of the page. 2.Then click "下载PDF" button on the left side of the page.']
_render_message: Screenshot for VLMPlanner:
<img src="
Sending messages to VLMPlanner: ['1.Click the "PDF 下载" button on the right side of the page. 2.Then click "下载PDF" button on the left side of the page.']
[oai] sending messages: [{'role': 'system', 'content': '\nYou are using an Darwin device.\nYou are able to use a mouse and keyboard to interact with the computer based on the given task and screenshot.\nYou can only interact with the desktop GUI (no terminal or application menu access).\n\nYou may be given some history plan and actions, this is the response from the previous loop.\nYou should carefully consider your plan base on the task, screenshot, and history actions.\n\nYour available "Next Action" only include:\n- ENTER: Press an enter key.\n- ESCAPE: Press an ESCAPE key.\n- INPUT: Input a string of text.\n- CLICK: Describe the ui element to be clicked.\n- HOVER: Describe the ui element to be hovered.\n- SCROLL: Scroll the screen, you must specify up or down.\n- PRESS: Describe the ui element to be pressed.\n\n\nOutput format:\n
json\n{\n "Thinking": str, # describe your thoughts on how to achieve the task, choose one action from available actions at a time.\n "Next Action": "action_type, action description" | "None" # one action at a time, describe it in short and precisely. \n}\n
\n\nOne Example:\njson\n{ \n "Thinking": "I need to search and navigate to amazon.com.",\n "Next Action": "CLICK \'Search Google or type a URL\'."\n}\n
\n\nIMPORTANT NOTES:\n1. Carefully observe the screenshot to understand the current state and read history actions.\n2. You should only give a single action at a time. for example, INPUT text, and ENTER can't be in one Next Action.\n3. Attach the text to Next Action, if there is text or any description for the button. \n4. You should not include other actions, such as keyboard shortcuts.\n5. When the task is completed, you should say "Next Action": "None" in the json field.\n\n\nNOTE: you are operating a Mac machine'}, {'role': 'user', 'content': [{'type': 'text', 'text': '1.Click the "PDF 下载" button on the right side of the page. 2.Then click "下载PDF" button on the left side of the page.'}]}]oai token usage: 486
VLMPlanner response:
{
"Thinking": "I need to locate and click the 'PDF 下载' button on the right side of the page.",
"Next Action": "CLICK 'PDF 下载' button on the right side of the page."
}
VLMPlanner total token usage so far: 486. Total cost so far: $USD0.00007
_render_message: VLMPlanner:
I need to locate and click the 'PDF 下载' button on the right side of the page.
Next A
_render_message: VLMPlanner sending action to **S<span style="color
_render_message: Screenshot for **S<span style="color:rgb(111, 163, 82)
Output Text: [{'action': 'CLICK', 'value': None, 'position': [0.78, 0.45]}]
Parsed Output: [{'action': 'CLICK', 'value': None, 'position': [0.78, 0.45]}]
Action Item: {'action': 'CLICK', 'value': None, 'position': [0.78, 0.45]}
Parsed Action List: [{'action': 'mouse_move', 'text': None, 'coordinate': (1497, 486)}, {'action': 'left_click', 'text': None, 'coordinate': None}]
_render_message: **Sh<span
Converted Action: {'action': 'mouse_move', 'text': None, 'coordinate': (1497, 486)}
sync_call: computer {'action': 'mouse_move', 'text': None, 'coordinate': (1497, 486)}
action: mouse_move, text: None, coordinate: (1497, 486)
mouse move to 1248, -594
_render_message: **Sh<span
Converted Action: {'action': 'left_click', 'text': None, 'coordinate': None}
sync_call: computer {'action': 'left_click', 'text': None, 'coordinate': None}
action: left_click, text: None, coordinate: None
Traceback (most recent call last):
File "/opt/anaconda3/envs/ootb/lib/python3.11/site-packages/gradio/queueing.py", line 715, in process_events
response = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/ootb/lib/python3.11/site-packages/gradio/route_utils.py", line 322, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/ootb/lib/python3.11/site-packages/gradio/blocks.py", line 2042, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/ootb/lib/python3.11/site-packages/gradio/blocks.py", line 1601, in call_function
prediction = await utils.async_iteration(iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/ootb/lib/python3.11/site-packages/gradio/utils.py", line 728, in async_iteration
return await anext(iterator)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/ootb/lib/python3.11/site-packages/gradio/utils.py", line 722, in anext
return await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/ootb/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/ootb/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2461, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/opt/anaconda3/envs/ootb/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 962, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/ootb/lib/python3.11/site-packages/gradio/utils.py", line 705, in run_sync_iterator_async
return next(iterator)
^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/ootb/lib/python3.11/site-packages/gradio/utils.py", line 866, in gen_wrapper
response = next(iterator)
^^^^^^^^^^^^^^
File "/Users/blink_bbk/VSCodeProjects/computer_use_ootb/app.py", line 273, in process_input
for loop_msg in sampling_loop_sync(
File "/Users/blink_bbk/VSCodeProjects/computer_use_ootb/computer_use_demo/loop.py", line 210, in sampling_loop_sync
print(f"End of loop {showui_loop_count+1}. Messages: {str(messages)[:100000]}. Total cost: $USD{planner.total_cost:.5f}")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'showui_loop_count' where it is not associated with a value
The text was updated successfully, but these errors were encountered: