Skip to content

Commit

Permalink
Update README
Browse files Browse the repository at this point in the history
  • Loading branch information
jardinetsouffleton committed Jun 12, 2024
1 parent 91c9247 commit 6325d64
Showing 1 changed file with 0 additions and 70 deletions.
70 changes: 0 additions & 70 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,60 +1,5 @@
# WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks

[[Paper]](https://arxiv.org/abs/2403.07718)[[Benchmark Contents]](#benchmark-contents)[[Getting Started]](#getting-started)[[Live Demo]](#live-demo)[[BrowserGym]](https://github.com/ServiceNow/BrowserGym)[[Citing This Work]](#citing-this-work)

`WorkArena` is a suite of browser-based tasks tailored to gauge web agents' effectiveness in supporting routine tasks for knowledge workers.
By harnessing the ubiquitous [ServiceNow](https://www.servicenow.com/what-is-servicenow.html) platform, this benchmark will be instrumental in assessing the widespread state of such automations in modern knowledge work environments.

WorkArena is included in [BrowserGym](https://github.com/ServiceNow/BrowserGym), a conversational gym environment for the evaluation of web agents.


https://github.com/ServiceNow/WorkArena/assets/2374980/68640f09-7d6f-4eb1-b556-c294a6afef70


## Benchmark Contents

At the moment, WorkArena includes `19,951` task instances drawn from `33` tasks that cover the main components of the ServiceNow user interface. The following videos show an agent built on `GPT-4-vision` interacting with every such component. As emphasized by our results, this benchmark is not solved and thus, the performance of the agent is not always on point.

### Knowledge Bases

**Goal:** The agent must search for specific information in the company knowledge base.

_The agent interacts with the user via BrowserGym's conversational interface._

https://github.com/ServiceNow/WorkArena/assets/1726818/352341ba-b501-46ac-bfa6-a6c9be1ac2b7

### Forms

**Goal:** The agent must fill a complex form with specific values for each field.

https://github.com/ServiceNow/WorkArena/assets/1726818/e2c2b5cb-3386-4f3c-b073-c8c619e0e81b

### Service Catalogs

**Goal:** The agent must order items with specific configurations from the company's service catalog.

https://github.com/ServiceNow/WorkArena/assets/1726818/ac64db3b-9abf-4b5f-84a7-e2d9c9cee863

### Lists

**Goal:** The agent must filter a list according to some specifications.

_In this example, the agent struggles to manipulate the UI and fails to create the filter._

https://github.com/ServiceNow/WorkArena/assets/1726818/7538b3ef-d39b-4978-b9ea-8b9e106df28e

### Menus

**Goal:** The agent must navigate to a specific application using the main menu.

https://github.com/ServiceNow/WorkArena/assets/1726818/ca26dfaf-2358-4418-855f-80e482435e6e

### Dashboards

**Goal:** The agent must extract information from a dashboard.



## Getting Started

To setup WorkArena, you will need to get your own ServiceNow instance, install our Python package, and upload some data to your instance. Follow the steps below to achieve this.
Expand Down Expand Up @@ -135,18 +80,3 @@ for (task, seed) in zip(AGENT_L2_SAMPLED_TASKS, AGENT_L2_SEEDS):
sleep(3)
env.close()
```


## Citing This Work

Please use the following BibTeX to cite our work:
```
@misc{workarena2024,
title={WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?},
author={Alexandre Drouin and Maxime Gasse and Massimo Caccia and Issam H. Laradji and Manuel Del Verme and Tom Marty and Léo Boisvert and Megh Thakkar and Quentin Cappart and David Vazquez and Nicolas Chapados and Alexandre Lacoste},
year={2024},
eprint={2403.07718},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```

0 comments on commit 6325d64

Please sign in to comment.