Multimodal Resolution AR

This repository contains the code and resources for the demonstration of Large Language Model (LLM) driven task planning for Embodied AI (EAI) systems, using Augmented Reality AR headsets (HoloLens 2) for human-robot teaming. The multi grounding technique combines voice commands and interactive interfaces via Virtual Markers.

Requirements:

Unreal Engine 4.26
Robofleet Server on your local network.
- System that enables ROS-based robot-to-robot communication.
Azure Spatial Anchor Account
- Built in Microsoft feature to allows HoloLens and Robots co-localize with each other.
GPT to UMRF parser node
- Node that receives NL commands, sends a request to a GPT model and provides the output as UMRF Graphs.
TeMoto
- Framework used to control the execution flow of robotic tasks.
TeMoto Config Package
- Repository that contains the Temoto config package with a set of TeMoto Actions that the robot can execute

Usage

Clone the Repo

git clone --recursive https://github.com/UTNuclearRoboticsPublic/multimodal_resolution_ar.git

Generate the solution file

Right click on multimodal_ar.uproject and select "Generate Visual Studio Project Files" from the menu to create the .sln file

Package the project and deploy to the HoloLens Device

Find mode documentation on how to run and deploy apps using Unreal Engine here

Run the Demo

Robot Side:

Terminal 1

# Start TeMoto Framework
roslauch vaultbot_temoto_config temoto.launch

Terminal 2

# Bring up the robot
roslaunch ta_initialize_robot invoke_action.launch wake_word:=vaultbot_temoto_config
# Run the node that provides the execution state of current UMRF Graphs
roslaunch ta_get_action_state invoke_action.launch wake_word:=vaultbot_temoto_config

Note: Make sure you find an anchor frame and it is part of the TF tree. Drive the robot and verify it is correctly localize in the map

Terminal 3

# Export the key as an environment variable
export GPT_API_KEY=$(cat <path/to/openai_key>)

# Invoke the parser node
rosrun gpt_umrf_parser gpt_umrf_parser_node.py -ue umrf_examples/

At this point the robot is up and ready to receive commands.

HoloLens:

Start the 'multimodal_resolution_ar' app on the HoloLens
Walk around and make sure to find an anchor before send commands. This is the common frame between hololens and robots.
Open the TeMoto interface
- You can use the hand menu and tap on top of the TeMoto Icon, or use the keyword "TeMoto" voice command to open the chat-like interface. A coordinate system and a chat window will appear. Place the coordinate frame at a desire location, for this you can use near or far interactios using hand gestures.
Use the chat-style interface to send a command. You can type the request or send a voice command using the microphone from the keyboard i.e. "robot inspect that area". (a)

The HoloLens uses visual marker information (b) and voice commands to generate a prompt in a string format, which is then sent to the gpt_parser_node through the /command topic. The node appends a few examples to the prompt and sends the request to the GPT model. It takes some time for the model to generate an output, and once it is ready, a sequence of blocks is spawned on the HoloLens, representing the UMRF Graph as feedback, with a combination of navigation, manipulation, and take_photo actions. The robot should start executing the graph. Each block is highlighted with a different color based on the state of execution. (c)

For more information about the Prompt Experiments and dataset construction, please refer to the github repo

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
multimodal_ar		multimodal_ar
.gitignore		.gitignore
.gitmodules		.gitmodules
Demo Setup.png		Demo Setup.png
GPT_Parser_AR_Demo.png		GPT_Parser_AR_Demo.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Resolution AR

Requirements:

Usage

Run the Demo

Robot Side:

HoloLens:

About

Releases

Packages

Languages

UTNuclearRoboticsPublic/multimodal_resolution_ar

Folders and files

Latest commit

History

Repository files navigation

Multimodal Resolution AR

Requirements:

Usage

Run the Demo

Robot Side:

HoloLens:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages