Project Description

In this project, we address the challenge of enabling a physical robot to locate objects within its pre-mapped environment given natural language instructions from a human user. Using a Triton robot, which comes equipped with 2D LiDAR and a color-depth camera, we develop an autonomous control system capable of perception, planning, and navigation. Specifically, we utilize a state-of-the-art hybrid computer vision model that combines the Recognize Anything Model (RAM) with the Grounded Segment Anything Model (Grounded-SAM) to perform automatic dense image annotation and object localization. Secondly, we leverage Large Language Models for object reference and high-level exploration planning. Lastly, we use the move_base package from ROS to navigate to locations specified by the LLM.

Video Demo

In this video demo the robot is given the prompt "I'm hungry." Based on this prompt the robot decides to navigate to the apple.

Demo Youtube Video

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
config		config
docs/img		docs/img
launch		launch
rviz		rviz
scripts		scripts
CMakeLists.txt		CMakeLists.txt
README.md		README.md
package.xml		package.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Description

Video Demo

About

Releases

Packages

Languages

ademersseman/CS603-Final-Project

Folders and files

Latest commit

History

Repository files navigation

Project Description

Video Demo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages