Skip to content

Latest commit

 

History

History
33 lines (24 loc) · 1.71 KB

llm_brain.md

File metadata and controls

33 lines (24 loc) · 1.71 KB

December 2023

tl;dr: A multi-LLM pipeline for active exploration of new environments.

Overall impression

LLM-brain is a multi-LLM pipline that acts as a robotic brain to unify ecocentric memory and control. The multiple agents communicate by natural language, provideing excellenet explanability.

Active exploration: extensively explore the unknown enviromenmt within a limited number of actions.

Reinforcement learning have dominated the field of embodied AI for many years until LLM in the field of Internet AI emerges.

The results of LLM-brain is not that impressive, but it provides a reasonable starting framework for active exploration.

Key ideas

  • Three agents
    • Eye: VQA model (BLIP-2), takes in images and questions.
    • Nerve: ask questions to Eye, and provide summary to Brain
      • Why do we need nerve? We need to summarize the action and environment. "I move forward, and I see a sofa". This is inspired by ChatCaptioner.
    • Brain: takes in high level human instructions, memorize previous action and (potentially partial) environment observations, and infers next action based on its world knowledge.

Technical details

  • Embodied AI tasks
    • Embodied QA (VQA)
    • Active exploration: "explore the house as completely as possible", evaluted by the explored areas vs total area.
    • LM-nav: vision-language navigation
  • PointNav task
  • Dataset: matterport3D.
  • On the ask of active exploration, LLM-brain provides a slightly better performance than the random baseline.

Notes

  • Questions and notes on how to improve/revise the current work