Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
TX-Leo committed Aug 20, 2024
1 parent 0d21069 commit d1b85f5
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 4 deletions.
8 changes: 4 additions & 4 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,9 @@ <h1 class="title is-1 publication-title"> KOSMOS-E : Learning to Follow Instruct
<a href="https://yushuiwx.github.io/">Xun Wu</a>*&nbsp;&nbsp;
<a href="https://buaahsh.github.io/">Shaohan Huang</a>&nbsp;&nbsp;
<br>
Li Dong&nbsp;&nbsp;
Wenhui Wang&nbsp;&nbsp;
Shuming Ma&nbsp;&nbsp;
<a href="https://dong.li/">Li Dong</a>&nbsp;&nbsp;
<a href="https://www.linkedin.com/in/wenhui-wang-064411121/?originalSubdomain=cn/">Wenhui Wang</a>&nbsp;&nbsp;
<a href="https://www.microsoft.com/en-us/research/people/shumma/">Shuming Ma</a>&nbsp;&nbsp;
<a href="https://thegenerality.com/">Furu Wei</a>&nbsp;&nbsp;
<!-- <br>*equal contributions -->
</span>
Expand All @@ -63,7 +63,7 @@ <h1 class="title is-1 publication-title"> KOSMOS-E : Learning to Follow Instruct
<div class="publication-links">
<!-- PDF Link. -->
<span class="link-block">
<a href="./resources/Robot_Parkour_Learning.pdf"
<a href="https://tx-leo.github.io/data/IROS2024_KOSMOS-E.pdf"
class="external-link button is-normal is-rounded is-dark" target="_blank">
<span class="icon">
<i class="fas fa-file-pdf"></i>
Expand Down
24 changes: 24 additions & 0 deletions twitter.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
Want LLM for Robotics Manipulation Tasks?

We present KOSMOS-E (accepted into IROS2024), a Multimodal Large Language Model (MLLM) that leverages instruction-following robotic grasping data to enhance capabilities for precise and intricate robotic grasping maneuvers.

:https://tx-leo.github.io/KOSMOS-E/
:https://github.com/TX-Leo/KOSMOS-E
:arxiv

Method

Dataset:
We create INSTRUCT-GRASP dataset based on Cornell Grasping Dataset. It includes three components: Non, Single and Multi with 8 kinds of intructions. It has 1.8 million grasping samples, with 250k unique language-image non-instruction samples and 1.56 million instruction-following samples. Among these instruction-following samples, 654k pertain to the single-object scene, while the remaining 654k relate to the multi-object scene.

Evaluation
1. Non-Instruction Grasping
We follow a cross-validation setup as in previous works and partition the datasets into 5 folds

2. Instruction-following Grasping
Our model was trained using a combination of non-instruction and instruction-following datasets. In contrast, four other baselines were each trained on a distinct dataset: non-instruction, single-object, multi-object, and a combination of single-object and multi-object datasets. We adopted image-wise grasp accuracy as our primary evaluation metric.


Instruction-following Grasping Examples

This is a work I produced during my internship at Microsoft Research, which is also my first academic work in Robotics. Thanks a lot for the help of my mentor and co-author.

0 comments on commit d1b85f5

Please sign in to comment.