-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added I Want This task. #528
Conversation
|
||
\item \textbf{Picking hors d'oeuvres and utensils :} The procedure repeats for the hors d'oeuvres and utensils, with 100 points awarded for each. | ||
|
||
\item \textbf{What am I looking at?} The person will now move to the mark in front of the three pictures and look at one of them. The will say, \textit{What is this?} The robot will then give a plain English description of the picture. If correct, the robot gets 100 points. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The three pictures are announced in setup day?
Well, the important factor here is the generation of the gaze cue, not identifying the content of the picture. So, it doesn't seem like it should matter much if they are there during setup.
|
Maybe we should move them slightly to the left/right the day of the test to assure that the teams must autonomously generate the gaze, rather than simply script it. So, sure, we should change that to the pictures are mounted a couple of hours before, in order to assure that they solve the problem rather than just hack something that looks good.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I feel the test is old-fashioned and overly complex. It features 2 different tasks, hence requires splitting (see below). In addition, the test seems sequential, there are no bonus objectives and the main goal is quite hard to achieve (refactoring required) The robot should initially notice what's going on and react accordingly without any need of an scripted order as in 2018. Furthermore, tests must be as short as possible, preferably fir in two pages (one paper sheet each). Finally, please do use SI units.
I like the ideas, but as is, this cannot be incorporated. Time limit cannot exceed 8min in Stage I (avg is 5), and preferably no more than 10min in Stage 2. Also the point reward is too small (Stage 1 is 1000, Stage 2, 2500).
Task 1: Hand me that
As described, the user points to an object on the table and the robot must grasp the right one. Main goal can be a drink and bonus the food and utensils. I would leave open the number of objects in each group as well as the possible descriptions to add more NLP to the challenge.
Task 2: What am I looking at
The game of guessing which painting is looking at goes here. I don't mind teams having the paintings/posters/pictures/whatever one month in advance, since building a guess who on a painting is quite time consuming, although I rather would like to use some sort of AI automation. This is certainly useful for museums and art galleries.
The main goal would be having the robot guess which painting the person is looking at, and provide a description. The rest can be split in bonus objectives
Edit:
If we are using special objects (paintings/pictures/posters) you need to add the pertinent info to the Objects Section under General Rules.
The maximum time for this test is 15 minutes. | ||
|
||
\begin{scorelist} | ||
\scoreheading{Mutual Gaze} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to change this, splitting to Main Goal and Bonuses
@@ -0,0 +1,91 @@ | |||
\section{I Want This [Party Host]} | |||
A guest at the party speaks English, but with only a limited vocabulary. They want a drink, an hors d'oeuvre, and a set of utensils, but do not know the words to describe them. As such, they will ask the robot for one through gesturing, and will ask for recommendations made by the robot through gestures. They will also discuss a picture on the wall, which the robot will determine by analyzing their gaze. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To Do: Shrink description.
Whether this is approved or not, I think we have two tasks here, one for retrieving objects (gesture-detection-based HRI) and another for discussing the painting (dialogue-based HRI).
A guest at the party speaks English, but with only a limited vocabulary. They want a drink, an hors d'oeuvre, and a set of utensils, but do not know the words to describe them. As such, they will ask the robot for one through gesturing, and will ask for recommendations made by the robot through gestures. They will also discuss a picture on the wall, which the robot will determine by analyzing their gaze. | ||
|
||
\subsection{Focus} | ||
Joint attention is a well-studied and important task in Human-Robot Interaction. The goal of this task is to really challenge the teams to perform a hard HRI task. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To Do: Shrink
Something like: high-level gesture-recognition-based HRI should suffice.
Joint attention is a well-studied and important task in Human-Robot Interaction. The goal of this task is to really challenge the teams to perform a hard HRI task. | ||
|
||
\subsection{Main Goal} | ||
The robot must interpret 3 point gestures of varying difficulty, must produce the same 3 point gestures, and must interpret the human's gaze gesture as well as generate one, and interpret the establishment of mutual gaze. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks too complex to me. I would split as explained below and request tops one with the other two as bonus objectives
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly is meant by 3 point gestures? Are the gestures different or just pointing a different things?
\begin{enumerate} | ||
\item \textbf{Surfaces:} The test area must have 3 surfaces with items arranged an equal distance apart on them. | ||
|
||
\item \textbf{Items:} We need 3 different drinks, three different food items, and three different utensils arranged on each surface. The first surface should have items about 2 feet apart. The second should have them about 1.5 feet apart. The last should have them about 9 inches apart. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ToDo: Shorten. Change to S.I. UNITS
A table featuring three groups of objects of the categories: food, drinks, and utensils.
I wouldn't say anything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just get rid of the distance indications here. Seems unnecessary.
|
||
\item \textbf{Items:} We need 3 different drinks, three different food items, and three different utensils arranged on each surface. The first surface should have items about 2 feet apart. The second should have them about 1.5 feet apart. The last should have them about 9 inches apart. | ||
|
||
\item \textbf{Pictures:} Three pictures should be hung along the broadest wall of the arena. They should be 2 feet apart. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd send this to another task
|
||
\item \textbf{Pictures:} Three pictures should be hung along the broadest wall of the arena. They should be 2 feet apart. | ||
|
||
\item \textbf{Floor Markings:} Markings should be made about 2-3 feet in front of each surface, where the robot or person is intended to stand during their points. A marking should be made about 5 feet from the middle picture. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S.I. UNITS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No markings imo. Positions can be announced beforehand by the referees.
% Setup | ||
% | ||
% %% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% | ||
\subsection{Procedure} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks old-fashioned. It might need rewriting. We don't want a sequential test.
|
||
\item \textbf{The person picks} We start by testing the robot's gesture understanding, in which the person picks things to drink, eat, and use for eating. | ||
|
||
\item \textbf{Picking drinks:} The person that is being helped will stand on the mark, look and point at one of the drinks, and say \textit{That one!}. The robot must then say, \textit{Do you mean the one to the [left, middle, right]?} If correct, the robot gets 100 points. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove [left, middle, right] and leave the interaction open to the robot. Description by size, color, or shape is valid. Only restriction is that the name of the object cannot be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont like the script-like aspect of this. I think we wanted to get rid of this in this rulebook interation. I think there is way too much defining of where things are, what happens, and in what order it will happen. I would prefer this to be more open.
|
||
\item \textbf{The robot suggests} Now the other person must interpret the robot's gestures correctly. | ||
|
||
\item \textbf{Picking drinks:} The robot will now stand on the marking and the person will say, \textit{Which should I choose!}. The robot must then point to, \textit{[left, middle, or right]?} The judge will determine which item is being indicated. If correct, the robot gets 100 points. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same, no [left, middle, right] but open.
Joint attention is a well-studied and important task in Human-Robot Interaction. The goal of this task is to really challenge the teams to perform a hard HRI task. | ||
|
||
\subsection{Main Goal} | ||
The robot must interpret 3 point gestures of varying difficulty, must produce the same 3 point gestures, and must interpret the human's gaze gesture as well as generate one, and interpret the establishment of mutual gaze. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly is meant by 3 point gestures? Are the gestures different or just pointing a different things?
\begin{enumerate} | ||
\item \textbf{Surfaces:} The test area must have 3 surfaces with items arranged an equal distance apart on them. | ||
|
||
\item \textbf{Items:} We need 3 different drinks, three different food items, and three different utensils arranged on each surface. The first surface should have items about 2 feet apart. The second should have them about 1.5 feet apart. The last should have them about 9 inches apart. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just get rid of the distance indications here. Seems unnecessary.
|
||
\item \textbf{Pictures:} Three pictures should be hung along the broadest wall of the arena. They should be 2 feet apart. | ||
|
||
\item \textbf{Floor Markings:} Markings should be made about 2-3 feet in front of each surface, where the robot or person is intended to stand during their points. A marking should be made about 5 feet from the middle picture. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No markings imo. Positions can be announced beforehand by the referees.
|
||
\item \textbf{The person picks} We start by testing the robot's gesture understanding, in which the person picks things to drink, eat, and use for eating. | ||
|
||
\item \textbf{Picking drinks:} The person that is being helped will stand on the mark, look and point at one of the drinks, and say \textit{That one!}. The robot must then say, \textit{Do you mean the one to the [left, middle, right]?} If correct, the robot gets 100 points. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont like the script-like aspect of this. I think we wanted to get rid of this in this rulebook interation. I think there is way too much defining of where things are, what happens, and in what order it will happen. I would prefer this to be more open.
Three point gestures means pointing to different things. The point of the distances is the mark difficulty and normalize the task so that it looks more like a laboratory experiment. Here's what I view as scriptable: The person says, "I would like a [drink|food|item]." Robot navigates to landmark near [drink|food|item]. Robot, "Please press the button on my forehead to continue with the task. I have been instructed to manipulate an object. Please press the button when you have manipulated the object on my behalf for points." The generation and interpretation of gaze cues and pointing gestures, on the other hand, is an open HRI problem. We can get rid of some of the structure, and make the task harder, but I want the task to sit in a combination of "doable with some serious work," "produces real research," and "not bypassable by the operator stating the name of the thing they want, or saying 'The middle fruit!'" The point of the way that I formulated this task is to inject some real HRI research into the league. We seriously have a problem with treating the term "HRI" as, "I can't get my robot to manipulate the thing." |
All kudos up!!!! I like the test, I want to see it implemented, and I don't want to be read as the stubborn nasty dude that finds objections for anything. So I'm only saying this: I can see some fatal flaws in here considering the competition structure and the Deus Ex Machina rule. Please take off your NLP/HRI hat and think as a CV bachelor student who has the test as assignment and desperately needs to score. |
@kyordhel I totally get it, and I want to make this work for both parties. I really am on the same page with you here and respect all of the commentary made in this and other threads. I just really want to help push this league towards what I view as sort of the next step, where the PhD students (in groups like mine, honestly) are able to participate because it's helping them to get their doctoral degrees. I know that I sound stubborn here too, and I definitely don't want "stubborn" to be interchangeable with "divisive." I just really want to get this to where we can expect a couple of papers out every year. I think that we can get to where the league rewards all of these various sorts of contributions, but, in my ideal vision of this, we all have an excuse to host an IROS workshop or something in 2019 or 2020. That's why I'm being so forceful with all of this stuff. I'd love to be able to open my job talk in a couple of years with, "This is RoboCup@Home" and lean on that and we all sit at a table together at AAAI in a few years. I think that we're right at the precipice of getting there, and that the things between us and that goal are a tweak here and there |
Ok @justinhart this is getting off-topic and this kind of conversation would require a beer and time for me to explain you why I am careful with this "let's make tests more scientific" approach. I'm not elaborating any further because is already stated in #536 However, I share you my experience as competitor and referee regarding this PR only:
Finally, don't be that systematic and let the door open for other teams to try different approaches. You can request the TC during the competition to assist you in data gathering if you need consistent data for publishing, but be careful. The more specs you provide in a test, the more hardcoded it will be, all teams score high but change a parameter and nobody solves the task. In conclusion, I'm OK as long as your experiment represents a [optimal|suboptimal] solution of a task. Design it, tune it for your publication, lose some constraints and we all try it. |
Regarding @johaq 's review. The point is to make sure that people don't bypass point and gaze gestures by driving up and touching the object. If you don't have to make a gesture, the test is pointless. |
Alright, I'll find a way to make it work for everyone, but can we agree that if you're able to just say the name of the object or touch it it's kind of a pointless task? |
Leaving this open until I add the tasks based on determining the robot's point gesture, but please see #546 |
This can be closed now, right? |
Nope. It remains open. @justinhart might still come with a solution |
I thought #546 is the solution |
I'll get these done. We just had the NRI deadline
|
This is a new task that tests gaze gestures, and is an attempt to insert a hard HRI problem into RoboCup@Home. I'd also like to discuss the possibility that teams who would like to participate could participate in a group publication for either HRI or RO-MAN, which is based on their algorithms and the results from the competition.