-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Object view message definitions #11
Conversation
I'm not actually convinced that objects and people can be the same thing, since the tasks involved with both can differ significantly (anything speech related for example, though I realize it's not relevant here in |
In what sense are they not the same? As I mentioned in the PR description, the messages are added specifically for a visual recognition task; they are not meant to be used for other purposes. To put it in other terms, they are data structures that will make it possible to write a generic visual recognition algorithm.
I'm not sure what you mean with this message; could you please elaborate? |
The tasks are different, with possibly more interactions than just picking and placing. The question is whether this will be affected by how the message is defined. Potentially something like age, gender will be recognition for people only. This can maybe dealt with via a list of attributes, but we have to be careful about mixing types.
It may be just the naming issue ('subject' may be better than 'object'?). So is this meant to replace the |
You are thinking in terms of tasks here, but visual recognition doesn't necessarily have anything to do with a task. Nor with attributes (though attributes can be used to update the confidence in the recognition). But in any case, we try to answer questions of the following type: is the face I see Minh, or Alex, or someone else; or is the cup I see Alex's cup, or Minh's cup, or an unknown cup? We try to answer this given a list of "views" of each face, or each cup.
No, replacement is not what I had in mind. The new messages are supposed to supplement the existing messages (particularly since not all applications need a recognition functionality).
This doesn't fully work because of the multiple view assumption (i.e. we may have a collection of images/clouds of the same object/person/face from multiple views). Unless I add the |
I meant adding If it's to supplement those messages then I don't really have any issue with the change. |
… feature/object-view-msg
OK, that makes sense. I added an @sthoduka @deebuls Will these changes break anything in the @work code? |
This makes the ObjectViews message obsolete, so I also removed that
289d7b6
to
aa247f8
Compare
Yes, this will break most of our perception codes. We use |
@mhwasil I was afraid that would be the case. But you will use |
Yes, we will use |
OK, that's great!
Actually, that's exactly what the list of |
I thought it was about different representations too, but I didn't read carefully enough. ObjectView sounds fine for different viewpoints |
Thanks everyone for your suggestions. I'm going to merge this now so that I can also proceed with b-it-bots/mas_perception_libs#18 and b-it-bots/mas_domestic_robotics#234. |
Summary
Related to b-it-bots/mas_domestic_robotics#26 and b-it-bots/mas_domestic_robotics#233
The PR adds messages that are useful for permanently memorising object views. This is useful, for example, for object and face recognition, where we need to store a (small) set of prototype views of objects/faces.
I particularly added three messages to support the representation of views:
ObjectEmbedding
: A (low-dimensional) embedding of an object, for instance found by a Siamese networkObjectView
: An object view represented by (i) an image, (ii) a point cloud, and (iii) an embeddingBased on the discussion below, I removed this message and instead added anObjectViews
: A list of individual object viewsObjectView
array to thePerson
,Face
, andObject
messagesI decided to represent an object view with three different types of information - a cloud, an image, and an embedding - since different modalities may be useful in different contexts: in particular, the cloud could be used for registration, while the image/embedding could be used for image-based recognition.
The image and the embedding are redundant (the message encodes an implicit assumption that the embedding is found from the associated image), but they could serve different purposes: the embedding is useful to have for fast(er) recognition (so that we don't have to recompute it every time we want to recognise an object); the image is primarily useful for transparency (for example, we might want to identify the images that were responsible for recognising an object).
Need for the PR
Currently, the repository contains separate messages for people and objects. The added messages unify objects and people into a single representation that is specifically designed for a visual recognition task. The expected use is that the view messages will be filled from the data in the object/person messages, which are used during online detection.