-
Notifications
You must be signed in to change notification settings - Fork 2
Home
In the previous version of Underworlds, the perception pipeline was separated from higher level reasoning, after experimenting with the original version it became clear that higher level reasoning need strong interaction with the perception layer to not loose information in the abstraction.
Uwds3 is a framework to design perception pipelines with high level reasoning capabilities like physical reasoning, perspective taking, human beliefs estimation, social signal processing in context or language models for human-robot interaction.
Uwds3 represented the environment by a set of entities called scene nodes that can be either a camera, meshes(or primitive shapes) or an abstract entity that do not have consistency, like the environment frame.
The timeline is in the same way a set of events or relations that give an extra layer of knowledge on top of this scene graph. Each event can be spatially located in the scene allowing the robot to look at this point for example.
One on the particularity of Uwds3 is that a real-time simulation is used at run-time while the perception of the robot is running. It allow the robot to have physical sense of his body but also to correct physical inconsistency of the perception during tabletop scenario by applying gravity and reasoning about collisions which allow the robot to infer the behavior of not visible objects.
The other particularity is the presence of a tensor based triplet store that store beliefs about agents for long term reasoning, while the timeline store only recent events. The formulation of this beliefs base allow to compute a divergence metric during a collaboration between the robot and a person to trigger repair actions.
More information in the dedicated section.
The pinhole camera model is a key component as it is used to project into a plane the 3D scene. If you are not familiar with this concept read this opencv lesson. The scene nodes, represent at the same time the 3D entity in the scene and the 2d bounding box represented by the camera that observe it. For example, one scene node can have multiple and distinct representation depending on how many camera is looking at it.
Uwds3 have it's own vector library based on numpy
and cv2
that additionally implement for each one a stable version by using a linear kalman model in order to tradeoff accuracy and stability, two important parameters when dealing with physical simulation.
The scene node is the atomic structure of the scene graph, it contains a 2D boundingbox stable, a 6D vector that represent the pose in the global frame, a dictionary of features and optionally a camera and/or shapes.
Each node that compose the scene graph can have multiple shapes associated
Each node can have one camera associated, note that face
nodes get automatically a human camera.
The boundingbox represent an aligned bounding box in a 2D image plane and an optional depth
The boundingbox stale represent a boundingbox with a linear kalman model that can be used as filter or tracker.
The detection represent the output of a detector it is composed by a bounding box and a mask
The features are abstract representation used by machine learning and neural networks. More information in the FAQ features section
The facial landmarks extracted thanks to dlib
are represented by a feature that have additional methods to extract roi of the face (eyes, mouth etc.)