Skip to content
Yoan Sallami edited this page Apr 21, 2020 · 8 revisions

Underworlds3

I. Introduction

In the previous version of Underworlds, the perception pipeline was separated from higher level reasoning, after experimenting with the original version it became clear that higher level reasoning need strong interaction with the perception layer to not loose information in the abstraction.

Uwds3 is a framework to design perception pipelines with high level reasoning capabilities like physical reasoning, perspective taking, human beliefs estimation, social signal processing in context or language models for human-robot interaction.

The scene graph & the timeline

Uwds3 represented the environment by a set of entities called scene nodes that can be either a camera, meshes(or primitive shapes) or an abstract entity that do not have consistency, like the environment frame.

The timeline is in the same way a set of events or relations that give an extra layer of knowledge on top of this scene graph. Each event can be spatially located in the scene allowing the robot to look at it.

The simulator and the beliefs base

One on the particularity of Uwds3 is that a real-time simulation is used at run-time while the perception of the robot is running. It allow the robot to have physical sense of his body but also to correct physical inconsistency of the perception during tabletop scenario by applying gravity and reasoning about collisions which allow the robot to infer the behavior of not visible objects.

The other particularity is the presence of a tensor based triplet store that store beliefs about agents for long term reasoning, while the timeline store only recent events. The formulation of this beliefs base allow to compute a divergence metric during a collaboration between the robot and a person to trigger repair actions.

More information in the dedicated section.

Camera pinhole model

The pinhole camera model is a key component as it is used to project into a plane the 3D scene. If you are not familiar with this concept read this opencv lesson. The scene nodes, represent at the same time the 3D entity in the scene and the 2d bounding box represented by the camera that observe it. For example, one scene node can have multiple and distinct representation depending on how many camera is looking at it.

II. Base types

Vector

Uwds3 have it's own vector library based on numpy and cv2 that additionally implement for each one a stable version by using a linear kalman model in order to tradeoff accuracy and stability, two important parameters when dealing with physical simulation.

SceneNode

The scene node is the atomic structure of the scene graph, it contains a 2D boundingbox stable, a 6D vector that represent the pose in the global frame, a dictionary of features and optionally a camera and/or shapes.

Shape

Each node that compose the scene graph can have multiple shapes associated

Camera

Each node can have one camera associated, note that face nodes get automatically a human camera.

BoundingBox

The boundingbox represent an aligned bounding box in a 2D image plane and an optional depth

BoundingBoxStable

The boundingbox stale represent a boundingbox with a linear kalman model that can be used as filter or tracker.

Detection

The detection represent the output of a detector it is composed by a bounding box and a mask

Features

The features are abstract representation used by machine learning and neural networks. More information in the FAQ features section

FacialLandmarks

The facial landmarks extracted thanks to dlib are represented by a feature that have additional methods to extract roi of the face (eyes, mouth etc.)

TemporalSituation

III. Reasoning

Assignment

Detection

Estimation

Grounding

Knowledge

Monitoring

Recognition

Sampling

Simulation

Tracking

Clone this wiki locally