Perception

ArUco Markers

Overview

ArUco markers (which are a type of fiducial markers) are special patterns than encode numbers. Often time the course will have these placed around so we can orient our rover and execute a certain task.

High Level Detection Method

OpenCV is used to run detection on the camera stream. This gives us information about where the tag is in pixel space, specifically its four corners. We can then fuse this with point cloud data, which gives us the xyz position for any given pixel relative to the camera. Specifically, we query the pointcloud at the center of the marker and thus find its transform relative to the rover.

We then publish the tags to the tf tree.

Details

Update Loop:

Detect the IDs and vertices in pixel space of ArUco tags from the current camera frame.
Add any new tags to the "immediate" map or update existing ones. We calculate the center here by finding the average of the four vertices. If we also have a point cloud reading for this tag publish it to the TF tree as an immediate tag relative to the rover. These readings are filled in by another callback.
Decrement the hit counter of any tags that were not seen this frame. If it reaches zero remove them entirely from the immediate map.
Publish all tags to the TF tree that have been seen enough times. Importantly this time they will be relative to the map frame not the rover.
Draw the detected markers onto an image and then publish it

Visual Odometry

We have the option of using the Zed built-in tracking or rtabmap stereo odometry. We have found that both are high quality but the Zed built-in tracking runs at a higher refresh rate at the cost of being more of a black box.

Nodelet Design

Communication between nodes has to use sockets in ROS by default since they all run in separate processes. We use nodelets instead which all run inside of the same process. In this way they share a virtual address space and can share messages via pointers (zero-copy). This vastly increases the update rate at which perception is able to run.

Resolution and Update Rate

At least 720p is recommended. Anything lower will not work at long ranges. We also try to hit at least 10 hz so information propagates fast enough to navigation.

Glossary

AruCo: Special pattern of black and white blocks that encode a number. Often times called markers/tags/targets
Stereo Camera: A camera that uses stereo rectification to produce point clouds
Zed 2i: The stereo camera that we use
OpenCV: A computer vision library
Point Cloud: A collection of 3D points that roughly describe a scene
Odometry: The "pose" of an object, in other words description of where it is in the world (usually position and rotation)
Pixel Space (or Camera Space): x and y coordinates of where a pixel is in an image