The BS3D dataset and the reconstruction framework presented in:
BS3D: Building-scale 3D Reconstruction from RGB-D Images [arXiv]
The BS3D dataset can be downloaded from [here]. The following sections describe the contents of the dataset.
The main reconstruction is under the campus subdirectory. Images are provided in two coordinate frames: color camera and depth (infrared) camera. For this part, lasers scans were not captured. There are 19981 images which have been rectified. A filename corresponds to the timestamp in seconds.
Type | Resolution | Format | Description | Identifier |
---|---|---|---|---|
Color images | 720x1280 | 24-bit JPG | Rectified color images. | color |
Depth maps | 720x1280 | 16-bit PNG | Sensor depth in millimeters. Invalid depth equals 0. | depth |
Depth maps (rendered) | 720x1280 | 16-bit PNG | Depth rendered from the mesh in millimeters. Invalid depth equals 0. | depth_render |
Normal maps (rendered) | 720x1280 | 24-bit PNG | Surface normals rendered from the mesh. Invalid normal equals (0,0,0). | normal_render |
Camera poses | TXT | Color camera poses (camera-to-world) in the RGBD SLAM format: timestamp, tx, ty, tz, qx, qy, qz, qw |
poses | |
Camera calibration | YAML | Color camera intrinsics and extrinsics between color and infrared camera. | calibration |
Type | Resolution | Format | Description | Identifier |
---|---|---|---|---|
Color images | 1024x1024 | 24-bit JPG | Color images transformed to the depth camera frame. | color |
Infrared images | 512x512 | 16-bit PNG | Active infrared images. | infrared |
Depth maps | 512x512 | 16-bit PNG | Raw sensor depth in millimeters. Invalid depth equals 0. | depth |
Depth maps (rendered) | 512x512 | 16-bit PNG | Depth rendered from a mesh in millimeters. Invalid depth equals 0. | depth_render |
Normal maps (rendered) | 512x512 | 24-bit PNG | Surface normals rendered from a mesh. Invalid normal equals (0,0,0). | normal_render |
Point clouds | PLY | Point cloud data (X,Y,Z) including infrared intensity. | clouds | |
Camera poses | TXT | Depth camera poses (camera-to-world) in the RGBD SLAM format: timestamp, tx, ty, tz, qx, qy, qz, qw |
poses | |
Camera calibration | YAML | Depth camera intrinsics and extrinsics between color and infrared camera. | calibration |
Type | Rate | Format | Description | Identifier |
---|---|---|---|---|
IMU data and calibration | 1.6 kHz | CSV, YAML | Accelerometer (m/s^2) and gyroscope readings (rad/s) sampled at 1.6 kHz. Format: stamp, wx, wy, wz, ax, ay, az. Calibration includes IMU-camera extrinsics (e.g. between gyroscope and color camera). | imu |
Type | Format | Description | Identifier |
---|---|---|---|
Mesh | PLY | Mesh created from raw depth maps using scalable TSDF fusion (Open3D library). No color information. | mesh |
Raw recordings are in the mkv directory. There are 47 recordings (6.7GB - 11.6 GB each) which you can extract using preprocess-mkv.exe (Section 2). You may want to discard a few seconds at the beginning/end of the recording since the device is stationary.
A reconstruction of a lobby and corridors is under the lobby subdirectory. Data is organized as described above. There are 6618 images in total. In addition, the data includes laser scans that were obtained using FARO 3D X 130.
Type | Format | Description | Identifier |
---|---|---|---|
Original scans | PLY | Original laser scans (point clouds) which have been registered. Clouds have not been cleaned or downsampled. | laserscans_original |
Cleaned scan | PLY | A single point cloud that has been cleaned and downsampled. | laserscan |
Sequences used in the visual-inertial odometry experiments. See the tables above for the description of the data. Note that lasers scans were not captured.
Sequence | Duration (s) | Length (m) | Dimensions (m) |
---|---|---|---|
cafeteria | 200 | 90.0 | 12.4 x 15.7 x 0.8 |
central | 242 | 155.0 | 25.5 x 42.1 x 5.3 |
dining | 192 | 109.2 | 33.8 x 25.0 x 5.5 |
corridor | 174 | 77.6 | 31.1 x 4.7 x 2.4 |
foobar | 75 | 37.1 | 5.4 x 14.4 x 0.6 |
hub | 124 | 52.3 | 11.4 x 5.9 x 0.7 |
juice | 103 | 42.7 | 6.3 x 8.6 x 0.5 |
lounge | 222 | 94.2 | 14.4 x 10.3 x 1.1 |
study | 87 | 40.0 | 5.6 x 9.8 x 0.6 |
waiting | 139 | 60.1 | 9.8 x 6.7 x 0.9 |
Follow these instructions to reconstruct your environment. This repository includes a template dataset datasets/mydataset
with necessary configuration files and folder structure.
This software has been tested on Windows 10, but it should be compatible with Ubuntu 18.04.
- Install Azure Kinect SDK from here (version 1.4.1, latest)
- Install RTAB-Map from here (version 0.20.16, latest)
- Install Preprocess-MKV (instructions below)
Clone the repository:
git clone https://github.com/jannemus/BS3D.git
cd BS3D
Preprocess-MKV is needed for extracting and processing the MKV files captured using Azure Kinect. Make sure you have installed the Azure Kinect SDK (see prerequisites). You also need OpenCV 4.3.0 (or later) and CMake 3.18.2 (or later).
In the following example, Visual Studio 2017 is used to compile Preprocess-MKV. Open the Visual Studio command prompt (Start -> VS2015 x64 Native Tools Command Prompt). To compile:
mkdir preprocess\build
cd preprocess\build
cmake -G"Visual Studio 15 2017 Win64" ..
cmake --build . --config Release --target install
Azure Kinect SDK includes a recorder application (k4arecorder.exe) that is called from record.py
. Record one or more sequences by running:
python record.py output.mkv
Put your recordings (e.g. A1.mkv, A2.mkv, ...) to the mydataset/mkv folder.
Capturing tips
- To encourage loop closure detection, start and end the recording from a view that has plenty of visual features (corners etc.).
- During recording, it is good to revisit locations, especially those that are rich in visual features.
- Although Azure Kinect has a fairly good depth range and FoV, avoid pointing the camera towards a view that has insufficient geometry (e.g. large and completely empty lobby or corridor).
Extract images (color, depth, infrared), inertial measurements, point clouds, and calibration information from the MKV files using preprocess.py
. The code will also undistort the images and perform color-to-depth alignment (C2D). The command:
python preprocess.py datasets/mydataset
will process all MKV files and write data to mydataset/preprocessed/*/, where * is the name of the MKV file. RTAB-Map configuration files will also be written to mydataset/rtabmap/.
Note If you just want to extract data, you can provide arguments --undistort false
and --c2d false
.
Launch RTAB-Map and load configuration from mydataset/rtabmap/*/single-session-config.ini, where * is the session name.
Preferences -> Load settings (*.ini)
This will automatically set paths to calibration, color images and depth maps.
Initialize database File -> New database
and press start. After the reconstruction has finished, check that the map looks good. If it does, close the database (.db) to save it to mydataset/rtabmap/*/map.db If you have multiple sessions, process and save each of them. Make sure you name each database map.db.
If you only have a single session, export camera poses to mydataset/rtabmap/poses.txt
File -> Export poses -> RGBD-SLAM format (*.txt) -> Frame: Camera
After that, continue to Sec. 1.7 Surface reconstruction.
In RTAB-Map, load configuration mydataset/rtabmap/multi-session-config.ini
Select all single-session databases:
Preferences -> Source -> Database > [...] button
Note that the order in which the databases are processed matters (A1.db, A2.db, ..., C3.db). For example, the sequence C3.db should overlap at least one of the earlier sequences (A1.db, A2.db, ...).
Initialize database File -> New database
and press start. After the reconstruction has finished, check that the map looks good. If it does, export camera poses to mydataset/rtabmap/poses.txt
File -> Export poses -> RGBD-SLAM format (*.txt) -> Frame: Camera
Optionally you can perform post-processing to detect more loop closures:
Tools -> Post-processing -> OK (default settings)
after which you need to export poses again.
Perform surface reconstruction using TSDF fusion:
python meshing.py datasets/mydataset
The output mesh (.ply) will be written to mydataset/mesh by default.
Render depth maps and surface normals from the mesh:
python render.py datasets/mydataset
The output data will be written to mydataset/render by default.
If you use this repository in your research, please consider citing:
@article{mustaniemi2023bs3d,
title={BS3D: Building-scale 3D Reconstruction from RGB-D Images},
author={Mustaniemi, Janne and Kannala, Juho and Rahtu, Esa and Liu, Li and Heikkil{\"a}, Janne},
journal={arXiv preprint arXiv:2301.01057},
year={2023}
}