Skip to content
/ BS3D Public

Implementation of "BS3D: Building-scale 3D Reconstruction from RGB-D Images"

Notifications You must be signed in to change notification settings

jannemus/BS3D

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BS3D: Building-scale 3D Reconstruction from RGB-D Images

The BS3D dataset and the reconstruction framework presented in:
BS3D: Building-scale 3D Reconstruction from RGB-D Images [arXiv]

1. BS3D dataset

The BS3D dataset can be downloaded from [here]. The following sections describe the contents of the dataset.

1.1 Campus reconstruction (2 Hz)

The main reconstruction is under the campus subdirectory. Images are provided in two coordinate frames: color camera and depth (infrared) camera. For this part, lasers scans were not captured. There are 19981 images which have been rectified. A filename corresponds to the timestamp in seconds.

Color camera frame

Type Resolution Format Description Identifier
Color images 720x1280 24-bit JPG Rectified color images. color
Depth maps 720x1280 16-bit PNG Sensor depth in millimeters. Invalid depth equals 0. depth
Depth maps (rendered) 720x1280 16-bit PNG Depth rendered from the mesh in millimeters. Invalid depth equals 0. depth_render
Normal maps (rendered) 720x1280 24-bit PNG Surface normals rendered from the mesh. Invalid normal equals (0,0,0). normal_render
Camera poses TXT Color camera poses (camera-to-world) in the RGBD SLAM format:
timestamp, tx, ty, tz, qx, qy, qz, qw
poses
Camera calibration YAML Color camera intrinsics and extrinsics between color and infrared camera. calibration

Depth camera frame

Type Resolution Format Description Identifier
Color images 1024x1024 24-bit JPG Color images transformed to the depth camera frame. color
Infrared images 512x512 16-bit PNG Active infrared images. infrared
Depth maps 512x512 16-bit PNG Raw sensor depth in millimeters. Invalid depth equals 0. depth
Depth maps (rendered) 512x512 16-bit PNG Depth rendered from a mesh in millimeters. Invalid depth equals 0. depth_render
Normal maps (rendered) 512x512 24-bit PNG Surface normals rendered from a mesh. Invalid normal equals (0,0,0). normal_render
Point clouds PLY Point cloud data (X,Y,Z) including infrared intensity. clouds
Camera poses TXT Depth camera poses (camera-to-world) in the RGBD SLAM format:
timestamp, tx, ty, tz, qx, qy, qz, qw
poses
Camera calibration YAML Depth camera intrinsics and extrinsics between color and infrared camera. calibration

Inertial measurements

Type Rate Format Description Identifier
IMU data and calibration 1.6 kHz CSV, YAML Accelerometer (m/s^2) and gyroscope readings (rad/s) sampled at 1.6 kHz. Format: stamp, wx, wy, wz, ax, ay, az. Calibration includes IMU-camera extrinsics (e.g. between gyroscope and color camera). imu

Surface reconstruction

Type Format Description Identifier
Mesh PLY Mesh created from raw depth maps using scalable TSDF fusion (Open3D library). No color information. mesh

Raw recordings

Raw recordings are in the mkv directory. There are 47 recordings (6.7GB - 11.6 GB each) which you can extract using preprocess-mkv.exe (Section 2). You may want to discard a few seconds at the beginning/end of the recording since the device is stationary.

1.2 Lobby reconstruction with laser scans (2 Hz)

A reconstruction of a lobby and corridors is under the lobby subdirectory. Data is organized as described above. There are 6618 images in total. In addition, the data includes laser scans that were obtained using FARO 3D X 130.

Laser scans

Type Format Description Identifier
Original scans PLY Original laser scans (point clouds) which have been registered. Clouds have not been cleaned or downsampled. laserscans_original
Cleaned scan PLY A single point cloud that has been cleaned and downsampled. laserscan

1.3 Odometry sequences (30 Hz)

Sequences used in the visual-inertial odometry experiments. See the tables above for the description of the data. Note that lasers scans were not captured.

Sequence Duration (s) Length (m) Dimensions (m)
cafeteria 200 90.0 12.4 x 15.7 x 0.8
central 242 155.0 25.5 x 42.1 x 5.3
dining 192 109.2 33.8 x 25.0 x 5.5
corridor 174 77.6 31.1 x 4.7 x 2.4
foobar 75 37.1 5.4 x 14.4 x 0.6
hub 124 52.3 11.4 x 5.9 x 0.7
juice 103 42.7 6.3 x 8.6 x 0.5
lounge 222 94.2 14.4 x 10.3 x 1.1
study 87 40.0 5.6 x 9.8 x 0.6
waiting 139 60.1 9.8 x 6.7 x 0.9

2. Reconstruction framework

Follow these instructions to reconstruct your environment. This repository includes a template dataset datasets/mydataset with necessary configuration files and folder structure.

2.1 Prerequisites

This software has been tested on Windows 10, but it should be compatible with Ubuntu 18.04.

  • Install Azure Kinect SDK from here (version 1.4.1, latest)
  • Install RTAB-Map from here (version 0.20.16, latest)
  • Install Preprocess-MKV (instructions below)

Clone the repository:

git clone https://github.com/jannemus/BS3D.git
cd BS3D

2.2 Install Preprocess-MKV

Preprocess-MKV is needed for extracting and processing the MKV files captured using Azure Kinect. Make sure you have installed the Azure Kinect SDK (see prerequisites). You also need OpenCV 4.3.0 (or later) and CMake 3.18.2 (or later).

In the following example, Visual Studio 2017 is used to compile Preprocess-MKV. Open the Visual Studio command prompt (Start -> VS2015 x64 Native Tools Command Prompt). To compile:

mkdir preprocess\build
cd preprocess\build
cmake -G"Visual Studio 15 2017 Win64" ..
cmake --build . --config Release --target install

2.3 Data capture

Azure Kinect SDK includes a recorder application (k4arecorder.exe) that is called from record.py. Record one or more sequences by running:

python record.py output.mkv

Put your recordings (e.g. A1.mkv, A2.mkv, ...) to the mydataset/mkv folder.

Capturing tips

  • To encourage loop closure detection, start and end the recording from a view that has plenty of visual features (corners etc.).
  • During recording, it is good to revisit locations, especially those that are rich in visual features.
  • Although Azure Kinect has a fairly good depth range and FoV, avoid pointing the camera towards a view that has insufficient geometry (e.g. large and completely empty lobby or corridor).

2.4 Preprocess MKVs

Extract images (color, depth, infrared), inertial measurements, point clouds, and calibration information from the MKV files using preprocess.py. The code will also undistort the images and perform color-to-depth alignment (C2D). The command:

python preprocess.py datasets/mydataset

will process all MKV files and write data to mydataset/preprocessed/*/, where * is the name of the MKV file. RTAB-Map configuration files will also be written to mydataset/rtabmap/.

Note If you just want to extract data, you can provide arguments --undistort false and --c2d false.

2.5 Single-session mapping

Launch RTAB-Map and load configuration from mydataset/rtabmap/*/single-session-config.ini, where * is the session name.
Preferences -> Load settings (*.ini) This will automatically set paths to calibration, color images and depth maps.

Initialize database File -> New database and press start. After the reconstruction has finished, check that the map looks good. If it does, close the database (.db) to save it to mydataset/rtabmap/*/map.db If you have multiple sessions, process and save each of them. Make sure you name each database map.db.

If you only have a single session, export camera poses to mydataset/rtabmap/poses.txt
File -> Export poses -> RGBD-SLAM format (*.txt) -> Frame: Camera
After that, continue to Sec. 1.7 Surface reconstruction.

2.6 Multi-session mapping

In RTAB-Map, load configuration mydataset/rtabmap/multi-session-config.ini

Select all single-session databases:
Preferences -> Source -> Database > [...] button

Note that the order in which the databases are processed matters (A1.db, A2.db, ..., C3.db). For example, the sequence C3.db should overlap at least one of the earlier sequences (A1.db, A2.db, ...).

Initialize database File -> New database and press start. After the reconstruction has finished, check that the map looks good. If it does, export camera poses to mydataset/rtabmap/poses.txt
File -> Export poses -> RGBD-SLAM format (*.txt) -> Frame: Camera

Optionally you can perform post-processing to detect more loop closures:
Tools -> Post-processing -> OK (default settings)
after which you need to export poses again.

2.7 Surface reconstruction

Perform surface reconstruction using TSDF fusion:

python meshing.py datasets/mydataset

The output mesh (.ply) will be written to mydataset/mesh by default.

2.8 Render images

Render depth maps and surface normals from the mesh:

python render.py datasets/mydataset

The output data will be written to mydataset/render by default.

Citation

If you use this repository in your research, please consider citing:

@article{mustaniemi2023bs3d,
  title={BS3D: Building-scale 3D Reconstruction from RGB-D Images},
  author={Mustaniemi, Janne and Kannala, Juho and Rahtu, Esa and Liu, Li and Heikkil{\"a}, Janne},
  journal={arXiv preprint arXiv:2301.01057},
  year={2023}
}

About

Implementation of "BS3D: Building-scale 3D Reconstruction from RGB-D Images"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published