This repository is the implementation code of the paper "Digital-Twin Tracking Dataset (DTTD): A Time-of-Flight 3D Object Tracking Dataset for High-Quality AR Applications".
In this work we create a novel RGB-D dataset, Digital-Twin Tracking Dataset (DTTD), to enable further research of the digital-twin tracking problem in pursuit of a Digital Twin solution. In our dataset, we select two time-of-flight (ToF) depth sensors, Microsoft Azure Kinect and Apple iPhone 12 Pro, to record 100 scenes each of 16 common purchasable objects, each frame annotated with a per-pixel semantic segmentation and ground truth object poses. We also provide source code in this repository as references to data generation and annotation pipeline in our paper.
DTTD_Dataset
├── train_data_list.txt
├── test_data_list.txt
├── classes.txt
├── cameras
│ └── iphone14pro_camera1 (to be released...)
├── data
│ ├── scene_1
│ │ └── data
│ │ │ ├── 00001_color.jpg
│ │ │ ├── 00001_depth.png
│ │ │ ├── 00001_label_debug.png
│ │ │ ├── 00001_label.png
│ │ │ ├── 00001_meta.json
│ │ │ └── ...
| | └── scene_meta.yaml
│ ├── scene_2
│ │ └── data
| | └── scene_meta.yaml
| ...
|
└── objects
├── apple
│ ├── apple.mtl
│ ├── apple.obj
│ ├── front.xyz
│ ├── points.xyz
│ ├── textured_0_etZloZLC.jpg
│ ├── textured_0_norm_etZloZLC.jpg
│ ├── textured_0_occl_etZloZLC.jpg
│ ├── textured_0_roughness_etZloZLC.jpg
│ └── textured.obj.mtl
├── black_expo_marker
├── blue_expo_marker
├── cereal_box_modified
├── cheezit_box_modified
├── chicken_can_modified
├── clam_can_modified
├── hammer_modified
├── itoen_green_tea
├── mac_cheese_modified
├── mustard_modified
├── pear
├── pink_expo_marker
├── pocky_pink_modified
├── pocky_red_modified
├── pocky_white_modified
├── pop_tarts_modified
├── spam_modified
├── tomato_can_modified
└── tuna_can_modified
Before running our data generation and annotation pipeline, you can activate a conda environment where Python Version >= 3.7:
conda create --name [YOUR ENVIR NAME] python = [PYTHON VERSION]
conda activate [YOUR ENVIR NAME]
then install all necessary packages:
pip install -r requirements.txt
- calculate_extrinsic: extrinsic information
- cameras: camera information
- data_capturing: helper package for data capturing
- data_processing: helper package for data processing
- demos: demo videos
- doc: demo images
- extrinsics_scenes: folder to save all extrinsic scenes
- iphone_app: iPhone app development for capturing RGBD images for iPhone 12 Pro camera
- manual_pose_annotation: helper package for pose annotation
- models: baseline deep learning 6D pose estimation algorithms
- objects: object models that we use in DTTD (with corresponding scale and texture)
- pose_refinement: helper package for pose refinement
- quality_control: helper package for reviewing manual annotations
- scene_labeling_generation: helper package for generating labels
- scenes: folder to save all recorded RGBD data
- synthetic_data_generation: helper package for generating synthetic data
- testing: package to test aruco marker's appearance, extrinsic's validity, etc.
- toolbox: package to generate data for model training
- tools: commands for running the pipelines. Details in tools/README.md.
- utils: utils package
Final dataset output:
objects
folderscenes
folder certain data:scenes/<scene name>/data/
folderscenes/<scene name>/scene_meta.yaml
metadata
toolbox
folder
- OptiTrack Motion Capture system with Motive tracking software
- This doesn't have to be running on the same computer as the other sensors. We will export the tracked poses to a CSV file.
- Create a rigid body to track a camera's OptiTrack markers, give the rigid body the same name that is passed into
tools/capture_data.py
- Microsoft Azure Kinect
- We interface with the camera using Microsoft's K4A SDK: https://github.com/microsoft/Azure-Kinect-Sensor-SDK
- iPhone 14 pro
- Please build the project in
iphone_app/
in XCode and install on the mobile device.
- Please build the project in
- Place ARUCO marker somewhere visible.
- Put 5 markers on the body of the iPhone, create ridge body named iPhone14Pro_camera in the OptiTrack software.
- Place markers on the corners of the aruco marker, in the order from down-left, down-right, up-right, up-left. We use this to compute the (aruco -> opti) transform.
- Place marker positions into
calculate_extrinsic/aruco_corners.yaml
, labeled under keys:quad1
,quad2
,quad3
, andquad4
. - Start the OptiTrack recording.
- Synchronization Phase
- Press
start calibration
on iphone to begin recording data. - Observe the ARUCO marker in the scene and move the camera in different trajectories to build synchronization data (back and forth 2 to 3 times, slowly).
- Press
stop calibration
when finished.
- Press
- Data Capturing Phase
- Press
start collection
to begin recording data. - Observe the ARUCO marker while moving around the marker. (Perform 90-180 revolution around the marker, one way.)
- Press
stop collection
when finished.
- Press
- Stop OptiTrack recording.
- Export OptiTrack recording to a CSV file with 60Hz report rate.
- Move tracking CSV file to
/extrinsics_scenes/<scene name>/camera_poses/camera_poses.csv
. - Export the app_data to
/extrinsics_scenes/<scene name>/iphone_data
. - Move the timestamps.csv to
/extrinsics_scenes/<scene name>
.
- Convert iPhone data formats to Kinect data formats (
tools/process_iphone_data.py
)- This tool converts everything to common image names, formats, and does distortion parameter fitting.
- Code:
python tools/process_ipone_data.py <camera_name> --depth_type <depth_type> --scene_name <scene_name> --extrinstic
- Clean raw opti poses and Sync opti poses with frames (
tools/process_data.py --extrinsic
)- Code:
python tools/process_data.py —-scene_name <scene_name> —-extrinstic
- Code:
- Calculate camera extrinsic (
tools/calculate_camera_extrinsic.py
)- Code:
python tools/caculate_camera_extrinsic.py —-scene_name <scene_name>
- Code:
- Output will be placed in
cameras/<camera name>/extrinsic.txt
- Setup LiDARDepth APP (ARKit version) using Xcode (Need to reinstall before each scene).
- Start the OptiTrack recording.
- Synchronization Phase.
- Press
start calibration
to begin recording data. - Observe the ARUCO marker in the scene and move the camera in different trajectories to build synchronization data (back and forth 2 to 3 times, slowly).
- Press
end calibration
when finished.
- Press
- Data Capturing Phase
- cover the ARUCO marker.
- Press
Start collection
to begin recording data. - Observe the objects while moving around. (Perform 90-180 revolution around the objects, one way.)
- Press
End collection
when finished.
- Stop OptiTrack recording.
- Export OptiTrack recording to a CSV file with 60Hz report rate.
- Move tracking CSV file to
scenes/<scene name>/camera_poses/camera_poses.csv
. - Export the app_data to
scenes/<scene name>/iphone_data
. - Move the timestamps.csv to
scenes/<scene name>
.
- Convert iPhone data formats to Kinect data formats (
tools/process_iphone_data.py
)- This tool converts everything to common image names, formats, and does distortion parameter fitting
- Code:
python tools/process_ipone_data.py <camera_name> --depth_type <depth_type> --scene_name <scene_name>
- Clean raw opti poses and Sync opti poses with frames (
tools/process_data.py
)- Code:
python tools/process_data.py —-scene_name [SCENE_NAME]
- Code:
- Manually annotate the first few frame of the object poses (
tools/manual_annotate_poses.py
).- Modify (
[SCENE_NAME]/scene_meta.yml
) by adding (objects
) field to the file according to objects and their corresponding ids. - Code:
python tools/manual_annotate_poses.py [SCENE_NAME]
- Check the control instructions in the
pose_refinement/README.md
.
- Modify (
- Recover all frame object poses and verify correctness (
tools/generate_scene_labeling.py
)
- Generate semantic labeling and adjust per frame object poses (
tools/generate_scene_labeling.py
) - Code:
python /tools/generate_scene_labeling.py [SCENE_NAME]
- Generate semantic labeling and adjust per frame object poses (
- Extrinsic scenes have their color images inside of
data
stored aspng
. This is to maximize performance. Data scenes have their color images inside ofdata
stored asjpg
. This is necessary so the dataset remains usable. - iPhone spits out
jpg
raw color images, while Azure Kinect skips outpng
raw color images.
- Good synchronization phase by observing ARUCO marker, for Azure Kinect keep in mind interference from OptiTrack system.
- Don't have objects that are in our datasets in the background. Make sure they are out of view!
- Minimize number of extraneous ARUCO markers/APRIL tags that appear in the scene.
- Stay in the yellow area for best OptiTrack tracking.
- Move other cameras out of area when collecting data to avoid OptiTrack confusion.
- Run
manual_annotate_poses.py
on all scenes after collection in order to archive extrinsic. - We want to keep the data anonymized. Avoid school logos and members of the lab appearing in frame.
- Perform 90-180 revolution around objects, one way. Try to minimize stand-still time.
- When doing manual annotaion, try to annote the first few frames (like 5th or 6th frame), and press 5 and 6 to move around.