Most of the keypoint detection model and repositories are trained on COCO or MPII human pose dataset or facial keypoints. There were no tangible guide to train a keypoint detection model on custom dataset other than human pose or facial keypoints.
And hence this repository will primarily focus on keypoint detection training on custom dataset using Tensorflow object detection API. Here we have used a combination of Centernet-hourglass network therefore the model can provide both bounding boxes and keypoint data as an output during inference.
We will be using the transfer learning technique on centernet-hourglass104 pre-trained model trained on coco dataset to speed-up the training process.
Create a folder structure similar to this order
Custom-keypoint-detection
|_ dataset
|_ images (folder to place all the images)
|_ annotations (folder to place the annotation file)
|_ tfrecord (folder to place tfrecord)
Our intention in this project is to detect cutting plier and it's 5 keypoints. Basically you can replace it with any object you need to detect.
Collect all your images and place it into your dataset/images
folder. Make sure all the images are in same format, preferably .jpg/jpeg.
The TF2 object detection pipeline requires the dataset for centernet-hourglass network to be annotated on coco data format as it's pretrained model is initially trained on COCO dataset.
I have used coco-annotator, a web-based annotation tool that let's you annotate bounding boxes, keypoints, etc which also allows us to automatically download the annotations in coco data format. The setup and installtion using docker is super easy, where you can follow these steps to do so.
Run docker-compose up
on terminal from the coco_annotator project directory. Once it's fired up, open http://localhost:5000
on your web browser to go to COCO annotator web interface.
Go to Datasets tab and create a new dataset. Give a dataset name and click Create Dataset.
It will automatically create a folder inside coco-annotator/datasets/(dataset_name)
. Now copy all the images from Custom-keypoint-detection/dataset/images
to coco-annotator/datasets/(dataset_name)
. This will automatically import all the images into the coco-annotator tool.
Next step is to create the categories (labels) for our dataset to annotate. We create categories only for the objects that needs to be detected using bounding box. We won't create separate categories for keypoints, it will be a subset of the object itself.
Link categories to the dataset by the Edit option.
Move to Datasets
tab and click on the images to start annotating. Draw the bounding box first and press right arrow
on the keyboard to annotate keypoints. Follow the same keypoints order while annotating which shows on the right side panel.
After annotating the required images, download the annotation data through Datasets -> Download COCO
. All the annotation data will be saved and downloaded as a (dataset_name).json
file.
You can find the annotation file for our dataset in dataset/annotations/
folder. By looking at the data structure in annotation file you will get any idea how to prepare your own dataset in coco format.
As you can see we have all the images annotated and saved in a file. But for training, we need two type of dataset namely, training and validation data. So, in order to prepare your dataset for training, we need to split the dataset into set.
$ python cocosplit.py -h
usage: cocosplit.py [-h] -s SPLIT [--having-annotations]
coco_annotations train test
Splits COCO annotations file into training and test sets.
positional arguments:
coco_annotations Path to COCO annotations file.
train Where to store COCO training annotations
test Where to store COCO test annotations
optional arguments:
-h, --help show this help message and exit
-s SPLIT A percentage of a split; a number in (0, 1)
--having-annotations Ignore all images without annotations. Keep only these
with at least one annotation
python split_coco_dataset.py -s 0.7 dataset/annotations/Plier\ keypoint.json
dataset/annotations/train_data.json dataset/annotations/validation_data.json
This script will split the data with 70% into training data and 30% into validation data.
Thanks to cocosplit repo from akarazniewicz, most part of the script is inspired from his work and just made few adjustments to it.
Tensorflow object detection api itself provides an example python script to generate TFRecord for coco based annotations. But the script is primarily written for coco dataset which contains human pose keypoints. So with few changes to it, we can use it for any custom dataset.
To do that, edit the _COCO_KEYPOINTS_NAMES
list in line no 87 with our keypoints data with the same order it appears on the annotation file. This is an important step which needs to be verified.
Change it from
_COCO_KEYPOINT_NAMES = [
b'nose', b'left_eye', b'right_eye', b'left_ear', b'right_ear',
b'left_shoulder', b'right_shoulder', b'left_elbow', b'right_elbow',
b'left_wrist', b'right_wrist', b'left_hip', b'right_hip',
b'left_knee', b'right_knee', b'left_ankle', b'right_ankle'
]
to
_COCO_KEYPOINT_NAMES = [
b'plier_right_handle', b'plier_left_handle', b'plier_middle',
b'plier_right_head', b'plier_left_head'
]
For reference, I have added the generate_tfrecord_from_coco.py
script to the repository. Once this change is done, you can run the script to generate tfrecord for train and validataion dataset.
Run
python dataset_tools/generate_tfrecord_from_coco.py --train_image_dir "dataset/images" \
--test_image_dir "dataset/images" \
--val_image_dir "dataset/images" \
--train_annotations_file "dataset/annotations/train_data.json" \
--testdev_annotations_file "dataset/annotations/validation_data.json" \
--val_annotations_file "dataset/annotations/validation_data.json" \
--train_keypoint_annotations_file "dataset/annotations/train_data.json" \
--val_keypoint_annotations_file "dataset/annotations/validation_data.json" \
--output_dir "dataset/tfrecord"
The next step is to create label map, which contains the label and ID correlation data for each category we have annotated along with it's keypoint. There is a specific format we have to adhere for TF2 object detection api to rightly recognise this label map.
item {
id: 1
name: "category_1"
display_name: "category_1 display name"
keypoints {
id: 0
label: "keypoint 0 name"
}
keypoints {
id: 1
label: "keypoint 1 name"
}
...
...
keypoints {
id: 7
label: "keypoint 7 name"
}
}
item{
id: 2
name: "category_2"
display_name: "category_2 display name"
keypoints {
id: 8
label: "keypoint 8 name"
}
...
...
keypoints {
id: 11
label: "keypoint 11 name"
}
}
item{
id: 3
name: "category_3"
display_name: "category_3 display name"
}
A few things to note:
- The name of the keypoints is arbitrary,so some of them share the same label text. What matters is the ID.
- Keypoint IDs start from 0, unlike item IDs that start from 1
- Keypoint IDs represent the position of each keypoint in the vector we'll write in the TFRecord file, so here I am implicitly defining the fact that the first 7 keypoints in the vector will be related to category_1 and the last 4 to category_2. Only one of the two groups will be defined for each sample, the other will contain zeroes (and it won't be used during training).
It is not mandatory for all the classes to have a keypoint data annotated with it. There can be other classes too without any keypoint annotaion. Model will only train to detect bounding boxes for those objects. For reference, I have added the label map for our dataset at dataset/label_map.pbtxt
.
Download the centernet-hourglass104 keypoints 512x512 pre-trained model from TF2 model zoo. You can also use centernet-hourglass104 keypoints 1024x1024 pretrained model.
Extract and place the pre-trained model inside the /pretrained_models
folder. Your folder structure should look like this
Custom-keypoint-detection
|_ dataset
|_ images (folder to place all the images)
|_ annotations (folder to place the annotation file)
|_ tfrecord (folder to place tfrecord)
|_ pretrained_model
|_ centernet_hg104_512x512_kpts_coco17_tpu-32
|_checkpoint
|_saved_model
|_pipeline.config
Configuration in the pipeline.config
file has to be edited based on the dataset annotation and training parameters. Firstly, for each category with keypoints defined, we have to add a keypoint_estimation_task
block in the center_net
block.
keypoint_estimation_task {
task_name: "category_1_keypoint_detection"
task_loss_weight: 1.0
loss {
localization_loss {
l1_localization_loss {
}
}
classification_loss {
penalty_reduced_logistic_focal_loss {
alpha: 2.0
beta: 4.0
}
}
}
keypoint_class_name: "category_1"
keypoint_regression_loss_weight: 0.1
keypoint_heatmap_loss_weight: 1.0
keypoint_offset_loss_weight: 1.0
offset_peak_radius: 3
per_keypoint_offset: true
}
Point keypoint_label_map_path
parameter to the label map path.
Change fine_tune_checkpoint
parameter to the pre-trained model checkpoint file.
Then edit the train and eval input readers to load keypoints adding num_keypoints: 5
(cumulative number of keypoints in all category) to the input_reader
block.
train_input_reader: {
label_map_path: "path_to_label_map.pbtxt"
tf_record_input_reader {
input_path: "path_to_train.record"
}
num_keypoints: 5
}
eval_input_reader: {
label_map_path: "path_to_label_map.pbtxt"
tf_record_input_reader {
input_path: "path_validation.record"
}
num_keypoints: 5
}
Finally, for proper evaluation metrics, you need to add keypoints information to the eval_config
block. keypoint_edge
is for visualization purpose.
metrics_set: "coco_detection_metrics"
use_moving_averages: false
num_visualizations: 10
max_num_boxes_to_visualize: 20
min_score_threshold: 0.2
parameterized_metric {
coco_keypoint_metrics {
class_label: "category_1"
keypoint_label_to_sigmas {
key: "keypoint_0" # add exact keypoint name of keypoint id 0 from labelmap
value: 5 # defaults value is 5
}
keypoint_label_to_sigmas {
key: "keypoint_1" # add exact keypoint name of keypoint id 1 from labelmap
value: 5
}
keypoint_label_to_sigmas {
key: "keypoint_2" # add exact keypoint name of keypoint id 2 from labelmap
value: 5
}
keypoint_label_to_sigmas {
key: "keypoint_3" # add exact keypoint name of keypoint id 3 from labelmap
value: 5
}
keypoint_label_to_sigmas {
key: "keypoint_4" # add exact keypoint name of keypoint id 4 from labelmap
value: 5
}
}
}
parameterized_metric { # keep adding another block for more category.
coco_keypoint_metrics {
class_label: "category_2"
keypoint_label_to_sigmas {
key: "another_keypint"
value: 5
}
...
...
keypoint_label_to_sigmas {
key: "another_keypoint"
value: 5
}
}
}
keypoint_edge { # add exact keypoint mapping
start: 0
end: 1
}
keypoint_edge {
start: 1
end: 2
}
keypoint_edge {
start: 0
end: 2
}
keypoint_edge {
start: 2
end: 3
}
keypoint_edge {
start: 2
end: 4
}
keypoint_edge {
start: 3
end: 4
}
}
For reference, I have added the sample pipeline.config
Now we are ready to train our keypoint detection model. We can use the model_main_tf2.py from the TensorFlow Object detection API to train the model.
python TensorFlow/models/research/object_detection/model_main_tf2.py \
--model_dir "output_models" \
--pipeline_config_path "dataset/pipeline.config"
Or, you can import the custom_keypoint_detection.ipynb directly into Google colab and do the TF Record generation, model training, model export and inference.
Use tensorboard to monitor the training performance.
Use exporter_main_v2.py from Tensorflow Object detection API to export checkpoint to a saved model.
python models/research/object_detection/exporter_main_v2.py \
--input_type image_tensor \
--pipeline_config_path "dataset/pipeline.config" \
--trained_checkpoint_dir "output_models" \
--output_directory "saved_model"
Use inference.py
script to run inference on sample images.
Some sample outputs