This is a minimal implementation that simply contains these files:
- train.py,predict.py: main entry script
- modeling/generalized_rcnn.py: implement variants of generalized R-CNN architecture
- modeling/backbone.py: implement backbones
- modeling/model_{fpn,rpn,frcnn,mrcnn,cascade}.py: implement FPN,RPN,Fast/Mask/Cascade R-CNN models.
- modeling/model_box.py: implement box-related symbolic functions
- dataset/dataset.py: the dataset interface
- dataset/coco.py: load COCO data to the dataset interface
- data.py: prepare data for training & inference
- common.py: common data preparation utilities
- utils/: third-party helper functions
- eval.py: evaluation utilities
- viz.py: visualization utilities
Data:
-
It's easy to train on your own data, by calling
DatasetRegistry.register(name, lambda: YourDatasetSplit())
, and modifycfg.DATA.*
accordingly.YourDatasetSplit
can be:-
COCODetection
, if your data is already in COCO format. In this case, you need to modifyCOCODetection
to change the class names and the id mapping. -
Your own class, if your data is not in COCO format. You need to write a subclass of
DatasetSplit
, similar toCOCODetection
. In this class you'll implement the logic to load your dataset and evaluate predictions. The documentation is in the docstring of `DatasetSplit.
-
-
If you load a COCO-trained model on a different dataset, you may see error messages complaining about unmatched number of categories for certain weights in the checkpoint. You can either remove those weights in checkpoint, or rename them in the model. See tensorpack tutorial for more details.
-
You can easily add more augmentations such as rotation, but be careful how a box should be augmented. The code now will always use the minimal axis-aligned bounding box of the 4 corners, which is probably not the optimal way. A TODO is to generate bounding box from segmentation, so more augmentations can be naturally supported.
Model:
- Floating-point boxes are defined like this:
-
We use ROIAlign, and
tf.image.crop_and_resize
is NOT ROIAlign. -
We currently only support single image per GPU in this example.
-
Because of (3), BatchNorm statistics are supposed to be freezed during fine-tuning.
-
An alternative to freezing BatchNorm is to sync BatchNorm statistics across GPUs (the
BACKBONE.NORM=SyncBN
option). This would require my bugfix which is available since TF 1.10. You can manually apply the patch to use it. For now the total batch size is at most 8, so this option does not improve the model by much. -
Another alternative to BatchNorm is GroupNorm (
BACKBONE.NORM=GN
) which has better performance.
Speed:
-
If CuDNN warmup is on, the training will start very slowly, until about 10k steps (or more if scale augmentation is used) to reach a maximum speed. As a result, the ETA is also inaccurate at the beginning. CuDNN warmup is by default on when no scale augmentation is used.
-
After warmup, the training speed will slowly decrease due to more accurate proposals.
-
The code should have around 70% GPU utilization on V100s, and 85%~90% scaling efficiency from 1 V100 to 8 V100s.
-
This implementation does not use specialized CUDA ops (e.g. AffineChannel, ROIAlign). Therefore it might be slower than other highly-optimized implementations.
Possible Future Enhancements:
-
Define a better interface to load different datasets.
-
Support batch>1 per GPU. Batching with inconsistent shapes is non-trivial to implement in TensorFlow.
-
Use dedicated ops to improve speed. (e.g. a TF implementation of ROIAlign op can be found in light-head RCNN)
TensorFlow ≥ 1.6 supports most common features in this R-CNN implementation. However, each version of TensorFlow has bugs that I either reported or fixed, and this implementation touches many of those bugs. Therefore, not every version of TF ≥ 1.6 supports every feature in this implementation.
- TF < 1.6: Nothing works due to lack of support for empty tensors
(PR)
and
FrozenBN
training (PR). - TF < 1.10:
SyncBN
with NCCL will fail (PR). - TF 1.11 & 1.12: multithread inference will fail (issue). Latest tensorpack will apply a workaround.
- TF 1.13: MKL inference will fail (issue).
- TF > 1.12: Horovod training will fail (issue). Latest tensorpack will apply a workaround.
This implementation contains workaround for some of these TF bugs.
However, note that the workaround needs to check your TF version by tf.VERSION
,
and may not detect bugs properly if your TF version is not an official release
(e.g., if you use a nightly build).