Skip to content

Latest commit

 

History

History
110 lines (79 loc) · 5.29 KB

NOTES.md

File metadata and controls

110 lines (79 loc) · 5.29 KB

File Structure

This is a minimal implementation that simply contains these files:

  • train.py,predict.py: main entry script
  • modeling/generalized_rcnn.py: implement variants of generalized R-CNN architecture
  • modeling/backbone.py: implement backbones
  • modeling/model_{fpn,rpn,frcnn,mrcnn,cascade}.py: implement FPN,RPN,Fast/Mask/Cascade R-CNN models.
  • modeling/model_box.py: implement box-related symbolic functions
  • dataset/dataset.py: the dataset interface
  • dataset/coco.py: load COCO data to the dataset interface
  • data.py: prepare data for training & inference
  • common.py: common data preparation utilities
  • utils/: third-party helper functions
  • eval.py: evaluation utilities
  • viz.py: visualization utilities

Implementation Notes

Data:

  1. It's easy to train on your own data, by calling DatasetRegistry.register(name, lambda: YourDatasetSplit()), and modify cfg.DATA.* accordingly.

    YourDatasetSplit can be:

    • COCODetection, if your data is already in COCO format. In this case, you need to modify COCODetection to change the class names and the id mapping.

    • Your own class, if your data is not in COCO format. You need to write a subclass of DatasetSplit, similar to COCODetection. In this class you'll implement the logic to load your dataset and evaluate predictions. The documentation is in the docstring of `DatasetSplit.

  2. If you load a COCO-trained model on a different dataset, you may see error messages complaining about unmatched number of categories for certain weights in the checkpoint. You can either remove those weights in checkpoint, or rename them in the model. See tensorpack tutorial for more details.

  3. You can easily add more augmentations such as rotation, but be careful how a box should be augmented. The code now will always use the minimal axis-aligned bounding box of the 4 corners, which is probably not the optimal way. A TODO is to generate bounding box from segmentation, so more augmentations can be naturally supported.

Model:

  1. Floating-point boxes are defined like this:

  1. We use ROIAlign, and tf.image.crop_and_resize is NOT ROIAlign.

  2. We currently only support single image per GPU in this example.

  3. Because of (3), BatchNorm statistics are supposed to be freezed during fine-tuning.

  4. An alternative to freezing BatchNorm is to sync BatchNorm statistics across GPUs (the BACKBONE.NORM=SyncBN option). This would require my bugfix which is available since TF 1.10. You can manually apply the patch to use it. For now the total batch size is at most 8, so this option does not improve the model by much.

  5. Another alternative to BatchNorm is GroupNorm (BACKBONE.NORM=GN) which has better performance.

Speed:

  1. If CuDNN warmup is on, the training will start very slowly, until about 10k steps (or more if scale augmentation is used) to reach a maximum speed. As a result, the ETA is also inaccurate at the beginning. CuDNN warmup is by default on when no scale augmentation is used.

  2. After warmup, the training speed will slowly decrease due to more accurate proposals.

  3. The code should have around 70% GPU utilization on V100s, and 85%~90% scaling efficiency from 1 V100 to 8 V100s.

  4. This implementation does not use specialized CUDA ops (e.g. AffineChannel, ROIAlign). Therefore it might be slower than other highly-optimized implementations.

Possible Future Enhancements:

  1. Define a better interface to load different datasets.

  2. Support batch>1 per GPU. Batching with inconsistent shapes is non-trivial to implement in TensorFlow.

  3. Use dedicated ops to improve speed. (e.g. a TF implementation of ROIAlign op can be found in light-head RCNN)

TensorFlow version notes

TensorFlow ≥ 1.6 supports most common features in this R-CNN implementation. However, each version of TensorFlow has bugs that I either reported or fixed, and this implementation touches many of those bugs. Therefore, not every version of TF ≥ 1.6 supports every feature in this implementation.

  1. TF < 1.6: Nothing works due to lack of support for empty tensors (PR) and FrozenBN training (PR).
  2. TF < 1.10: SyncBN with NCCL will fail (PR).
  3. TF 1.11 & 1.12: multithread inference will fail (issue). Latest tensorpack will apply a workaround.
  4. TF 1.13: MKL inference will fail (issue).
  5. TF > 1.12: Horovod training will fail (issue). Latest tensorpack will apply a workaround.

This implementation contains workaround for some of these TF bugs. However, note that the workaround needs to check your TF version by tf.VERSION, and may not detect bugs properly if your TF version is not an official release (e.g., if you use a nightly build).