Merge pull request #76 from sony/feature/20180131-deeplabv3plus

Add Deeplab v3+ training and inference
sony · Feb 4, 2019 · b102097 · b102097
2 parents 2bbf7f9 + 4a9023f
commit b102097
Show file tree

Hide file tree

Showing 25 changed files with 1,778 additions and 0 deletions.
diff --git a/semantic-segmentation/deeplabv3plus/README.md b/semantic-segmentation/deeplabv3plus/README.md
@@ -0,0 +1,213 @@
+# Neural Network Libraries - Examples
+
+Installation guide given in : https://github.com/sony/nnabla-examples/
+
+# Setup Requirements
+
+In Linux environment:
+1. OpenCV
+    ```
+    conda install opencv
+    ```
+2. ImageIO
+    ```
+    pip install imageio
+    ```
+3. Tensorflow
+    ```
+    pip install tensorflow
+    ```
+
+
+# Some segmentation results on VOC validation images:
+<p align="center">
+    <img src="results/test3.jpg" width=200 height=250>&nbsp;&nbsp;&nbsp;&nbsp;<img src="results/nn_test3.png" width=200 height=250>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</br>
+    <img src="results/test4.jpg" width=200 height=250>&nbsp;&nbsp;&nbsp;&nbsp;<img src="results/nn_test4.png" width=200 height=250>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</br>
+    <pre class="tab"></pre><img src="results/test5.jpg" width=200 height=250>&nbsp;&nbsp;&nbsp;&nbsp;<img src="results/nn_test5.png" width=200 height=250>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</br>
+    <img src="results/test7.jpg" width=300 height=200>&nbsp;&nbsp;&nbsp;&nbsp;<img src="results/nn_test7.png" width=300 height=200>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</br>
+</p>
+
+
+
+# Quick Start Inference
+
+To perform inference on a test image perform the following steps:
+
+
+## Download pretrained model
+
+To download a pretrained model from tensorflow's repository trained on [COCO+VOC trainaug dataset](https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md) :
+```bash
+python download_pretrained_tf_deeplabv3plus_coco_voc_trainaug.py
+```
+
+This will download and uncompress the pretrained tensorflow model trained on COCO+Pascal VOC 2012 trainaug dataset with xception as backbone.
+
+## Weight Conversion
+
+Convert tensorflow weight/checkpoints to nnabla parameters file.
+```bash
+python convert_tf_nnabla.py --input-ckpt-file=/path to ckpt file --output-nnabla-file=/output.h5 file
+```
+
+##### NOTE: input-ckpt-file is the path to the ckpt file downloaded and uncompressed in the previous step. Please give the path to the model-*.ckpt.
+
+
+## Inference
+
+Perform inference on a test image using the converted Tensorflow model.
+```bash
+python model_inference.py --model-load-path=/path to parameter file --image-width=target width for input image --test-image-file=image file for inference --num-class=no. of categories --label-file-path=txt file having categories --output-stride=16
+```
+
+##### NOTE: model-load-path is the path to the converted parameter filr(.h5) obtained in the previous step.
+##### and num-class=21 in the case of using the default tensorflow pretrained model downloaded in the "Download pretrained model" step (as it is trained on 21 categories).
+
+
+
+
+# Training
+
+To train a Deeplab v3+ model in Nnabla, perform the following steps:
+
+## Download Dataset
+
+Support for the following dataset is provided:
+
+##### VOC 2012 Semantic Segmentation dataset
+
+Download the Pascal VOC dataset and uncompress it.
+```bash
+wget host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
+```
+
+## Run the data preparation script
+
+- VOC 2012 Semantic Segmentation Dataset
+
+To prepare VOC data for training:
+```bash
+python dataset_utils.py --train-file="" --val-file="" --data-dir=""
+```
+##### NOTE: train-file and val-file are the filenames of the train and val images provided by VOC under VOC2012/ImageSets/Segmentation/ respectively; data-dir is the path that contains the JPEGImages (e.g: --data-dir = =../../VOCdevkit/VOC2012/ )
+##### This will result in files train_images.txt, train_labels.txt, val_images.txt, val_labels.txt
+
+After data preparation, the data directory structure should look like :
+```
++ VOCdevkit
+  + VOC2012
+    + JPEGImages
+    + SegmentationClass
+      + encoded
+```
+and the current working directory should contain the following 4 files generated from running the above script:
+```
++ train_image.txt
++ train_label.txt
++ val_image.txt
++ val_label.txt
+```
+
+
+## Download backbone pre-trained model from Tensorflow's Model Zoo
+
+```bash
+wget download.tensorflow.org/models/deeplabv3_xception_2018_01_04.tar.gz
+```
+
+
+## Convert the backbone pre-trained model to Nnabla
+
+```bash
+python convert_tf_nnabla.py --input-ckpt-file=/path to ckpt file --output-nnabla-file=/output .h5 file
+```
+
+
+## Run the training script
+
+To run the training:
+
+##### Single Process Training
+
+```bash
+python train.py \
+    --train-dir=train_image.txt \
+    --train-label-dir=train_label.txt \
+    --val-dir=val_image.txt\
+    --val-label-dir=val_label.txt \
+    --accum-grad=1 \
+    --warmup-epoch=5 \
+    --max-iter=40000 \
+    --model-save-interval=1000 \
+    --model-save-path=/path to save model \
+    --val-interval=1000 \
+    --batch-size=1 \
+    --num-class= no of categories in dataset \
+    --pretrained-model-path=path to the pretrained model(.h5) \
+    --train-samples=no. of train samples in dataset \
+    --val-samples=no. of val samples in dataset \
+```
+
+
+##### Distributed Training
+For distributed binary installation refer : https://nnabla.readthedocs.io/en/latest/python/pip_installation_cuda.html#installation-with-multi-gpu-supported
+
+
+```bash
+mpirun -n <no. of devices> python train.py \
+    --train-dir=train_image.txt \
+    --train-label-dir=train_label.txt \
+    --val-dir=val_image.txt\
+    --val-label-dir=val_label.txt \
+    --accum-grad=1 \
+    --warmup-epoch=5 \
+    --max-iter=40000 \
+    --model-save-interval=1000 \
+    --model-save-path=/path to save model \
+    --val-interval=1000 \
+    --batch-size=1 \
+    --num-class= no of categories in dataset \
+    --pretrained-model-path=path to the pretrained model(.h5) \
+    --train-samples=no. of train samples in dataset \
+    --val-samples=no. of val samples in dataset \
+    --distributed
+```
+
+##### Fine Tuning
+For fine-tuning with any dataset, prepare the dataset in the same way VOC dataset is prepared(writing a data preparation script may be required--refer dataset_utils.py) and add --fine-tune argument.
+
+##### NOTE: 
+1. The text files passed as arguments to the training scripts are the ones generated in the "Run the data preparation script" Step.
+2. For reproducing paper results, it is suggested to use batch-size > 16 (for distributed, set argument --batch-size = 16 / no.of devices) and max-iter=250,000 when training from scratch.
+3. To compute the accuracy (mean IOU) while training/validation add argument --compute-acc to the training command.
+
+##### Typical Training Loss curve:
+<p align="center">
+    <img src="results/Train-loss.png" width=600 height=350></br>
+</p>    
+
+
+## Evaluate
+
+To evaluate the trained model obtained from the previous step :
+
+```bash
+python eval.py \
+    --model-load-path=/model_save_path/param_xxx.h5 \
+    --val-samples=no. of val samples in dataset \
+    --val-dir=val_image.txt \
+    --val-label-dir=val_label.txt \
+    --batch-size=1 \
+    -c='cudnn' or 'cpu' \
+    --num-class=no. of categories 
+```
+
+## Inference
+
+Perform inference on a test image using the trained model.
+
+```bash
+python model_inference.py --model-load-path=/path to parameter file(.h5) --image-width=target width for input image --test-image-file=image file for inference --num-class=no. of categories --label-file-path=txt file having categories --output-stride=16
+```
+
+##### NOTE: model-load-path is the path to the converted parameter filr(.h5) obtained in training.
diff --git a/semantic-segmentation/deeplabv3plus/args.py b/semantic-segmentation/deeplabv3plus/args.py
@@ -0,0 +1,119 @@
+# Copyright (c) 2017 Sony Corporation. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+def get_args(monitor_path='tmp.monitor', max_iter=10000, model_save_path=None, learning_rate=1e-3, batch_size=128, weight_decay=1e-4, description=None):
+    """
+    Get command line arguments.
+
+    Arguments set the default values of command line arguments.
+    """
+    import argparse
+    import os
+    if model_save_path is None:
+        model_save_path = monitor_path
+    if description is None:
+        description = "Examples on data iterator examples. The following help shared among examples in this folder. Some arguments are valid or invalid in some examples."
+    parser = argparse.ArgumentParser(description)
+    parser.add_argument('--fine-tune', action='store_true',
+                        default=False, help="Whether to fine tune model or not; False by default")
+    parser.add_argument('--distributed', action='store_true',
+                        default=False, help="Whether to use distributed/single gpu training; False by default")
+    parser.add_argument('--compute-acc', action='store_true',
+                        default=False, help="Whether to compute the accuracy mean IOU value during training/validation; False by default")
+    parser.add_argument("--input-ckpt-file", type=str)
+    parser.add_argument("--output-nnabla-file", type=str, default='deeplab_nnabla.h5')
+    parser.add_argument("--batch-size", "-b", type=int, default=batch_size)
+    parser.add_argument("--label-path", type=str)
+    parser.add_argument("--data-dir",type=str,help='Path to VOC datset.')
+    parser.add_argument("--train-file",type=str,
+                        help='VOC train split text file')
+    parser.add_argument("--val-file",
+                        type=str, help='VOC val split text file')
+    parser.add_argument("--train-dir", "-t",
+                        type=str, default=model_save_path,
+                        help='Path to training data.')
+    parser.add_argument("--val-dir", "-v",
+                        type=str, default=model_save_path,
+                        help='Path to validation data.')
+    parser.add_argument("--train-label-dir",
+                        type=str, default=model_save_path,
+                        help='Path to training data-labels.')
+    parser.add_argument("--val-label-dir",
+                        type=str, default=model_save_path,
+                        help='Path to validation data-labels.')
+    parser.add_argument("--learning-rate", "-l",
+                        type=float, default=learning_rate)
+    parser.add_argument("--output-stride",
+                        type=int, default=16)
+    parser.add_argument("--monitor-path", "-m",
+                        type=str, default=monitor_path,
+                        help='Path monitoring logs saved.')
+    parser.add_argument("--max-iter", "-i", type=int, default=max_iter,
+                        help='Max iteration of training.')
+    parser.add_argument("--val-interval", type=int, default=100,
+                        help='Validation interval.')
+    parser.add_argument("--val-iter", "-j", type=int, default=10,
+                        help='Each validation runs `val_iter mini-batch iteration.')
+    parser.add_argument("--accum-grad",
+                        type=int, default=32,
+                        help='Weight decay factor of SGD update.')
+    parser.add_argument("--weight-decay", "-w",
+                        type=float, default=weight_decay,
+                        help='Weight decay factor of SGD update.')
+    parser.add_argument("--warmup-epoch", type=int, default=5)
+    parser.add_argument("--device-id", "-d", type=str, default='0',
+                        help='Device ID the training run on. This is only valid if you specify `-c cuda.cudnn`.')
+    parser.add_argument("--type-config", type=str, default='float',
+                        help='Type of computation. e.g. "float", "half".')
+    parser.add_argument("--model-save-interval", "-s", type=int, default=1000,
+                        help='The interval of saving model parameters.')
+    parser.add_argument("--model-save-path", "-o",
+                        type=str, default=model_save_path,
+                        help='Path the model parameters saved.')
+    parser.add_argument("--pretrained-model-path",
+                        type=str, default=model_save_path,
+                        help='Path the pretrained model parameters saved.')
+    parser.add_argument("--net", "-n", type=str,
+                        default='lenet',
+                        help="Neural network architecure type (used only in classification*.py).\n  classification.py: ('lenet'|'resnet'),  classification_bnn.py: ('bincon'|'binnet'|'bwn'|'bwn'|'bincon_resnet'|'binnet_resnet'|'bwn_resnet')")
+    parser.add_argument('--context', '-c', type=str,
+                        default='cpu', help="Extension modules. ex) 'cpu', 'cudnn'.")
+    parser.add_argument('--augment-train', action='store_true',
+                        default=False, help="Enable data augmentation of training data.")
+    parser.add_argument('--augment-test', action='store_true',
+                        default=False, help="Enable data augmentation of testing data.")
+    parser.add_argument('--channel', default=1, type=int)
+    parser.add_argument('--image-width', default=28, type=int)
+    parser.add_argument('--image-height', default=28, type=int)
+    parser.add_argument('--dataset-path', type=str)
+    parser.add_argument("--model-load-path", "-T",
+                        type=str, default=model_save_path,
+                        help='Path the model parameters loaded.')
+    parser.add_argument('--label-file-path', type=str)
+    parser.add_argument('--test-image-file', type=str)
+    parser.add_argument('--num-class', default=10, type=int)
+    parser.add_argument('--train-samples', default=10, type=int)
+    parser.add_argument('--val-samples', default=10, type=int)
+    parser.add_argument("--sync-weight-every-itr",
+                        type=int, default=100,
+                        help="Sync weights every specified iteration. NCCL uses\
+ the ring all reduce, so gradients in each device are not exactly same. When it\
+ is accumulated in the weights, the weight values in each device diverge.")
+
+
+    args = parser.parse_args()
+    if not os.path.isdir(args.model_save_path):
+        os.makedirs(args.model_save_path)
+    return args