Skip to content

Commit

Permalink
Merge pull request #76 from sony/feature/20180131-deeplabv3plus
Browse files Browse the repository at this point in the history
Add Deeplab v3+ training and inference
  • Loading branch information
TakuyaNarihira authored Feb 4, 2019
2 parents 2bbf7f9 + 4a9023f commit b102097
Show file tree
Hide file tree
Showing 25 changed files with 1,778 additions and 0 deletions.
213 changes: 213 additions & 0 deletions semantic-segmentation/deeplabv3plus/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
# Neural Network Libraries - Examples

Installation guide given in : https://github.com/sony/nnabla-examples/

# Setup Requirements

In Linux environment:
1. OpenCV
```
conda install opencv
```
2. ImageIO
```
pip install imageio
```
3. Tensorflow
```
pip install tensorflow
```


# Some segmentation results on VOC validation images:
<p align="center">
<img src="results/test3.jpg" width=200 height=250>&nbsp;&nbsp;&nbsp;&nbsp;<img src="results/nn_test3.png" width=200 height=250>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</br>
<img src="results/test4.jpg" width=200 height=250>&nbsp;&nbsp;&nbsp;&nbsp;<img src="results/nn_test4.png" width=200 height=250>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</br>
<pre class="tab"></pre><img src="results/test5.jpg" width=200 height=250>&nbsp;&nbsp;&nbsp;&nbsp;<img src="results/nn_test5.png" width=200 height=250>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</br>
<img src="results/test7.jpg" width=300 height=200>&nbsp;&nbsp;&nbsp;&nbsp;<img src="results/nn_test7.png" width=300 height=200>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</br>
</p>



# Quick Start Inference

To perform inference on a test image perform the following steps:


## Download pretrained model

To download a pretrained model from tensorflow's repository trained on [COCO+VOC trainaug dataset](https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md) :
```bash
python download_pretrained_tf_deeplabv3plus_coco_voc_trainaug.py
```

This will download and uncompress the pretrained tensorflow model trained on COCO+Pascal VOC 2012 trainaug dataset with xception as backbone.

## Weight Conversion

Convert tensorflow weight/checkpoints to nnabla parameters file.
```bash
python convert_tf_nnabla.py --input-ckpt-file=/path to ckpt file --output-nnabla-file=/output.h5 file
```

##### NOTE: input-ckpt-file is the path to the ckpt file downloaded and uncompressed in the previous step. Please give the path to the model-*.ckpt.


## Inference

Perform inference on a test image using the converted Tensorflow model.
```bash
python model_inference.py --model-load-path=/path to parameter file --image-width=target width for input image --test-image-file=image file for inference --num-class=no. of categories --label-file-path=txt file having categories --output-stride=16
```

##### NOTE: model-load-path is the path to the converted parameter filr(.h5) obtained in the previous step.
##### and num-class=21 in the case of using the default tensorflow pretrained model downloaded in the "Download pretrained model" step (as it is trained on 21 categories).




# Training

To train a Deeplab v3+ model in Nnabla, perform the following steps:

## Download Dataset

Support for the following dataset is provided:

##### VOC 2012 Semantic Segmentation dataset

Download the Pascal VOC dataset and uncompress it.
```bash
wget host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
```

## Run the data preparation script

- VOC 2012 Semantic Segmentation Dataset

To prepare VOC data for training:
```bash
python dataset_utils.py --train-file="" --val-file="" --data-dir=""
```
##### NOTE: train-file and val-file are the filenames of the train and val images provided by VOC under VOC2012/ImageSets/Segmentation/ respectively; data-dir is the path that contains the JPEGImages (e.g: --data-dir = =../../VOCdevkit/VOC2012/ )
##### This will result in files train_images.txt, train_labels.txt, val_images.txt, val_labels.txt

After data preparation, the data directory structure should look like :
```
+ VOCdevkit
+ VOC2012
+ JPEGImages
+ SegmentationClass
+ encoded
```
and the current working directory should contain the following 4 files generated from running the above script:
```
+ train_image.txt
+ train_label.txt
+ val_image.txt
+ val_label.txt
```


## Download backbone pre-trained model from Tensorflow's Model Zoo

```bash
wget download.tensorflow.org/models/deeplabv3_xception_2018_01_04.tar.gz
```


## Convert the backbone pre-trained model to Nnabla

```bash
python convert_tf_nnabla.py --input-ckpt-file=/path to ckpt file --output-nnabla-file=/output .h5 file
```


## Run the training script

To run the training:

##### Single Process Training

```bash
python train.py \
--train-dir=train_image.txt \
--train-label-dir=train_label.txt \
--val-dir=val_image.txt\
--val-label-dir=val_label.txt \
--accum-grad=1 \
--warmup-epoch=5 \
--max-iter=40000 \
--model-save-interval=1000 \
--model-save-path=/path to save model \
--val-interval=1000 \
--batch-size=1 \
--num-class= no of categories in dataset \
--pretrained-model-path=path to the pretrained model(.h5) \
--train-samples=no. of train samples in dataset \
--val-samples=no. of val samples in dataset \
```


##### Distributed Training
For distributed binary installation refer : https://nnabla.readthedocs.io/en/latest/python/pip_installation_cuda.html#installation-with-multi-gpu-supported


```bash
mpirun -n <no. of devices> python train.py \
--train-dir=train_image.txt \
--train-label-dir=train_label.txt \
--val-dir=val_image.txt\
--val-label-dir=val_label.txt \
--accum-grad=1 \
--warmup-epoch=5 \
--max-iter=40000 \
--model-save-interval=1000 \
--model-save-path=/path to save model \
--val-interval=1000 \
--batch-size=1 \
--num-class= no of categories in dataset \
--pretrained-model-path=path to the pretrained model(.h5) \
--train-samples=no. of train samples in dataset \
--val-samples=no. of val samples in dataset \
--distributed
```

##### Fine Tuning
For fine-tuning with any dataset, prepare the dataset in the same way VOC dataset is prepared(writing a data preparation script may be required--refer dataset_utils.py) and add --fine-tune argument.

##### NOTE:
1. The text files passed as arguments to the training scripts are the ones generated in the "Run the data preparation script" Step.
2. For reproducing paper results, it is suggested to use batch-size > 16 (for distributed, set argument --batch-size = 16 / no.of devices) and max-iter=250,000 when training from scratch.
3. To compute the accuracy (mean IOU) while training/validation add argument --compute-acc to the training command.

##### Typical Training Loss curve:
<p align="center">
<img src="results/Train-loss.png" width=600 height=350></br>
</p>


## Evaluate

To evaluate the trained model obtained from the previous step :

```bash
python eval.py \
--model-load-path=/model_save_path/param_xxx.h5 \
--val-samples=no. of val samples in dataset \
--val-dir=val_image.txt \
--val-label-dir=val_label.txt \
--batch-size=1 \
-c='cudnn' or 'cpu' \
--num-class=no. of categories
```

## Inference

Perform inference on a test image using the trained model.

```bash
python model_inference.py --model-load-path=/path to parameter file(.h5) --image-width=target width for input image --test-image-file=image file for inference --num-class=no. of categories --label-file-path=txt file having categories --output-stride=16
```

##### NOTE: model-load-path is the path to the converted parameter filr(.h5) obtained in training.
119 changes: 119 additions & 0 deletions semantic-segmentation/deeplabv3plus/args.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Copyright (c) 2017 Sony Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


def get_args(monitor_path='tmp.monitor', max_iter=10000, model_save_path=None, learning_rate=1e-3, batch_size=128, weight_decay=1e-4, description=None):
"""
Get command line arguments.
Arguments set the default values of command line arguments.
"""
import argparse
import os
if model_save_path is None:
model_save_path = monitor_path
if description is None:
description = "Examples on data iterator examples. The following help shared among examples in this folder. Some arguments are valid or invalid in some examples."
parser = argparse.ArgumentParser(description)
parser.add_argument('--fine-tune', action='store_true',
default=False, help="Whether to fine tune model or not; False by default")
parser.add_argument('--distributed', action='store_true',
default=False, help="Whether to use distributed/single gpu training; False by default")
parser.add_argument('--compute-acc', action='store_true',
default=False, help="Whether to compute the accuracy mean IOU value during training/validation; False by default")
parser.add_argument("--input-ckpt-file", type=str)
parser.add_argument("--output-nnabla-file", type=str, default='deeplab_nnabla.h5')
parser.add_argument("--batch-size", "-b", type=int, default=batch_size)
parser.add_argument("--label-path", type=str)
parser.add_argument("--data-dir",type=str,help='Path to VOC datset.')
parser.add_argument("--train-file",type=str,
help='VOC train split text file')
parser.add_argument("--val-file",
type=str, help='VOC val split text file')
parser.add_argument("--train-dir", "-t",
type=str, default=model_save_path,
help='Path to training data.')
parser.add_argument("--val-dir", "-v",
type=str, default=model_save_path,
help='Path to validation data.')
parser.add_argument("--train-label-dir",
type=str, default=model_save_path,
help='Path to training data-labels.')
parser.add_argument("--val-label-dir",
type=str, default=model_save_path,
help='Path to validation data-labels.')
parser.add_argument("--learning-rate", "-l",
type=float, default=learning_rate)
parser.add_argument("--output-stride",
type=int, default=16)
parser.add_argument("--monitor-path", "-m",
type=str, default=monitor_path,
help='Path monitoring logs saved.')
parser.add_argument("--max-iter", "-i", type=int, default=max_iter,
help='Max iteration of training.')
parser.add_argument("--val-interval", type=int, default=100,
help='Validation interval.')
parser.add_argument("--val-iter", "-j", type=int, default=10,
help='Each validation runs `val_iter mini-batch iteration.')
parser.add_argument("--accum-grad",
type=int, default=32,
help='Weight decay factor of SGD update.')
parser.add_argument("--weight-decay", "-w",
type=float, default=weight_decay,
help='Weight decay factor of SGD update.')
parser.add_argument("--warmup-epoch", type=int, default=5)
parser.add_argument("--device-id", "-d", type=str, default='0',
help='Device ID the training run on. This is only valid if you specify `-c cuda.cudnn`.')
parser.add_argument("--type-config", type=str, default='float',
help='Type of computation. e.g. "float", "half".')
parser.add_argument("--model-save-interval", "-s", type=int, default=1000,
help='The interval of saving model parameters.')
parser.add_argument("--model-save-path", "-o",
type=str, default=model_save_path,
help='Path the model parameters saved.')
parser.add_argument("--pretrained-model-path",
type=str, default=model_save_path,
help='Path the pretrained model parameters saved.')
parser.add_argument("--net", "-n", type=str,
default='lenet',
help="Neural network architecure type (used only in classification*.py).\n classification.py: ('lenet'|'resnet'), classification_bnn.py: ('bincon'|'binnet'|'bwn'|'bwn'|'bincon_resnet'|'binnet_resnet'|'bwn_resnet')")
parser.add_argument('--context', '-c', type=str,
default='cpu', help="Extension modules. ex) 'cpu', 'cudnn'.")
parser.add_argument('--augment-train', action='store_true',
default=False, help="Enable data augmentation of training data.")
parser.add_argument('--augment-test', action='store_true',
default=False, help="Enable data augmentation of testing data.")
parser.add_argument('--channel', default=1, type=int)
parser.add_argument('--image-width', default=28, type=int)
parser.add_argument('--image-height', default=28, type=int)
parser.add_argument('--dataset-path', type=str)
parser.add_argument("--model-load-path", "-T",
type=str, default=model_save_path,
help='Path the model parameters loaded.')
parser.add_argument('--label-file-path', type=str)
parser.add_argument('--test-image-file', type=str)
parser.add_argument('--num-class', default=10, type=int)
parser.add_argument('--train-samples', default=10, type=int)
parser.add_argument('--val-samples', default=10, type=int)
parser.add_argument("--sync-weight-every-itr",
type=int, default=100,
help="Sync weights every specified iteration. NCCL uses\
the ring all reduce, so gradients in each device are not exactly same. When it\
is accumulated in the weights, the weight values in each device diverge.")


args = parser.parse_args()
if not os.path.isdir(args.model_save_path):
os.makedirs(args.model_save_path)
return args
Loading

0 comments on commit b102097

Please sign in to comment.