Image Recognition with ResNet50, ResNet101 and InceptionV3

Goal

This tutorial will introduce CPU performance considerations for three image recognition deep learning models, and how to use Intel® Optimizations for TensorFlow to improve inference time on CPUs. This tutorial will also provide code examples to use with Model Zoo's pretrained model that can be copy/pasted for quick off-the-ground implementation with synthetic and real data.

Background

Image recognition with deep learning is a computationally expensive endeavor. This tutorial will show you how to reduce the inference runtime of your network. Convolutional neural networks (CNNs) have been shown to learn and extract usable features by layering many convolution filters. ResNet50, ResNet101 and InceptionV3 are among the popular topologies for image recognition in the industry today. There are 2 main setbacks for CNNs for performance:

Deeply layering convolutions causes the number of training parameters to increase drastically.
Linear convolution filters cannot learn size-invariant features without using separate filter for each size regime.

ResNet models use gate and skip logic to address issue 1 and lower the number of parameters, similar to a recurrent neural network (RNN). The InceptionV3 model utilizes “network in network” mini perceptrons to convert linear convolutions into non-linear convolutions in a compact step, addressing issue 2. InceptionV3 also includes optimization that factor and vectorize the convolutions, further increasing the speed of the network.

Recommended Settings

In addition to TensorFlow optimizations that use the Intel® oneAPI Deep Neural Network Library (Intel® oneDNN) to utilize instruction sets appropriately, runtime settings also significantly contribute to improved performance. Tuning these options for CPU workloads is vital to optimize performance of TensorFlow on Intel® processors. Below are the set of run-time options recommended by Intel on ResNet50, ResNet101 and InceptionV3 through empirical testing.

Run-time options	Recommendations
Run-time options	ResNet50	InceptionV3	ResNet101
Batch Size	128. Regardless of the hardware
Hyperthreading	Enabled. Turn on in BIOS. Requires a restart.
intra_op_parallelism_threads	#physical cores per socket	#physical cores per socket	# all physical cores
inter_op_parallelism_threads	1	1	2
Data Layout	NCHW
NUMA Controls	numactl --cpunodebind=0 --membind=0
KMP_AFFINITY	KMP_AFFINITY=granularity=fine,verbose,compact,1,0
KMP_BLOCKTIME	1
OMP_NUM_THREADS	# intra_op_parallelism_threads		#physical cores per socket

Note: Refer to the link here to learn more about the run time options.

Run the following commands to get your processor information

a. #physical cores per socket: lscpu | grep "Core(s) per socket" | cut -d':' -f2 | xargs

b. #all physical cores: lscpu -b -p=Core,Socket | grep -v '^#' | sort -u | wc -l

Below is a code snippet you can incorporate into your existing ResNet50 or ResNet101 or InceptionV3 TensorFlow application to set the best settings. You can either set them in the CLI or in the Python script. Note that inter and intra_op_parallelism_threads settings can only be set in the Python script

export OMP_NUM_THREADS=physical cores
export KMP_AFFINITY="granularity=fine,verbose,compact,1,0"
export KMP_BLOCKTIME=1
export KMP_SETTINGS=1

(or)

import os
os.environ["KMP_BLOCKTIME"] = "1"
os.environ["KMP_SETTINGS"] = "1"
os.environ["KMP_AFFINITY"]= "granularity=fine,verbose,compact,1,0"
if FLAGS.num_intra_threads > 0:
  os.environ["OMP_NUM_THREADS"]= # <physical cores>
tf.config.threading.set_inter_op_parallelism_threads(1)
# tf.config.threading.set_inter_op_parallelism_threads(2) # for ResNet101
tf.config.threading.set_intra_op_parallelism_threads(<# physical cores>)

Hands-on Tutorial

This section shows how to measure inference performance on Intel's Model Zoo pretrained model (or your pretrained model) by setting the above-discussed run time flags.

FP32 inference

Initial Setup

Clone IntelAI models and download into your home directory

git clone https://github.com/IntelAI/models.git

(Skip to the next step if you already have a pretrained model) Download the pretrained models resnet50_fp32_pretrained_model.pb, resnet101_fp32_pretrained_model.pb and inceptionv3_fp32_pretrained_model.pb into your home directory or any other directory of your choice.

wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v1_8/resnet50_fp32_pretrained_model.pb

wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v1_8/resnet101_fp32_pretrained_model.pb

wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v1_8/inceptionv3_fp32_pretrained_model.pb

Refer to following Readme files to get the latest locations of pretrained models
a. ResNet50
b. ResNet101
c. InceptionV3

(optional) Download and setup a data directory that has image files in TFRecord format if you are inferring on a real dataset. You can refer to ImageNet or Coco Dataset which have images converted to TFRecords, or you can run the build_image_data.py script to convert raw images into TFRecords.
Install Docker since the tutorial runs on a Docker container.

Run inference

Pull the relevant Intel-optimized TensorFlow Docker image. We'll be running the pretrained model to infer on Docker container. Click here to find all the available Docker images.

docker pull intel/intel-optimized-tensorflow:latest

cd to the inference script directory

cd ~/models/benchmarks

Run the Python script launch_benchmark.py with the pretrained model. launch_benchmark.py script can be treated as an entry point to conveniently perform out-of-box high performance inference on pretrained models trained of popular topologies. The script will automatically set the recommended run-time options for supported topologies, but if you choose to set your own options, refer to the full list of available flags and a detailed explanation of the launch_benchmark.py script here. This step will automatically launch a new container on every run and terminate. Go to the Step 4 to interactively run the script on the container.

3.1. Online inference(or real-time inference, batch_size=1)

3.1.1 ResNet50

Note: As per the recommended settings socket-id is set to 0 for ResNet50. The workload will run on a single socket with numactl enabled. Remove the flag or set it to -1 to disable it.

Synthetic data

python launch_benchmark.py \
	--in-graph /home/<user>/resnet50_fp32_pretrained_model.pb \
	--model-name resnet50 \
	--framework tensorflow \
	--precision fp32 \
	--mode inference \
	--batch-size 1 \
	--benchmark-only \
	--socket-id 0 \
	--docker-image intel/intel-optimized-tensorflow:latest

Real data

python launch_benchmark.py \
	--data-location /home/<user>/<tfrecords_dataset_directory> \
	--in-graph /home/<user>/resnet50_fp32_pretrained_model.pb \
	--model-name resnet50 \
	--framework tensorflow \
	--precision fp32 \
	--mode inference \
	--batch-size 1 \
	--benchmark-only \
	--socket-id 0 \
	--docker-image intel/intel-optimized-tensorflow:latest

3.1.2 ResNet101

Synthetic data

python launch_benchmark.py \
	--in-graph /home/<user>/resnet101_fp32_pretrained_model.pb \
	--model-name resnet101 \
	--framework tensorflow \
	--precision fp32 \
	--mode inference \
	--batch-size 1 \
	--benchmark-only \
	--docker-image intel/intel-optimized-tensorflow:latest

Real data

python launch_benchmark.py \
	--data-location /home/<user>/<tfrecords_dataset_directory> \
	--in-graph /home/<user>/resnet101_fp32_pretrained_model.pb \
	--model-name resnet101 \
	--framework tensorflow \
	--precision fp32 \
	--mode inference \
	--batch-size 1 \
	--benchmark-only \
	--docker-image intel/intel-optimized-tensorflow:latest

3.1.3 InceptionV3

Note: As per the recommended settings socket-id is set to 0 for InceptionV3. The workload will run on a single socket with numactl enabled. Remove the flag or set it to -1 to disable it.

Synthetic data

python launch_benchmark.py \
	--in-graph /home/<user>/inceptionv3_fp32_pretrained_model.pb \
	--model-name inceptionv3 \
	--framework tensorflow \
	--precision fp32 \
	--mode inference \
	--batch-size 1 \
	--benchmark-only \
	--socket-id 0 \
	--docker-image intel/intel-optimized-tensorflow:latest

Real data

python launch_benchmark.py \
	--data-location /home/<user>/<tfrecords_dataset_directory> \
	--in-graph /home/<user>/inceptionv3_fp32_pretrained_model.pb \
	--model-name inceptionv3 \
	--framework tensorflow \
	--precision fp32 \
	--mode inference \
	--batch-size 1 \
	--benchmark-only \
	--socket-id 0 \
	--docker-image intel/intel-optimized-tensorflow:latest

3.2. Best Batch inference(batch_size=128)

3.2.1 ResNet50

Note: As per the recommended settings socket-id is set to 0 for ResNet50. The workload will run on a single socket with numactl enabled. Remove the flag or set it to -1 to disable it.

Synthetic data

python launch_benchmark.py \
	--in-graph /home/<user>/resnet50_fp32_pretrained_model.pb \
	--model-name resnet50 \
	--framework tensorflow \
	--precision fp32 \
	--mode inference \
	--batch-size 128 \
	--benchmark-only \
	--socket-id 0 \
	--docker-image intel/intel-optimized-tensorflow:latest

Real data

python launch_benchmark.py \
	--data-location /home/<user>/<tfrecords_dataset_directory> \
	--in-graph /home/<user>/resnet50_fp32_pretrained_model.pb \
	--model-name resnet50 \
	--framework tensorflow \
	--precision fp32 \
	--mode inference \
	--batch-size 128 \
	--benchmark-only \
	--socket-id 0 \
	--docker-image intel/intel-optimized-tensorflow:latest

3.2.2 ResNet101

Synthetic data

python launch_benchmark.py \
	--in-graph /home/<user>/resnet101_fp32_pretrained_model.pb \
	--model-name resnet101 \
	--framework tensorflow \
	--precision fp32 \
	--mode inference \
	--batch-size 128 \
	--benchmark-only \
	--docker-image intel/intel-optimized-tensorflow:latest

Real data

python launch_benchmark.py \
	--data-location /home/<user>/<tfrecords_dataset_directory> \
	--in-graph /home/<user>/resnet101_fp32_pretrained_model.pb \
	--model-name resnet101 \
	--framework tensorflow \
	--precision fp32 \
	--mode inference \
	--batch-size 128 \
	--benchmark-only \
	--docker-image intel/intel-optimized-tensorflow:latest

3.2.3 InceptionV3

Note: As per the recommended settings socket-id is set to 0 for InceptionV3. The workload will run on a single socket with numactl enabled. Remove the flag or set it to -1 to disable it.

Synthetic data

python launch_benchmark.py \
	--in-graph /home/<user>/inceptionv3_fp32_pretrained_model.pb \
	--model-name inceptionv3 \
	--framework tensorflow \
	--precision fp32 \
	--mode inference \
	--batch-size 128 \
	--benchmark-only \
	--socket-id 0 \
	--docker-image intel/intel-optimized-tensorflow:latest

Real data

python launch_benchmark.py \
	--data-location /home/<user>/<tfrecords_dataset_directory> \
	--in-graph /home/<user>/inceptionv3_fp32_pretrained_model.pb \
	--model-name inceptionv3 \
	--framework tensorflow \
	--precision fp32 \
	--mode inference \
	--batch-size 128 \
	--benchmark-only \
	--socket-id 0 \
	--docker-image intel/intel-optimized-tensorflow:latest

Example Output

[Running warmup steps...]
steps = 10, ... images/sec
[Running benchmark steps...]
steps = 10, ... images/sec
steps = 20, ... images/sec
steps = 30, ... images/sec
steps = 40, ... images/sec
steps = 50, ... images/sec
Ran inference with batch size 128
Log location outside container: {--output-dir value}/benchmark_resnet50

The logs are captured in a directory outside of the container.

If you want to run the model script interactively within the docker container, run launch_benchmark.py with --debug flag. This will launch a docker container based on the --docker-image, performs necessary installs, runs the launch_benchmark.py script and does not terminate the container process. As an example, this step will demonstrate ResNet50 Real Time inference on Synthetic Data use case, you can implement the same strategy on different use cases demoed in Step 3.
```
 python launch_benchmark.py \
 	--in-graph /home/<user>/resnet50_fp32_pretrained_model.pb \
 	--model-name resnet50 \
 	--framework tensorflow \
 	--precision fp32 \
 	--mode inference \
 	--batch-size 1 \
 	--benchmark-only \
 	--docker-image intel/intel-optimized-tensorflow:latest \
 	--debug
```

Example Output

root@a78677f56d69:/workspace/benchmarks/common/tensorflow#

To rerun the bechmarking script, execute the start.sh bash script from your existing directory with additional or modified flags. For e.g to rerun with the best batch inference (batch size=128) settings run with BATCH_SIZE and to skip the run from reinstalling packages pass True to NOINSTALL.

chmod +x ./start.sh


NOINSTALL=True BATCH_SIZE=128 ./start.sh

All other flags will be defaulted to values passed in the first launch_benchmark.py that starts the container. See here to get the full list of flags.

Example Output

USE_CASE: image_recognition
FRAMEWORK: tensorflow
WORKSPACE: /workspace/benchmarks/common/tensorflow
DATASET_LOCATION: /dataset
CHECKPOINT_DIRECTORY: /checkpoints
IN_GRAPH: /in_graph/freezed_resnet50.pb
Mounted volumes:
    /localdisk/<user>/models/benchmarks mounted on: /workspace/benchmarks
    None mounted on: /workspace/models
    /localdisk/<user>/models/benchmarks/../models/image_recognition/tensorflow/resnet50 mounted on: /workspace/intelai_models
    None mounted on: /dataset
    None mounted on: /checkpoints
SOCKET_ID: -1
MODEL_NAME: resnet50
MODE: inference
PRECISION: fp32
BATCH_SIZE: 128
NUM_CORES: -1
BENCHMARK_ONLY: True
ACCURACY_ONLY: False
NOINSTALL: True
.
.
.
.
.
Batch size = 128
Throughput: ... images/sec
Ran inference with batch size 128
Log location outside container: {--output-dir value}/benchmark_resnet50_inference_fp32_20190205_201632.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tutorial.md

Tutorial.md

Image Recognition with ResNet50, ResNet101 and InceptionV3

Goal

Background

Recommended Settings

Hands-on Tutorial

FP32 inference

Initial Setup

Run inference

Files

Tutorial.md

Latest commit

History

Tutorial.md

File metadata and controls

Image Recognition with ResNet50, ResNet101 and InceptionV3

Goal

Background

Recommended Settings

Hands-on Tutorial

FP32 inference

Initial Setup

Run inference