Skip to content

xiaohe725/Padding-free-Downsampling-And-Hybrid-Stem

Repository files navigation

This repository provides code for paper 《Lightweight Deep Neural Network Model With Padding-free Downsampling》

Framework & Performance

framework performa

Experiment

I. For the CIFAR-100 dataset, using the first set of hyperparameters
I. For the CIFAR-100 dataset, using the second set of hyperparameters
II. For Stanford Dogs dataset
III. For ImageNet dataset
IV. Inference Latency
V. Ablation Experiments on CIFAR-100
VI. Comparison with other downsampling (EfficientFormerv2)
VII. For VegFru-292 dataset

Instructions for use

Getting started
Take MobileNetv3 as an example, when using our module
Acknowledgement

For the CIFAR-100 dataset, using the first set of hyperparameters

The first set of hyperparameters follows the settings of Haase et al.

《Rethinking Depthwise Separable Convolutions: How Intra-Kernel Correlations Lead to Improved MobileNets》
Daniel Haase∗   Manuel Amthor∗
ZEISS Microscopy   ZEISS Microscopy

Orig——CIFAR-100

Model Parameters FLOPs Accuracy
MobileNetv3-large 3.066M 68.5M 75.37%
MobileNetv3-large(BSConv-S) 3.066M 68.5M 77.87%
ResNet-20 0.278M 41.4M 68.12%
ResNet-110(BSConv-U) 0.245M 41.8M 71.58%
WideResNet-40-3 5.056M 735.8M 76.23%
WideResNet-40-8(BSConv-U) 4.286M 675.1M 77.79%

Ours——CIFAR-100

Model Parameters FLOPs Accuracy
MobileNetv3-large 3.067M 54.6M ↓ 75.71%
MobileNetv3-large(BSConv-S) 3.067M 54.6M ↓ 78.36%
ResNet-20 0.282M 37.8M ↓ 68.30%
ResNet-110(BSConv-U) 0.249M 38.6M ↓ 71.62%
WideResNet-40-3 5.287M 668.7M ↓ 76.28%
WideResNet-40-8(BSConv-U) 4.457M 615.6M ↓ 78.05%

For the CIFAR-100 dataset, using the second set of hyperparameters

Orig——CIFAR-100

Model Parameters FLOPs Accuracy
MobileNetv3-large 4.330M 68.8M 76.00%
Parc-MobileNet-v2 2.348M 91.3M 76.20%
GhostNet 4.029M 44.6M 74.00%
ShuffleNet-v2 1.356M 46.2M 70.90%

Ours——CIFAR-100

Model Parameters FLOPs Accuracy
MobileNetv3-large 4.331M 54.7M ↓ 76.60%
Parc-MobileNet-v2 2.348M 73.0M ↓ 76.60%
GhostNet 4.030M 34.8M ↓ 74.10%
ShuffleNet-v2 1.358M 35.7M ↓ 71.50%

For Stanford Dogs dataset

Orig——Stanford Dogs

Model Parameters FLOPs Accuracy
MobileNetv3-large 3.086M 230.1M 51.07%
MobileNetv3-large-bsconvs 3.086M 230.1M 59.68%

Ours——Stanford Dogs

Model Parameters FLOPs Accuracy
MobileNetv3-large 3.087M 212.6M ↓ 54.11%
MobileNetv3-large-bsconvs 3.087M 212.6M ↓ 60.79%

For ImageNet dataset

Orig——ImageNet

Model Parameters FLOPs Accuracy
MobileNetv3-large 5.480M 232.5M 69.50%

Ours——ImageNet

Model Parameters FLOPs Accuracy
MobileNetv3-large 5.481M 214.9M ↓ 69.50%

Inference Latency

Orig——Latency

Model AMD Ryzen 5 5600H MediaTek Tiangui 1000+
MobileNetv3-large 8.5ms 27.0ms
Parc-MobileNet-v2 8.7ms 37.4ms
GhostNet 11.4ms 36.6ms
ShuffleNet-v2 6.2ms 19.4ms

Ours——Latency

Model AMD Ryzen 5 5600H MediaTek Tiangui 1000+
MobileNetv3-large 9.0ms 26.3ms ↓
Parc-MobileNet-v2 9.3ms 34.0ms ↓
GhostNet 11.7ms 26.8ms ↓
ShuffleNet-v2 7.4ms 18.8ms ↓

Ablation Experiments on CIFAR-100

Model orig +Stem +Downsampling ours
MobileNetv3-large 76.0% 75.9% 76.4% 76.6%↑
Parc-MobileNet-v2 76.2% 76.6% 76.4% 76.6%↑
GhostNet 76.0% 74.2% 73.8% 74.1%↑
ShuffleNet-v2 70.9% 72.0% 70.4% 71.5%↑

Comparison with other downsampling (EfficientFormerv2)

《Rethinking Vision Transformers for MobileNet Size and Speed》
Yanyu Li
Snap Inc. Northeastern University

EfficientFormerv2-Downsampling——CIFAR-100

Model Parameters FLOPs Accuracy
MobileNetv3-large 4.317M 78.0M 75.80%
Parc-MobileNet-v2 2.558M 97.5M 75.70%
GhostNet 4.092M 58.3M 74.30%
ShuffleNet-v2 2.804M 84.1M 70.60%

Ours——CIFAR-100

Model Parameters FLOPs Accuracy
MobileNetv3-large 4.331M 54.7M 76.60%
Parc-MobileNet-v2 2.348M 73.0M 76.60%
GhostNet 4.030M 34.8M 74.10%
ShuffleNet-v2 1.358M 35.7M 71.50%

For VegFru-292 dataset

Orig——VegFru-292

Model Parameters FLOPs Accuracy
MobileNetv3-large 4.576M 224.5M 89.20%
Parc-MobileNet-v2 2.605M 314.8M 89.10%
GhostNet 4.276M 147.9M 89.60%
ShuffleNet-v2 1.553M 148.1M 88.40%

Ours——VegFru-292

Model Parameters FLOPs Accuracy
MobileNetv3-large 4.577M 205.7M ↓ 89.90%
Parc-MobileNet-v2 2.605M 305.5M ↓ 90.00%
GhostNet 4.276M 136.9M ↓ 90.30%
ShuffleNet-v2 1.554M 130.7M ↓ 87.70%

Getting started

  • For the BSConv folder, when using it for the first time, use "--download" to download the dataset.
python bsconv_pytorch_train.py --data-root cifar100 --dataset cifar100 --architecture cifar_mobilenetv3_large_w1 --download --gpu-id 0
  • "--data-root" is the dataset path, "--dataset" is the dataset name, "--architecture" is the model name.
python bsconv_pytorch_train.py --data-root cifar100 --dataset cifar100 --architecture cifar_mobilenetv3_large_w1 --gpu-id 0
python bsconv_pytorch_train.py --data-root cifar100 --dataset cifar100 --architecture cifar_mobilenetv3_large_w1_bsconvs_p1d6 --gpu-id 0
python bsconv_pytorch_train.py --data-root cifar100 --dataset cifar100 --architecture cifar_wrn40_3 --gpu-id 0
python bsconv_pytorch_train.py --data-root cifar100 --dataset cifar100 --architecture cifar_wrn40_8_bsconvu --gpu-id 0
python bsconv_pytorch_train.py --data-root cifar100 --dataset cifar100 --architecture cifar_resnet20 --gpu-id 0
python bsconv_pytorch_train.py --data-root cifar100 --dataset cifar100 --architecture cifar_resnet110_bsconvu --gpu-id 0 

Take MobileNetv3 as an example, when using our module.

  • Replace init_conv on line 321 in the mobilenet.py file with our stem layer.
self.backbone.add_module("init_conv", StemBlock(in_channels, init_conv_channels))
  • Uncomment the if stride==2 on lines 157, 168, 237, and 261 in the common.py file.
        if stride == 2:
          self.maxx = nn.MaxPool2d(kernel_size=3, stride=2,padding=0)
        if self.stride == 2:
          b = self.maxx(b)
          return x + b
    if stride ==2:
          return ConvBlock(
             in_channels=channels,
             out_channels=channels,
             kernel_size=3,
             stride=stride,
             padding=0,
             groups=channels,
             use_bn=use_bn,
             activation=activation)
    if stride ==2 :
        return ConvBlock(
             in_channels=channels,
             out_channels=channels,
             kernel_size=5,
             stride=stride,
             padding=1,
             groups=channels,
             use_bn=use_bn,
             activation=activation)

Acknowledgement

Thanks to BSConv and EfficientFormerv2. Our code is based on the BSConv library and EfficientFormerv2 library.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published