Skip to content

Commit

Permalink
Add benchmark for calculate_qparams (pytorch#42138)
Browse files Browse the repository at this point in the history
Summary:
Adds a benchmark for `HistogramObserver.calculate_qparams` to the quantized op benchmarks. The next diff in this stack adds a ~15x speedup for this benchmark.

Pull Request resolved: pytorch#42138

Test Plan:
While in the folder `benchmarks/operator_benchmark`, the benchmark can be run using `python -m benchmark_all_quantized_test --operators HistogramObserverCalculateQparams`.

Benchmark results before speedup:
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: HistogramObserverCalculateQparams
# Mode: Eager
# Name: HistogramObserverCalculateQparams_C3_M512_N512_dtypetorch.quint8_cpu_qschemetorch.per_tensor_affine
# Input: C: 3, M: 512, N: 512, dtype: torch.quint8, device: cpu, qscheme: torch.per_tensor_affine
Forward Execution Time (us) : 185818.566

# Benchmarking PyTorch: HistogramObserverCalculateQparams
# Mode: Eager
# Name: HistogramObserverCalculateQparams_C3_M512_N512_dtypetorch.quint8_cpu_qschemetorch.per_tensor_symmetric
# Input: C: 3, M: 512, N: 512, dtype: torch.quint8, device: cpu, qscheme: torch.per_tensor_symmetric
Forward Execution Time (us) : 165325.916
```

Benchmark results after speedup:
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: HistogramObserverCalculateQparams
# Mode: Eager
# Name: HistogramObserverCalculateQparams_C3_M512_N512_dtypetorch.quint8_cpu_qschemetorch.per_tensor_affine
# Input: C: 3, M: 512, N: 512, dtype: torch.quint8, device: cpu, qscheme: torch.per_tensor_affine
Forward Execution Time (us) : 12242.241

# Benchmarking PyTorch: HistogramObserverCalculateQparams
# Mode: Eager
# Name: HistogramObserverCalculateQparams_C3_M512_N512_dtypetorch.quint8_cpu_qschemetorch.per_tensor_symmetric
# Input: C: 3, M: 512, N: 512, dtype: torch.quint8, device: cpu, qscheme: torch.per_tensor_symmetric
Forward Execution Time (us) : 12655.354
```

Reviewed By: supriyar

Differential Revision: D22779291

Pulled By: durumu

fbshipit-source-id: 1fe17d20eda5dd99e0e2590480142034c3574d4e
  • Loading branch information
durumu authored and facebook-github-bot committed Aug 6, 2020
1 parent 79de9c0 commit 5ca08b8
Showing 1 changed file with 21 additions and 0 deletions.
21 changes: 21 additions & 0 deletions benchmarks/operator_benchmark/pt/qobserver_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,13 @@
]
)

qobserver_calculate_qparams_list = op_bench.op_list(
attr_names=['op_name', 'op_func'],
attrs=[
['HistogramObserverCalculateQparams', obs.HistogramObserver],
]
)


class QObserverBenchmark(op_bench.TorchBenchmarkBase):
def init(self, C, M, N, dtype, qscheme, op_func, device):
Expand All @@ -79,6 +86,15 @@ def init(self, C, M, N, dtype, qscheme, op_func, device):
def forward(self):
return self.op_func(self.f_input)

class QObserverBenchmarkCalculateQparams(op_bench.TorchBenchmarkBase):
def init(self, C, M, N, dtype, qscheme, op_func, device):
self.f_input = torch.rand(C, M, N, device=device)
self.q_observer = op_func(dtype=dtype, qscheme=qscheme).to(device)
self.q_observer(self.f_input)

def forward(self):
return self.q_observer.calculate_qparams()


op_bench.generate_pt_tests_from_op_list(
qobserver_per_tensor_list,
Expand All @@ -90,6 +106,11 @@ def forward(self):
qobserver_per_channel_configs_short + qobserver_per_channel_configs_long,
QObserverBenchmark)

op_bench.generate_pt_tests_from_op_list(
qobserver_calculate_qparams_list,
qobserver_per_tensor_configs_short + qobserver_per_tensor_configs_long,
QObserverBenchmarkCalculateQparams)


if __name__ == "__main__":
op_bench.benchmark_runner.main()

0 comments on commit 5ca08b8

Please sign in to comment.