Performance improvement is not achieved with AVX #8

sharon-k · 2018-04-24T17:44:11Z

My processor- Intel(R) Core(TM) i7-3740QM supports the AVX instructions set. I created 2 environments with Anaconda 4.5.0:

tf_avx: has tf installed with tensorflow-windows-wheel/1.5.0/py36/CPU/avx/tensorflow-1.5.0-cp36-cp36m-win_amd64.whl
tf_wo_simd: has tf installed with pip install tensorflow==1.2.0
(the reason I selected this version is to ensure that I'm installing a version without SIMD. When it's activated, I can see the warnings about the SIMD printed by tf)

I ran the same code, evaluating a simple network with two fully connected layers, on each of the envs. I couldn't see the time improvement between the two. I will continue to say that it comes after I try it on a complicated network with few conv layers, the improvement wasn't seen there too.

Did I miss something?
Thank you for your help

The text was updated successfully, but these errors were encountered:

fo40225 · 2018-04-24T20:49:11Z

Indeed.

Currently using cmake to build Tensorflow on windows, its SIMD configuration has some issue that did not apply to sub module.

You can use this script to compare the speed.

https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks

In my observation, there is only a slight difference in the results of using AVX or not on windows, and did not reach the improvement in the following table.

https://www.tensorflow.org/performance/performance_guide#comparing_compiler_optimizations

sharon-k · 2018-04-26T12:52:51Z

Thank you for your reply!

Are there plans to address this issue with cmake? Is there an open bug report? If not, could you shortly describe it - maybe I'd be able to help... :)

fo40225 · 2018-04-26T18:16:44Z

https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/cmake/external

Those cmake files didn't pass tensorflow_WIN_CPU_SIMD_OPTIONS to build the library.

fo40225 · 2018-05-04T17:44:57Z

fo40225/tensorflow@0a95e35

I'm trying to build with those change, but still no performance difference from sse2 to avx2.

Seem this issue is caused by another reason.

TheRedMudder · 2018-06-13T16:05:44Z

In my testing, I found that GPU with no AVX2 for the CPU out performed the GPU using AVX2 in speed! I thought AVX2 will give performance gains for operations that only have CPU implementations, but in my test AVX2 marginally decreased performance when the GPU flag was used. Of course AVX2 did improve performance when no GPU flag was used. Have you seen similar results?

Benchmark Test

NO GPU + AVX2 -3rd Place	NO GPU + NoAVX2 - 4th Place

GPU + AVX2 -2nd Place	GPU + No AVX2 - 1st Place

Test Specifications

The benchmarks were run on a Windows 10 system that has an Intel i5-8400 with 16 GB RAM and NVIDIA GeForce GTX 1080 Ti with 11 GB dedicated memory. The code and video it was processing was the same, but with different supports for AVX2 and GPU in TensorFlow. YOLOv2 with TensorFlow as the backend was used for benching with the following command: python flow --model cfg/yolo.cfg --load bin/yolo.weights --demo ../video/Ron.mp4 --gpu .8 --saveVideo

Questions

I understand why AVX2 outperforms no AVX2 when no GPU is used, but why does AVX2 cause a marginal performance decrease when GPU is used?
Do you guys see have similar marginal performance decreases when using AVX2 with GPU compared to not using AVX2 with GPU?

Also, thanks @fo40225 for this repo, it is an awesome time saver!
Edits: Spelling

fo40225 · 2018-06-14T04:24:45Z

@TheRedMudder I saw "GPU + AVX2" case contains a "out of vram" error. You should recheck your benchmark script.

If you can add my sse2 version to benchmark, it will more clarify the result (sse2 vs avx2, official vs custom build).

GuyTraveler · 2018-08-08T17:06:54Z

Any updates regarding this issue? I was testing the inference speed difference between the multiple optimized binaries for Windows and at best noticed a 20ms improvement which certainly does not measure up to the expectations.

fo40225 mentioned this issue May 5, 2018

[Q] Why is there no tensorflow 1.8 GPU with AVX2? #11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvement is not achieved with AVX #8

Performance improvement is not achieved with AVX #8

sharon-k commented Apr 24, 2018

fo40225 commented Apr 24, 2018

sharon-k commented Apr 26, 2018

fo40225 commented Apr 26, 2018

fo40225 commented May 4, 2018

TheRedMudder commented Jun 13, 2018 •

edited

Loading

fo40225 commented Jun 14, 2018

GuyTraveler commented Aug 8, 2018

Performance improvement is not achieved with AVX #8

Performance improvement is not achieved with AVX #8

Comments

sharon-k commented Apr 24, 2018

fo40225 commented Apr 24, 2018

sharon-k commented Apr 26, 2018

fo40225 commented Apr 26, 2018

fo40225 commented May 4, 2018

TheRedMudder commented Jun 13, 2018 • edited Loading

Benchmark Test

Test Specifications

Questions

fo40225 commented Jun 14, 2018

GuyTraveler commented Aug 8, 2018

TheRedMudder commented Jun 13, 2018 •

edited

Loading