-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvement is not achieved with AVX #8
Comments
Indeed. Currently using cmake to build Tensorflow on windows, its SIMD configuration has some issue that did not apply to sub module. You can use this script to compare the speed. https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks In my observation, there is only a slight difference in the results of using AVX or not on windows, and did not reach the improvement in the following table. https://www.tensorflow.org/performance/performance_guide#comparing_compiler_optimizations |
Thank you for your reply! Are there plans to address this issue with cmake? Is there an open bug report? If not, could you shortly describe it - maybe I'd be able to help... :) |
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/cmake/external Those cmake files didn't pass tensorflow_WIN_CPU_SIMD_OPTIONS to build the library. |
I'm trying to build with those change, but still no performance difference from sse2 to avx2. Seem this issue is caused by another reason. |
In my testing, I found that GPU with no AVX2 for the CPU out performed the GPU using AVX2 in speed! I thought AVX2 will give performance gains for operations that only have CPU implementations, but in my test AVX2 marginally decreased performance when the GPU flag was used. Of course AVX2 did improve performance when no GPU flag was used. Have you seen similar results? Benchmark Test
Test SpecificationsThe benchmarks were run on a Windows 10 system that has an Intel i5-8400 with 16 GB RAM and NVIDIA GeForce GTX 1080 Ti with 11 GB dedicated memory. The code and video it was processing was the same, but with different supports for AVX2 and GPU in TensorFlow. YOLOv2 with TensorFlow as the backend was used for benching with the following command: Questions
Also, thanks @fo40225 for this repo, it is an awesome time saver! |
@TheRedMudder I saw "GPU + AVX2" case contains a "out of vram" error. You should recheck your benchmark script. If you can add my sse2 version to benchmark, it will more clarify the result (sse2 vs avx2, official vs custom build). |
Any updates regarding this issue? I was testing the inference speed difference between the multiple optimized binaries for Windows and at best noticed a 20ms improvement which certainly does not measure up to the expectations. |
My processor- Intel(R) Core(TM) i7-3740QM supports the AVX instructions set. I created 2 environments with Anaconda 4.5.0:
tf_avx: has tf installed with tensorflow-windows-wheel/1.5.0/py36/CPU/avx/tensorflow-1.5.0-cp36-cp36m-win_amd64.whl
tf_wo_simd: has tf installed with
pip install tensorflow==1.2.0
(the reason I selected this version is to ensure that I'm installing a version without SIMD. When it's activated, I can see the warnings about the SIMD printed by tf)
I ran the same code, evaluating a simple network with two fully connected layers, on each of the envs. I couldn't see the time improvement between the two. I will continue to say that it comes after I try it on a complicated network with few conv layers, the improvement wasn't seen there too.
Did I miss something?
Thank you for your help
The text was updated successfully, but these errors were encountered: