We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XLA can significantly increase computation speed.
I tried to measure speed up, but unfortunately didn't manage to get significant results:
$ python3 benchmark_vgg.py --batch_size 4000 WARNING:tensorflow:From benchmark_vgg.py:184: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Use `tf.global_variables_initializer` instead. 2017-02-20 17:14:06.476874: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2017-02-20 17:14:06.476908: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2017-02-20 17:14:08.569973: I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: name: Graphics Device major: 6 minor: 0 memoryClockRate (GHz) 1.3285 pciBusID 0000:04:00.0 Total memory: 11.91GiB Free memory: 11.63GiB 2017-02-20 17:14:08.570458: W tensorflow/stream_executor/cuda/cuda_driver.cc:485] creating context when one is currently active; existing: 0x3eccad0 2017-02-20 17:14:09.183220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties: name: Graphics Device major: 6 minor: 0 memoryClockRate (GHz) 1.3285 pciBusID 0000:41:00.0 Total memory: 11.91GiB Free memory: 11.63GiB 2017-02-20 17:14:09.183512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 1 2017-02-20 17:14:09.183570: I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 0 2017-02-20 17:14:09.183633: I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1 2017-02-20 17:14:09.183658: I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y N 2017-02-20 17:14:09.183675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1: N Y 2017-02-20 17:14:09.183905: I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Graphics Device, pci bus id: 0000:04:00.0) 2017-02-20 17:14:09.184090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Graphics Device, pci bus id: 0000:41:00.0) 2017-02-20 17:14:09.749515: I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 2 visible devices 2017-02-20 17:14:09.749669: I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 96 visible devices 2017-02-20 17:14:09.794250: I tensorflow/compiler/xla/service/service.cc:180] XLA service executing computations on platform Host. Devices: 2017-02-20 17:14:09.794375: I tensorflow/compiler/xla/service/service.cc:187] StreamExecutor device (0): <undefined>, <undefined> 2017-02-20 17:14:09.794871: I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 2 visible devices 2017-02-20 17:14:09.794890: I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 96 visible devices 2017-02-20 17:14:09.826939: I tensorflow/compiler/xla/service/service.cc:180] XLA service executing computations on platform CUDA. Devices: 2017-02-20 17:14:09.827028: I tensorflow/compiler/xla/service/service.cc:187] StreamExecutor device (0): Graphics Device, Compute Capability 6.0 2017-02-20 17:14:09.827054: I tensorflow/compiler/xla/service/service.cc:187] StreamExecutor device (1): Graphics Device, Compute Capability 6.0 2017-02-20 17:14:14.149286: step 0, duration = 0.000 2017-02-20 17:14:14.152526: step 10, duration = 0.000 2017-02-20 17:14:14.155913: step 20, duration = 0.000 2017-02-20 17:14:14.158968: step 30, duration = 0.000 2017-02-20 17:14:14.161953: step 40, duration = 0.000 2017-02-20 17:14:14.165289: step 50, duration = 0.001 2017-02-20 17:14:14.168046: step 60, duration = 0.000 2017-02-20 17:14:14.172249: step 70, duration = 0.000 2017-02-20 17:14:14.174981: step 80, duration = 0.000 2017-02-20 17:14:14.177259: step 90, duration = 0.000 2017-02-20 17:14:14.179223: Forward across 100 steps, 0.000 +/- 0.000 sec / batch 2017-02-20 17:14:15.127072: step 0, duration = 0.006 2017-02-20 17:14:15.193918: step 10, duration = 0.006 2017-02-20 17:14:15.258036: step 20, duration = 0.006 2017-02-20 17:14:15.311999: step 30, duration = 0.006 2017-02-20 17:14:15.364200: step 40, duration = 0.005 2017-02-20 17:14:15.416405: step 50, duration = 0.005 2017-02-20 17:14:15.470125: step 60, duration = 0.006 2017-02-20 17:14:15.508636: step 70, duration = 0.003 2017-02-20 17:14:15.542784: step 80, duration = 0.003 2017-02-20 17:14:15.576780: step 90, duration = 0.003 2017-02-20 17:14:15.607214: Forward-backward across 100 steps, 0.005 +/- 0.001 sec / batch
(I've used P100 for these measurements)
The text was updated successfully, but these errors were encountered:
Could you post your benchmark code?
Sorry, something went wrong.
config = tf.ConfigProto() # Turns on XLA JIT compilation. config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1 run_metadata = tf.RunMetadata() sess = tf.Session(config=config) tf.global_variables_initializer().run(session=sess)
I've added theses lines to enable XLA
No branches or pull requests
XLA can significantly increase computation speed.
I tried to measure speed up, but unfortunately didn't manage to get significant results:
(I've used P100 for these measurements)
The text was updated successfully, but these errors were encountered: