Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable XLA support for Tensorflow #122

Open
Randl opened this issue Feb 20, 2017 · 2 comments
Open

Enable XLA support for Tensorflow #122

Randl opened this issue Feb 20, 2017 · 2 comments

Comments

@Randl
Copy link

Randl commented Feb 20, 2017

XLA can significantly increase computation speed.

I tried to measure speed up, but unfortunately didn't manage to get significant results:

$ python3 benchmark_vgg.py --batch_size 4000
WARNING:tensorflow:From benchmark_vgg.py:184: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
2017-02-20 17:14:06.476874: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-02-20 17:14:06.476908: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-02-20 17:14:08.569973: I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Graphics Device
major: 6 minor: 0 memoryClockRate (GHz) 1.3285
pciBusID 0000:04:00.0
Total memory: 11.91GiB
Free memory: 11.63GiB
2017-02-20 17:14:08.570458: W tensorflow/stream_executor/cuda/cuda_driver.cc:485] creating context when one is currently active; existing: 0x3eccad0
2017-02-20 17:14:09.183220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: Graphics Device
major: 6 minor: 0 memoryClockRate (GHz) 1.3285
pciBusID 0000:41:00.0
Total memory: 11.91GiB
Free memory: 11.63GiB
2017-02-20 17:14:09.183512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 1
2017-02-20 17:14:09.183570: I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 0
2017-02-20 17:14:09.183633: I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1
2017-02-20 17:14:09.183658: I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y N
2017-02-20 17:14:09.183675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1:   N Y
2017-02-20 17:14:09.183905: I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Graphics Device, pci bus id: 0000:04:00.0)
2017-02-20 17:14:09.184090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Graphics Device, pci bus id: 0000:41:00.0)
2017-02-20 17:14:09.749515: I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 2 visible devices
2017-02-20 17:14:09.749669: I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 96 visible devices
2017-02-20 17:14:09.794250: I tensorflow/compiler/xla/service/service.cc:180] XLA service executing computations on platform Host. Devices:
2017-02-20 17:14:09.794375: I tensorflow/compiler/xla/service/service.cc:187]   StreamExecutor device (0): <undefined>, <undefined>
2017-02-20 17:14:09.794871: I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 2 visible devices
2017-02-20 17:14:09.794890: I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 96 visible devices
2017-02-20 17:14:09.826939: I tensorflow/compiler/xla/service/service.cc:180] XLA service executing computations on platform CUDA. Devices:
2017-02-20 17:14:09.827028: I tensorflow/compiler/xla/service/service.cc:187]   StreamExecutor device (0): Graphics Device, Compute Capability 6.0
2017-02-20 17:14:09.827054: I tensorflow/compiler/xla/service/service.cc:187]   StreamExecutor device (1): Graphics Device, Compute Capability 6.0
2017-02-20 17:14:14.149286: step 0, duration = 0.000
2017-02-20 17:14:14.152526: step 10, duration = 0.000
2017-02-20 17:14:14.155913: step 20, duration = 0.000
2017-02-20 17:14:14.158968: step 30, duration = 0.000
2017-02-20 17:14:14.161953: step 40, duration = 0.000
2017-02-20 17:14:14.165289: step 50, duration = 0.001
2017-02-20 17:14:14.168046: step 60, duration = 0.000
2017-02-20 17:14:14.172249: step 70, duration = 0.000
2017-02-20 17:14:14.174981: step 80, duration = 0.000
2017-02-20 17:14:14.177259: step 90, duration = 0.000
2017-02-20 17:14:14.179223: Forward across 100 steps, 0.000 +/- 0.000 sec / batch
2017-02-20 17:14:15.127072: step 0, duration = 0.006
2017-02-20 17:14:15.193918: step 10, duration = 0.006
2017-02-20 17:14:15.258036: step 20, duration = 0.006
2017-02-20 17:14:15.311999: step 30, duration = 0.006
2017-02-20 17:14:15.364200: step 40, duration = 0.005
2017-02-20 17:14:15.416405: step 50, duration = 0.005
2017-02-20 17:14:15.470125: step 60, duration = 0.006
2017-02-20 17:14:15.508636: step 70, duration = 0.003
2017-02-20 17:14:15.542784: step 80, duration = 0.003
2017-02-20 17:14:15.576780: step 90, duration = 0.003
2017-02-20 17:14:15.607214: Forward-backward across 100 steps, 0.005 +/- 0.001 sec / batch

(I've used P100 for these measurements)

@aodhan-domhnaill
Copy link

Could you post your benchmark code?

@Randl
Copy link
Author

Randl commented Mar 7, 2017

  config = tf.ConfigProto()
  
  # Turns on XLA JIT compilation.
  config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1
  run_metadata = tf.RunMetadata()
  sess = tf.Session(config=config)
  tf.global_variables_initializer().run(session=sess)

I've added theses lines to enable XLA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants