Skip to content

Setting up the system

Tianqi Tang edited this page Jun 13, 2018 · 17 revisions

Standalone Version

Standalone version does not require tensorflow environment. We write a simple tensor wrapper and buffer wrapper to mimic the bahavior of tensorflow. To access the standalone version, first download our code:

$ git clone https://github.com/miglopst/cs263_spring2018.git

The code is located in the folder:

cs263_spring2018/tensorflow/tensorgc/

To compile the code, type make in the above folder. We can run two test. To run linear allocation test, open main.cc and comment out:

  std::cout << "===start random initialization ===" << std::endl;
  random_initialization_test();
  std::cout << "===end random initialization ===" << std::endl;

To run random allocation test, open main.cc and comment out:

  std::cout << "===start linear initialization ===" << std::endl;
  linear_initialization_test();
  std::cout << "===end linear initialization ===" << std::endl;

To run the code, use:

$ ./tracing > output.log

To output debug information, use:

$ export DEBUG_FLAG=X

X indicates which part of TensorGC to debug. X=0 is main.cc; X=1 is tensor.cc; X=2 is buffer.cc; X=3 is roottracer.cc; X=4 is buftracer.cc.

Integrated Version

Requirement

To run integrated tensorGC code, you need to download our docker image and configure it properly, and download our github code and build it. Another requirement is you need to have a GPU with certain compute compatibility supported by tensorflow. Please check tensorflow's website for more details.

Setup from docker image

We use a docker image for ubuntu 16.09 and CUDA 9. To get our docker image, run:

$ docker pull gupeng/tensorflow1-7

You can check the downloaded image name by:

$ docker images

You can check container name by:

$ docker container ps

Create a working directory in your current OS and download our TensroGC code (integrated with tensorflow):

$ git clone https://github.com/miglopst/cs263_spring2018.git

Then we should start the docker container using (note that working directory in your current OS should already contains tensorflow code):

$ nvidia-docker run -it -v [working directory in your current OS]:[target directory in the docker container] [image name]

Once we start the docker, we should build tensorflow with TensorGC. We should go into the root directory with downloaded github code (target directory in the docker container). We configure tensorflow using:

$ ./configure

In the configuration, we disable all unnecessary setting except CUDA support. After the configuration, we can build tensorflow using bazel:

$ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

We can also build a debug version which has more debugging information:

$ bazel build --config=cuda --compilation_mode=dbg --strip=never //tensorflow/tools/pip_package:build_pip_package

Then we build the package:

$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

Before install the tensorflow package, we should uninstall the old version:

$ pip uninstall tensorflow

Finally, we install the tensorflow package:

pip install /tmp/tensorflow_pkg/[THE NEWEST BUILT TENSORFLOW PACKAGE]

How to run evaluation benchmarks

Here we have four benchmarks that can run:

  • LeNet on MNIST, and explore the batchsize:
python tf_example/tuturials/mnist/mnist_deep.py --batch_size [128/256/512/1024]
  • 4-layer DNN on MNIST, and explore the batchsize:
python tf_example/tuturials/mnist/mnist_deep.py --batch_size [128/256/512/1024]
  • resnet on MNIST
python tf_example/tuturials/mnist_ensemble/train.py --model_name [resnet]
  • vggnet on MNIST
python tf_example/tuturials/mnist_ensemble/train.py --model_name [vggnet]
  • if you would like to try different GC threshold, We are sorry that currently you have to modify the source code. Please set the GC threshold in line 679 of tf_core/framework/tensor.cc:
BufTracer<TensorBuffer> TensorBuffer::buf_tracer = BufTracer<TensorBuffer>(1000*1024*1024);

Currently, it is left blank and set as the default value of the constructor of BufTracer. (refer to tf_core/tensor_gc/buf_tracer.h) After modifying the value, you need to rebuild the project before run.

Get Debugging Information

We provide two ways to gather debugging information. The first uses std::cout to print information to stdout, and the second uses LOG(ERROR) to print information to stderr.

To collect debugging information, run an example code in tf_example/tutorials/mnist:

python mnist_deep.py > log.txt 2>err.txt

where log.txt has stdout information, and err.txt has stderr information.

profile the Debugging Information

  • To deal with log.txt, please run the code of profile_log.py.
  • To deal with err.txt, please run the code of profile_err.py.