Skip to content
/ tvm Public

GTA : generating high-performance tensorized program with dual-task scheduling

License

Notifications You must be signed in to change notification settings

0to1boy/tvm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GTA: Generating hihg-performance tensorized program with dual-task scheduling

This repo is based on TVM v0.14.0 and reuses some code from AMOS.

Install | Tutorials | Cite

What is GTA

GTA is a framework designed to generate high-performance tensorized programs for DLAs. Unlike existing deep learning compilers, GTA coordinate intrinsic-based mapping abstraction with rule-based program generation strategy, followed by the application of resource-constrained rules to eliminate ineffective tensor program candidates from the search space. Additionally, GTA employ a dual-task scheduling strategy to allocate tuning resources across multiple subgraphs of deep learning networks and their mapping candidates.

Install

GTA requires the following dependencies:

  • LLVM (recommended >= 15)
  • CUDA (recommended version: 11.6)
  • Python (recommended version:.7.16)
  • Conda (recommended miniconda)

1. Download the source code

cd ~
git clone https://github.com/0to1boy/tvm.git

2. Prepare the conda environment

We recommend using miniconda to manage the dependencies. Miniconda official website

2.1 Create a new conda environment

conda create -n tvm-build python=3.7.16
conda activate tvm-build

2.2 Install the dependencies

sudo apt-get install -y libtinfo-dev zlib1g-dev build-essential cmake libedit-dev libxml2-dev
conda install conda-build git llvmdev numpy pytest cython cmake bzip2 make scipy pillow 
pip install -i https://mirrors.ustc.edu.cn/pypi/web/simple decorator attrs typing-extensions tornado psutil 'xgboost>=1.1.0' cloudpickle pebble ml_dtypes pytest-order pylint appdirs ninja

3. Configure and Build

mkdir build
cd build
cp ../cmake/config.cmake .
  1. Edit build/config.cmake to customize the compilation options
  • Changeset(USE_CUDA OFF) to set(USE_CUDA ON) to enable the CUDA backend
  1. TVM requires LLVM for CPU code generation It is recommended to build with LLVM. If you have installed llvmdev via conda, no further installation is required Otherwise, you can download and build the approriate version from LLVM releases.
  • Simply set set(USE_LLVM ON) to have CMake search for an available LLVM version
cmake ..- G Ninja
ninja

If you are not familiar with TVM, please stick to the following steps to configure config.cmake, otherwise, just jump to the cmake step. We recommend you to refer to the documents of TVM for details.

Export environment variables

export TVM_HOME=/path/to/tvm
export PYTHONPATH=$TVM_HOME/python:$PYTHONPATH

4. Some make errors and solutions

  1. Not found 'GLIBCXX_3.4.30'
~/anaconda3/envs/tvm-build/lib$ rm libstdc++.so
~/anaconda3/envs/tvm-build/lib$ rm libstdc++.so.6
~/anaconda3/envs/tvm-build/lib$ ln -s /usr/lib/x86_64-gnu/libstdc++.so.6.0.30 libstdc++.so
~/anaconda3/envs/tvm-build/lib$ ln -s /usr/lib/x86_64-gnu/libstdc++.so.6.0.30 libstdc++.so.6
  1. No module name 'torch'
conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
  1. No module name 'sklearn'
conda install scikit-learn

Tutorials

We have placed the experimental test cases in the benchmark folder. You can run the following commands to run the test cases. For example, the single operator test for GTA is located under benchmakr/GTA/single_op/conv2d.

GPU

cd benchmark/GTA/single_op/conv2d
python mapping_conv2d_GTA.py --in_dtype float16 --out_dtype float16 --begin 0 --num 1 --trials 200

CPU

When running on CPU, you need to change the target = "cuda" to target = "llvm -mcpu=skylake-avx512" in the test file. Note that this requires the CPU to support AVX-512 instructions, otherwise, the execution will fail.

cd benchmark/GTA/single_op/conv2d
python mapping_conv2d_GTA.py --in_dtype float16 --out_dtype float16 --begin 0 --num 1 --trials 200

Examples of running instructions for other test functions are provided within the respective test function files.

Cite us

@article{xie2025gta,
  title={GTA: Generating high-performance tensorized program with dual-task scheduling},
  author={Xie, Anxing and Hu, Yonghua and Wang, Yaohua and Li, Zhe and Gao, Yuxiang and Cheng, Zenghua},
  journal={Journal of Systems Architecture},
  pages={103359},
  year={2025},
  publisher={Elsevier}
}

About

GTA : generating high-performance tensorized program with dual-task scheduling

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published