This repo is based on TVM v0.14.0 and reuses some code from AMOS.
GTA is a framework designed to generate high-performance tensorized programs for DLAs. Unlike existing deep learning compilers, GTA coordinate intrinsic-based mapping abstraction with rule-based program generation strategy, followed by the application of resource-constrained rules to eliminate ineffective tensor program candidates from the search space. Additionally, GTA employ a dual-task scheduling strategy to allocate tuning resources across multiple subgraphs of deep learning networks and their mapping candidates.
GTA requires the following dependencies:
- LLVM (recommended >= 15)
- CUDA (recommended version: 11.6)
- Python (recommended version:.7.16)
- Conda (recommended miniconda)
cd ~
git clone https://github.com/0to1boy/tvm.git
We recommend using miniconda to manage the dependencies. Miniconda official website
conda create -n tvm-build python=3.7.16
conda activate tvm-build
sudo apt-get install -y libtinfo-dev zlib1g-dev build-essential cmake libedit-dev libxml2-dev
conda install conda-build git llvmdev numpy pytest cython cmake bzip2 make scipy pillow
pip install -i https://mirrors.ustc.edu.cn/pypi/web/simple decorator attrs typing-extensions tornado psutil 'xgboost>=1.1.0' cloudpickle pebble ml_dtypes pytest-order pylint appdirs ninja
mkdir build
cd build
cp ../cmake/config.cmake .
- Edit build/config.cmake to customize the compilation options
- Changeset(USE_CUDA OFF) to set(USE_CUDA ON) to enable the CUDA backend
- TVM requires LLVM for CPU code generation It is recommended to build with LLVM. If you have installed llvmdev via conda, no further installation is required Otherwise, you can download and build the approriate version from LLVM releases.
- Simply set set(USE_LLVM ON) to have CMake search for an available LLVM version
cmake ..- G Ninja
ninja
If you are not familiar with TVM, please stick to the following steps to configure config.cmake, otherwise, just jump to the cmake step. We recommend you to refer to the documents of TVM for details.
Export environment variables
export TVM_HOME=/path/to/tvm
export PYTHONPATH=$TVM_HOME/python:$PYTHONPATH
- Not found 'GLIBCXX_3.4.30'
~/anaconda3/envs/tvm-build/lib$ rm libstdc++.so
~/anaconda3/envs/tvm-build/lib$ rm libstdc++.so.6
~/anaconda3/envs/tvm-build/lib$ ln -s /usr/lib/x86_64-gnu/libstdc++.so.6.0.30 libstdc++.so
~/anaconda3/envs/tvm-build/lib$ ln -s /usr/lib/x86_64-gnu/libstdc++.so.6.0.30 libstdc++.so.6
- No module name 'torch'
conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
- No module name 'sklearn'
conda install scikit-learn
We have placed the experimental test cases in the benchmark folder. You can run the following commands to run the test cases. For example, the single operator test for GTA is located under benchmakr/GTA/single_op/conv2d.
cd benchmark/GTA/single_op/conv2d
python mapping_conv2d_GTA.py --in_dtype float16 --out_dtype float16 --begin 0 --num 1 --trials 200
When running on CPU, you need to change the target = "cuda"
to target = "llvm -mcpu=skylake-avx512"
in the test file. Note that this requires the CPU to support AVX-512 instructions, otherwise, the execution will fail.
cd benchmark/GTA/single_op/conv2d
python mapping_conv2d_GTA.py --in_dtype float16 --out_dtype float16 --begin 0 --num 1 --trials 200
Examples of running instructions for other test functions are provided within the respective test function files.
@article{xie2025gta,
title={GTA: Generating high-performance tensorized program with dual-task scheduling},
author={Xie, Anxing and Hu, Yonghua and Wang, Yaohua and Li, Zhe and Gao, Yuxiang and Cheng, Zenghua},
journal={Journal of Systems Architecture},
pages={103359},
year={2025},
publisher={Elsevier}
}