- Build Wukong with GPU support
- Configuring and running Wukong+G
- Run SPARQL queries on Wukong+G
- Caveat
To use Wukong with GPU support (Wukong+G), you need a CUDA-enabled GPU with CUDA toolkit installed, and the GPU needs to support GPUDirect RDMA (at least Kepler-class). Edit the .bashrc
in your home directory to add /usr/local/cuda-8.0/bin
to PATH
and /usr/local/cuda-8.0/include
to the CPATH
.
export CUDA_HOME=/usr/local/cuda-8.0
export PATH=$CUDA_HOME/bin:$PATH
export CPATH=${CUDA_HOME}/include:$CPATH
And edit the CMakeLists.txt in the $WUKONG_ROOT
to enable the USE_GPU
option. Then go to $WUKONG_ROOT/scripts
and run ./build.sh
to build Wukong+G. Or you can just run ./build.sh -DUSE_GPU=ON
to build with GPU support. Currently, the USE_GPU
option conflicts with USE_JEMALLOC
, USE_DYNAMIC_GSTORE
and USE_VERSATILE
.
Next, we need to install the kernel module for GPUDirect RDMA. Download it from the web and install it to each machine in the cluster.
tar xzf nvidia-peer-memory-1.0-3.tar.gz
cd nvidia-peer-memory-1.0
tar xzf nvidia-peer-memory_1.0.orig.tar.gz
cd nvidia-peer-memory-1.0
dpkg-buildpackage -us -uc
cd ..
sudo dpkg -i nvidia-peer-memory_1.0-3_all.deb
sudo dpkg -i nvidia-peer-memory-dkms_1.0-3_all.deb
You should manually load it after the installation, and make sure the module is loaded successfully.
sudo modprobe nv_peer_mem
lsmod | grep nv_peer_mem
Note: You can use the
deviceQuery
utility provided by CUDA toolkit to query the capabilities of your GPU. You can find it in/usr/local/cuda-8.0/samples/1_Utilities
in Ubuntu.
- Edit the
config
in$WUKONG_ROOT/scripts
according to capabilities of your GPU .
$cd $WUKONG_ROOT/scripts
$vim config
The configuration items related to GPU support are:
global_num_gpus
: set the number of GPUs (currently we only support one GPU)global_gpu_rdma_buf_size_mb
: set the size (MB) of buffer for one-sided RDMA operationsglobal_gpu_rbuf_size_mb
: set the size (MB) of result buffer in GPU memory for query processingglobal_gpu_kvcache_size_gb
: set the size (GB) of key-value cache in GPU memoryglobal_gpu_key_blk_size_mb
: set the size (MB) of key block in key-value cacheglobal_gpu_value_blk_size_mb
: set the size (MB) of value block in key-value cacheglobal_enable_pipeline
: enable query execution overlaps with memory copy between CPU and GPU
- Sync Wukong files to machines listed in
mpd.hosts
.
$cd ${WUKONG_ROOT}/scripts
$./sync.sh
sending incremental file list
...
- Launch Wukong server on your cluster.
$cd ${WUKONG_ROOT}/scripts
$./run.sh 2
...
Input 'help' command to get more information
wukong>
Wukong+G adds a new argument -g
to the sparql
command, which tells the console to send the query to a GPU agent in the cluster. The GPU agent will use the GPU with device ID 0 to process queries.
- Wukong+G commands.
wukong> help
These are common Wukong commands: :
help display help infomation:
quit quit from the console:
config <args> run commands for configueration:
-v print current config
-l <fname> load config items from <fname>
-s <string> set config items by <str> (e.g., item1=val1&item2=...)
-h [ --help ] help message about config
logger <args> run commands for logger:
-v print loglevel
-s <level> set loglevel to <level> (e.g., DEBUG=1, INFO=2,
WARNING=4, ERROR=5)
-h [ --help ] help message about logger
sparql <args> run SPARQL queries in single or batch mode:
-f <fname> run a [single] SPARQL query from <fname>
-m <factor> (=1) set multi-threading <factor> for heavy query
processing
-n <num> (=1) repeat query processing <num> times
-p <fname> adopt user-defined query plan from <fname> for running
a single query
-N <num> (=1) do query optimization <num> times
-v <lines> (=0) print at most <lines> of results
-o <fname> output results into <fname>
-g leverage GPU to accelerate heavy query processing
-b <fname> run a [batch] of SPARQL queries configured by <fname>
-h [ --help ] help message about sparql
...
- run a single SPARQL query.
There are query examples in $WUKONG_ROOT/scripts/sparql_query
. For example, enter sparql -f sparql_query/lubm/basic/lubm_q7
to run the query lubm_q7
.
Please note that if you didn't enable the planner, you should specify a query plan manually.
wukong> sparql -f sparql_query/lubm/basic/lubm_q7 -g
INFO: Parsing a SPARQL query is done.
INFO: Parsing time: 172 usec
INFO: Optimization time: 507 usec
INFO: Leverage GPU to accelerate query processing.
INFO: (last) result size: 1763
INFO: (average) latency: 1452 usec
wukong> sparql -f sparql_query/lubm/basic/lubm_q7 -p sparql_query/lubm/basic/osdi16_plan/lubm_q7.fmt -g
INFO: Parsing a SPARQL query is done.
INFO: Parsing time: 111 usec
INFO: User-defined query plan is enabled
INFO: Leverage GPU to accelerate query processing.
INFO: (last) result size: 1763
INFO: (average) latency: 1908 usec
Performance data can be found in docs/performance
folder.
Although Wukong+G can notably speed up query processing, there is still plenty of room for improvement:
- We only tested Wukong+G in a RDMA-capable cluster with NVIDIA Tesla K40m and CUDA 8.0.
- Wukong+G assumes the predicate of triple patterns in a query is known, queries with unknown predicates cannot be handled by GPU engine.
- If a query produces huge intermediate result that exceeds the size of result buffer on GPU (
global_gpu_rbuf_size_mb
), Wukong+G cannot handle it (may crash or return wrong results). - If the size of triples of a predicate cannot fit into the key-value cache on GPU, Wukong+G cannot handle it.
- If pipeline is enabled, there should be enough GPU memory to accommodate two triple patterns, the current pattern and the prefetched pattern.