- Deploying Wukong on a local cluster
- Downloading LUBM sample dataset
- Configuring and running Wukong
- Processing SPARQL queries on Wukong
- Dynamic data loading on Wukong
- Graph storage integrity check on Wukong
Install Wukong's dependencies (e.g., OpenMPI), using instructions in the INSTALL, on your master node (one of your cluster machines, e.g., node0.some.domain
) and copy necessities to the rest machines.
Note: suppose there are two machines in your cluster, namely
node0.some.domain
andnode1.some.domain
.
$cd ${WUKONG_ROOT}/scripts
$cat mpd.hosts
node0.some.domain
node1.some.domain
$./syncdeps.sh ../deps/dependencies mpd.hosts
$ mkdir $WUKONG_ROOT/datasets
$ cd $WUKONG_ROOT/datasets
$ wget http://ipads.se.sjtu.edu.cn/wukong/id_lubm_2.tar.gz
$ tar zxvf id_lubm_2.tar.gz
$ ls id_lubm_2
id_uni0.nt id_uni1.nt str_index str_normal str_normal_minimal
Move dataset (e.g., id_lubm_2
) to a distributed FS (e.g., path/to/input/
)which can be accessed by all machines in your cluster.
- Edit
config
andcore.bind
.
$cd $WUKONG_ROOT/scripts
$cat config
# general
global_num_proxies 1
global_num_engines 2
global_data_port_base 5500
global_ctrl_port_base 9576
global_mt_threshold 2
global_enable_workstealing 0
global_stealing_pattern 0
global_enable_planner 1
global_generate_statistics 1
global_enable_vattr 0
global_silent 0
# kvstore
global_input_folder path/to/input/id_lubm_2/
global_memstore_size_gb 40
# global_est_load_factor is used to calculate how many buckets one segment should be allocated.
global_est_load_factor 55
# RDMA
global_rdma_buf_size_mb 128
global_rdma_rbf_size_mb 32
global_use_rdma 1
global_rdma_threshold 300
global_enable_caching 0
# GPU
global_num_gpus 0
global_gpu_rdma_buf_size_mb 64
global_gpu_rbuf_size_mb 32
global_gpu_kvcache_size_gb 10
global_gpu_key_blk_size_mb 16
global_gpu_value_blk_size_mb 4
global_gpu_enable_pipeline 1
$
$cat core.bind
# One node per line (NOTE: the empty line means to skip a node)
0 1 2
The detail explanation of above config
file can be found in INSTALL
- Sync Wukong files to all machines.
$cd ${WUKONG_ROOT}/scripts
$./sync.sh
sending incremental file list
...
- Launch Wukong server on your cluster.
$cd ${WUKONG_ROOT}/scripts
$./run.sh 2
...
Input 'help' command to get more information
wukong>
- Wukong commands.
wukong> help
These are common Wukong commands: :
help display help infomation:
quit quit from the console:
config <args> run commands for configueration:
-v print current config
-l <fname> load config items from <fname>
-s <string> set config items by <str> (e.g., item1=val1&item2=...)
-h [ --help ] help message about config
logger <args> run commands for logger:
-v print loglevel
-s <level> set loglevel to <level> (e.g., DEBUG=1, INFO=2,
WARNING=4, ERROR=5)
-h [ --help ] help message about logger
sparql <args> run SPARQL queries in single or batch mode:
-f <fname> run a [single] SPARQL query from <fname>
-m <factor> (=1) set multi-threading <factor> for heavy query
processing
-n <num> (=1) repeat query processing <num> times
-p <fname> adopt user-defined query plan from <fname> for running
a single query
-N <num> (=1) do query optimization <num> times
-v <lines> (=0) print at most <lines> of results
-o <fname> output results info <fname>
-g leverage GPU to accelerate heavy query processing
-b <fname> run a [batch] of SPARQL queries configured by <fname>
-h [ --help ] help message about sparql
sparql-emu <args> emulate clients to continuously send SPARQL queries:
-f <fname> run queries generated from temples configured by
<fname>
-p <fname> adopt user-defined query plans from <fname> for
running queries
-d <sec> (=10) eval <sec> seconds (default: 10)
-w <sec> (=5) warmup <sec> seconds (default: 5)
-n <num> (=20) keep <num> queries being processed (default: 20)
-h [ --help ] help message about sparql-emu
load <args> load RDF data into dynamic (in-memmory) graph store:
-d <dname> load data from directory <dname>
-c check and skip duplicate RDF triples
-h [ --help ] help message about load
gsck <args> check the integrity of (in-memmory) graph storage:
-i check from index key/value pair to normal key/value
pair
-n check from normal key/value pair to index key/value
pair
-h [ --help ] help message about gsck
load-stat load statistics of SPARQL query optimizer:
-f <fname> load statistics from <fname> located at data folder
-h [ --help ] help message about load-stat
store-stat store statistics of SPARQL query optimizer:
-f <fname> store statistics to <fname> located at data folder
-h [ --help ] help message about store-stat
- run a single SPARQL query.
There are query examples in $WUKONG_ROOT/scripts/sparql_query
. For example, input sparql -f sparql_query/lubm/basic/lubm_q2
to run the query lubm_q2
.
wukong> sparql -f sparql_query/lubm/basic/lubm_q7 -v 5
(average) latency: 3660 usec
(last) result size: 73
The first 5 rows of results:
1: <http://www.Department8.University1.edu/FullProfessor5> <http://www.Department8.University1.edu/UndergraduateStudent204> <http://www.Department8.University1.edu/Course9>
2: <http://www.Department14.University1.edu/FullProfessor6> <http://www.Department14.University1.edu/UndergraduateStudent141> <http://www.Department14.University1.edu/Course7>
3: <http://www.Department4.University0.edu/FullProfessor0> <http://www.Department4.University0.edu/UndergraduateStudent312> <http://www.Department4.University0.edu/Course1>
4: <http://www.Department7.University1.edu/FullProfessor9> <http://www.Department7.University1.edu/UndergraduateStudent8> <http://www.Department7.University1.edu/Course14>
5: <http://www.Department8.University1.edu/FullProfessor7> <http://www.Department8.University1.edu/UndergraduateStudent47> <http://www.Department8.University1.edu/Course13>
wukong>
wukong> sparql -f sparql_query/lubm/basic/lubm_q4 -n 1000
(average) latency: 199 usec
(last) result size: 10
wukong>
- show and change the configuration of Wukong at runtime.
wukong> config -v
------ global configurations ------
the number of proxies: 1
the number of engines: 2
global_input_folder: path/to/input/id_lubm_2
global_memstore_size_gb: 2
global_est_load_factor: 55
global_data_port_base: 5500
global_ctrl_port_base: 9576
global_rdma_buf_size_mb: 0
global_rdma_rbf_size_mb: 0
global_use_rdma: 0
global_enable_caching: 0
global_enable_workstealing: 0
global_stealing_pattern: 0
global_rdma_threshold: 300
global_mt_threshold: 2
global_silent: 0
global_enable_planner: 1
global_generate_statistics: 1
global_enable_vattr: 0
global_num_gpus: 0
global_gpu_rdma_buf_size_mb: 0
global_gpu_rbuf_size_mb: 32
global_gpu_kvcache_size_gb: 10
global_gpu_key_blk_size_mb: 16
global_gpu_value_blk_size_mb: 4
global_gpu_enable_pipeline: 1
--
the number of servers: 2
the number of threads: 3
wukong>
wukong> config -s global_use_rdma=0
wukong> sparql -f sparql_query/lubm/basic/lubm_q4 -n 1000
(average) latency: 1128 usec
(last) result size: 10
Make sure that you have enable dynamic data loading support with parameter -USE_DYNAMIC_GSTORE=ON
.
- Load new dataset from directory, the structure of directory is just the same as which used to initialize.
wukong> load -d /home/datanfs/nfs0/rdfdata/id_lubm_2/
INFO: loading ID-mapping file: /home/datanfs/nfs0/rdfdata/id_lubm_2/str_index
INFO: loading ID-mapping file: /home/datanfs/nfs0/rdfdata/id_lubm_2/str_normal
INFO: loading ID-mapping file: /home/datanfs/nfs0/rdfdata/id_lubm_2/str_index
INFO: loading ID-mapping file: /home/datanfs/nfs0/rdfdata/id_lubm_2/str_normal
INFO: 2 data files and 0 attribute files found in directory (/home/datanfs/nfs0/rdfdata/id_lubm_2/) at server 0
INFO: 2 data files and 0 attribute files found in directory (/home/datanfs/nfs0/rdfdata/id_lubm_2/) at server 1
INFO: load 94802 triples from file /home/datanfs/nfs0/rdfdata/id_lubm_2/id_uni0.nt at server 0
INFO: load 122091 triples from file /home/datanfs/nfs0/rdfdata/id_lubm_2/id_uni1.nt at server 0
INFO: #0: 227ms for inserting into gstore
INFO: load 110672 triples from file /home/datanfs/nfs0/rdfdata/id_lubm_2/id_uni0.nt at server 1
INFO: load 145107 triples from file /home/datanfs/nfs0/rdfdata/id_lubm_2/id_uni1.nt at server 1
INFO: #1: 316ms for inserting into gstore
INFO: (average) latency: 660366 usec
- Add -c option to check and skip duplicate triples in the dataset.
wukong> load -c -d path/tp/input/id_lubm_2/
INFO: loading ID-mapping file: path/tp/input/id_lubm_2/str_index
INFO: loading ID-mapping file: path/tp/input/id_lubm_2/str_normal
INFO: loading ID-mapping file: path/tp/input/id_lubm_2/str_index
INFO: loading ID-mapping file: path/tp/input/id_lubm_2/str_normal
INFO: 2 data files and 0 attribute files found in directory (/home/datanfs/nfs0/rdfdata/id_lubm_2/) at server 0
INFO: 2 data files and 0 attribute files found in directory (/home/datanfs/nfs0/rdfdata/id_lubm_2/) at server 1
INFO: load 94802 triples from file path/tp/input/id_lubm_2/id_uni0.nt at server 0
INFO: load 122091 triples from file path/tp/input/id_lubm_2/id_uni1.nt at server 0
INFO: #0: 222ms for inserting into gstore
INFO: load 110672 triples from file path/tp/input/id_lubm_2/id_uni0.nt at server 1
INFO: load 145107 triples from file path/tp/input/id_lubm_2/id_uni1.nt at server 1
INFO: #1: 784ms for inserting into gstore
INFO: (average) latency: 1072962 usec
This command can help you make sure the correctness of current graph storage.
- Check the storage integrity related with both index vertex and normal vertex
wukong> gsck
INFO: Graph storage intergity check has started on server 0
INFO: Graph storage intergity check has started on server 1
INFO: Server#0 has checked 47 index vertices and 110115 normal vertices.
INFO: Server#1 has checked 49 index vertices and 110013 normal vertices.
INFO: (average) latency: 49943499 usec
- Check the storage integrity related with index vertex.
wukong> gsck -i
INFO: Graph storage intergity check has started on server 0
INFO: Graph storage intergity check has started on server 1
INFO: Server#0 has checked 47 index vertices and 0 normal vertices.
INFO: Server#1 has checked 49 index vertices and 0 normal vertices.
INFO: (average) latency: 18493196 usec
- Check the storage integrity related with normal vertex.
wukong> gsck -n
INFO: Graph storage intergity check has started on server 0
INFO: Graph storage intergity check has started on server 1
INFO: Server#0 has checked 0 index vertices and 110115 normal vertices.
INFO: Server#1 has checked 0 index vertices and 110013 normal vertices.
INFO: (average) latency: 36454664 usec