use end-to-end DGL scripts run featGraph #13

Ed-gong · 2022-05-23T17:07:18Z

Hi, I want to run the featGraph end-to-end.
I have already built the DGL (with featGraph) and run the test.py file successfully using the instructions posted in https://github.com/dmlc/dgl/tree/master/featgraph.

If I want to run an end-to-end GCN training on Pubmed or Reddit dataset, can I just use the DGL GCN benchmark script I have before without changing any kernel names? In other words, which parts of the code of DGL python script do I need to change so that I can run the featGraph(not DGL) end-to-end? Thank you.

yzh119 · 2022-05-23T21:31:14Z

You might checkout this branch of DGL:

https://github.com/kira-lin/dgl/tree/tvm_integration

Ed-gong · 2022-05-24T18:14:04Z

Thanks for your reply. I just clarified my question by re-editing the post above. Can you respond again? Thank you.

Ed-gong · 2022-06-01T21:42:31Z

I used the DGL test scripts to run the GCN on PubMed and Cora dataset with extra one line of code: dgl.sparse._CAPI_FG_LoadModule("../build/featgraph/libfeatgraph_kernels.so") The python script works fine without any error. But the training time of featGraph is the same as DGL. It seems like featGraph does not improve any training time efficiency.

yzh119 · 2022-06-01T22:46:03Z

I don't think Featgraph has better performance against cusparse for GCN on GPU, see table IV in the paper, since DGL uses cusparse, it's normal that you don't observe any acceleration here.

Ed-gong · 2022-06-02T20:52:26Z

Thank you very much for your response. I am closing this issue.

yzh119 · 2022-06-07T21:33:57Z

Sorry I just noticed that you were using dgl.sparse._CAPI_FG_LoadModule("../build/featgraph/libfeatgraph_kernels.so") to use featgraph as backend, actually the integration was abandoned because TVM do not have native sparse support and we might encounter several issues when used in production, so you will still be using DGL's native backend in most cases even if load the module.

Only the branch I mentioned (https://github.com/kira-lin/dgl/tree/tvm_integration) contains the complete code that uses featgraph backend. Regarding the question in #14 , yes GAT is also supported (it was mentioned in the paper), and we can use it by compiling the tvm_integration branch.

yzh119 · 2022-06-07T21:36:27Z

If you are interested in native sparse support of TVM, our work is coming soon, please stay tuned.

Ed-gong · 2022-06-10T17:17:00Z

Hi, thank you for the kind response. For the branch https://github.com/kira-lin/dgl/tree/tvm_integration, If I want to use the featGraph backend, what is the specific python code I needed to write? For example, If I only write dgl.sparse._CAPI_FG_LoadModule("../build/featgraph/libfeatgraph_kernels.so"), will the featGraph backend be used automatically? If not, which python code do I need to use so that the I can use the featGraph GCN and GAT backend ?

The ReadMe file in https://github.com/kira-lin/dgl/tree/tvm_integration/featgraph only shows to run test.py to verify the. correctness. However, the test.py only contains a test case kernel: dgl.sparse._CAPI_FG_SDDMMTreeReduction(gidx, u, v, e) for sddmm kernels. It is a little bit hard for me to know how to run other featGraph kernel backends. Could you provide more detailed instructions about which python code I need to write so that I can use the featGraph GCN and GAT backend kernels? Thank you.

Ed-gong · 2022-06-13T13:37:42Z

This is the step we followed:

(base) ygong07@mira0:~/dgl_src/dgl_tvm/dgl/featgraph$ git branch
  master
* tvm_integration
(base) ygong07@mira0:~/dgl_src/dgl_tvm/dgl/build$ pwd
/home/ygong07/dgl_src/dgl_tvm/dgl/build
(base) ygong07@mira0:~/dgl_src/dgl_tvm/dgl/build$ cmake -DUSE_CUDA=ON -DUSE_TVM=ON ..
-- Start configuring project dgl
-- Build with CUDA support
-- Found CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.2
-- Found CUDA_CUDART_LIBRARY=/usr/local/cuda-11.2/lib64/libcudart.so
-- Found CUDA_CUBLAS_LIBRARY=/usr/lib/x86_64-linux-gnu/libcublas.so
-- Found OpenMP_C: -fopenmp  
-- Found OpenMP_CXX: -fopenmp  
-- -fopenmp -O2 -Wall -fPIC -std=c++11  -DUSE_AVX -DIDXTYPEWIDTH=64 -DREALTYPEWIDTH=32
-- Running GPU architecture autodetection
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
-- Found CUDA arch 8.0
-- CUDA flags: -Xcompiler ,-fopenmp,-O2,-Wall,-fPIC,,,-DUSE_AVX,-DIDXTYPEWIDTH=64,-DREALTYPEWIDTH=32;-gencode;arch=compute_80,code=sm_80;--expt-extended-lambda;-Wno-deprecated-declarations;-std=c++14
-- Found OpenMP_C: -fopenmp  
-- Found OpenMP_CXX: -fopenmp  
-- /home/ygong07/dgl_src/dgl_tvm/dgl/third_party/dmlc-core/cmake/build_config.h.in -> include/dmlc/build_config.h
-- Start configuring project featgraph
-- Found CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.2
-- Found CUDA_CUDART_LIBRARY=/usr/local/cuda-11.2/lib64/libcudart.so
-- Found CUDA_CUBLAS_LIBRARY=/usr/lib/x86_64-linux-gnu/libcublas.so
-- /usr/local/cuda-11.2/include
-- Configuring done
-- Generating done
-- Build files have been written to: /home/ygong07/dgl_src/dgl_tvm/dgl/build

(base) ygong07@mira0:~/dgl_src/dgl_tvm/dgl/build$ make -j4
[  1%] Creating featgraph kernels...
[  6%] Built target dmlc
[ 34%] Built target metis
/home/ygong07/tvm/python/tvm/driver/build_module.py:242: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
  warnings.warn(
[ 34%] Built target featgraph_kernel
[ 35%] Built target featgraph_runtime
[ 35%] Linking CXX shared library libdgl.so
[100%] Built target dgl

(base) ygong07@mira0:~/dgl_src/dgl_tvm/dgl/featgraph$ python3 test.py 
Using backend: pytorch
tensor([[[1.5832],
         [1.8842]],

        [[1.1876],
         [2.5858]],

        [[1.5149],
         [0.9924]],
         ...
[[2.2963],
         [1.3279]],

        [[1.7643],
         [1.2339]],

        [[2.3274],
         [1.7878]]], device='cuda:0')

[[[1.5831739]
  [1.8842214]]

 [[1.1875974]
  [2.5857563]]

 [[1.5148897]
  [0.9924001]]
....
[[2.2962904]
  [1.3278971]]

 [[1.7643319]
  [1.233911 ]]

 [[2.3274217]
  [1.7877729]]]

We run GCN and GAT scripts using dgl.sparse._CAPI_FG_LoadModule("/home/ygong07/dgl_src/dgl_tvm/dgl/build/featgraph/libfeatgraph_kernels.so")
The training time are same as DGL training time
Please let us know if you see any issues as these numbers will be reported in a research paper.

Thank you very much for your help.

yzh119 · 2022-06-20T06:14:31Z

Oh sorry, what I mean is the tvm-kernel branch.

Ed-gong · 2022-06-23T15:26:00Z

Hi, the tvm-kernel branch you mentioned does not include the 'featGraph' folder. Therefore, I am not sure how to compile it specifically for featgraph and how to verify whether the featgraph is installed correctly or not. Could you provide me with more instructions? Thank you.

yzh119 · 2022-06-27T22:53:50Z

The tvm-kernel branch is fully Python based, and featgraph kernels would be triggered when you set the environment variable DGLENGINE to true.

See https://github.com/kira-lin/dgl/blob/tvm-kernel/python/dgl/sparse.py#L13-L16

yzh119 · 2022-06-27T22:55:19Z

Btw I do think you are not expected to see speedup using featgraph against DGL 0.8 because most of the optimized kernels have already been merged into DGL.

Ed-gong · 2022-07-07T18:01:57Z

13 use_tvm = True if 'DGLENGINE' in os.environ and os.getenv('DGLENGINE') == 'tvm' else False
14 if use_tvm:
15     import tvm
16     from .tvm import gsddmm, gspmm

based on line 13, we make sure use_tvm is True, unfortunately, it crashes. When use_tvm is False, it does run, but I suspect it is calling DGL kernels.

We are still interested in running FeatGraph end-to-end. Do let us know if there are any other instructions.

yzh119 · 2022-07-10T04:12:43Z

Would you mind elaborating the error message so that we can debug why crashes?

Ed-gong · 2022-07-23T18:56:06Z

Here is what the error I got:


(base) ygong07@mira0:~/compare_graphPy/GraphPy_GPU/build$ python3 GCN_pubmed_dgl.py
Using backend: pytorch
use_tvm True
Output of Read function is 
/home/ygong07/anaconda3/lib/python3.8/site-packages/dgl-0.6-py3.8-linux-x86_64.egg/dgl/base.py:45: DGLWarning: Recommend creating graphs by `dgl.graph(data)` instead of `dgl.DGLGraph(data)`.
  return warnings.warn(message, category=category, stacklevel=1)
graph creation time is: 0:00:00.029156
Traceback (most recent call last):
  File "GCN_pubmed_dgl.py", line 244, in <module>
    logits = net(graph, feature)
  File "/home/ygong07/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "GCN_pubmed_dgl.py", line 193, in forward
    h = self.conv1(g, inputs)
  File "/home/ygong07/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ygong07/anaconda3/lib/python3.8/site-packages/dgl-0.6-py3.8-linux-x86_64.egg/dgl/nn/pytorch/conv/graphconv.py", line 269, in forward
    graph.update_all(fn.copy_src(src='h', out='m'),
  File "/home/ygong07/anaconda3/lib/python3.8/site-packages/dgl-0.6-py3.8-linux-x86_64.egg/dgl/heterograph.py", line 4499, in update_all
    ndata = core.message_passing(g, message_func, reduce_func, apply_node_func)
  File "/home/ygong07/anaconda3/lib/python3.8/site-packages/dgl-0.6-py3.8-linux-x86_64.egg/dgl/core.py", line 283, in message_passing
    ndata = invoke_gspmm(g, mfunc, rfunc)
  File "/home/ygong07/anaconda3/lib/python3.8/site-packages/dgl-0.6-py3.8-linux-x86_64.egg/dgl/core.py", line 255, in invoke_gspmm
    z = op(graph, x)
  File "/home/ygong07/anaconda3/lib/python3.8/site-packages/dgl-0.6-py3.8-linux-x86_64.egg/dgl/ops/spmm.py", line 171, in func
    return gspmm(g, 'copy_lhs', reduce_op, x, None)
  File "/home/ygong07/anaconda3/lib/python3.8/site-packages/dgl-0.6-py3.8-linux-x86_64.egg/dgl/ops/spmm.py", line 62, in gspmm
    ret = gspmm_internal(g._graph, op,
  File "/home/ygong07/anaconda3/lib/python3.8/site-packages/dgl-0.6-py3.8-linux-x86_64.egg/dgl/backend/pytorch/sparse.py", line 235, in gspmm
    return GSpMM.apply(gidx, op, reduce_op, lhs_data, rhs_data)
  File "/home/ygong07/anaconda3/lib/python3.8/site-packages/dgl-0.6-py3.8-linux-x86_64.egg/dgl/backend/pytorch/sparse.py", line 64, in forward
    out, (argX, argY) = _gspmm(gidx, op, reduce_op, X, Y)
  File "/home/ygong07/anaconda3/lib/python3.8/site-packages/dgl-0.6-py3.8-linux-x86_64.egg/dgl/sparse.py", line 87, in _gspmm
    return _gspmm_tvm(gidx, op, reduce_op, u, e) if use_tvm \
  File "/home/ygong07/anaconda3/lib/python3.8/site-packages/dgl-0.6-py3.8-linux-x86_64.egg/dgl/sparse.py", line 373, in _gspmm_tvm
    mod = gspmm.spmm(
  File "/home/ygong07/anaconda3/lib/python3.8/site-packages/dgl-0.6-py3.8-linux-x86_64.egg/dgl/tvm/gspmm.py", line 301, in spmm
    if topi.util.get_const_int(topi.util.prod(out.shape[1:])) < 16:
AttributeError: module 'tvm.topi' has no attribute 'util'

yzh119 · 2022-07-24T00:16:04Z

This is due to the TVM version, you should use TVM 0.7.

Ed-gong closed this as completed Jun 2, 2022

Ed-gong reopened this Jun 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use end-to-end DGL scripts run featGraph #13

use end-to-end DGL scripts run featGraph #13

Ed-gong commented May 23, 2022 •

edited

Loading

yzh119 commented May 23, 2022

Ed-gong commented May 24, 2022

Ed-gong commented Jun 1, 2022

yzh119 commented Jun 1, 2022

Ed-gong commented Jun 2, 2022

yzh119 commented Jun 7, 2022 •

edited

Loading

yzh119 commented Jun 7, 2022

Ed-gong commented Jun 10, 2022

Ed-gong commented Jun 13, 2022

yzh119 commented Jun 20, 2022

Ed-gong commented Jun 23, 2022

yzh119 commented Jun 27, 2022

yzh119 commented Jun 27, 2022

Ed-gong commented Jul 7, 2022

yzh119 commented Jul 10, 2022

Ed-gong commented Jul 23, 2022

yzh119 commented Jul 24, 2022

use end-to-end DGL scripts run featGraph #13

use end-to-end DGL scripts run featGraph #13

Comments

Ed-gong commented May 23, 2022 • edited Loading

yzh119 commented May 23, 2022

Ed-gong commented May 24, 2022

Ed-gong commented Jun 1, 2022

yzh119 commented Jun 1, 2022

Ed-gong commented Jun 2, 2022

yzh119 commented Jun 7, 2022 • edited Loading

yzh119 commented Jun 7, 2022

Ed-gong commented Jun 10, 2022

Ed-gong commented Jun 13, 2022

yzh119 commented Jun 20, 2022

Ed-gong commented Jun 23, 2022

yzh119 commented Jun 27, 2022

yzh119 commented Jun 27, 2022

Ed-gong commented Jul 7, 2022

yzh119 commented Jul 10, 2022

Ed-gong commented Jul 23, 2022

yzh119 commented Jul 24, 2022

Ed-gong commented May 23, 2022 •

edited

Loading

yzh119 commented Jun 7, 2022 •

edited

Loading