Some problem while running on GPU #17

onlyoh · 2020-09-07T01:50:08Z

I want to test the performance of C9 of yolo after FlexTensor's optimization, but there seems to be some problems when running optimize_conv2d.py on GPU

$ python optimize_conv2d.py --shapes yolo --from 8 --to 9 --parallel 16 --target cuda
......
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 16
warm up [1599394505.223908] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 16
warm up [1599394508.009939] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 16
warm up [1599394510.781969] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 16
Fail to find valid schedule, too many errors
warm up [1599394513.576313] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 16
warm up [1599394516.424372] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 16
......

I have seen a previous issue and the current code uses 'spawn' when using multiprocessing.
It seems that it will not stop running because it can't find a suitable schedule.

The text was updated successfully, but these errors were encountered:

KnowingNothing · 2020-09-09T12:10:51Z

Please check your nvcc by typing nvcc --version in your terminal, if nvcc is not available, the codegen of tvm will fail.

onlyoh · 2020-09-10T08:31:08Z

The result of nvcc --version is :

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

And excuting print(torch.version.cuda) in python interpreter outputs :

10.0.130

KnowingNothing · 2020-09-10T08:33:26Z

How about setting a larger timeout? Just try to add --timeout 20 in your command.

onlyoh · 2020-09-10T08:38:25Z

This method does not seem to work...

KnowingNothing · 2020-09-10T08:41:35Z

Then I'd suggest that you uncomment the two #print(msg)s in scheduler.py, and then tell me the error message, if any.

onlyoh · 2020-09-10T08:50:46Z

It outputs these messages:

Optimize yolo convolution layer 9 shape (1, 512, 28, 28, 512, 512, 1, 1, 1, 1, 0, 1, 1)
graph space size 2
op 0 space size: 25344000
[Warning] Directory lib is not empty, but reusing it
op build fail:module 'tvm.tir' has no attribute 'ir_pass'
op build fail:module 'tvm.tir' has no attribute 'ir_pass'
op build fail:module 'tvm.tir' has no attribute 'ir_pass'
op build fail:module 'tvm.tir' has no attribute 'ir_pass'
op build fail:module 'tvm.tir' has no attribute 'ir_pass'
op build fail:module 'tvm.tir' has no attribute 'ir_pass'
op build fail:module 'tvm.tir' has no attribute 'ir_pass'
op build fail:module 'tvm.tir' has no attribute 'ir_pass'
op build fail:module 'tvm.tir' has no attribute 'ir_pass'
op build fail:module 'tvm.tir' has no attribute 'ir_pass'
op build fail:module 'tvm.tir' has no attribute 'ir_pass'
op build fail:module 'tvm.tir' has no attribute 'ir_pass'
op build fail:module 'tvm.tir' has no attribute 'ir_pass'
op build fail:module 'tvm.tir' has no attribute 'ir_pass'
op build fail:module 'tvm.tir' has no attribute 'ir_pass'
op build fail:module 'tvm.tir' has no attribute 'ir_pass'
op build fail:module 'tvm.tir' has no attribute 'ir_pass'
op build fail:module 'tvm.tir' has no attribute 'ir_pass'
op build fail:module 'tvm.tir' has no attribute 'ir_pass'
op build fail:module 'tvm.tir' has no attribute 'ir_pass'
warm up [1599727687.361282] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
op build fail:module 'tvm.tir' has no attribute 'ir_pass'
op build fail:module 'tvm.tir' has no attribute 'ir_pass'
......

KnowingNothing · 2020-09-10T08:54:00Z

I see. TVM is under fast development and the API keeps changing. To use FlexTensor, you can try TVM (commit 89da63e228eae2b0b4fe39770031a042858c52a7).

onlyoh · 2020-09-10T09:17:19Z

Thanks, I will try it!

hecmay · 2020-09-16T02:46:48Z

A follow-up to this issue. I got the following errors when running the same example. I am using TVM v0.7 (not exactly the commit you recommended). What would be the reason to have those null error messages?

$ python optimize_conv2d.py --shapes yolo --from 8 --to 9 --parallel 16 --target cuda
Optimize yolo convolution layer 9 shape (1, 512, 28, 28, 512, 512, 1, 1, 1, 1, 0, 1, 1)
graph space size 2
op 0 space size: 25344000
[Warning] Directory lib is not empty, but reusing it
op build fail:
op build fail:
op build fail:
op build fail:
op build fail:
op build fail:
op build fail:
op build fail:
op build fail:
op build fail:
op build fail:
op build fail:                                                                                        op build fail:
op build fail:
op build fail:
op build fail:
op build fail:
op build fail:
op build fail:
op build fail:
warm up [1600224234.227920] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf
inf inf ]

KnowingNothing · 2020-09-16T05:49:07Z

Did you check your nvcc?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some problem while running on GPU #17

Some problem while running on GPU #17

onlyoh commented Sep 7, 2020 •

edited

Loading

KnowingNothing commented Sep 9, 2020

onlyoh commented Sep 10, 2020

KnowingNothing commented Sep 10, 2020

onlyoh commented Sep 10, 2020

KnowingNothing commented Sep 10, 2020

onlyoh commented Sep 10, 2020

KnowingNothing commented Sep 10, 2020

onlyoh commented Sep 10, 2020

hecmay commented Sep 16, 2020

KnowingNothing commented Sep 16, 2020

Some problem while running on GPU #17

Some problem while running on GPU #17

Comments

onlyoh commented Sep 7, 2020 • edited Loading

KnowingNothing commented Sep 9, 2020

onlyoh commented Sep 10, 2020

KnowingNothing commented Sep 10, 2020

onlyoh commented Sep 10, 2020

KnowingNothing commented Sep 10, 2020

onlyoh commented Sep 10, 2020

KnowingNothing commented Sep 10, 2020

onlyoh commented Sep 10, 2020

hecmay commented Sep 16, 2020

KnowingNothing commented Sep 16, 2020

onlyoh commented Sep 7, 2020 •

edited

Loading