Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trouble with profiling as nvprof get defunct. #4

Open
jdgh000 opened this issue Nov 23, 2024 · 0 comments
Open

trouble with profiling as nvprof get defunct. #4

jdgh000 opened this issue Nov 23, 2024 · 0 comments

Comments

@jdgh000
Copy link

jdgh000 commented Nov 23, 2024

I was trying to get following output from p84:
nvprof --metrics --branch_efficiency ./simpleDivergence.
But it complains that GPU/cuda toolkit I am using to too new and suggests to use ncu. But I have no idea what is the equivalent parameters for NCU.

  1. Any idea on how to use ncu to get same info?
  2. Is there a newer edition that uses ncu?

1063
nvprof --metrics branch_efficiency ./p84.out
======== Warning: Skipping profiling on device 0 since profiling is not supported on devices with compute capability 7.5 and higher.
Use NVIDIA Nsight Compute for GPU profiling and NVIDIA Nsight Systems for GPU tracing and CPU sampling.
Refer https://developer.nvidia.com/tools-overview for more details.


==9792== NVPROF is profiling process 9792, command: ./p84.out
./p84.out using Device 0: NVIDIA GeForce RTX 2070 SUPER
Data size: 16777216.
Execution configured (block 1024 grid 16384).
Warmup <<<<16384 1024 >>> elapsed 0 sec 
MathKernel1 <<<16384 1024 >>> elapsed 0 sec 
MathKernel2 <<<16384 1024 >>> elapsed 0 sec 
MathKernel3 <<<16384 1024 >>> elapsed 0 sec 
MathKernel4 <<<16384 1024 >>> elapsed 0 sec 
==9792== Profiling application: ./p84.out
==9792== Profiling result:
No events/metrics were profiled.

Although that without metrics branch_occupancy, it appears to show some basic info:

armup <<<<16384 1024 >>> elapsed 0 sec 
MathKernel1 <<<16384 1024 >>> elapsed 0 sec 
MathKernel2 <<<16384 1024 >>> elapsed 0 sec 
MathKernel3 <<<16384 1024 >>> elapsed 0 sec 
MathKernel4 <<<16384 1024 >>> elapsed 0 sec 
==9836== Profiling application: ./p84.out
==9836== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:   55.77%  505.41us         3  168.47us  168.22us  168.74us  mathKernel1(float*)
                   25.35%  229.79us         1  229.79us  229.79us  229.79us  mathKernel2(float*)
                   18.88%  171.10us         1  171.10us  171.10us  171.10us  warmingup(float*)
      API calls:   98.23%  89.032ms         1  89.032ms  89.032ms  89.032ms  cudaMalloc
                    1.01%  917.22us         6  152.87us  5.2620us  231.59us  cudaDeviceSynchronize
                    0.42%  376.36us         5  75.271us  3.5550us  356.17us  cudaLaunchKernel
                    0.18%  163.80us       114  1.4360us     120ns  68.497us  cuDeviceGetAttribute
                    0.13%  115.92us         1  115.92us  115.92us  115.92us  cudaGetDeviceProperties
                    0.02%  17.388us         1  17.388us  17.388us  17.388us  cuDeviceGetName
                    0.01%  11.211us         1  11.211us  11.211us  11.211us  cuDeviceGetPCIBusId
                    0.00%  1.8360us         3     612ns     136ns  1.4860us  cuDeviceGetCount
                    0.00%     663ns         2     331ns     173ns     490ns  cuDeviceGet
                    0.00%     399ns         1     399ns     399ns     399ns  cuDeviceTotalMem
                    0.00%     274ns         1     274ns     274ns     274ns  cuModuleGetLoadingMode
                    0.00%     189ns         1     189ns     189ns     189ns  cuDeviceGetUuid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant