Skip to content

Latest commit

 

History

History
79 lines (66 loc) · 3.63 KB

File metadata and controls

79 lines (66 loc) · 3.63 KB

Device Activity Tracing for OpenCL(TM)

Overview

OpenCL(TM) device activity tracing is the standard feature of OpenCL(TM) runtime that allows to measure execution time for the kernels and memory transfers running on the device.

All the device activities, which should be placed into the queue with clEnqueue* call, have an event that allows to track their execution status (usually it's the last argument of clEnqueue* function).

The same event could be used to get additional profiling information for the device activity, such as:

  • time counter in nanoseconds when the command identified by event is enqueued in a command-queue by the host (CL_PROFILING_COMMAND_QUEUED);
  • time counter in nanoseconds when the command identified by event that has been enqueued is submitted by the host to the device associated with the command queue (CL_PROFILING_COMMAND_SUBMIT);
  • time counter in nanoseconds when the command identified by event starts execution on the device (CL_PROFILING_COMMAND_START);
  • time counter in nanoseconds when the command identified by event has finished execution on the device (CL_PROFILING_COMMAND_END).

Intel(R) Xeon(R) Processor / Intel(R) Core(TM) Processor (CPU) Runtimes use QueryPerformanceCounter on Windows and CLOCK_MONOTONIC on Linux as time sources for the counters described above. Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver also uses QueryPerformanceCounter on Windows but CLOCK_MONOTONIC_RAW on Linux.

Supported Runtimes:

  • any OpenCL(TM) 1.0 and above

Supported OS:

  • Linux
  • Windows

Supported HW:

  • Any

Needed Headers:

Needed Libraries:

How To Use

To retrieve the timestamps described above, one may:

  1. Enable profiling mode for the command queue of the target device:
cl_queue_properties props[3] = { CL_QUEUE_PROPERTIES,
                                 CL_QUEUE_PROFILING_ENABLE, 0 };
cl_int status = CL_SUCCESS;
cl_command_queue queue =
  clCreateCommandQueueWithProperties(context, device, props, &status);
assert(status == CL_SUCESS);
  1. Introduce the event for the target device activity, e.g. for the kernel:
cl_event event = nullptr;
status = clEnqueueNDRangeKernel(queue, kernel, dim, nullptr,
    global_size, local_size, 0, nullptr, &event);
assert(status == CL_SUCESS);
  1. Set the callback to be notified when the activity is completed:
status = clSetEventCallback(event, CL_COMPLETE, EventNotify, nullptr);
assert(status == CL_SUCESS);
  1. Read the profiling data inside the callback:
void CL_CALLBACK EventNotify(cl_event event,
                             cl_int event_status,
                             void* user_data) {
    cl_int status = CL_SUCCESS;
    cl_ulong start = 0, end = 0;
    
    assert(event_status == CL_COMPLETE);

    status = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START,
                                     sizeof(cl_ulong), &start, nullptr);
    assert(status == CL_SUCCESS);
    status = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END,
                                     sizeof(cl_ulong), &end, nullptr);
    assert(status == CL_SUCCESS);
}

Usage Details

Samples