Skip to content

Commit

Permalink
Expanded docstring as suggested in review
Browse files Browse the repository at this point in the history
  • Loading branch information
oleksandr-pavlyk committed Nov 15, 2024
1 parent d1011c5 commit 7752078
Showing 1 changed file with 41 additions and 12 deletions.
53 changes: 41 additions & 12 deletions dpctl/_sycl_timer.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,8 +84,7 @@ def get_event(self):

class SyclTimer:
"""
Context to measure device time and host wall-time of execution
of commands submitted to :class:`dpctl.SyclQueue`.
Context to time execution of tasks submitted to :class:`dpctl.SyclQueue`.
:Example:
.. code-block:: python
Expand All @@ -99,13 +98,18 @@ class SyclTimer:
milliseconds_sc = 1e3
timer = dpctl.SyclTimer(time_scale = milliseconds_sc)
untimed_code_block_1
# use the timer
with timer(queue=q):
code_block1
timed_code_block1
untimed_code_block_2
# use the timer
with timer(queue=q):
code_block2
timed_code_block2
untimed_code_block_3
# retrieve elapsed times in milliseconds
wall_dt, device_dt = timer.dt
Expand All @@ -116,16 +120,41 @@ class SyclTimer:
associated with these submissions to perform the timing. Thus
:class:`dpctl.SyclTimer` requires the queue with ``"enable_profiling"``
property. In order to be able to collect the profiling information,
the ``dt`` property ensures that both submitted barriers complete their
execution and thus effectively synchronizes the queue.
`device_timer` keyword argument controls the type of tasks submitted.
With `device_timer="queue_barrier"`, queue barrier tasks are used. With
`device_timer="order_manager"`, a single empty body task is inserted
instead relying on order manager (used by `dpctl.tensor` operations) to
the ``dt`` property ensures that both tasks submitted by the timer
complete their execution and thus effectively synchronizes the queue.
Execution of the above example results in the following task graph,
where each group of tasks is ordered after the one preceding it,
``[tasks_of_untimed_block1]``, ``[timer_fence_start_task]``,
``[tasks_of_timed_block1]``, ``[timer_fence_finish_task]``,
``[tasks_of_untimed_block2]``, ``[timer_fence_start_task]``,
``[tasks_of_timed_block2]``, ``[timer_fence_finish_task]``,
``[tasks_of_untimed_block3]``.
``device_timer`` keyword argument controls the type of tasks submitted.
With ``"queue_barrier"`` value, queue barrier tasks are used. With
``"order_manager"`` value, a single empty body task is inserted
and order manager (used by all `dpctl.tensor` operations) is used to
order these tasks so that they fence operations performed within
timer's context.
Timing offloading operations that do not use the order manager with
the timer that uses ``"order_manager"`` as ``device_timer`` value
will be misleading becaused the tasks submitted by the timer will not
be ordered with respect to tasks we intend to time.
Note, that host timer effectively measures the time of task
submissions. To measure host timer wall-time that includes execution
of submitted tasks, make sure to include synchronization point in
the timed block.
:Example:
.. code-block:: python
with timer(q):
timed_block
q.wait()
Args:
host_timer (callable, optional):
A callable such that host_timer() returns current
Expand All @@ -134,7 +163,7 @@ class SyclTimer:
device_timer (Literal["queue_barrier", "order_manager"], optional):
Device timing method. Default: "queue_barrier".
time_scale (Union[int, float], optional):
Ratio of the unit of time of interest and one second.
Ratio of one second and the unit of time-scale of interest.
Default: ``1``.
"""

Expand Down

0 comments on commit 7752078

Please sign in to comment.