Slow processing with batched dlpreproc #188

emmadrigal · 2022-02-11T00:12:22Z

Processing time for a N-buffers batched dlpreproc is much lower than the processing time for N individual dlpreproc processing in parallel:

IN_CAPS="video/x-raw, width=320, height=320, format=RGB"

GST_DEBUG="2,*tiovxsiso*:6,*perf*:6" gst-launch-1.0 \
videotestsrc ! $IN_CAPS ! mux. \
videotestsrc ! $IN_CAPS ! mux. \
videotestsrc ! $IN_CAPS ! mux. \
videotestsrc ! $IN_CAPS ! mux. \
videotestsrc ! $IN_CAPS ! mux. \
videotestsrc ! $IN_CAPS ! mux. \
tiovxmux name=mux ! \
tiovxdlpreproc  ! "application/x-tensor-tiovx(memory:batched)" ! perf ! \
tiovxdemux name=demux \
demux. ! queue ! fakesink \
demux. ! queue ! fakesink \
demux. ! queue ! fakesink \
demux. ! queue ! fakesink \
demux. ! queue ! fakesink \
demux. ! queue ! fakesink

This first pipeline will run at around 5fps

IN_CAPS="video/x-raw, width=320, height=320, format=RGB"

GST_DEBUG="2,*perf*:6" gst-launch-1.0 \
videotestsrc ! $IN_CAPS ! tiovxdlpreproc ! perf ! fakesink \
videotestsrc ! $IN_CAPS ! tiovxdlpreproc ! perf ! fakesink \
videotestsrc ! $IN_CAPS ! tiovxdlpreproc ! perf ! fakesink \
videotestsrc ! $IN_CAPS ! tiovxdlpreproc ! perf ! fakesink \
videotestsrc ! $IN_CAPS ! tiovxdlpreproc ! perf ! fakesink \
videotestsrc ! $IN_CAPS ! tiovxdlpreproc ! perf ! fakesink

Each of the 6 individual pipelines will run at 40fps.

All the delay in the batched pipeline appears to be in the processing time:

0:00:02.118839548  1771     0x171faf70 LOG                tiovxsiso gsttiovxsiso.c:864:gst_tiovx_siso_process_graph:<tiovxdlpreproc0> Enqueueing parameters
0:00:02.118885055  1771     0x171faf70 LOG                tiovxsiso gsttiovxsiso.c:883:gst_tiovx_siso_process_graph:<tiovxdlpreproc0> Processing graph
0:00:02.298343493  1771     0x171faf70 LOG                tiovxsiso gsttiovxsiso.c:896:gst_tiovx_siso_process_graph:<tiovxdlpreproc0> Dequeueing parameters

This corresponds to the following code: https://github.com/TexasInstruments/edgeai-gst-plugins/blob/develop/gst-libs/gst/tiovx/gsttiovxsiso.c#L882

which by removing the error handling can be summarized as:

GST_LOG_OBJECT (self, "Enqueueing parameters");
  status =
      vxGraphParameterEnqueueReadyRef (priv->graph, INPUT_PARAMETER_INDEX,
      (vx_reference *) priv->input, priv->num_channels);
  status =
      vxGraphParameterEnqueueReadyRef (priv->graph, OUTPUT_PARAMETER_INDEX,
      (vx_reference *) priv->output, priv->num_channels);

  GST_LOG_OBJECT (self, "Processing graph");
  status = vxScheduleGraph (priv->graph);
  status = vxWaitGraph (priv->graph);

  GST_LOG_OBJECT (self, "Dequeueing parameters");
  status =
      vxGraphParameterDequeueDoneRef (priv->graph, INPUT_PARAMETER_INDEX,
      (vx_reference *) priv->input, priv->num_channels, &in_refs);
  status =
      vxGraphParameterDequeueDoneRef (priv->graph, OUTPUT_PARAMETER_INDEX,
      (vx_reference *) priv->output, priv->num_channels, &out_refs);

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow processing with batched dlpreproc #188

Slow processing with batched dlpreproc #188

emmadrigal commented Feb 11, 2022

Slow processing with batched dlpreproc #188

Slow processing with batched dlpreproc #188

Comments

emmadrigal commented Feb 11, 2022