spi_bcm2708situation

Facts

these have the "gpio-timing-instrumentation" enabled and were run on a RPI2 similar/worse pattern happen on a RPI1

Note also that there are channels:

4 - CAN_INT - the CAN interrupt signaling a message has arrived
5 - D24 - work_run - inside the bcm2835_spi_transfer_one (potentially sleeping)
8 - D23 - trans_wait - waiting on completion inside of transmit code (in interrupt)
9 - D22 - spi_int - inside the SPI interrupt
10 - D18 - mcp-int - inside the interrupt handler of the mcp2515
11 - D17 - mcpcompl - inside one of the mcp2515 completion routines

spi-bcm2708 without modifications (beside instrumentation)

Measurements:

time between CAN-interrupt down and inside CAN interrupt handler: 6.16us
time between first CAN message scheduled and the workqueue getting woken up: 27.04us
time from Workqueue-start to CS-down: 3.64us
time from CS-Down to Spi-interrupt: 3.04us
time from CS-Down to first bit transferred: 3.96us
time taken to transfer the 2 bytes: 2.12us
time from last bit sent to spi_interrupt: 1.52us
time from last bit sent to workqueue woken up: 5.72us
time from workqueue woken to CS-up: 0.72us
time from CS-down to CS-up: 12.56us
time from Workqueue woken to mcp-completion code called: 1.68us (after 2 more transmits)

Issues observed:

Transfers take 8 times as long as the transfer itself:
- 2.12us data and 12.56us CS down for 2 bytes - that is 16% utilization
- 5.56us data and 34.32us CS down for 5 bytes - that is 16% utilization
- variations are high for these variations - there is lots of "scheduling" jitter...
3us are typically between CS down and the first byte getting sent (but sometimes these are longer)
- this is mostly due to the fact that the interrupt handler takes so long to start and push data into the fifo
also we see lots of WAIT for completions (in the case of multiple transfers for a single transfer like write X then read Y)

Summary:

scheduling of tasks is a major breaking-point (wakeups)
also the scheduling interrupt-latencies are quite high
for a single "simple" transfer we have:
- 2 interrupts (fill in initially and draining in the end)
- 3 task switches:
  - scheduling process (IRQ or other) to queued thread (transmit_one)
  - sleeping and waiting for Completion inside transmit_one
  - waking the transmit_one thread for further processing
  - often the completion code will wake up another kernel thread (especially when using the synchronous spi interfaces)
we can remove one interrupt from the equation by filling the FIFO before enabling interrupts

spi-bcm2708 with schedule fifo immediately (beside instrumentation)

Measurements:

time between CAN-interrupt down and inside CAN interrupt handler: 9.88us
time between first CAN message scheduled and the workqueue getting woken up: 29.04us
time from Workqueue-start to CS-down: 5.64us
time from CS-Down to first bit transferred: 1.12us (but sometimes also only 0.52us)
time taken to transfer the 2 bytes: 2.12us
time from last bit sent to spi_interrupt: 4.56us
time from last bit sent to workqueue woken up: 17.52us
time from workqueue woken to CS-up: 0.8us
time from CS-down to CS-up: 21.64us
time from Workqueue woken to mcp-completion code called: 2.44us (after 2 more transmits)

Issues observed:

the "starting" gap has been minimized and we have one interrupt less to handle
but it still leaves us with:
- lots of latencies if we transfer more bytes
- Interrupt-handler waking up the completion and the workqueue resuming its work