Skip to content

spi_bcm2708situation

msperl edited this page Feb 26, 2015 · 9 revisions

Facts

  • Uses/implements:
    • "depreciated" API which results in code duplication/unmanaged code
    • separate worker thread/queue without changing RT-priority
    • interrupt handler
    • supports LoSSI/9Bit transfers
    • implements as per "BCM2834 ARM Peripherals" this includes
      • recommendations for filling FIFOs (16bytes then 12) as per 10.6.2/Page 158
      • recommendations for CLK to be a power of 2 - as per 10.5 CLK-Register/Page 156
    • limited to 2/3 SPI devices - no arbitrary GPIO as CS support
  • Bugs:
    • Inverted CS does not work correctly if there are 2 or more devices

Typical transfer pattern on the bus

these have the "gpio-timing-instrumentation" enabled and were run on a RPI2 similar/worse pattern happen on a RPI1

Note also that there are channels:

  • 4 - CAN_INT - the CAN interrupt signaling a message has arrived
  • 5 - D24 - work_run - inside the bcm2835_spi_transfer_one (potentially sleeping)
  • 8 - D23 - trans_wait - waiting on completion inside of transmit code (in interrupt)
  • 9 - D22 - spi_int - inside the SPI interrupt
  • 10 - D18 - mcp-int - inside the interrupt handler of the mcp2515
  • 11 - D17 - mcpcompl - inside one of the mcp2515 completion routines

default code

spi-bcm2708 without modifications (beside instrumentation)

Measurements:

  • time between CAN-interrupt down and inside CAN interrupt handler: 6.16us
  • time between first CAN message scheduled and the workqueue getting woken up: 27.04us
  • time from Workqueue-start to CS-down: 3.64us
  • time from CS-Down to Spi-interrupt: 3.04us
  • time from CS-Down to first bit transferred: 3.96us
  • time taken to transfer the 2 bytes: 2.12us
  • time from last bit sent to spi_interrupt: 1.52us
  • time from last bit sent to workqueue woken up: 5.72us
  • time from workqueue woken to CS-up: 0.72us
  • time from CS-down to CS-up: 12.56us
  • time from Workqueue woken to mcp-completion code called: 1.68us (after 2 more transmits)

Issues observed:

  • Transfers take 8 times as long as the transfer itself:
    • 2.12us data and 12.56us CS down for 2 bytes - that is 16% utilization
    • 5.56us data and 34.32us CS down for 5 bytes - that is 16% utilization
    • variations are high for these variations - there is lots of "scheduling" jitter...
  • 3us are typically between CS down and the first byte getting sent (but sometimes these are longer)
    • this is mostly due to the fact that the interrupt handler takes so long to start and push data into the fifo
  • also we see lots of WAIT for completions (in the case of multiple transfers for a single transfer like write X then read Y)

Summary:

  • scheduling of tasks is a major breaking-point (wakeups)
  • also the scheduling interrupt-latencies are quite high
  • for a single "simple" transfer we have:
    • 2 interrupts (fill in initially and draining in the end)
    • 3 task switches:
      • scheduling process (IRQ or other) to queued thread (transmit_one)
      • sleeping and waiting for Completion inside transmit_one
      • waking the transmit_one thread for further processing
      • often the completion code will wake up another kernel thread (especially when using the synchronous spi interfaces)
  • we can remove one interrupt from the equation by filling the FIFO before enabling interrupts

Improved startup code prefilling FIFO

spi-bcm2708 with schedule fifo immediately (beside instrumentation)

Measurements:

  • time between CAN-interrupt down and inside CAN interrupt handler: 9.88us
  • time between first CAN message scheduled and the workqueue getting woken up: 29.04us
  • time from Workqueue-start to CS-down: 5.64us
  • time from CS-Down to first bit transferred: 1.12us (but sometimes also only 0.52us)
  • time taken to transfer the 2 bytes: 2.12us
  • time from last bit sent to spi_interrupt: 4.56us
  • time from last bit sent to workqueue woken up: 17.52us
  • time from workqueue woken to CS-up: 0.8us
  • time from CS-down to CS-up: 21.64us
  • time from Workqueue woken to mcp-completion code called: 2.44us (after 2 more transmits)

Issues observed:

  • the "starting" gap has been minimized and we have one interrupt less to handle
  • but it still leaves us with:
    • lots of latencies if we transfer more bytes
    • Interrupt-handler waking up the completion and the workqueue resuming its work

improved interrupt code for handling multiple transfers as one