Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Alternate Manchester SWO implementation using DMA #1958

Closed

Conversation

ssimek
Copy link

@ssimek ssimek commented Oct 12, 2024

Fast DMA-based TRACESWO Manchester decoding

Detailed description

The current interrupt-based TRACESWO decoding suffers (IMO) from several issues

  • it is limited to ~100 kHz (~50 kbit/s raw, i.e. ~25 kbit/s decoded SWO console) signal
    • while this is usually sufficient for text-based debug output, it gets worse when more data needs to be sent out/graphed/etc
    • it also unnecessarily impacts performance of the tested code, because the MCU has to wait for the ITM to be available
  • high frequency hangs the probe due to an overload of interrupt requests
    • this happens easily when starting with new hardware before the init code is debugged

This PR contains a full reimplementation of the TRACESWO link decoder using DMA that makes capture a lot more efficient:

  1. all edge times of the signal are captured using a timer
  2. DMA is used to record the timings into a circular buffer
  3. the buffer is periodically processed in batches, transformig the edge stream into a byte stream for sending in another circular buffer, resulting in effective processing time per sample on the order of several clock cycles
  4. the output buffer is processed in a lower-priority ISR as time permits

The result is the ability to process SWO signal up to ~3 MHz with the probe being more resilient against higher frequency signals (it just fails to decode it properly). Continuous streaming at these speeds obviously makes USB a bottleneck (1.5 Mbit = ~180 kB/s, with the USB controller seemingly being able to handle up to about 70 kB/s without double buffering). But for reasonably intermittent flows, one big benefit is the lightened load on the target.

Your checklist for this pull request

  • I've read the Code of Conduct
  • I've read the guidelines for contributing to this repository
  • It builds for hardware native (see Building the firmware)
    • it does, but it doesn't fit, neither did the original one, at least not with GCC 13.3 I'm using - it fits with LTO, but then the GDB server doesn't work for some reason. I was developing it with a few targets removed so it fit
  • It builds as BMDA (see Building the BMDA)
    • not applicable, the only change is for the native hardware
  • I've tested it to the best of my ability
  • My commit messages provide a useful short description of what the commits do

Closing issues

None that I'm aware of

ALTracer and others added 17 commits November 5, 2023 04:57
* Try to print extended capabilities. If the adapter doesn't advertise
  support for them, then just print standard capabilities, like it was before.
…l byte

* According to Wireshark usbmon dumps, the firmware returns 112 bytes of version,
  including some copyright information after an effectively 0-terminator.
* Use strchr() instead of index() for Windows reasons.
* Make sure the version buffer contains at least one NULL
* And avoid replacing the only NULL with LF, losing null-termination in the process
* Asking for 1 byte less raises a libusb OVERFLOW error
  when using JLink V8 and newer with BMDA on Linux hosts
  after the proprietary software stack touches the adapter (but not before).
* libusb docs recommend receiving into a bigger buffer (1028=512*2+4)
  and checking whether all of expected data got indeed received.
* For firmwares which send that transaction-error byte in a separate packet,
  keep the second read call (V5 ones do this regardless).
… which would make wrongly it skip chunks of the buffer to write, corrupting the message to send
* Add a reduced loop like the one in bmp_poll_loop(), allowing BMP
  to service GDB accessing syscall buffers in target memory;
* Use both F-reply and Detach as exit conditions, passing
  a corresponding result code for the target to see when we resume it
DFU detach wasn't lowering the USB pullup. This meant that BMP was
rebooting into DFU bootloader mode, but the host wasn't re-enumerating
it, because it hadn't detected a detach.

In `platform_request_boot`, The `gpio_set_mode` config was for analog
instead of floating input. PA8 isn't defined as an analog pin on this
chip, so it's arguably undefined behavior.

Also confirmed that using `scb_reset_system` instead of `scb_reset_core`
to do the DFU reboot would correctly detach and enter the bootloader, if
the boot button was held down.
Fix two problems with SWD turnaround:

* When going from output to input, add a missing delay loop, so the
clock low period is consistent. This prevents problems with a missed
clock edge when the clock speed is intentionally slowed to accommodate
lower bandwidth connections to the target.

* Always set the clock low at the end of a turnaround. Otherwise, when
floating the clock after a read transaction, the most recent clock state
was driven high. If there's a pull down on the clock, a subsequent clock
reactivation would drive the clock high, producing an extra rising clock
edge. This sometimes causes (recoverable) protocol error states when
re-attaching after a detach.

Co-Authored-By: Stanislav Sotnikov <[email protected]>
Add missing idle cycles after SWD read transactions.

ADIv5 (ARM IHI0031G) §B4.1.1 requires at least 8 idle cycles after any
SWD transaction (not just write transactions) before stopping the clock.

This does add some delays to read transactions where there were
previously none. A future change could conditionally disable the idle
cycles for both read and write transactions if it is known that a new
transaction will immediately follow.
…itch, where a threshold violation is detected during normal operation.

Now we need to count 3 threshold violations to trigger the protection
mechanism.
@dragonmux
Copy link
Member

dragonmux commented Oct 13, 2024

Before we review this, please adjust all the nomenclature etc over to the current state of things - the pin is technically called TRACESWO in exactly one place in the CoreSight TRM, however it is referred to as SWO everywhere else including all the Architecture Reference Manuals so BMD takes the opinion that we should be consistent and use SWO only to refer to it to make sure nobody gets confused with parallel Trace.

Additionally, the file should be named for which encoding mode it implements - so, in this case.. swo_manchester_dma.c. This code likewise seems to take the old SWO init interface, please use and obey the current API given this also needs a way to be deinitialised and the pin returned back to use as TDO on most probes. The SWO_PIN and SWO_PORT macros exist for switching the pin over to the right mode for the decoder.

Also note that trace_buf_drain was entirely replaced with a routine that is interrupt safe, and that all the buffering for this was lifted out of the backends, simplifying them, the API, etc.

Once you have adjusted things for the current API, please run clang-format on your contribution as it contains many formatting errors.

Finally, we are retargeting this to main as this change will not be landing on the v1.10 firmware branch. That branch exists for bugfixes not features.

@dragonmux dragonmux changed the base branch from v1.10 to main October 13, 2024 02:25
@dragonmux dragonmux changed the title common/stm32/traceswo: add an alternate traceswo implementation using… Feature: Alternate Manchester SWO implementation using DMA Oct 13, 2024
@ssimek
Copy link
Author

ssimek commented Oct 13, 2024

Making this PR against 1.10 was fully intentional since main has diverged significantly and seems to be unfortunately quite unstable and under heavy development. If there's not going to be another 1.x release (the nature and cadency of changes in main has a very 2.0-ish vibe to it), I'm ok with maintaining this in my own fork for my own use for now. Once there is a stable release based on current main, I'll look into adjusting the code to the new standards.

@ssimek ssimek closed this Oct 13, 2024
@dragonmux
Copy link
Member

main should be stable, and if it is not for your target(s), we need to know please - open bug reports, tell us, that's how we get to know that something broke. The issue with Flash size will be addressed in the next couple of weeks as the firmware build is going to be split into target profiles. It is at this point seeing minor features only and bugfixes pending a v2.0 release candidate.

The SWO engine was entirely overhauled as it was entirely broken on GD32F1-based BMPs, and the buffer management rewritten so it does something more useful for both SWO modes + so the pin could be properly returned back to use for JTAG after as this is a bug in the v1.x firmware series. The new code should be considered stable pending any show-stopping bugs that require any more rework.

It's not so much that there won't be another v1.x release, so much as that branch is specifically for bugfixes backported from main, so unless a major bug were found in v1.10 there is unlikely to be another v1.10 release and if there was it would only be for a bugfix, not a feature. The release branches are always specifically for this purpose only.

@ssimek
Copy link
Author

ssimek commented Oct 13, 2024

I understand. I'll try to find the time to give 2.0 another chance - it may have been just bad luck, but the main I originally checked out about a week ago had issues with SWO, buffering, etc. (various artifacts appearing in the stream even in low speed scenarios) so I chose to base my work on the stable branch. In case it is stable enough for daily use, I'll port the changes and reopen this PR.

@ssimek
Copy link
Author

ssimek commented Oct 13, 2024

Created #1960 for the 2.0 implementation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants