Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pico: revert using nano.specs #867

Merged
merged 2 commits into from
Nov 5, 2024
Merged

Conversation

Daft-Freak
Copy link
Collaborator

I added this to avoid bloat from exception related stuff in libstdc++, but there's a problem.

  • newlib-nano is usually built with -Os, this results in disabling the optimised memcpy/memset/... implementations.
  • The fallback implementation just copy/fill bytes. Very small, but not fast.
  • RP2040 has an optimised memcpy/memset in the bootrom so these aren't used in the end. RP2350 does not... so they do.
  • GCC tries to be smart and optimises anything that looks like a memcpy to a memcpy (similar for memset), but if you were copying words it makes it 4x slower.

So, this goes back to the default but tries to keep most of the space savings by avoiding the default terminate handler.

For a somewhat extreme performance comparison, this is 15 -> 10ms on my 3d engine (rasterise time).

Size comparison:

logo before (with nano.specs)

  text    data     bss     dec     hex filename
282432       0  319696  602128   93010 examples/logo/logo.elf

logo without nano.specs

  text    data     bss     dec     hex filename
313484       0  320524  634008   9ac98 examples/logo/logo.elf

(ouch, +31k)

logo after (without nano.specs, overridden terminate handler)

  text    data     bss     dec     hex filename
288736       0  320516  609252   94be4 examples/logo/logo.elf

(just +6.3k)

(32blit-stm32 has a similar problem, which we can fix there by rebuilding our custom newlib with -O2 (+3k on logo, +2.4k on firmware))

Saves 20-30k when building with non-nano libstdc++
newlib-nano has really basic memcpy/memset code which hurts perf a lot on RP2350
@Gadgetoid
Copy link
Contributor

I am reminded of this video which I need to watch again - https://www.youtube.com/watch?v=bY2FlayomlE

Judging by your findings here, I should probably not be using https://github.com/pimoroni/pimoroni-pico/blob/main/micropython/modules/micropython-disable-exceptions.cmake in MicroPython builds 😬

@Daft-Freak
Copy link
Collaborator Author

It does seem to depend on the toolchain (I got the fast memcpy when I tried the default ubuntu toolchain, but not with the default arch one), but yeah...

The only way to be sure is digging in the disassembly:
image

I need to see if there's any huge wins on stm32 stuff as we've been using the slow copies since adding the PIC stdlibs, maybe even before that 🤔

@Daft-Freak Daft-Freak merged commit 349bd9a into 32blit:master Nov 5, 2024
9 checks passed
@Daft-Freak Daft-Freak deleted the pico-no-nano branch November 5, 2024 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants