Skip to content
This repository has been archived by the owner on Feb 2, 2024. It is now read-only.

subset calling conventions for inline handlers pressure (wishfull thinking bias warning) #12

Open
jnk0le opened this issue Jul 27, 2022 · 4 comments

Comments

@jnk0le
Copy link

jnk0le commented Jul 27, 2022

read the Anders@IAR proposed document on ABI deviations [1] and got impression that this is limited way to defacto standardize things like calling ILP32E code (with 16byte stack alignment) from UABI or RVE eabi from RVI eabi. (just a bit different and more possible deviations)

Up to 8 registers can be reserved simultaneously: t3-t6 (x28-x31)
and s8-s11 (x24-x27). A tool chain may however choose to only
support up to 4 reserved registers (t5-t6 & s10-s11), as the need
for reserving more than four registers is considered being very
rare.

This gives
the opportunity to build generic libraries (with reserved registers),
which can be used for applications which need reserved registers and
applications that do not need reserved registers.

If a toolchain choose to only allow four
registers to be locked/reserved (s10-s11 and t5-t6), most library
functions will probably generate the same code as if those registers
were not locked.

I think that it is theoretically possible to implement a way to do a compile time ABI deviations (for static linking, or internal ones) with greater benefits.

  • each function will somehow "export" list of clobbered caller-saved registers with single unit granularity (e.g. a0, a1, a2, t0, and that's all)
  • the caller will use those clobbers to treat unclobbered registers as callee-saved - thus the children nodes need to be compiled and optimized before the parent
  • compilers needs to learn to not happily use x10-x15 (UABI) when not necessary for compression and more registers than necessary , so parent nodes can take advantage of it (as there is no reason to care about it right now)
  • any legacy code, shared object exports etc. defaults to full ABI clobber
  • many leaf functions tend to not use many registers, so that allows more efficient code on perf and size metric
  • some assembly directive will be needed to annotate such "exports" from pure assembly functions
  • no. 1 issue will be function pointers aka callbacks (popular in interrupts). I think the most common are: callbacks initialized once at startup, callback queues (from structures etc.) and callback parameter for instant use. Probably all could be determined statically in most cases. The dispatch will have to use a common clobber (ored clobbers of all possible calls)
  • resolves push/pop wall issue in inline interrupt handlers
  • CLIC like C function dispatchers - probably not
  • RVV can be handled similarly with ABI making all regs caller saved by default
  • calle-saved registers can be similarly handled starting from root/parent nodes, but that's way more complex and combined with above creates infinite 2-way recursion. Rather a big no-no
  • this pattern can be ported to other architectures
it is also possible to do such calls to pure assembly functions, using some inline asm, with current toolchains e.g.:
  static inline int weird_call(int reason, void* param);
  static inline int weird_call(int reason, void* param)
  {
  	register int result asm("a0") = reason;
  	register void* r1 asm("a1") = param;

  	asm volatile(
  		"call	sth_func     \n\t" // potentially problematic on architectures without pseudoinstr for calls
  		: [ARG0] "+r" (result) // return in same register
  		: [ARG1] "r" (r1)
  		: "memory", "ra", "a2" // use clobber for any caller saved regs used
  	);

  	return result;
  }

https://godbolt.org/z/5jTn11jYv

[1] - https://lists.riscv.org/g/tech-psabi/topic/eabi_deviations/92622595?p=,,,20,0,0,0::recentpostdate/sticky,,,20,2,0,92622595,previd%3D1658930318801721437,nextid%3D1644242986809326219&previd=1658930318801721437&nextid=1644242986809326219

@jnk0le
Copy link
Author

jnk0le commented Jul 27, 2022

one more:

  • if passed arguments are not clobbered by this mechanism, it means that they were not modified and caller can recycle them

@jnk0le
Copy link
Author

jnk0le commented Jul 28, 2022

was too quick with conclusions, the reservation part is for things like overlay.

The mentioned inline handlers even though can use only what is needed, they currently emit full push/pop wall when anything is called, as is the case in AVR8 for a long time.

https://godbolt.org/z/nhdjh3Gqa

trampoline interrupts: from a C function dispatch, back to inline handlers. Not sure about that. There is also issue of pure assembly handlers not knowing what dispatcher is preserving.

Also the trampoline engine cannot reuse preserved registers for nesting, only different handlers/prorities at the same nesting level.

@jnk0le jnk0le changed the title subset calling conventions as more efficient deviations (wishfull thinking bias warning) subset calling conventions (wishfull thinking bias warning) Jul 28, 2022
@jnk0le jnk0le changed the title subset calling conventions (wishfull thinking bias warning) subset calling conventions for inline handlers pressure (wishfull thinking bias warning) Jul 30, 2022
@jnk0le
Copy link
Author

jnk0le commented Dec 10, 2022

It turns out that llvm already has something like this, hidden under -mllvm -enable-ipra flag.
No hits for gcc though.

@jnk0le
Copy link
Author

jnk0le commented Jan 8, 2023

Another one is to also "export" deterministic constants/values that are left in registers on return.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant