Skip to content
This repository has been archived by the owner on Feb 2, 2024. It is now read-only.

Floating point ABI need not be adjusted for interrupt latency #7

Open
marceg opened this issue May 26, 2019 · 11 comments
Open

Floating point ABI need not be adjusted for interrupt latency #7

marceg opened this issue May 26, 2019 · 11 comments

Comments

@marceg
Copy link

marceg commented May 26, 2019

It is a very common rule in operating systems that interrupt handlers are not to touch any coprocessors, including floating point in particular, unless the handler specifically manages everything about it (enable access to the coprocessor, save/restore any registers used, etc). In that context, I don't really see the point of changing floating point parameter passing conventions for purposes of improved interrupt latency.

Of course, a software floating point ABI (that passes floating point values in integer registers and explicitly calls floating point functions) is much more efficient on a core without an FPU than emulating the FPU. And cores without an FPU are more common in embedded systems. All this, however, is independent of interrupt latency in common usage. (Except that if an RTOS runs in S rather than M, emulation may add to interrupt latency. The EABI as stated, however, does not seem to be about removing FPU emulation.)

@ilg-ul
Copy link

ilg-ul commented May 26, 2019

It is a very common rule in operating systems that interrupt handlers are not to touch any coprocessors, including floating point in particular, unless the handler specifically manages everything about it

In the Cortex-M world the common rule for RTOSes is to do lazy stacking/unstacking, i.e. the handler reserves stack space, but the register save is done only at first FP instruction. Thus interrupt latency is not affected, and the frame saved on the stack can be immediately reused as a context switch frame, if a reschedule is requested by the interrupt.

And cores without an FPU are more common in embedded systems ...

yes, but there are still lots of Cortex-M with float, even with double, and lots of applications making use of them, so I would not treat them as oddities.

@kasanovic
Copy link
Contributor

As mentioned in the notes, the current EABI spec does not include a hard-FP-optimized EABI calling convention (though for Zfinx, floats are held in x registers anyway). Adding the variant makes sense technically, but the primary practical constraint is community energy to support yet another ABI.

Defining some fp registers as temporaries (but not arguments) would keep calling convention the same but improve perf and codesize of FP routines. Interrupt handlers would then have to deal with saving/restoring FP temps either directly or via lazy traps.

@marceg
Copy link
Author

marceg commented May 26, 2019

Yes, Cortex-M provides this convenience by implementing all this stacking/unstacking in hardware. It avoids latency hit for handlers not touching the FPU (touching it costs 17 loads+stores+overhead, the half of the FPU that is caller-saved). All this without impacting the performance of regular code using the FPU. (As an aside, on cores without that hardware feature, doing the lazy save/restore for handlers in software adds way more interrupt latency overhead and complexity than desirable; I've tried : ).

It really is simply a programmer convenience, otherwise FPU state has to be explicitly saved/restored, or the OS can provide a separate function to register handlers that use the FPU, or such things. Generally, the hard realtime handlers that care most about latency won't touch the FPU. IMO it's not worth degrading the main code's performance (something more people care about) for the sake of this bit of programmer convenience. (Maybe a future extension can add the convenience.)

It looks to me now that the real issue is concerns about ABI proliferation. (Which I see Krste just confirmed.) Not about interrupt latency.

The proposed EABI is neither hard not soft float, it's a mix of the two: it passes floats as integers, and introduces a new FP register convention when using FP instructions. It's not clear that is much simpler than separate hard and soft float variants of the EABI (as UABI already has), where:

  • the hard variant can use the same conventions as UABI for FP registers (I don't see any value in changing this for embedded, unlike GPRs)
  • the soft variant avoids all FP registers and instructions (or at least registers in the case of Zfinx), so no new FP register use convention is introduced

If separate hard and soft ABI variants are ultimately better, doing a mixed one adds yet one more ABI to the total long term number of ABIs.

@kasanovic
Copy link
Contributor

kasanovic commented May 26, 2019

Strictly speaking, an ABI includes the ISA supported, so the current proposal is really a common calling convention used across six ABIs (I,E)*(no FP instructions, hard FP in f, hard FP in x) . Some of the ABIs are subsets of the others, and the current definition allows some mix-n-match across pre-compiled/hand-written functions.

The current form of proposal is optimized for no FP and FP-in-X, but doesn't require any change (including to interrupt handlers) to use FP-in-F hardware instructions.

Adding a second calling convention optimized for FP-in-F is a possibility, but I don't see how we'd get to three calling conventions.

@marceg
Copy link
Author

marceg commented May 26, 2019

For the "hard FP in f" case, a hard float EABI is a clear win over the EABI-as-proposed, so when a hard float EABI eventually does get added, there's no point in supporting the EABI-as-proposed anymore for hard FP in f. Which means the FP register conventions for EABI-as-proposed get dropped, making it a purely soft float variant. That's a simplification more than a 3rd ABI, granted.

The point is that the single EABI as proposed is not obviously any less work than two (hard and soft float) EABIs. Of course if asking anyone about doing two ABIs without all the context, the obvious answer one gets is no, that's more work, everyone knows a new ABI is work. However, let's look at the details.

The EABI as proposed combines hard FP instructions with FP-in-integer calling conventions:

  • this hasn't been done in RISC-V UABI, it's totally new, and potentially a fair bit of work (any assembler optimized math routines using FP instructions need to be updated etc for these new conventions)
  • it's ultimately throwaway work, assuming a hard float EABI eventually gets implemented

Whereas adding both hard and soft variants of EABI:

  • for the hard variant, the existing UABI FP register conventions can be used, except that only 4 instead of 8 FPRs are used to pass argument (necessary to fit with EABI GPR conventions), the other 4 become temporaries or saved registers
  • most math routines don't pass more than 4 arguments and are thus unaffected
  • most of the work seems to be in the compiler; the UABI already supports both hard and soft variants so the compiler infrastructure is there

@kasanovic
Copy link
Contributor

I agree it makes sense to define the two calling conventions (f-args-in-x and f-args-in-f) up front, to avoid wasted effort on an fargs-in-x + f-in-f-hardware tool chain.

Calling these hard-float and soft-float will be confusing gven that Zfinx systems will have hardware floating-point and pass args using the "soft-float" calling convention.

@aswaterman
Copy link

aswaterman commented Aug 29, 2019

Re: naming the the calling conventions, the POSIX ones are identified with a tuple of (XLEN, data model, FP passing convention), e.g., (RV32, ILP32, F). GCC accepts this information with an arch string and an abi string, e.g., -march=rv32imafdc -mabi=ilp32f indicates that the F and D extensions are present but only single-precision floats are passed in F registers.

It probably makes sense to avoid reinventing the wheel here, and tell GCC something like -mabi=eabi-ilp32f.

@aswaterman
Copy link

aswaterman commented Aug 29, 2019

I'll also just add a note that, while @marceg's original point that having FP caller-saved registers need not impact interrupt latency holds water, this approach will add software complexity to single-protection-domain programs that want to link the interrupt handlers together with the application code. The interrupt handlers and the functions they call must be compiled with a different ABI than the application code, and/or shims must be inserted. Supporting this in the linker would be a major effort, unless we go with the simple approach that interrupt handlers must spill all the caller-saved registers when they call other functions. So for the time being, this approach might only apply to situations where the application code and interrupt handlers are separately linked.

@ghost
Copy link

ghost commented Sep 26, 2019

Currently I see that there are two possible ways to reduce interrupt latency on a core with FPU.

  1. Zfinx which holding floats in x registers. It can share the ABI without FPU. However, it needs ISA supported and a lot of compiler work.
  2. Support software lazy stacking/unstacking optimization on toolchain. Theoretically, linker can traverse the function call chain of interrupt handlers to find out whether FPU instruction has been used, even find out which registers are used, then modify the interrupt handler's code only to save/resotre the registers been used.

@ilg-ul
Copy link

ilg-ul commented Sep 26, 2019

Support software lazy stacking/unstacking optimization on toolchain.

not realistic, an interrupt handler can call the actual driver via a dynamic pointer.

@ghost
Copy link

ghost commented Sep 26, 2019

Yes, software optimization seems hard. An other way is to leave the optimization to hardware if FPU is used like ARM.
On the other hand, Zfinx seems to be a good idea. It can reduce the interrupt latency as well as share the ABI without FPU.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants