-
Notifications
You must be signed in to change notification settings - Fork 5
Floating point ABI need not be adjusted for interrupt latency #7
Comments
In the Cortex-M world the common rule for RTOSes is to do lazy stacking/unstacking, i.e. the handler reserves stack space, but the register save is done only at first FP instruction. Thus interrupt latency is not affected, and the frame saved on the stack can be immediately reused as a context switch frame, if a reschedule is requested by the interrupt.
yes, but there are still lots of Cortex-M with float, even with double, and lots of applications making use of them, so I would not treat them as oddities. |
As mentioned in the notes, the current EABI spec does not include a hard-FP-optimized EABI calling convention (though for Zfinx, floats are held in x registers anyway). Adding the variant makes sense technically, but the primary practical constraint is community energy to support yet another ABI. Defining some fp registers as temporaries (but not arguments) would keep calling convention the same but improve perf and codesize of FP routines. Interrupt handlers would then have to deal with saving/restoring FP temps either directly or via lazy traps. |
Yes, Cortex-M provides this convenience by implementing all this stacking/unstacking in hardware. It avoids latency hit for handlers not touching the FPU (touching it costs 17 loads+stores+overhead, the half of the FPU that is caller-saved). All this without impacting the performance of regular code using the FPU. (As an aside, on cores without that hardware feature, doing the lazy save/restore for handlers in software adds way more interrupt latency overhead and complexity than desirable; I've tried : ). It really is simply a programmer convenience, otherwise FPU state has to be explicitly saved/restored, or the OS can provide a separate function to register handlers that use the FPU, or such things. Generally, the hard realtime handlers that care most about latency won't touch the FPU. IMO it's not worth degrading the main code's performance (something more people care about) for the sake of this bit of programmer convenience. (Maybe a future extension can add the convenience.) It looks to me now that the real issue is concerns about ABI proliferation. (Which I see Krste just confirmed.) Not about interrupt latency. The proposed EABI is neither hard not soft float, it's a mix of the two: it passes floats as integers, and introduces a new FP register convention when using FP instructions. It's not clear that is much simpler than separate hard and soft float variants of the EABI (as UABI already has), where:
If separate hard and soft ABI variants are ultimately better, doing a mixed one adds yet one more ABI to the total long term number of ABIs. |
Strictly speaking, an ABI includes the ISA supported, so the current proposal is really a common calling convention used across six ABIs (I,E)*(no FP instructions, hard FP in f, hard FP in x) . Some of the ABIs are subsets of the others, and the current definition allows some mix-n-match across pre-compiled/hand-written functions. The current form of proposal is optimized for no FP and FP-in-X, but doesn't require any change (including to interrupt handlers) to use FP-in-F hardware instructions. Adding a second calling convention optimized for FP-in-F is a possibility, but I don't see how we'd get to three calling conventions. |
For the "hard FP in f" case, a hard float EABI is a clear win over the EABI-as-proposed, so when a hard float EABI eventually does get added, there's no point in supporting the EABI-as-proposed anymore for hard FP in f. Which means the FP register conventions for EABI-as-proposed get dropped, making it a purely soft float variant. That's a simplification more than a 3rd ABI, granted. The point is that the single EABI as proposed is not obviously any less work than two (hard and soft float) EABIs. Of course if asking anyone about doing two ABIs without all the context, the obvious answer one gets is no, that's more work, everyone knows a new ABI is work. However, let's look at the details. The EABI as proposed combines hard FP instructions with FP-in-integer calling conventions:
Whereas adding both hard and soft variants of EABI:
|
I agree it makes sense to define the two calling conventions (f-args-in-x and f-args-in-f) up front, to avoid wasted effort on an fargs-in-x + f-in-f-hardware tool chain. Calling these hard-float and soft-float will be confusing gven that Zfinx systems will have hardware floating-point and pass args using the "soft-float" calling convention. |
Re: naming the the calling conventions, the POSIX ones are identified with a tuple of (XLEN, data model, FP passing convention), e.g., (RV32, ILP32, F). GCC accepts this information with an arch string and an abi string, e.g., It probably makes sense to avoid reinventing the wheel here, and tell GCC something like |
I'll also just add a note that, while @marceg's original point that having FP caller-saved registers need not impact interrupt latency holds water, this approach will add software complexity to single-protection-domain programs that want to link the interrupt handlers together with the application code. The interrupt handlers and the functions they call must be compiled with a different ABI than the application code, and/or shims must be inserted. Supporting this in the linker would be a major effort, unless we go with the simple approach that interrupt handlers must spill all the caller-saved registers when they call other functions. So for the time being, this approach might only apply to situations where the application code and interrupt handlers are separately linked. |
Currently I see that there are two possible ways to reduce interrupt latency on a core with FPU.
|
not realistic, an interrupt handler can call the actual driver via a dynamic pointer. |
Yes, software optimization seems hard. An other way is to leave the optimization to hardware if FPU is used like ARM. |
It is a very common rule in operating systems that interrupt handlers are not to touch any coprocessors, including floating point in particular, unless the handler specifically manages everything about it (enable access to the coprocessor, save/restore any registers used, etc). In that context, I don't really see the point of changing floating point parameter passing conventions for purposes of improved interrupt latency.
Of course, a software floating point ABI (that passes floating point values in integer registers and explicitly calls floating point functions) is much more efficient on a core without an FPU than emulating the FPU. And cores without an FPU are more common in embedded systems. All this, however, is independent of interrupt latency in common usage. (Except that if an RTOS runs in S rather than M, emulation may add to interrupt latency. The EABI as stated, however, does not seem to be about removing FPU emulation.)
The text was updated successfully, but these errors were encountered: