-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add large code model information. #388
Conversation
4c454df
to
a902324
Compare
I think I'd prefer to define a set of relocations to materialize a 64-bit address with four instructions and let the linker to relax it to 1 to 3 instruction depending on the offset to the materialized address. That approach is easier to implement than the address pool and doesn't need a writable text segment. I'd also think it could be faster than reading addresses from the address pool because 1) the processor could fuse 3 or 4 instructions into a single macro-op, and 2) loading an address from the address pool is just a waste of resources if the materialized address happens to be not too far from PC. |
@rui314 I am not sure if we can generate any arbitrary 64 bit address within 4 instruction? did you mind share the instruction sequence? |
@kito-cheng Apologies, we can't materialize a 64-bit value with four instructions in RISC-V. We actually need six instructions to, for example, load a value from an arbitrary 64-bit address as follows:
which can be relaxed to the following 5 instructions if the symbol is within ±2^44 bytes
and of course to the following two instructions if it's within ±2GiB.
It looks to me that the RISC-V psABI's design choice to allow the linker to shrink the section really shines for this use case. |
Creating new ABIs that only support position-dependent code seems like a bit of a questionable thing to be doing in this day and age |
I think using constant pool for large model doesn't cost so much. Because compiler can use anchors to tag variables, and load each variable just by its offset from the anchor.
|
I know there are many extremely large programs out there that might already need the large code model, but to my knowledge, most of these programs are server-side and run in datacenters. They naturally need to be built as position-independent executables, and their text segments need to be read-only (or execute-only if possible). This made me wonder about your motivation to define a position-dependent-only ABI in the first place. So, before diving into the details, I think we need to take a step back and start by understanding the context of this change. I'd like to understand your motivation, explore potential alternative specifications, and learn why you believe this is the best way to achieve the goal. |
I have some notes about large code models in aarch64/powerpc64/x86-64: https://maskray.me/blog/2023-05-14-relocation-overflow-and-code-models#aarch64-code-models I know that certain JIT programs may use large code models, possibly just the position-dependent form.
Agree. For server side large x86-64 applications, they can use the medium code model. This larger range makes it unlikely for AArch64 to encounter relocation overflow issues before the binary becomes excessively oversized for x86-64. |
Without commenting on the merits of this particular code model, I'll remark that there is a distinct and very real use case: RV64 embedded systems, which might not consume that much memory in total but need to cope with a sparse address space. The text/rodata might be separated by gigabytes from the absolute-addressed I/O, and there might be multiple regions of each. There's no virtual memory, so it isn't possible to remap the relevant regions to improve virtual spatial locality. |
Actually, using constant pools as the large code model can generate position-independent executables. It only needs the static linker to leave dynamic relocations for the loader or the memery manager to add the offset when executables are remapped. |
Yes, constant pools are equivalent to a hand-rolled GOT. |
Yes. It's a nice description. Thanks. |
I think I'm still waiting for a response to this comment... |
Can use lui rather than auipc? I think all using Use auipc we may either enforce whole instruction sequence must together or has a relocation let last instruction point to the auipc instruction like
I involved the design and implementation of this code model when I still collage with @kuanlinchentw, so I guess I can give few detail from my brain dump: that design come with several advantages: 1) simple to implement, because it can be borrow the implementation from AArch64 :P, 2) NO new relocation required. However the disadvantage is obviously: 1) every address need load from constant pool, 2) the pool has duplicated entries. IIRC, long instruction sequence scheme also has discussed before in somewhere (publicly?), but it just come with more overhead to implement: new relocation and new linker relaxation, also psABI TG isn't exist in that moment, so we are trying to prevent touch psABI as possible at that moment. |
As @kito-cheng mentioned, It's easy to implement at the compiler veiw, and it doesn't need to modify binutils. |
If no new feature is required for it, what's the point of adding a new section to the psABI document for it? Does AArch64 psABI has a section for their counterpart? |
It need to add a new option for code model just like |
I couldn't find a section in https://github.com/ARM-software/abi-aa/blob/844a79fd4c77252a11342709e3b27b2c9f590cf1/aaelf64/aaelf64.rst about how to use a constant pool to load an object's address from memory. Could you share the URL? |
|
And which code model? It looks like the "large" code model in the AArch64 psABI is different from this proposal because the AArch64's large code model requires that GOT is within 2 GiB from the text segment and seems like addresses are read from GOT. |
I think you can find example at https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/sysvabi64/sysvabi64.rst#get-the-address-of-a-symbol-defined-in-the-same-elf-file I think the distance of GOT means the literal pool not normal GOT. Because it doesn't support PIC. |
If "GOT" in the documentation doesn't mean the |
Some comment from the last LLVM sync meeting: Constant pool and long instruction sequence are both has it own use case, so we may allow both scheme and let user to choose which scheme should be used by some option, also same for function call. Also some other comment from the last psABI call: We didn't (officially) reserve intra-procedure-call scratch register, AArch64 has listed r16 and r17 ad IP0 and IP1, and explicitly say they may clobber during procedure call, that might be an issue when we implement range extension thunks . However we actually already use t0, t1, t2 and t3 at PLT stuffs, so we could use same set of register to implement that, then we should specify that explicitly in the psABI, the only concern is it will seem like an incompatible ABI change, but this is less risky since it's kind of de facto behavior. |
No; using custom calling conventions within an object has always been allowed (and that’s a thing that’s done across architectures), but range extension thunks clobbering registers that weren’t previously reserved for it would break that. It’s only safe to do in the PLT case because people know PLTs exist and they need to be careful. |
Yeah, fair enough, so I think let moving forward without range extension thunks, then extend that later with necessary changes (e.g. adding new tag) if needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am intend to moving this forward and then extend this further later, e.g. add long instruction sequence scheme, one concern is that will require adding new relocation and extra implementation work, so it should split into another step to do to prevent this stuck here too long.
For now, I think it would be great to add few note like: "NOTE: We intend extend the large code model with different code generation strategy in future." to mention we will add long instruction scheme in future, also range extension thunk may included in future.
My biggest concern here is that we're allocating the name "large" and creating a compatibility promise for a short-term code model. If in the future we have a fully designed large model, gcc won't be able to switch to it for -mcmodel=large because that will regress functionality for anyone with an old binutils, so the new, better code model will be stuck with a worse name. @kito-cheng There is a fourth option - use a real GOT. RISC-V does not have a meaningful concept of a GOT base, so there's nothing forcing the GOT to be contiguous; interleave text and GOT in 4 GiB chunks to support GOTPCREL_HI20 relocations in the large model. Obviously this won't work if you're generating a.out and need RX and RW memory to be a single contiguous range each, but it should work for ELF. I'm a strong supporter of range extension thunks and implemented them for the riscv Go linker a while ago. Ideally we would support them with both 4-byte and 8-byte call sites, which means we need a new relocation type JAL_THUNK anyway, so adding CALL_THUNK might not be so bad. |
I think I can post this here for some brief: |
This is the specification for the RISC-V instruction set's ABI, and your 64-bit AMD processor is not a RISC-V processor; unless you're cross-compiling for RISC-V (doubtful?) you seem quite lost and this is not the place for this kind of question since it's for a completely different processor instruction set. |
Implement large code model for GlobalAddressSDNode, BlockAddressSDNode and ExternalSymbolSDNode. See discussion on riscv-non-isa/riscv-elf-psabi-doc#388. co-authored by: Kuan-Lin Chen <[email protected]>
I incline to accept current proposal with optional range extension thunk
|
Will moving forward/merge this PR after next psABI meeting, GCC already merged for a while, and LLVM also provided PoC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Implement large code model for GlobalAddressSDNode and ExternalSymbolSDNode. See discussion on riscv-non-isa/riscv-elf-psabi-doc#388. --------- Co-authored-by: Kuan-Lin Chen <[email protected]>
With riscv-non-isa/riscv-elf-psabi-doc#388 landed it makes sense to have a define for the large code model for consistency with medany and medlow.
Implement large code model for GlobalAddressSDNode and ExternalSymbolSDNode. See discussion on riscv-non-isa/riscv-elf-psabi-doc#388. --------- Co-authored-by: Kuan-Lin Chen <[email protected]>
Implement large code model for GlobalAddressSDNode and ExternalSymbolSDNode. See discussion on riscv-non-isa/riscv-elf-psabi-doc#388. --------- Co-authored-by: Kuan-Lin Chen <[email protected]>
Hi,
This PR add description about large code model.
I was wondering if we need
large+fpic
model.In general, position independant code model puts external symbol addresses into the GOT table.
Is there any case that we have to layout GOT table far away from code over +-2GB?