diff --git a/cfi_backward.adoc b/cfi_backward.adoc index 9de43e1..76de6b8 100644 --- a/cfi_backward.adoc +++ b/cfi_backward.adoc @@ -2,43 +2,101 @@ [[backward]] == Shadow Stack (Zicfiss) -To enforce backward-edge control-flow integrity, the Zicfiss extension -introduces a shadow stack. A shadow stack is a second stack used to store a -shadow copy of the return address in the link register if it needs to be spilled. +The Zicfiss extension introduces a shadow stack to enforce backward-edge +control-flow integrity, A shadow stack is a second stack used to store a +shadow copy of the return address in the link register if it needs to be +spilled. + +The shadow stack is designed to provide integrity to control transfers performed +using a _return_ (where the return may be from a procedure invoked using an +indirect call or a direct call), and this is referred to as backward-edge +protection. + +A program using backward-edge control-flow integrity has two stacks: a regular +stack and a shadow stack. The shadow stack is used to spill the link register, +if required, by non-leaf functions. An additional register, shadow-stack-pointer +(`ssp`), is introduced in the architecture to hold the address of the top of the +active shadow stack. The shadow stack, similar to the regular stack, grows downwards, i.e. from higher addresses to lower addresses. Each entry on the shadow stack is `XLEN` wide and holds the link register value. The `ssp` points to the top of the shadow stack, i.e. address of the last element stored on the shadow stack. -When Zicfiss is enabled, each function that needs to spill the link -register (e.g., non-leaf functions) stores the link register value to the regular -stack and a shadow copy of the link register value to the shadow stack when the +The shadow stack is architecturally protected from inadvertent corruptions and +modifications, as detailed later (See <>). + +The Zicfiss extension provides instructions to store and load the link register +to/from the shadow stack and to check the integrity of the return address. The +extension provides instructions to support common stack maintenance operations +such as stack unwinding and stack switching. + +When Zicfiss is enabled, each function that needs to spill the link register +(e.g., non-leaf functions) stores the link register value to the regular stack +and a shadow copy of the link register value to the shadow stack when the function is entered (the prologue). When such a function returns (the epilogue), the function loads the link register from the regular stack and the shadow copy of the link register from the shadow stack. Then, the link register value from the regular stack and the shadow link register value from the shadow -stack are compared. A mismatch of the two values is indicative of a subversion -of the return address control variable and causes a software error exception -(cause=18) with `*tval` set to "shadow stack fault (code=3)". The software error -exception caused by the shadow stack fault is lower in priority than the load -access fault exception. +stack are compared. A mismatch of the two values is indicative of a subversion +of the return address control variable and causes a software error exception. + +The Zicfiss instructions are encoded using a subset of "May be op" instructions +defined by the Zimop and Zcmop extensions cite:[ZIMOP]. This subset of +instructions revert to their Zimop/Zcmop defined behavior when the Zicfiss +extension is not implemented or if the extension has not been activated at a +privilege mode. A program that is built with Zicfiss instructions can thus +continue to operate correctly, but without backward-edge control-flow integrity, +on processors that do not support the Zicfiss extension or if the Zicfiss +extension is not active. + +The Zicfiss extensions may be activated for use individually and independently +for each privilege mode. + +Compilers should flag each object file (for example, using flags in the elf +attributes) to indicate if the object file has been compiled with the Zicfiss +instructions. The linker should flag (for example, using flags in the elf +attributes) the binary/executable generated by linking objects as being +compiled with the Zicfiss instructions only if all the object files that are +linked have the same Zicfiss attributes. + +The dynamic loader should activate the use of Zicfiss extension for an +application only if all executables (the application and the dependent +dynamically linked libraries) used by that application use the Zicfiss +extension. + +An application that has the Zicfiss extension active may request the dynamic +loader at runtime to load a new dynamic shared object (using dlopen() for +example). If the requested object does not have the Zicfiss attribute then +the dynamic loader, based on its policy (e.g, established by the operating +system or the administrator) configuration, could either deny the request or +deactivate the Zicfiss extension for the application. It is strongly recommended +that the policy enforces a strict security posture and denies the request. + +When the Zicfiss extension is not active or not implemented, the Zicfiss +instructions revert to their Zimop/Zcmop defined behavior. This allows a +compiled with Zicfiss instructions to operate correctly but without +backward-edge control-flow integrity. + +The Zicfiss extension depends on the A, Zicsr, Zimop, and Zcmop extensions. + +=== Zicfiss Instructions Summary The Zicfiss extension introduces the following instructions: * Push to the shadow stack (See <>) -** `sspush x1` and `sspush x5` - encoded using `mop.r.7` -** `c.sspush x1` - encoded using `c.mop.0` +** `SSPUSH x1` and `SSPUSH x5` - encoded using `MOP.R.7` +** `C.SSPUSH x1` - encoded using `C.MOP.0` * Pop from the shadow stack (See <>) -** `sspopchk x1` and `sspopchk x5` - encoded using `mop.rr.28` -** `c.sspopchk x5` - encoded using `c.mop.2` +** `SSPOPCHK x1` and `SSPOPCHK x5` - encoded using `mop.rr.28` +** `C.SSPOPCHK x5` - encoded using `c.mop.2` * Read the value of `ssp` into a register (See <>) -** `ssrdp` - encoded using `mop.r.0` +** `SSRDP` - encoded using `mop.r.0` * Perform an atomic swap from a shadow stack location (See <>) -** `ssamoswap` +** `SSAMOSWAP` When a Zimop encoding is not used by the Zicfiss extension then the instruction follows its Zimop defined behavior. @@ -247,7 +305,7 @@ followed by a write of the link register at the new top of the shadow stack. {bits: 3, name: 'funct3', attr:['100']}, {bits: 5, name: 'rs1', attr:['00000']}, {bits: 5, name: 'rs2', attr:['00001', '00101']}, - {bits: 7, name: '1100111', attr:['sspush x1','sspush x5']}, + {bits: 7, name: '1100111', attr:['SSPUSH x1','SSPUSH x5']}, ], config:{lanes: 1, hspace:1024}} .... @@ -260,38 +318,41 @@ followed by a write of the link register at the new top of the shadow stack. {bits: 3, name: 'n[2:0]', attr:['000']}, {bits: 1, name: '0'}, {bits: 1, name: '0'}, - {bits: 3, name: '011', attr:['c.sspush x1']}, + {bits: 3, name: '011', attr:['C.SSPUSH x1']}, ], config:{lanes: 1, hspace:1024}} .... -Only `x1` and `x5` encodings are supported as `rs2` for `sspush`. Zicfiss -provides 16-bit versions of the `sspush x1` instruction using the Zcmop -defined `c.mop.0` encoding. The `c.sspush x1` expands to `sspush x1`. +Only `x1` and `x5` encodings are supported as `rs2` for `SSPUSH`. Zicfiss +provides 16-bit versions of the `SSPUSH x1` instruction using the Zcmop +defined `c.mop.0` encoding. The `C.SSPUSH x1` expands to `SSPUSH x1`. -The `sspush` instruction and its compressed form `c.sspush` can be used, to push -a link register on the shadow stack. The `sspush` and `c.sspush` instructions +The `SSPUSH` instruction and its compressed form `C.SSPUSH` can be used, to push +a link register on the shadow stack. The `SSPUSH` and `C.SSPUSH` instructions performs a store identically to the existing `STORE` instruction, with the difference that the base is implicitly `ssp` and the width is implicitly `XLEN`. -The `sspush` and `c.sspush` instructions require the virtual address in `ssp` to -have a shadow stack attribute (see <>). Correct execution of `sspush` and -`c.sspush` requires that `ssp` refers to idempotent memory. If the memory -referenced by `ssp` is not idempotent, then the `sspush`/`c.sspush` instructions +The `SSPUSH` and `C.SSPUSH` instructions require the virtual address in `ssp` to +have a shadow stack attribute (see <>). Correct execution of `SSPUSH` and +`C.SSPUSH` requires that `ssp` refers to idempotent memory. If the memory +referenced by `ssp` is not idempotent, then the `SSPUSH`/`C.SSPUSH` instructions cause a store/AMO access fault exception. If the virtual address in `ssp` is not -`XLEN` aligned, then the `sspush`/`c.sspush` instructions cause a store/AMO +`XLEN` aligned, then the `SSPUSH`/`C.SSPUSH` instructions cause a store/AMO access fault exception. -The operation of the `sspush` and `c.sspush` instructions is as follows: +The operation of the `SSPUSH` and `C.SSPUSH` instructions is as follows: -.`sspush` and `c.sspush` operation +.`SSPUSH` and `C.SSPUSH` operation [listing] ---- -If (xSSE == 1) +if (xSSE == 1) mem[ssp - (XLEN/8)] = X(src) # Store src value to ssp - XLEN/8 ssp = ssp - (XLEN/8) # decrement ssp by XLEN/8 endif ---- +The `ssp` is decremented by `SSPUSH` and `C.SSPUSH` only if the store to the +shadow stack completes successfully. + [[SS_POP]] === Pop from the shadow stack @@ -306,7 +367,7 @@ current top of the shadow stack followed by an increment of the `ssp` by {bits: 5, name: 'rd', attr:['00000','00000']}, {bits: 3, name: 'funct3', attr:['100']}, {bits: 5, name: 'rs1', attr:['00001','00101']}, - {bits: 12, name: '110011011100', attr:['sspopchk x1','sspopchk x5']}, + {bits: 12, name: '110011011100', attr:['SSPOPCHK x1','SSPOPCHK x5']}, ], config:{lanes: 1, hspace:1024}} .... @@ -319,13 +380,13 @@ current top of the shadow stack followed by an increment of the `ssp` by {bits: 3, name: 'n[2:0]', attr:['010']}, {bits: 1, name: '0'}, {bits: 1, name: '0'}, - {bits: 3, name: '011', attr:['c.sspopchk x5']}, + {bits: 3, name: '011', attr:['C.SSPOPCHK x5']}, ], config:{lanes: 1, hspace:1024}} .... -Only `x1` and `x5` encodings are supported as `rs1` for `sspopchk`. Zicfiss -provides a 16-bit version of the `sspopchk x5` using Zcmop define `c.mop.2` -encoding. The `c.sspopchk x5` expands to `sspopchk x5`. +Only `x1` and `x5` encodings are supported as `rs1` for `SSPOPCHK`. Zicfiss +provides a 16-bit version of the `SSPOPCHK x5` using Zcmop define `c.mop.2` +encoding. The `C.SSPOPCHK x5` expands to `SSPOPCHK x5`. Usually programs with a shadow stack push the return address onto the regular stack as well as the shadow stack in the function prologue of non-leaf @@ -335,10 +396,10 @@ the shadow stack. The two values are then compared. If the values do not match it is indicative of a corruption of the return address variable on the regular stack. -The `sspopchk` instruction and its compressed form `c.sspopchk` can be used to +The `SSPOPCHK` instruction and its compressed form `C.SSPOPCHK` can be used to pop the shadow return address value from the shadow stack and check that the value matches the contents of the link register and if not cause a software -integrity fault exception with `*tval` set to "shadow stack fault (code=3)". +integrity fault exception with `__x__tval` set to "shadow stack fault (code=3)". While any register may be used as link register, conventionally the `x1` or `x5` registers are used. The shadow stack instructions are designed to be most @@ -419,16 +480,16 @@ be held in the link register itself for the duration of the leaf function execution. ==== -The `c.sspopchk`, and `sspopchk` instructions perform a load identically to the +The `C.SSPOPCHK`, and `SSPOPCHK` instructions perform a load identically to the existing `LOAD` instruction, with the difference that the base is implicitly `ssp` and the width is implicitly `XLEN`. -The `sspopchk` and `c.sspopchk` instructions require the virtual address in +The `SSPOPCHK` and `C.SSPOPCHK` instructions require the virtual address in `ssp` to have a shadow stack attribute (see <>). Correct execution of -`sspopchk` and `c.sspopchk` requires that `ssp` refers to idempotent memory. If +`SSPOPCHK` and `C.SSPOPCHK` requires that `ssp` refers to idempotent memory. If the memory reference by `ssp` is not idempotent, then the instructions cause a load access fault exception. If the virtual address in `ssp` is not `XLEN` -aligned, then `sspopchk` and `c.sspopchk` instructions cause a load access +aligned, then `SSPOPCHK` and `C.SSPOPCHK` instructions cause a load access fault exception [NOTE] @@ -438,7 +499,7 @@ more secure to detect errors in the program. An access fault exception is raised instead of address misaligned exception in such cases to indicate fatality and that the instruction must not be emulated by a trap handler. -The `sspopchk` instruction performs a load followed by a check of the loaded +The `SSPOPCHK` instruction performs a load followed by a check of the loaded data value with the link register as source. If the check against the link register faults, and the instruction is restarted by the trap handler, then the instruction will perform a load again. If the memory from which the load is performed is @@ -450,9 +511,9 @@ usage, and requiring memory referenced by `ssp` to be idempotent does not pose a significant restriction. ==== -The operation of the `sspopchk` and `c.sspopchk` instructions is as follows: +The operation of the `SSPOPCHK` and `C.SSPOPCHK` instructions is as follows: -.`sspopchk` and `c.sspopchk` operation +.`SSPOPCHK` and `C.SSPOPCHK` operation [listing] ---- if (xSSE == 1) @@ -468,19 +529,24 @@ if (xSSE == 1) endif ---- -The `ssp` is incremented by `sspopchk` and `c.sspopchk` only if the load from -the shadow stack completes successfully. The `ssp` is decremented by `sspush` -and `c.sspush` only if the store to the shadow stack completes successfully. +If the value loaded from the address in `ssp` does not match the value in `rs1`, +a software error exception (cause=18) is raised with `__x__tval` set to "shadow +stack fault (code=3)". The software error exception caused by `SSPOPCHK`/ +`C.SSPOPCHK` is lower in priority than a load access fault exception. + +The `ssp` is incremented by `SSPOPCHK` and `C.SSPOPCHK` only if the load from +the shadow stack completes successfully and no software error exception is +raised. [NOTE] ==== -The use of the compressed instruction `c.sspush x1` to push on the shadow stack +The use of the compressed instruction `C.SSPUSH x1` to push on the shadow stack is most efficient when the ABI uses `x1` as the link register, as the link register may then be pushed without needing a register-to-register move in the -function prologue. To use the compressed instruction `c.sspopchk x5`, the +function prologue. To use the compressed instruction `C.SSPOPCHK x5`, the function should pop the return address from regular stack into the alternate -link register `x5` and use the `c.sspopchk x5` to compare the return address to -the shadow copy stored on the shadow stack. The function then uses `c.jr x5` to +link register `x5` and use the `C.SSPOPCHK x5` to compare the return address to +the shadow copy stored on the shadow stack. The function then uses `C.JR x5` to jump to the return address. [listing] @@ -504,17 +570,17 @@ jump to the return address. ==== Store-to-load forwarding is a common technique employed by high-performance processor implementations. Zicfiss implementations may prevent forwarding from -a non-shadow-stack store to the `sspopchk` or the `c.sspopchk` instructions. A +a non-shadow-stack store to the `SSPOPCHK` or the `C.SSPOPCHK` instructions. A non-shadow-stack store causes a fault if done to a page mapped as a shadow stack. However, such determination may be delayed till the PTE has been examined and thus may be used to transiently forward the data from such stores to -`sspopchk` or to `c.sspopchk`. +`SSPOPCHK` or to `C.SSPOPCHK`. ==== [[SSP_READ]] === Read `ssp` into a register -The `ssrdp` instruction is provided to move the contents of `ssp` to a destination +The `SSRDP` instruction is provided to move the contents of `ssp` to a destination register. [wavedrom, ,svg] @@ -524,18 +590,18 @@ register. {bits: 5, name: 'rd', attr:['dst']}, {bits: 3, name: 'funct3', attr:['100']}, {bits: 5, name: '00000'}, - {bits: 12, name: '100000011100', attr:['ssrdp']}, + {bits: 12, name: '100000011100', attr:['SSRDP']}, ], config:{lanes: 1, hspace:1024}} .... -Encoding `rd` as `x0` is not supported for `ssrdp`. +Encoding `rd` as `x0` is not supported for `SSRDP`. -The operation of the `ssrdp` instructions is as follows: +The operation of the `SSRDP` instructions is as follows: -.`ssrdp` operation +.`SSRDP` operation [listing] ---- -If (xSSE == 1) +if (xSSE == 1) X(dst) = ssp else X(dst) = 0 @@ -616,7 +682,7 @@ back_cfi_not_enabled: [[SSAMOSWAP]] === Atomic Swap from a shadow stack location -The `ssamoswap` instruction performs an atomic swap operation between the XLEN +The `SSAMOSWAP` instruction performs an atomic swap operation between the XLEN bits of the `src` register and the XLEN bits located on the shadow stack at the address specified in the `addr` register. The resulting value from the swap operation is then stored into the register specified in the `dst` operand. @@ -631,18 +697,18 @@ operation is then stored into the register specified in the `dst` operand. {bits: 5, name: 'rs2', attr:'src'}, {bits: 1, name: 'rl'}, {bits: 1, name: 'aq'}, - {bits: 5, name: '00101', attr:['ssamoswap.w', 'ssamoswap.d']}, + {bits: 5, name: '00101', attr:['SSAMOSWAP.W', 'SSAMOSWAP.D']}, ], config:{lanes: 1, hspace:1024}} .... -The `ssamoswap` instruction requires the virtual address in `addr` to have a +The `SSAMOSWAP` instruction requires the virtual address in `addr` to have a shadow stack attribute (see <>). If the virtual address is not XLEN -aligned, then `ssamoswap` causes a store/AMO access fault exception. If the -memory reference by the `ssp` is not idempotent, then `ssamoswap` causes a -store/AMO access fault exception. The operation of the `ssamoswap` instructions +aligned, then `SSAMOSWAP` causes a store/AMO access fault exception. If the +memory reference by the `ssp` is not idempotent, then `SSAMOSWAP` causes a +store/AMO access fault exception. The operation of the `SSAMOSWAP` instructions is as follows: -.`ssamoswap` operation +.`SSAMOSWAP` operation [listing] ---- X(rd) = mem[X(rs1)] @@ -693,9 +759,9 @@ restore it prior to returning from the trap. When a new shadow stack is created by the supervisor, it needs to store a checkpoint at the highest address on that stack. This enables the shadow stack -pointer to be switched using the process outlined in this note. The `ssamoswap` +pointer to be switched using the process outlined in this note. The `SSAMOSWAP` instruction can be used to store this checkpoint. When the old value at the -memory location operated on by `ssamoswap` is not required, `rd` can be set to +memory location operated on by `SSAMOSWAP` is not required, `rd` can be set to `x0`. ==== @@ -712,11 +778,11 @@ enhanced to support a shadow stack memory region for use by M-mode. ==== Virtual-Memory system extension for Shadow Stack The shadow stack memory is protected using page table attributes such that it -cannot be stored to by instructions other than `sspush`, and `c.sspush`. The -`sspopchk` and `c.sspopchk` instructions can only load from shadow stack memory. +cannot be stored to by instructions other than `SSPUSH`, and `C.SSPUSH`. The +`SSPOPCHK` and `C.SSPOPCHk` instructions can only load from shadow stack memory. -The `sspush` and `c.sspush` instructions perform a store. The `sspopchk` and -`c.sspopchk` instructions perfom a load. +The `SSPUSH` and `C.SSPUSH` instructions perform a store. The `SSPOPCHK` and +`C.SSPOPCHK` instructions perfom a load. The shadow stack can be read using all instructions that load from memory. @@ -730,16 +796,16 @@ page. When `menvcfg.SSE=0`, this encoding remains reserved. When `V=1` and The following faults may occur: . If the accessed page is a shadow stack page: -.. Stores other than `sspush` and `c.sspush` cause store/AMO access fault. +.. Stores other than `SSPUSH` and `C.SSPUSH` cause store/AMO access fault. .. Instruction fetches cause an instruction page fault. . If the accessed page is not a shadow stack page or if the page is in non-idempotent memory: -.. `c.sspush` and `sspush` cause a store/AMO access fault. -.. `c.sspopchk` and `sspopchk` cause a load access fault. +.. `C.SSPUSH` and `SSPUSH` cause a store/AMO access fault. +.. `C.SSPOPCHK` and `SSPOPCHK` cause a load access fault. [NOTE] ==== -Stores to shadow stack by instructions other than `sspush`, and `c.sspush` +Stores to shadow stack by instructions other than `SSPUSH`, and `C.SSPUSH` cause a store/AMO access fault exception, rather than a store/AMO page fault exception, to indicate fatality. @@ -748,8 +814,8 @@ system should service that fault and correct the condition. Correcting the condition is not possible in this case. The page fault handler would have to resort to decoding the opcode of the instruction that caused the page fault to determine if it was caused by non-shadow-stack-stores to shadow stack pages -(which is a fatal condition) vs. a page fault caused by an `sspush` or -`c.sspush` to a non-resident page (which is a recoverable condition). Since +(which is a fatal condition) vs. a page fault caused by an `SSPUSH` or +`C.SSPUSH` to a non-resident page (which is a recoverable condition). Since the operating system page fault handler is typically performance-critical, causing an access fault instead of a page fault enables the operating system to easily distinguish between the fatal/non-recoverable conditions and recoverable @@ -757,7 +823,7 @@ page faults. On implementations where address misaligned exception is prioritized higher than access fault exception, a trap handler handler that emulates misaligned stores -must cause an access fault exception if the store is not `sspush` or `c.sspush`, +must cause an access fault exception if the store is not `SSPUSH` or `C.SSPUSH`, and the store is being made to a shadow stack page. Shadow stack instructions cause an access fault if the accessed page is not a @@ -793,8 +859,8 @@ follows: fields of the `mstatus` register, stop and raise a page fault exception corresponding to the original access type. -The PMA checks are extended to require memory referenced by `sspush`, -`c.sspush`, `c.sspopchk`, and `sspopchk` to be idempotent. +The PMA checks are extended to require memory referenced by `SSPUSH`, +`C.SSPUSH`, `C.SSPOPCHK`, and `SSPOPCHK` to be idempotent. The `U` and `SUM` bit enforcement is performed normally for shadow stack instruction initiated memory accesses. The state of the `MXR` bit does not @@ -825,9 +891,9 @@ access fault exception. ==== The G-stage address translation and protections remain unaffected by Zicfiss -extension. When G-stage page tables are active, the `c.sspopchk` and `sspopchk` +extension. When G-stage page tables are active, the `C.SSPOPCHK` and `SSPOPCHK` instructions require the G-stage page table to have read permission for the -accessed memory, whereas the `c.sspush` and `sspush` instructions require write +accessed memory, whereas the `C.SSPUSH` and `SSPUSH` instructions require write permission. The `xwr == 010b` encoding in the G-stage PTE remains reserved. [NOTE] @@ -840,15 +906,15 @@ its guests. [[PMP_SS]] ==== PMP extension for shadow stack -When privilege mode is less than M, the PMP region accessed by `sspush` and -`c.sspush` must provide write permission and the PMP region accessed by -`c.sspopchk` and `sspopchk` must provide read permission. +When privilege mode is less than M, the PMP region accessed by `SSPUSH` and +`C.SSPUSH` must provide write permission and the PMP region accessed by +`C.SSPOPCHK` and `SSPOPCHK` must provide read permission. -The M-mode memory accesses by `sspush` and `c.sspush` instructions test for +The M-mode memory accesses by `SSPUSH` and `C.SSPUSH` instructions test for write permission in the matching PMP entry when permission checking is required. -The M-mode memory accesses by `c.sspopchk` and `sspopchk` instructions test for +The M-mode memory accesses by `C.SSPOPCHK` and `SSPOPCHK` instructions test for read permission in the matching PMP entry when permission checking is required. A new WARL field `SSPMP` is defined in the `mseccfg` CSR to identify a PMP entry @@ -859,11 +925,11 @@ When `mseccfg.MML` is 1, the `SSPMP` field is read-only else it may be written. When the `SSPMP` field is not zero, the following rules are additionally enforced for M-mode memory accesses: -* `sspush`, `c.sspush`, `sspopchk`, and `c.sspopchk` instructions must match the +* `SSPUSH`, `C.SSPUSH`, `SSPOPCHK`, and `C.SSPOPCHK` instructions must match the PMP entry identified by `SSPMP` else an access fault exception corresponding to the access type occurs. -* Write by instructions other than `sspush` and `c.sspush` that +* Write by instructions other than `SSPUSH` and `C.SSPUSH` that match the PMP entry identified by `SSPMP` cause an store/AMO access fault exception. diff --git a/cfi_forward.adoc b/cfi_forward.adoc index 7d6021b..9b04f40 100644 --- a/cfi_forward.adoc +++ b/cfi_forward.adoc @@ -2,11 +2,89 @@ == Landing pad (Zicfilp) To enforce forward-edge control-flow integrity, the Zicfilp extension introduces -a landing pad (`lpad`) instruction. The `lpad` instruction that must be placed +a landing pad (`LPAD`) instruction. The `LPAD` instruction that must be placed at the program locations that are valid targets of indirect jumps or calls. The -`lpad` instruction (See <>) is encoded using the `AUIPC` major opcode +`LPAD` instruction (See <>) is encoded using the `AUIPC` major opcode with `rd=x0`. +Compilers emit a landing pad instruction as the first instruction of an +address-taken functions, as well as at any indirect jump targets. A landing pad +instruction is not required in functions that are only reached using a direct +call or direct jump. + +The landing pad is designed to provide integrity to control transfers performed +using indirect call and jumps, and this is referred to as forward-edge +protection. When the Zicfilp is active, the hart tracks an expected landing pad +(`ELP`) state that is updated by an _indirect_call_ or _indirect_jump_ to +require a landing pad instruction at the target of the branch. If the +instruction at the target is not a landing pad, then a software error exception +is raised. + +A landing pad may be optionally associated with a 20-bit label. With labeling +enabled, the number of landing pads that can be reached from an indirect call +or jump site can be defined using programming language-based policies. Labeling +of the landing pads enables software to achieve greater precision in pairing up +indirect call/jump sites with valid targets. When labeling of landing pads +is used, indirect call or indirect jump site can specify the expected label of +the landing pad and thereby constrain the set of landing pads that may be +reached from each indirect call or indirect jump site in the program. + +In the simplest form, a program can be built with a single label value to +implement a coarse-grained version of forward-edge control-flow integrity. By +constraining gadgets to be preceded by a landing pad instruction that marks +the start of indirect callable functions, the program can significantly reduce +the available gadget space. A second form of label generation may generate a +signature, such as a MAC, using the prototype of the function. Programs that use +this approach would further constrain the gadgets accessible from a call site to +only indirect callable functions that match the prototype of the called +functions. Another approach to label generation involves analyzing the +control-flow-graph (CFG) of the program, which can lead to even more stringent +constraints on the set of reachable gadgets. Such programs may further use +multiple labels per function, which means that if a function is called from two +or more call sites, the functions can be labeled as reachable from each of the +call sites. For instance, consider two call sites A and B, where A calls the +functions X and Y, and B calls the functions Y and Z. In a single label scheme, +functions X, Y, and Z would need to be assigned the same label so that both call +sites A and B can invoke the common function Y. This scheme would allow call +site A to also call function Z and call site B to also call function X. However, +if function Y was assigned two labels - one corresponding to call site A and the +other to call site B, then Y can be invoked by both call sites, but X can only be +invoked by call site A and Z can only be invoked by call site B. To support +multiple labels, the compiler could generate a call-site-specific entry point +for shared functions, with each entry point having its own landing pad +instruction followed by a direct branch to the start of the function. This would +allow the function to be labeled with multiple labels, each corresponding to a +specific call site. A portion of the label space may be dedicated to labeled +landing pads that are only valid targets of an indirect jump (and not an +indirect call). + +The `LPAD` instruction uses the code points defined as HINTs for the `AUIPC` +opcode. When Zicfilp is not active at a privilege level or when the extension +is not implemented, the landing pad instruction executes as a no-op. A program +that is built with `LPAD` instruction can thus continue to operate correctly, +but without forward-edge control-flow integrity, on processors that do not +support the Zicfilp extension or if the Zicfilp extension is not active. + +Compilers and linkers should provided an attribute flag to indicate if the +program has been compiled with the Zicfilp extension and use that to determine +if the Zicfilp extension should be activated. The dynamic loader should activate +the use of Zicfilp extension for an application only if all executables (the +application and the dependent dynamically linked libraries) used by that +application use the Zicfiss extension. + +When Zicfilp extension is not active or not implemented, that hart does not +required landing pad instructions at targets of indirect calls/jumps and the +landing instructions revert to being a no-op. This allows a program compiled +with landing pad instructions to operate correctly but without forward-edge +control-flow integrity. + +The Zicfilp extensions may be activated for use individually and independently +for each privilege mode. + +The Zicfilp extension depends on the Zicsr extension. + +=== Landing pad enforceement + To enforce that the target of an indirect call or indirect jump must be a valid landing pad instruction, the hart maintains an expected landing pad (`ELP`) state to determine if a landing pad instruction is required at the target of an @@ -31,7 +109,7 @@ indirect jump must land on a landing pad, as specified in <>. If An indirect branch using `JALR`, `C.JALR`, or `C.JR` with `rs1` as `x7` is termed a software guarded branch. Such branches do not need to land on a -`lpad` instruction and thus do not set `ELP` to `LP_EXPECTED`. +`LPAD` instruction and thus do not set `ELP` to `LP_EXPECTED`. [NOTE] ==== @@ -67,16 +145,16 @@ performing bounds checking on the index into the table, etc.). The landing pad may be labeled. Zicfilp extension designates the register `x7` for use as the landing pad label register. To support labeled landing pads, the indirect call/jump sites establish an expected landing pad label (e.g., using -the `lui` instruction) in the bits 31:12 of the `x7` register. The `lpad` +the `lui` instruction) in the bits 31:12 of the `x7` register. The `LPAD` instruction is encoded with a 20-bit immediate value called the landing-pad-label (`LPL`) that is matched to the expected landing pad label. When `LPL` is encoded -as zero, the `lpad` instruction does not perform the label check and in programs +as zero, the `LPAD` instruction does not perform the label check and in programs built with this single label mode of operation the indirect call/jump sites do not need to establish an expected landing pad label value in `x7`. When `ELP` is set to `LP_EXPECTED`, if the next instruction in the instruction -stream is not 4-byte aligned, or is not `lpad`, or if the landing pad label -encoded in `lpad` is not zero and does not match the expected landing pad label +stream is not 4-byte aligned, or is not `LPAD`, or if the landing pad label +encoded in `LPAD` is not zero and does not match the expected landing pad label in bits 31:12 of the `x7` register, then a software error exception (cause=18) with `*tval` set to "landing pad fault (code=2)" is raised else the `ELP` is updated to `NO_LP_EXPECTED`. @@ -91,14 +169,14 @@ and increases the difficulty of using techniques such as branch-target-injection also known as Spectre variant 2, which use speculative execution to leak data through side channels. -The `lpad` requires a 4-byte alignment to address the concatenation of two +The `LPAD` requires a 4-byte alignment to address the concatenation of two instructions `A` and `B` accidentally forming an unintended landing pad in the program. For example, consider a 32-bit instruction where the bytes 3 and 2 have a pattern of `?017h` (for example, the immediate fields of a `lui`, `auipc`, or a `jal` instruction), followed by a 16-bit or a 32-bit instruction. When patterns that can accidentally form a valid landing pad are detected, the assembler or linker can force instruction `A` to be aligned to a 4-byte -boundary to force the unintended `lpad` pattern to become misaligned and thus +boundary to force the unintended `LPAD` pattern to become misaligned and thus not a valid landing pad or may use an alternate register allocation to prevent the accidental landing pad. ==== @@ -137,7 +215,7 @@ following rules apply to S-mode: * The hart does not update the expected landing pad (`ELP`) state, and the `ELP` state remains `NO_LP_EXPECTED`. -* The `lpad` instruction operates as a no-op. +* The `LPAD` instruction operates as a no-op. If the `LPE` field is 0 and S-mode is not supported, these rules apply to U-mode. @@ -165,7 +243,7 @@ following rules apply to VU/U-mode: * The hart does not update the expected landing pad (`ELP`) state and the `ELP` state remains `NO_LP_EXPECTED`. -* The `lpad` instruction operates as a no-op. +* The `LPAD` instruction operates as a no-op. ==== Hypervisor environment configuration registers (`henvcfg and henvcfgh`) @@ -194,7 +272,7 @@ rules apply to VS-mode: * The hart does not update the expected landing pad (`ELP`) state and the `ELP` state remains `NO_LP_EXPECTED`. -* The `lpad` instruction operates as a no-op. +* The `LPAD` instruction operates as a no-op. ==== Machine status registers (`mstatus`) @@ -334,7 +412,7 @@ apply to M-mode. * The hart does not update the expected landing pad (`ELP`) state and the `ELP` state remains `NO_LP_EXPECTED`. -* The `lpad` instruction operates as a no-op. +* The `LPAD` instruction operates as a no-op. ==== Debug Control and Status (`dcsr`) @@ -404,7 +482,7 @@ When S-mode is not supported, it is determined as follows: ==== The Zicfilp must be explicitly enabled for use at each privilege mode. -Programs compiled with the `lpad` instruction continue to function correctly, +Programs compiled with the `LPAD` instruction continue to function correctly, but without forward-edge CFI protection, when the Zicfilp extension is not implemented or is not enabled. ==== @@ -412,9 +490,9 @@ implemented or is not enabled. [[LP_INST]] === Landing pad instruction -When Zicfilp is enabled, `lpad` is the only instruction allowed to execute when +When Zicfilp is enabled, `LPAD` is the only instruction allowed to execute when the `ELP` state is `LP_EXPECTED`. If Zicfilp is not enabled then the instruction -is a no-op. If Zicfilp is enabled, the `lpad` instruction causes a software +is a no-op. If Zicfilp is enabled, the `LPAD` instruction causes a software error exception with `*tval` set to "landing pad fault (code=2)" if any of the following conditions are true: @@ -436,9 +514,9 @@ caused then the `ELP` is updated to `NO_LP_EXPECTED`. ], config:{lanes: 1, hspace:1024}} .... -The operation of the `lpad` instruction is as follows: +The operation of the `LPAD` instruction is as follows: -.`lpad` operation +.`LPAD` operation [listing] ---- if (xLPE != 0) @@ -470,9 +548,9 @@ of indirect call/jump was decoded, due to: The software error exception caused by Zicfilp has higher priority than an illegal instruction exception but lower priority than instruction access fault. -The software error exception due to the instruction not being an `lpad` +The software error exception due to the instruction not being an `LPAD` instruction when `ELP` is `LP_EXPECTED` or an software error exception caused by -the `lpad` instruction itself (See <>) leads to a trap being delivered +the `LPAD` instruction itself (See <>) leads to a trap being delivered to the same or to a higher privilege mode. In such cases, the `ELP` prior to the trap, the previous `ELP`, must be diff --git a/cfi_intro.adoc b/cfi_intro.adoc index 1152b34..a69ef67 100644 --- a/cfi_intro.adoc +++ b/cfi_intro.adoc @@ -59,149 +59,9 @@ and where the `rs1` is not `x1` or `x5` (i.e., not a return). A `C.JR` instruction where `rs1` is not `x1` or `x5` (i.e., not a return) is an _indirect-jump_. -The Zicfiss and Zicfilp extensions build on these conventions and hints. +The Zicfiss and Zicfilp extensions build on these conventions and hints and +provide backward-edge and forward-edge control flow integrity respectively. The +Zicfilp extension is specified in <> and the Zicfiss extension is +specified in <>. -=== Backward-edge control-flow integrity -To enforce backward-edge control-flow integrity, the Zicfiss extension -introduces a shadow stack. - -The shadow stack is designed to provide integrity to control transfers performed -using a _return_ (where the return may be from a procedure invoked using an -indirect call or a direct call), and this is referred to as backward-edge -protection. - -A program using backward-edge control-flow integrity has two stacks: a regular -stack and a shadow stack. The shadow stack is used to spill the link register, -if required, by non-leaf functions. An additional register, shadow-stack-pointer -(`ssp`), is introduced in the architecture to hold the address of the top of the -active shadow stack. - -The shadow stack is architecturally protected from inadvertent corruptions and -modifications, as detailed later (See <>). - -The Zicfiss extension provides instructions to store and load the link register -to/from the shadow stack and to check the integrity of the return address. The -extension provides instructions to support common stack maintenance operations -such as stack unwinding and stack switching. - -The Zicfiss instructions are encoded using a subset of "May be op" instructions -defined by the Zimop and Zcmop extensions cite:[ZIMOP]. This subset of -instructions revert to their Zimop/Zcmop defined behavior when the Zicfiss -extension is not implemented or if the extension has not been activated at a -privilege mode. A program that is built with Zicfiss instructions can thus -continue to operate correctly, but without backward-edge control-flow integrity, -on processors that do not support the Zicfiss extension or if the Zicfiss -extension is not active. - -The Zicfiss extensions may be activated for use individually and independently -for each privilege mode. - -Compilers should flag each object file (for example, using flags in the elf -attributes) to indicate if the object file has been compiled with the Zicfiss -instructions. The linker should flag (for example, using flags in the elf -attributes) the binary/executable generated by linking objects as being -compiled with the Zicfiss instructions only if all the object files that are -linked have the same Zicfiss attributes. - -The dynamic loader should activate the use of Zicfiss extension for an -application only if all executables (the application and the dependent -dynamically linked libraries) used by that application use the Zicfiss -extension. - -An application that has the Zicfiss extension active may request the dynamic -loader at runtime to load a new dynamic shared object (using dlopen() for -example). If the requested object does not have the Zicfiss attribute then -the dynamic loader, based on its policy (e.g, established by the operating -system or the administrator) configuration, could either deny the request or -deactivate the Zicfiss extension for the application. It is strongly recommended -that the policy enforces a strict security posture and denies the request. - -When the Zicfiss extension is not active or not implemented, the Zicfiss -instructions revert to their Zimop/Zcmop defined behavior. This allows a -compiled with Zicfiss instructions to operate correctly but without -backward-edge control-flow integrity. - -The Zicfiss extension is specified in <>. The Zicfiss extension -depends on the A, Zicsr, Zimop, and Zcmop extensions. - -=== Forward-edge control-flow integrity - -To enforce forward edge control-flow integrity, Zicfilp extension introduces -a landing pad (`lpad`) instruction that allows software to indicate valid -targets for indirect calls and jumps in a program. - -Compilers emit a landing pad instruction as the first instruction of an -address-taken functions, as well as at any indirect jump targets. A landing pad -instruction is not required in functions that are only reached using a direct -call or direct jump. - -The landing pad is designed to provide integrity to control transfers performed -using indirect call and jumps, and this is referred to as forward-edge -protection. When the Zicfilp is active, the hart tracks an expected landing pad -(`ELP`) state that is updated by an _indirect_call_ or _indirect_jump_ to -require a landing pad instruction at the target of the branch. If the -instruction at the target is not a landing pad, then a software error exception -is raised. - -A landing pad may be optionally associated with a 20-bit label. With labeling -enabled, the number of landing pads that can be reached from an indirect call -or jump site can be defined using programming language-based policies. Labeling -of the landing pads enables software to achieve greater precision in pairing up -indirect call/jump sites with valid targets. When labeling of landing pads -is used, indirect call or indirect jump site can specify the expected label of -the landing pad and thereby constrain the set of landing pads that may be -reached from each indirect call or indirect jump site in the program. - -In the simplest form, a program can be built with a single label value to -implement a coarse-grained version of forward-edge control-flow integrity. By -constraining gadgets to be preceded by a landing pad instruction that marks -the start of indirect callable functions, the program can significantly reduce -the available gadget space. A second form of label generation may generate a -signature, such as a MAC, using the prototype of the function. Programs that use -this approach would further constrain the gadgets accessible from a call site to -only indirect callable functions that match the prototype of the called -functions. Another approach to label generation involves analyzing the -control-flow-graph (CFG) of the program, which can lead to even more stringent -constraints on the set of reachable gadgets. Such programs may further use -multiple labels per function, which means that if a function is called from two -or more call sites, the functions can be labeled as reachable from each of the -call sites. For instance, consider two call sites A and B, where A calls the -functions X and Y, and B calls the functions Y and Z. In a single label scheme, -functions X, Y, and Z would need to be assigned the same label so that both call -sites A and B can invoke the common function Y. This scheme would allow call -site A to also call function Z and call site B to also call function X. However, -if function Y was assigned two labels - one corresponding to call site A and the -other to call site B, then Y can be invoked by both call sites, but X can only be -invoked by call site A and Z can only be invoked by call site B. To support -multiple labels, the compiler could generate a call-site-specific entry point -for shared functions, with each entry point having its own landing pad -instruction followed by a direct branch to the start of the function. This would -allow the function to be labeled with multiple labels, each corresponding to a -specific call site. A portion of the label space may be dedicated to labeled -landing pads that are only valid targets of an indirect jump (and not an -indirect call). - -The `lpad` instruction uses the code points defined as HINTs for the `AUIPC` -opcode. When Zicfilp is not active at a privilege level or when the extension -is not implemented, the landing pad instruction executes as a no-op. A program -that is built with `lpad` instruction can thus continue to operate correctly, -but without forward-edge control-flow integrity, on processors that do not -support the Zicfilp extension or if the Zicfilp extension is not active. - -As discussed earlier for the Zicfiss extension, compilers, linkers, and dynamic -loaders should provided an attribute flag to indicate if the program has been -compiled with the Zicfilp extension and use that to determine if the Zicfilp -extension should be activated. - -When Zicfilp extension is not active or not implemented, that hart does not -required landing pad instructions at targets of indirect calls/jumps and the -landing instructions revert to being a no-op. This allows a program compiled -with landing pad instructions to operate correctly but without forward-edge -control-flow integrity. - -The Zicfilp extensions may be activated for use individually and independently -for each privilege mode. - -The Zicfilp extension is specified in <>. The Zicfilp extension depends -on the Zicsr extension.