Skip to content

Commit

Permalink
🧪 [pmp] use time-multiplex approach (#1105)
Browse files Browse the repository at this point in the history
  • Loading branch information
stnolting authored Dec 19, 2024
2 parents 9d13cc5 + 8e1fffb commit 7047223
Show file tree
Hide file tree
Showing 6 changed files with 200 additions and 207 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ mimpid = 0x01040312 -> Version 01.04.03.12 -> v1.4.3.12

| Date | Version | Comment | Ticket |
|:----:|:-------:|:--------|:------:|
| 19.12.2024 | 1.10.7.5 | :test_tube: use time-multiplex PMP architecture (reducing area footprint) | [#1105](https://github.com/stnolting/neorv32/pull/1105) |
| 14.12.2024 | 1.10.7.4 | :sparkles: add new module: I2C-compatible **Two-Wire Device Controller (TWD)** | [#1121](https://github.com/stnolting/neorv32/pull/1121) |
| 14.12.2024 | 1.10.7.3 | :warning: rework TRNG (change HAL; remove interrupt) | [#1120](https://github.com/stnolting/neorv32/pull/1120) |
| 12.12.2024 | 1.10.7.2 | add external memory configuration/initialization options to testbench | [#1119](https://github.com/stnolting/neorv32/pull/1119) |
Expand Down
48 changes: 30 additions & 18 deletions docs/datasheet/cpu.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -1093,28 +1093,40 @@ does not complete operation within this time window.

==== `Smpmp` ISA Extension

The NEORV32 physical memory protection (PMP) provides an elementary memory
protection mechanism that can be used to constrain read, write and execute rights of arbitrary memory regions.
The NEORV32 PMP is fully compatible to the RISC-V Privileged Architecture Specifications. In general, the PMP can
**grant permissions to user mode**, which by default has none, and can **revoke permissions from M-mode**, which
by default has full permissions. The PMP is configured via the <<_machine_physical_memory_protection_csrs>>.

Several <<_processor_top_entity_generics>> are provided to fine-tune the CPU's PMP capabilities:

* `PMP_NUM_REGIONS` defines the number of implemented PMP region
* `PMP_MIN_GRANULARITY` defines the minimal granularity of each region
* `PMP_TOR_MODE_EN` controls the implementation of the top-of-region (TOR) mode
* `PMP_NAP_MODE_EN` controls the implementation of the naturally-aligned-power-of-two (NA4 and NAPOT) modes
The NEORV32 physical memory protection (PMP) provides an elementary memory protection mechanism that can be used
to configure read/write(execute permission of arbitrary memory regions. In general, the PMP can **grant permissions
to user mode**, which by default has none, and can **revoke permissions from M-mode**, which by default has full
permissions. The NEORV32 PMP is fully compatible to the RISC-V Privileged Architecture Specifications and is
configured via several CSRs (<<_machine_physical_memory_protection_csrs>>). Several <<_processor_top_entity_generics>>
are provided to adjust the CPU's PMP capabilities according to the application requirements (pre-synthesis):

. `PMP_NUM_REGIONS` defines the number of implemented PMP regions (0..16); setting this generic to zero will
result in absolutely no PMP logic being implemented
. `PMP_MIN_GRANULARITY` defines the minimal granularity of each region (has to be a power of 2, minimal
granularity = 4 bytes); note that a smaller granularity will lead to wider comparators and thus, to higher area footprint
and longer critical path
. `PMP_TOR_MODE_EN` controls the implementation of the top-of-region (TOR) mode (default = true); disabling this mode
will reduce area footprint
. `PMP_NAP_MODE_EN` controls the implementation of the naturally-aligned-power-of-two (NA4 and NAPOT) modes (default =
true); disabling this mode will reduce area footprint and critical path length

.PMP Permissions when in Debug Mode
[NOTE]
When in debug-mode all PMP rules are bypassed/ignored granting the debugger maximum access permissions.

.PMP Rules when in Debug Mode
.PMP Time-Multiplex
[NOTE]
When in debug-mode all PMP rules are ignored making the debugger have maximum access rights.
Instructions are executed in a multi-cycle manner. Hence, data access (load/store) and instruction fetch cannot occur
at the same time. Therefore, the PMP hardware uses only a single set of comparators for memory access permissions checks
that are switched in an iterative, time-multiplex style reducing hardware footprint by approx. 50% while maintaining
full security features and RISC-V compatibility.

.Protected Instruction Fetches
.PMP Memory Accesses
[IMPORTANT]
New instruction fetches are **always triggered even when denied** by a certain PMP rule. However, the fetched instruction(s)
will not be executed and will not change CPU core state. Instead, they will raise a bus exception when reaching the CPU's
executions stage.
Load/store accesses for which there are insufficient access permission do not trigger any memory/bus accesses at all.
In contrast, instruction accesses for which there are insufficient access permission nevertheless lead to a memory/bus
access (causing potential side effects on the memory side=. However, the fetched instruction will be discarded and the
corresponding exception will still be triggered precisely.


==== `Sdext` ISA Extension
Expand Down
26 changes: 12 additions & 14 deletions rtl/core/neorv32_cpu.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -121,11 +121,10 @@ architecture neorv32_cpu_rtl of neorv32_cpu is
signal csr_rdata : std_ulogic_vector(XLEN-1 downto 0); -- csr read data
signal lsu_mar : std_ulogic_vector(XLEN-1 downto 0); -- lsu memory address register
signal lsu_err : std_ulogic_vector(3 downto 0); -- lsu alignment/access errors
signal pc_fetch : std_ulogic_vector(XLEN-1 downto 0); -- pc for instruction fetch
signal pc_curr : std_ulogic_vector(XLEN-1 downto 0); -- current pc (for currently executed instruction)
signal pc_next : std_ulogic_vector(XLEN-1 downto 0); -- next PC (corresponding to next instruction)
signal pc_ret : std_ulogic_vector(XLEN-1 downto 0); -- return address
signal pmp_ex_fault : std_ulogic; -- pmp instruction fetch fault
signal pmp_rw_fault : std_ulogic; -- pmp read/write access fault
signal pmp_fault : std_ulogic; -- pmp permission violation
signal irq_machine : std_ulogic_vector(2 downto 0); -- risc-v standard machine-level interrupts

begin
Expand Down Expand Up @@ -234,17 +233,18 @@ begin
rstn_i => rstn_i, -- global reset, low-active, async
ctrl_o => ctrl, -- main control bus
-- instruction fetch interface --
ibus_pmperr_i => pmp_ex_fault, -- instruction fetch pmp fault
ibus_req_o => ibus_req_o, -- request
ibus_rsp_i => ibus_rsp_i, -- response
-- pmp fault --
pmp_fault_i => pmp_fault, -- instruction fetch / execute pmp fault
-- data path interface --
alu_cp_done_i => alu_cp_done, -- ALU iterative operation done
alu_cmp_i => alu_cmp, -- comparator status
alu_add_i => alu_add, -- ALU address result
alu_imm_o => alu_imm, -- immediate
rf_rs1_i => rs1, -- rf source 1
pc_fetch_o => pc_fetch, -- instruction fetch address
pc_curr_o => pc_curr, -- current PC (corresponding to current instruction)
pc_next_o => pc_next, -- next PC (corresponding to next instruction)
pc_ret_o => pc_ret, -- return address
csr_rdata_o => csr_rdata, -- CSR read data
-- external CSR interface --
Expand All @@ -269,8 +269,8 @@ begin
xcsr_rdata_res <= xcsr_rdata_pmp or xcsr_rdata_alu;

-- CPU state --
sleep_o <= ctrl.cpu_sleep; -- set when CPU is sleeping (after WFI)
debug_o <= ctrl.cpu_debug; -- set when CPU is in debug mode
sleep_o <= ctrl.cpu_sleep;
debug_o <= ctrl.cpu_debug;


-- Register File --------------------------------------------------------------------------
Expand Down Expand Up @@ -365,7 +365,7 @@ begin
mar_o => lsu_mar, -- memory address register
wait_o => lsu_wait, -- wait for access to complete
err_o => lsu_err, -- alignment/access errors
pmp_fault_i => pmp_rw_fault, -- PMP read/write access fault
pmp_fault_i => pmp_fault, -- PMP read/write access fault
-- data bus --
dbus_req_o => dbus_req_o, -- request
dbus_rsp_i => dbus_rsp_i -- response
Expand Down Expand Up @@ -394,19 +394,17 @@ begin
csr_wdata_i => xcsr_wdata, -- write data
csr_rdata_o => xcsr_rdata_pmp, -- read data
-- address input --
addr_if_i => pc_fetch, -- instruction fetch address
addr_if_i => pc_next, -- instruction fetch address
addr_ls_i => alu_add, -- load/store address
-- faults --
fault_ex_o => pmp_ex_fault, -- instruction fetch fault
fault_rw_o => pmp_rw_fault -- read/write access fault
-- access error --
fault_o => pmp_fault -- permission violation
);
end generate;

pmp_inst_false:
if not RISCV_ISA_Smpmp generate
xcsr_rdata_pmp <= (others => '0');
pmp_ex_fault <= '0';
pmp_rw_fault <= '0';
pmp_fault <= '0';
end generate;


Expand Down
43 changes: 20 additions & 23 deletions rtl/core/neorv32_cpu_control.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -79,17 +79,18 @@ entity neorv32_cpu_control is
rstn_i : in std_ulogic; -- global reset, low-active, async
ctrl_o : out ctrl_bus_t; -- main control bus
-- instruction fetch interface --
ibus_pmperr_i : in std_ulogic; -- instruction fetch pmp fault
ibus_req_o : out bus_req_t; -- request
ibus_rsp_i : in bus_rsp_t; -- response
-- pmp fault --
pmp_fault_i : in std_ulogic; -- instruction fetch / execute pmp fault
-- data path interface --
alu_cp_done_i : in std_ulogic; -- ALU iterative operation done
alu_cmp_i : in std_ulogic_vector(1 downto 0); -- comparator status
alu_add_i : in std_ulogic_vector(XLEN-1 downto 0); -- ALU address result
alu_imm_o : out std_ulogic_vector(XLEN-1 downto 0); -- immediate
rf_rs1_i : in std_ulogic_vector(XLEN-1 downto 0); -- rf source 1
pc_fetch_o : out std_ulogic_vector(XLEN-1 downto 0); -- instruction fetch address
pc_curr_o : out std_ulogic_vector(XLEN-1 downto 0); -- current PC (corresponding to current instruction)
pc_next_o : out std_ulogic_vector(XLEN-1 downto 0); -- next PC (corresponding to next instruction)
pc_ret_o : out std_ulogic_vector(XLEN-1 downto 0); -- return address
csr_rdata_o : out std_ulogic_vector(XLEN-1 downto 0); -- CSR read data
-- external CSR interface --
Expand Down Expand Up @@ -250,7 +251,7 @@ architecture neorv32_cpu_control_rtl of neorv32_cpu_control is
end record;
signal csr : csr_t;

-- hpm event configuration CSRs --
-- HPM event configuration CSRs --
type hpmevent_cfg_t is array (3 to 15) of std_ulogic_vector(hpmcnt_event_width_c-1 downto 0);
type hpmevent_rd_t is array (3 to 15) of std_ulogic_vector(XLEN-1 downto 0);
signal hpmevent_cfg : hpmevent_cfg_t;
Expand Down Expand Up @@ -308,18 +309,11 @@ begin
fetch_engine.pc <= (others => '0');
fetch_engine.priv <= '0';
elsif rising_edge(clk_i) then
-- restart request --
if (fetch_engine.state = IF_RESTART) then -- restart done
fetch_engine.restart <= '0';
else -- buffer request
fetch_engine.restart <= fetch_engine.restart or fetch_engine.reset;
end if;

-- fsm --
case fetch_engine.state is

when IF_REQUEST => -- request next 32-bit-aligned instruction word
-- ------------------------------------------------------------
fetch_engine.restart <= fetch_engine.restart or fetch_engine.reset; -- buffer restart request
if (ipb.free = "11") then -- free IPB space?
fetch_engine.state <= IF_PENDING;
elsif (fetch_engine.restart = '1') or (fetch_engine.reset = '1') then -- restart because of branch
Expand All @@ -328,6 +322,7 @@ begin

when IF_PENDING => -- wait for bus response and write instruction data to prefetch buffer
-- ------------------------------------------------------------
fetch_engine.restart <= fetch_engine.restart or fetch_engine.reset; -- buffer restart request
if (fetch_engine.resp = '1') then -- wait for bus response
fetch_engine.pc <= std_ulogic_vector(unsigned(fetch_engine.pc) + 4); -- next word
fetch_engine.pc(1) <= '0'; -- (re-)align to 32-bit
Expand All @@ -340,17 +335,17 @@ begin

when others => -- IF_RESTART: set new start address
-- ------------------------------------------------------------
fetch_engine.pc <= exe_engine.pc2(XLEN-1 downto 1) & '0'; -- initialize from PC incl. 16-bit-alignment bit
fetch_engine.priv <= csr.privilege_eff; -- set new privilege level
fetch_engine.state <= IF_REQUEST;
fetch_engine.restart <= '0'; -- restart done
fetch_engine.pc <= exe_engine.pc2(XLEN-1 downto 1) & '0'; -- initialize from PC incl. 16-bit-alignment bit
fetch_engine.priv <= csr.privilege_eff; -- set new privilege level
fetch_engine.state <= IF_REQUEST;

end case;
end if;
end process fetch_engine_fsm;

-- PC output for instruction fetch --
ibus_req_o.addr <= fetch_engine.pc(XLEN-1 downto 2) & "00"; -- word aligned
pc_fetch_o <= fetch_engine.pc(XLEN-1 downto 2) & "00"; -- word aligned

-- instruction fetch (read) request if IPB not full --
ibus_req_o.stb <= '1' when (fetch_engine.state = IF_REQUEST) and (ipb.free = "11") else '0';
Expand All @@ -359,8 +354,8 @@ begin
fetch_engine.resp <= ibus_rsp_i.ack or ibus_rsp_i.err;

-- IPB instruction data and status --
ipb.wdata(0) <= (ibus_rsp_i.err or ibus_pmperr_i) & ibus_rsp_i.data(15 downto 0);
ipb.wdata(1) <= (ibus_rsp_i.err or ibus_pmperr_i) & ibus_rsp_i.data(31 downto 16);
ipb.wdata(0) <= ibus_rsp_i.err & ibus_rsp_i.data(15 downto 0);
ipb.wdata(1) <= ibus_rsp_i.err & ibus_rsp_i.data(31 downto 16);

-- IPB write enable --
ipb.we(0) <= '1' when (fetch_engine.state = IF_PENDING) and (fetch_engine.resp = '1') and
Expand All @@ -384,7 +379,7 @@ begin
prefetch_buffer_inst: entity neorv32.neorv32_fifo
generic map (
FIFO_DEPTH => 2, -- number of IPB entries; has to be a power of two, min 2
FIFO_WIDTH => ipb.wdata(i)'length, -- size of data elements in fifo
FIFO_WIDTH => ipb.wdata(i)'length, -- size of data elements in FIFO
FIFO_RSYNC => false, -- we NEED to read data asynchronously
FIFO_SAFE => false, -- no safe access required (ensured by FIFO-external logic)
FULL_RESET => true -- map to FFs and add a dedicated reset
Expand Down Expand Up @@ -564,6 +559,7 @@ begin

-- PC output --
pc_curr_o <= exe_engine.pc(XLEN-1 downto 1) & '0'; -- address of current instruction
pc_next_o <= exe_engine.pc2(XLEN-1 downto 1) & '0'; -- address of next instruction
pc_ret_o <= exe_engine.ra(XLEN-1 downto 1) & '0'; -- return address

-- simplified rv32 opcode --
Expand All @@ -572,7 +568,8 @@ begin

-- Execute Engine FSM Comb ----------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
execute_engine_fsm_comb: process(exe_engine, debug_ctrl, trap_ctrl, hw_trigger_match, opcode, issue_engine, csr, alu_cp_done_i, lsu_wait_i, alu_add_i, branch_taken)
execute_engine_fsm_comb: process(exe_engine, debug_ctrl, trap_ctrl, hw_trigger_match, opcode, issue_engine, csr,
alu_cp_done_i, lsu_wait_i, alu_add_i, branch_taken, pmp_fault_i)
variable funct3_v : std_ulogic_vector(2 downto 0);
variable funct7_v : std_ulogic_vector(6 downto 0);
begin
Expand Down Expand Up @@ -688,11 +685,11 @@ begin
exe_engine_nxt.state <= BRANCHED; -- delay cycle to restart front-end

when EXECUTE => -- decode and execute instruction (control will be here for exactly 1 cycle in any case)
-- [NOTE] register file is read in this stage; due to the sync read, data will be available in the _next_ state
-- ------------------------------------------------------------
exe_engine_nxt.pc2 <= alu_add_i(XLEN-1 downto 1) & '0'; -- next PC (= PC + immediate)
trap_ctrl.instr_be <= pmp_fault_i; -- did this instruction cause a PMP-execute violation?

-- decode instruction class/type --
-- decode instruction class/type; [NOTE] register file is read in THIS stage; due to the sync read data will be available in the NEXT state --
case opcode is

-- register/immediate ALU operation --
Expand Down Expand Up @@ -1964,7 +1961,7 @@ begin
cnt_lo_rd(2) <= cnt.lo(2); -- instret
cnt_hi_rd(2) <= cnt.hi(2); -- instreth
end if;
-- hpm counters --
-- HPM counters --
if RISCV_ISA_Zihpm and (hpm_num_c > 0) then
for i in 3 to (hpm_num_c+3)-1 loop
if (hpm_cnt_lo_width_c > 0) then -- constrain low word size
Expand Down Expand Up @@ -2041,7 +2038,7 @@ begin
cnt.inc(0) <= (others => (cnt_event(hpmcnt_event_cy_c) and (not csr.mcountinhibit(0)) and (not debug_ctrl.run)));
cnt.inc(1) <= (others => '0'); -- time: not available
cnt.inc(2) <= (others => (cnt_event(hpmcnt_event_ir_c) and (not csr.mcountinhibit(2)) and (not debug_ctrl.run)));
-- hpm counters --
-- HPM counters --
for i in 3 to 15 loop
cnt.inc(i) <= (others => (or_reduce_f(cnt_event and hpmevent_cfg(i)) and (not csr.mcountinhibit(i)) and (not debug_ctrl.run)));
end loop;
Expand Down
Loading

0 comments on commit 7047223

Please sign in to comment.