Skip to content

Commit

Permalink
Relocate clock gating switch (#1124)
Browse files Browse the repository at this point in the history
  • Loading branch information
stnolting authored Dec 22, 2024
2 parents d3c09c0 + facd50a commit 0d7e96c
Show file tree
Hide file tree
Showing 17 changed files with 126 additions and 113 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ mimpid = 0x01040312 -> Version 01.04.03.12 -> v1.4.3.12

| Date | Version | Comment | Ticket |
|:----:|:-------:|:--------|:------:|
| 22.12.2024 | 1.10.7.7 | :warning: move clock gating switch from processor top to CPU clock; `CLOCK_GATING_EN` is now a CPU tuning option | [#1124](https://github.com/stnolting/neorv32/pull/1124) |
| 21.12.2024 | 1.10.7.6 | minor rtl cleanups and optimizations | [#1123](https://github.com/stnolting/neorv32/pull/1123) |
| 19.12.2024 | 1.10.7.5 | :test_tube: use time-multiplex PMP architecture (reducing area footprint) | [#1105](https://github.com/stnolting/neorv32/pull/1105) |
| 14.12.2024 | 1.10.7.4 | :sparkles: add new module: I2C-compatible **Two-Wire Device Controller (TWD)** | [#1121](https://github.com/stnolting/neorv32/pull/1121) |
Expand Down
72 changes: 51 additions & 21 deletions docs/datasheet/cpu.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -65,23 +65,22 @@ direction as seen from the CPU.
|=======================
| Signal | Width/Type | Dir | Description
4+^| **Global Signals**
| `clk_i` | 1 | in | Global clock line, all registers triggering on rising edge, this clock can be switched off during <<_sleep_mode>>
| `clk_aux_i` | 1 | in | Always-on clock, used to keep the the sleep control active when `clk_i` is switched off
| `rstn_i` | 1 | in | Global reset, low-active
| `sleep_o` | 1 | out | CPU is in <<_sleep_mode>> when set
| `debug_o` | 1 | out | CPU is in <<_cpu_debug_mode,debug mode>> when set
| `clk_i` | 1 | in | Global clock line, all registers triggering on rising edge.
| `rstn_i` | 1 | in | Global reset, low-active.
| `sleep_o` | 1 | out | CPU is in <<_sleep_mode>> when set.
| `debug_o` | 1 | out | CPU is in <<_cpu_debug_mode,debug mode>> when set.
4+^| **Interrupts (<<_traps_exceptions_and_interrupts>>)**
| `msi_i` | 1 | in | RISC-V machine software interrupt
| `mei_i` | 1 | in | RISC-V machine external interrupt
| `mti_i` | 1 | in | RISC-V machine timer interrupt
| `firq_i` | 16 | in | Custom fast interrupt request signals
| `dbi_i` | 1 | in | Request CPU to halt and enter debug mode (RISC-V <<_on_chip_debugger_ocd>>)
| `msi_i` | 1 | in | RISC-V machine software interrupt.
| `mei_i` | 1 | in | RISC-V machine external interrupt.
| `mti_i` | 1 | in | RISC-V machine timer interrupt.
| `firq_i` | 16 | in | Custom fast interrupt request signals.
| `dbi_i` | 1 | in | Request CPU to halt and enter debug mode (RISC-V <<_on_chip_debugger_ocd>>).
4+^| **Instruction <<_bus_interface>>**
| `ibus_req_o` | `bus_req_t` | out | Instruction fetch bus request
| `ibus_rsp_i` | `bus_rsp_t` | in | Instruction fetch bus response
| `ibus_req_o` | `bus_req_t` | out | Instruction fetch bus request.
| `ibus_rsp_i` | `bus_rsp_t` | in | Instruction fetch bus response.
4+^| **Data <<_bus_interface>>**
| `dbus_req_o` | `bus_req_t` | out | Data access (load/store) bus request
| `dbus_rsp_i` | `bus_rsp_t` | in | Data access (load/store) bus response
| `dbus_req_o` | `bus_req_t` | out | Data access (load/store) bus request.
| `dbus_rsp_i` | `bus_rsp_t` | in | Data access (load/store) bus response.
|=======================

.Bus Interface Protocol
Expand Down Expand Up @@ -110,6 +109,7 @@ The generic type "suv(x:y)" represents a `std_ulogic_vector(x downto y)`.
[options="header",grid="rows"]
|=======================
| Name | Type | Description
| `HART_ID` | suv(31:0) | Value for the <<_mhartid>> CSR.
| `VENDOR_ID` | suv(31:0) | Value for the <<_mvendorid>> CSR.
| `BOOT_ADDR` | suv(31:0) | CPU reset address. See section <<_address_space>>.
| `DEBUG_PARK_ADDR` | suv(31:0) | "Park loop" entry address for the <<_on_chip_debugger_ocd>>, has to be 4-byte aligned.
Expand All @@ -119,6 +119,10 @@ The generic type "suv(x:y)" represents a `std_ulogic_vector(x downto y)`.
| `RISCV_ISA_Smpmp` | boolean | Implement RISC-V-compatible physical memory protection (PMP). See section <<_smpmp_isa_extension>>.
|=======================

.Tuning Option Generics
[TIP]
Additional generics that are related to certain _tuning options_ are listed in section <<_cpu_tuning_options>>.


<<<
// ####################################################################################################################
Expand Down Expand Up @@ -253,6 +257,21 @@ Note that these configuration options have no impact on the actual functionality
Software can check for configured tuning options via specific flags in the <<_mxisa>> CSR.


{empty} +
[discrete]
===== **`CLOCK_GATING_EN`**

[cols="<1,<8"]
[frame="topbot",grid="none"]
|=======================
| Name | Clock gating
| Type | `boolean`
| Default | `false`, disabled
| Description | When **enabled** the CPU's primary clock is switched off when the CPU enters <<_sleep_mode>>. See <<_cpu_clock_gating>>.
| | When **disabled** the CPU clock system is implemented as single always-on clock domain.
|=======================


{empty} +
[discrete]
===== **`FAST_MUL_EN`**
Expand Down Expand Up @@ -314,7 +333,7 @@ like blockRAM. Note that these primitives do not provide any kind of hardware re
==== Sleep Mode

The NEORV32 CPU provides a single sleep mode that can be entered to power-down the core reducing
dynamic power consumption. Sleep mode is entered by executing the `wfi` ("wait for interrupt") instruction.
dynamic power consumption. Sleep mode is entered by executing the RISC-V `wfi` ("wait for interrupt") instruction.

.Execution Details
[NOTE]
Expand All @@ -323,22 +342,33 @@ if `TW` in <<_mstatus>> is set. When executed in debug-mode or during single-ste
simple `nop` without entering sleep mode.

After executing the `wfi` instruction the CPU's `sleep_o` signal (<<_cpu_top_entity_signals>>) will become set
as soon as the CPU has fully halted ("CPU is sleeping"):
as soon as the CPU has fully halted:

[start=1]
.The front-end (instruction fetch) has stopped. There is no pending instruction fetch bus access.
.The back-end (instruction execution) has stopped. There is no pending data bus access.
.There is no enabled interrupt being pending.

CPU-external modules like memories, timers and peripheral interfaces are not affected by this. Furthermore, the CPU will
continue to buffer/enqueue incoming interrupt. The CPU will leave sleep mode as soon as any _enabled_ interrupt (via <<_mie>>)
continue to buffer/enqueue incoming interrupts. The CPU will leave sleep mode as soon as any _enabled_ interrupt (via <<_mie>>)
source becomes _pending_ or if a debug session is started.

===== Power-Down Mode

Optionally, the sleep mode can also be used to shut down the CPU's main clock to further reduce power consumption
by halting the core's clock tree. This clock gating mode is enabled by the `CLOCK_GATING_EN` generic
(<<_processor_top_entity_generics>>). See section <<_processor_clocking>> for more information.
==== CPU Clock Gating

The single clock domain of the CPU core can be split into an always-on clock domain and a switchable clock domain.
The switchable clock domain can be deactivated to further reduce reduce dynamic power consumption. CPU-external modules
like timers, interfaces and memories are not affected by the clock gating.

The splitting into two clock domain is enabled by the `CLOCK_GATING_EN` generic (<<_processor_top_entity_generics>> /
<<_cpu_tuning_options>>). When enabled, a generic clock switching gate is added to decouple the switchable clock from
the always-on clock domain. Whenever the CPU enters <<_sleep_mode>> the switchable clock domain is shut down.

.Clock Switch Hardware
[NOTE]
By default, a generic clock switch is used (`rtl/core/neorv32_clockgate.vhd`). Especially for FPGA setups it is highly
recommended to replace this default module by a technology-specific primitive or macro wrapper to improve synthesis results
(clock skew, global clock tree usage, etc.).


==== Full Virtualization
Expand Down
9 changes: 5 additions & 4 deletions docs/datasheet/cpu_csr.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -979,9 +979,10 @@ discover ISA sub-extensions and CPU configuration options
| 23 | `CSR_MXISA_ZBB` | r/- | <<_zbb_isa_extension>> available
| 24 | `CSR_MXISA_ZBS` | r/- | <<_zbs_isa_extension>> available
| 25 | `CSR_MXISA_ZALRSC` | r/- | <<_zalrsc_isa_extension>> available
| 27:26 | - | r/- | _reserved_, hardwired to zero
| 28 | `CSR_MXISA_RFHWRST` | r/- | full hardware reset of register file available when set (`REGFILE_HW_RST`)
| 29 | `CSR_MXISA_FASTMUL` | r/- | fast multiplication available when set (`FAST_MUL_EN`)
| 30 | `CSR_MXISA_FASTSHIFT` | r/- | fast shifts available when set (`FAST_SHIFT_EN`)
| 28:26 | - | r/- | _reserved_, hardwired to zero
| 27 | `CSR_MXISA_CLKGATE` | r/- | sleep-mode clock gating implemented when set (`CLOCK_GATING_EN`), see <<_cpu_tuning_options>
| 28 | `CSR_MXISA_RFHWRST` | r/- | full hardware reset of register file available when set (`REGFILE_HW_RST`), see <<_cpu_tuning_options>>
| 29 | `CSR_MXISA_FASTMUL` | r/- | fast multiplication available when set (`FAST_MUL_EN`), see <<_cpu_tuning_options>
| 30 | `CSR_MXISA_FASTSHIFT` | r/- | fast shifts available when set (`FAST_SHIFT_EN`), see <<_cpu_tuning_options>
| 31 | `CSR_MXISA_IS_SIM` | r/- | set if CPU is being **simulated** (⚠️ not guaranteed)
|=======================
28 changes: 7 additions & 21 deletions docs/datasheet/soc.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,6 @@ The generic type "`suv(x:y)`" is an abbreviation for "`std_ulogic_vector(x downt
| Name | Type | Default | Description
4+^| **<<_processor_clocking>>**
| `CLOCK_FREQUENCY` | natural | 0 | The clock frequency of the processor's `clk_i` input port in Hertz (Hz).
| `CLOCK_GATING_EN` | boolean | false | Enable clock gating when CPU is in sleep mode (see sections <<_sleep_mode>> and <<_processor_clocking>>).
4+^| **Core Identification**
| `HART_ID` | suv(31:0) | x"00000000" | The hart thread ID of the CPU (passed to <<_mhartid>> CSR).
| `JEDEC_ID` | suv(10:0) | "00000000000" | JEDEC ID; continuation codes plus vendor ID (passed to <<_mvendorid>> CSR and to the <<_debug_transport_module_dtm>>).
Expand Down Expand Up @@ -243,7 +242,8 @@ The generic type "`suv(x:y)`" is an abbreviation for "`std_ulogic_vector(x downt
| `RISCV_ISA_Zksh` | boolean | false | Enable <<_zksh_isa_extension>> (scalar cryptography ShangMi hash functions).
| `RISCV_ISA_Zmmul` | boolean | false | Enable <<_zmmul_isa_extension>> (hardware-based integer multiplication).
| `RISCV_ISA_Zxcfu` | boolean | false | Enable NEORV32-specific <<_zxcfu_isa_extension>> (custom RISC-V instructions).
4+^| **CPU <<_architecture>> Tuning Options**
4+^| **<<_cpu_tuning_options>>**
| `CLOCK_GATING_EN` | boolean | false | Implement sleep-mode clock gating (see sections <<_sleep_mode>> and <<_processor_clocking>>).
| `FAST_MUL_EN` | boolean | false | Implement fast but large full-parallel multipliers (trying to infer DSP blocks); see section <<_cpu_arithmetic_logic_unit>>.
| `FAST_SHIFT_EN` | boolean | false | Implement fast but large full-parallel barrel shifters; see section <<_cpu_arithmetic_logic_unit>>.
| `REGFILE_HW_RST` | boolean | false | Implement full hardware reset for register file (use individual FFs instead of BRAM); see section <<_cpu_register_file>>.
Expand Down Expand Up @@ -329,27 +329,13 @@ The generic type "`suv(x:y)`" is an abbreviation for "`std_ulogic_vector(x downt

The processor is implemented as fully-synchronous logic design using a single clock domain that is driven entirely
by the top's `clk_i` signal. This clock signal is used by all internal registers and memories. All of them trigger
on the **rising edge** of this clock signal - the only exception it the default <<_clock_gating>> module. External
"clocks" like the OCD's JTAG clock or the SDI's serial clock are synchronized into the processor's clock domain
before being used as "general logic signal" (and not as a dedicated clock).
on the **rising edge** of this clock signal. External "clocks" like the OCD's JTAG clock or the SDI's serial clock
are synchronized into the processor's clock domain before being used as "general logic signal" (and not as a dedicated clock).

==== Clock Gating

The single clock domain of the processor can be split into an always-on clock domain and a switchable clock domain.
The switchable clock domain is used to clock the CPU core, the CPU's bus switch and - if implemented - the caches.
This domain can be deactivated to reduce power consumption. The always-on clock domain is used to clock all other
processor modules like peripherals, memories and IO devices. Hence, these modules can continue operation (e.g. a
timer keeps running) even if the CPU is shut down.

The splitting into two clock domain is enabled by the `CLOCK_GATING_EN` generic (<<_processor_top_entity_generics>>).
When enabled, a generic clock switching gate is added to decouple the switchable clock from the always-on clock domain
(VHDL file `neorv32_clockgate.vhd`). Whenever the CPU enters <<_sleep_mode>> the CPU clock domain ist shut down.

.Clock Switch Hardware
.CPU Clock Gating
[NOTE]
By default, a generic clock gate is used (`rtl/core/neorv32_clockgate.vhd`) to shut down the CPU clock.
Especially for FPGA setups it is highly recommended to replace this default version by a technology-specific primitive
or macro wrapper to improve efficiency (clock skew, global clock tree usage, etc.).
The CPU core provides an optional clock-gating feature to switch off large parts of the core when sleep mode is entered.
See section <<_cpu_clock_gating>> for more information.

==== Peripheral Clocks

Expand Down
2 changes: 1 addition & 1 deletion docs/datasheet/soc_sysinfo.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ Bit fields in this register are set to all-zero if the according memory system i
| `4` | `SYSINFO_SOC_OCD` | set if on-chip debugger is implemented (via top's `OCD_EN` generic)
| `5` | `SYSINFO_SOC_ICACHE` | set if processor-internal instruction cache is implemented (via top's `ICACHE_EN` generic)
| `6` | `SYSINFO_SOC_DCACHE` | set if processor-internal data cache is implemented (via top's `DCACHE_EN` generic)
| `7` | `SYSINFO_SOC_CLOCK_GATING` | set if CPU clock gating is implemented (via top's `CLOCK_GATING_EN` generic)
| `7` | - |_reserved_, read as zero
| `8` | `SYSINFO_SOC_XBUS_CACHE` | set if external bus interface cache is implemented (via top's `XBUS_CACHE_EN` generic)
| `9` | `SYSINFO_SOC_XIP` | set if XIP module is implemented (via top's `XIP_EN` generic)
| `10` | `SYSINFO_SOC_XIP_CACHE` | set if XIP cache is implemented (via top's `XIP_CACHE_EN` generic)
Expand Down
47 changes: 34 additions & 13 deletions rtl/core/neorv32_cpu.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ entity neorv32_cpu is
RISCV_ISA_Sdtrig : boolean; -- implement trigger module extension
RISCV_ISA_Smpmp : boolean; -- implement physical memory protection
-- Tuning Options --
CLOCK_GATING_EN : boolean; -- enable clock gating when in sleep mode
FAST_MUL_EN : boolean; -- use DSPs for M extension's multiplier
FAST_SHIFT_EN : boolean; -- use barrel shifter for shift operations
REGFILE_HW_RST : boolean; -- implement full hardware reset for register file
Expand All @@ -69,7 +70,6 @@ entity neorv32_cpu is
port (
-- global control --
clk_i : in std_ulogic; -- switchable global clock, rising edge
clk_aux_i : in std_ulogic; -- always-on clock, rising edge
rstn_i : in std_ulogic; -- global reset, low-active, async
sleep_o : out std_ulogic; -- cpu is in sleep mode when set
debug_o : out std_ulogic; -- cpu is in debug mode when set
Expand Down Expand Up @@ -108,6 +108,7 @@ architecture neorv32_cpu_rtl of neorv32_cpu is
signal xcsr_rdata_res : std_ulogic_vector(XLEN-1 downto 0);

-- local signals --
signal clk_gated : std_ulogic; -- switchable clock (clock gating)
signal ctrl : ctrl_bus_t; -- main control bus
signal alu_imm : std_ulogic_vector(XLEN-1 downto 0); -- immediate
signal rf_wdata : std_ulogic_vector(XLEN-1 downto 0); -- register file write data
Expand Down Expand Up @@ -178,6 +179,25 @@ begin
assert not is_simulation_c report "[NEORV32] Assuming this is a simulation." severity warning;


-- Clock Gating ---------------------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
neorv32_cpu_clockgate_enabled:
if CLOCK_GATING_EN generate
neorv32_cpu_clockgate_inst: entity neorv32.neorv32_clockgate
port map (
clk_i => clk_i,
rstn_i => rstn_i,
halt_i => ctrl.cpu_sleep,
clk_o => clk_gated
);
end generate;

neorv32_cpu_clockgate_disabled:
if not CLOCK_GATING_EN generate
clk_gated <= clk_i;
end generate;


-- Control Unit ---------------------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
neorv32_cpu_control_inst: entity neorv32.neorv32_cpu_control
Expand Down Expand Up @@ -219,6 +239,7 @@ begin
RISCV_ISA_Sdtrig => RISCV_ISA_Sdtrig, -- implement trigger module extension
RISCV_ISA_Smpmp => RISCV_ISA_Smpmp, -- implement physical memory protection
-- Tuning Options --
CLOCK_GATING_EN => CLOCK_GATING_EN, -- enable clock gating when in sleep mode
FAST_MUL_EN => FAST_MUL_EN, -- use DSPs for M extension's multiplier
FAST_SHIFT_EN => FAST_SHIFT_EN, -- use barrel shifter for shift operations
REGFILE_HW_RST => REGFILE_HW_RST, -- implement full hardware reset for register file
Expand All @@ -228,8 +249,8 @@ begin
)
port map (
-- global control --
clk_i => clk_i, -- global clock, rising edge
clk_aux_i => clk_aux_i, -- always-on clock, rising edge
clk_i => clk_gated, -- global clock, rising edge
clk_aux_i => clk_i, -- always-on clock, rising edge
rstn_i => rstn_i, -- global reset, low-active, async
ctrl_o => ctrl, -- main control bus
-- instruction fetch interface --
Expand Down Expand Up @@ -283,14 +304,14 @@ begin
)
port map (
-- global control --
clk_i => clk_i, -- global clock, rising edge
rstn_i => rstn_i, -- global reset, low-active, async
ctrl_i => ctrl, -- main control bus
clk_i => clk_gated, -- global clock, rising edge
rstn_i => rstn_i, -- global reset, low-active, async
ctrl_i => ctrl, -- main control bus
-- operands --
rd_i => rf_wdata, -- destination operand rd
rs1_o => rs1, -- source operand rs1
rs2_o => rs2, -- source operand rs2
rs3_o => rs3 -- source operand rs3
rd_i => rf_wdata, -- destination operand rd
rs1_o => rs1, -- source operand rs1
rs2_o => rs2, -- source operand rs2
rs3_o => rs3 -- source operand rs3
);

-- all buses are zero unless there is an according operation --
Expand Down Expand Up @@ -324,7 +345,7 @@ begin
)
port map (
-- global control --
clk_i => clk_i, -- global clock, rising edge
clk_i => clk_gated, -- global clock, rising edge
rstn_i => rstn_i, -- global reset, low-active, async
ctrl_i => ctrl, -- main control bus
-- CSR interface --
Expand Down Expand Up @@ -355,7 +376,7 @@ begin
)
port map (
-- global control --
clk_i => clk_i, -- global clock, rising edge
clk_i => clk_gated, -- global clock, rising edge
rstn_i => rstn_i, -- global reset, low-active, async
ctrl_i => ctrl, -- main control bus
-- cpu data access interface --
Expand Down Expand Up @@ -385,7 +406,7 @@ begin
)
port map (
-- global control --
clk_i => clk_i, -- global clock, rising edge
clk_i => clk_gated, -- global clock, rising edge
rstn_i => rstn_i, -- global reset, low-active, async
ctrl_i => ctrl, -- main control bus
-- CSR interface --
Expand Down
Loading

0 comments on commit 0d7e96c

Please sign in to comment.