From 2b40d18d95df787beb93d653f1f880038f7e9866 Mon Sep 17 00:00:00 2001 From: stnolting Date: Sun, 15 Dec 2024 20:25:50 +0100 Subject: [PATCH 1/3] [sw/lib] minor comment cleanups --- sw/lib/include/neorv32_twi.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/sw/lib/include/neorv32_twi.h b/sw/lib/include/neorv32_twi.h index f0cc26293..2039c6c4d 100644 --- a/sw/lib/include/neorv32_twi.h +++ b/sw/lib/include/neorv32_twi.h @@ -27,8 +27,8 @@ /**@{*/ /** TWI module prototype */ typedef volatile struct __attribute__((packed,aligned(4))) { - uint32_t CTRL; /**< offset 0: control register (#NEORV32_TWI_CTRL_enum) */ - uint32_t DCMD; /**< offset 4: data/cmd register (#NEORV32_TWI_DCMD_enum) */ + uint32_t CTRL; /**< offset 0: control register (#NEORV32_TWI_CTRL_enum) */ + uint32_t DCMD; /**< offset 4: data/cmd register (#NEORV32_TWI_DCMD_enum) */ } neorv32_twi_t; /** TWI module hardware access (#neorv32_twi_t) */ @@ -46,8 +46,8 @@ enum NEORV32_TWI_CTRL_enum { TWI_CTRL_CDIV3 = 7, /**< TWI control register(7) (r/w): Clock divider bit 3 */ TWI_CTRL_CLKSTR = 8, /**< TWI control register(8) (r/w): Enable/allow clock stretching */ - TWI_CTRL_FIFO_LSB = 15, /**< SPI control register(15) (r/-): log2(FIFO size), lsb */ - TWI_CTRL_FIFO_MSB = 18, /**< SPI control register(18) (r/-): log2(FIFO size), msb */ + TWI_CTRL_FIFO_LSB = 15, /**< TWI control register(15) (r/-): log2(FIFO size), lsb */ + TWI_CTRL_FIFO_MSB = 18, /**< TWI control register(18) (r/-): log2(FIFO size), msb */ TWI_CTRL_SENSE_SCL = 27, /**< TWI control register(27) (r/-): current state of the SCL bus line */ TWI_CTRL_SENSE_SDA = 28, /**< TWI control register(28) (r/-): current state of the SDA bus line */ From adca89713a6ab64b7c3d21d6ae4252d12ffba657 Mon Sep 17 00:00:00 2001 From: stnolting Date: Sun, 15 Dec 2024 20:25:57 +0100 Subject: [PATCH 2/3] [docs] minor typo fix --- docs/datasheet/soc_imem.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/datasheet/soc_imem.adoc b/docs/datasheet/soc_imem.adoc index d68434fe5..06b0f17fb 100644 --- a/docs/datasheet/soc_imem.adoc +++ b/docs/datasheet/soc_imem.adoc @@ -25,7 +25,7 @@ Note that this size should be a power of two to optimize physical implementation the IMEM is mapped to base address `0x00000000` (see section <<_address_space>>). By default the IMEM is implemented as true RAM so the content can be modified during run time. This is -required when using the <<_bootloader>> (or the <<_on_chip_debugger>>) so it can update the content of the IMEM at +required when using the <<_bootloader>> (or the <<_on_chip_debugger_ocd>>) so it can update the content of the IMEM at any time. Alternatively, the IMEM can be implemented as **pre-initialized read-only memory (ROM)**, so the processor can From 5bde2af7103bd30376e2f4b58957ae99af5c388f Mon Sep 17 00:00:00 2001 From: stnolting Date: Sun, 15 Dec 2024 20:26:09 +0100 Subject: [PATCH 3/3] [docs] cpu: add new section "tuning options" --- docs/datasheet/cpu.adoc | 119 +++++++++++++++++++++++++++++----------- 1 file changed, 86 insertions(+), 33 deletions(-) diff --git a/docs/datasheet/cpu.adoc b/docs/datasheet/cpu.adoc index 8a7df9475..e77d091ec 100644 --- a/docs/datasheet/cpu.adoc +++ b/docs/datasheet/cpu.adoc @@ -158,23 +158,10 @@ Up to four individual synchronous read ports allow to fetch up to 4 register ope are mutually exclusive as they happen in separate cycles. Hence, there is no need to consider things like "read-during-write" behavior. -The register file provides two different implementation options configured via the top's `REGFILE_HW_RST` generic. - -* `REGFILE_HW_RST = false` (default): In this configuration the register file is implemented as plain memory array without a -dictated hardware reset. This architecture allows to infer FPGA block RAM for the entire register file resulting in minimal -general logic utilization. -* `REGFILE_HW_RST = true`: This configuration is based on individual FFs that do provide a dedicated hardware reset. -Hence, the register cannot be mapped to FPGA block RAM. This optional can be selected if the application requires a -reset of the register file (e.g. for security reasons) or if the design shall be synthesized for an **ASIC** implementation. -Using individual FFs for th register file might also improve timing as no long routing lines are required to connect to -block RAM primitives. - -The state of this configuration generic can be checked by software via the <<_mxisa>> CSR. - -.FPGA Implementation -[WARNING] -Enabling the `REGFILE_HW_RST` option for FPGA implementation is not recommended as this will massively increase the amount -of required logic resources. +.Memory Tuning Options +[TIP] +The physical implementation of the register file's memory core can be tuned for certain design goals like area or throughput. +See section <<_cpu_tuning_options>> for more information. .Implementation of the `zero` Register within FPGA Block RAM [NOTE] @@ -208,12 +195,6 @@ and <<_b_isa_extension>>). The CPU control will raise an illegal instruction exception if a multi-cycle functional unit (like the <<_custom_functions_unit_cfu>>) does not complete processing in a bound amount of time (configured via the package's `monitor_mc_tmo_c` constant; default = 512 clock cycles). -.Tuning Options -[TIP] -The ALU architecture can be tuned for an application-specific area-vs-performance trade-off. The `FAST_MUL_EN` and `FAST_SHIFT_EN` -generics can be used to implement performance-optimized barrel shifters and DSP blocks, respectively. See sections <<_i_isa_extension>>, -<<_b_isa_extension>> and <<_m_isa_extension>> for specific examples. - :sectnums: ==== CPU Bus Unit @@ -261,6 +242,75 @@ CPU back-end for actual execution. Execution is conducted by a state-machine tha includes the <<_control_and_status_registers_csrs>> as well as the trap controller. +:sectnums: +==== CPU Tuning Options + +The top module provides several tuning options to optimize the CPU for a specific goal. +Note that these configuration options have no impact on the actual functionality (e.g. ISA compatibility). + +.Software Tuning Options Discovery +[TIP] +Software can check for configured tuning options via specific flags in the <<_mxisa>> CSR. + + +{empty} + +[discrete] +===== **`FAST_MUL_EN`** + +[cols="<1,<8"] +[frame="topbot",grid="none"] +|======================= +| Name | Fast multiplication +| Type | `boolean` +| Default | `false`, disabled +| Description | When **enabled** the `M`/`Zmmul` extension's multiplier is implemented as "plain multiplication" allowing the +synthesis tool to infer DSP blocks / multiplication primitives. Multiplication operations only require a few cycles due to the +DSP-internal register stages. The execution time is time-independent of the provided operands. +| | When **disabled** the `M`/`Zmmul` extension's multiplier is implemented as bit-serial multiplier that computes one +result bit in every cycle. Multiplication operations only requires at least 32 cycles but the entire execution time is still +time-independent of the provided operands. +|======================= + + +{empty} + +[discrete] +===== **`FAST_SHIFT_EN`** + +[cols="<1,<8"] +[frame="topbot",grid="none"] +|======================= +| Name | Fast bit shifting +| Type | `boolean` +| Default | `false`, disabled +| Description | When **enabled** the ALU's shifter unit is implemented as full-parallel barrel shifter that is capable +of shifting a data word by an arbitrary number of positions within a single cycle. Hence, the execution time of any base-ISA +shift operation is independent of the provided operands. Note that the barrel shifter requires a lot of hardware resources and +might also increase the core's critical path. +| | When **disabled** the ALU's shifter unit is implemented as bit-serial shifter that can shift the input data +only by one position per cycle. Hence, several cycles might be required to complete any base-ISA shift-related operations. +Therefore, the execution time of the serial approach is **not** time-independent of the provided operands. However, the serial +approach requires only a few hardware resources and does not impact the critical path. +|======================= + + +{empty} + +[discrete] +===== **`REGFILE_HW_RST`** + +[cols="<1,<8"] +[frame="topbot",grid="none"] +|======================= +| Name | Register file hardware reset +| Type | `boolean` +| Default | `false`, disabled +| Description | When **enabled** the CPU register file is implemented using single flip flops that provide a full hardware reset. +The register file is reset to all-zero after each hardware reset. Note that this options requires a lot of flip flops and LUTs to +build the register file. However, timing might be optimized as there is no need to route to far blockRAM resources. +| | When **disabled** the CPU register file is implemented in a way to allow synthesis to infer memory primitives +like blockRAM. Note that these primitives do not provide any kind of hardware reset. Hence, the data content is undefined after reset. +|======================= + + ==== Sleep Mode The NEORV32 CPU provides a single sleep mode that can be entered to power-down the core reducing @@ -555,11 +605,10 @@ platform-compatibility and to indicate the actual intention of the according fen The `wfi` instruction is used to enter <<_sleep_mode>>. Executing the `wfi` instruction in user-mode will raise an illegal instruction exception if the `TW` bit of <<_mstatus>> is set. -.Barrel Shifter +.Shifter Tuning Options [TIP] -The shift operations are implemented as multi-cycle ALU co-process (`rtl/core/neorv32_cpu_cp_shifter.vhd`). -These operations can be accelerated (at the cost of additional logic resources) by enabling the `FAST_SHIFT_EN` -configuration option that will replace the (time-variant) bit-serial shifter by a (time-constant) barrel shifter. +The physical implementation of the bit-shifter can be tuned for certain design goals like area or throughput. +See section <<_cpu_tuning_options>> for more information. ==== `M` ISA Extension @@ -576,10 +625,10 @@ This ISA extension is implemented as multi-cycle ALU co-process (`rtl/core/neorv | Division | `div` `divu` `rem` `remu` | 36 |======================= -.DSP Blocks +.Multiplication Tuning Options [TIP] -Multiplication operations can be accelerated (at the cost of additional logic resources) by enabling the `FAST_MUL_EN` -configuration option that will replace the (time-variant) bit-serial multiplier by (time-constant) FPGA DSP blocks. +The physical implementation of the multiplier can be tuned for certain design goals like area or throughput. +See section <<_cpu_tuning_options>> for more information. ==== `U` ISA Extension @@ -803,10 +852,10 @@ generic. This ISA extension is implemented as multi-cycle ALU co-processor (`rtl | Byte-reverse | `rev8` | 4 |======================= -.Shift Operations +.shifter Tuning Options [TIP] -Shift operations can be accelerated (at the cost of additional logic resources) by enabling the `FAST_SHIFT_EN` -configuration option that will replace the (time-variant) bit-serial shifter by a (time-constant) barrel shifter. +The physical implementation of the bit-shifter can be tuned for certain design goals like area or throughput. +See section <<_cpu_tuning_options>> for more information. ==== `Zbs` ISA Extension @@ -1164,6 +1213,10 @@ provide custom trap codes in <<_mcause>>. These FIRQs are reserved for NEORV32 p The following tables show all traps that are currently supported by the NEORV32 CPU. It also shows the prioritization and the CSR side-effects. +.FIRQ Mapping +[TIP] +See section <<_neorv32_specific_fast_interrupt_requests>> for the mapping of the FIRQ channels to the according hardware modules. + **Table Annotations** The "Prio." column shows the priority of each trap with the highest priority being 1. The "RTE Trap ID" aliases are