Skip to content

Commit

Permalink
⚠️✨ replace Zalrsc ISA extension by Zaamo ISA extension (#1141)
Browse files Browse the repository at this point in the history
  • Loading branch information
stnolting authored Jan 4, 2025
2 parents 4df1c72 + 7a54bb6 commit 651732d
Show file tree
Hide file tree
Showing 35 changed files with 322 additions and 1,254 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ mimpid = 0x01040312 -> Version 01.04.03.12 -> v1.4.3.12

| Date | Version | Comment | Ticket |
|:----:|:-------:|:--------|:------:|
| 03.01.2025 | 1.10.8.7 | :warning: :sparkles: replace `Zalrsc` ISA extensions (reservation-set operations) by `Zaamo` ISA extension (atomic read-modify-write operations) | [#1141](https://github.com/stnolting/neorv32/pull/1141) |
| 01.01.2025 | 1.10.8.6 | :sparkles: :test_tube: add smp dual-core option | [#1135](https://github.com/stnolting/neorv32/pull/1135) |
| 29.12.2024 | 1.10.8.5 | :test_tube: add multi-hart support to debug module | [#1132](https://github.com/stnolting/neorv32/pull/1132) |
| 29.12.2024 | 1.10.8.4 | :warning: rename `SYSINFO.MEM -> SYSINFO.MISC`; add new `SYSINFO.MISC` entry for number of CPU cores (hardwired to one) | [#1134](https://github.com/stnolting/neorv32/pull/1134) |
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ setup according to your needs. Note that all of the following SoC modules are en
[[`B`](https://stnolting.github.io/neorv32/#_b_isa_extension)]
[[`U`](https://stnolting.github.io/neorv32/#_u_isa_extension)]
[[`X`](https://stnolting.github.io/neorv32/#_x_isa_extension)]
[[`Zalrsc`](https://stnolting.github.io/neorv32/#_zalrsc_isa_extension)]
[[`Zaamo`](https://stnolting.github.io/neorv32/#_zaamo_isa_extension)]
[[`Zba`](https://stnolting.github.io/neorv32/#_zba_isa_extension)]
[[`Zbb`](https://stnolting.github.io/neorv32/#_zbb_isa_extension)]
[[`Zbkb`](https://stnolting.github.io/neorv32/#_zbkb_isa_extension)]
Expand Down
78 changes: 30 additions & 48 deletions docs/datasheet/cpu.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -415,7 +415,8 @@ always valid when set.
| `rw` | 1 | Access direction (`0` = read, `1` = write)
| `src` | 1 | Access source (`0` = instruction fetch, `1` = load/store)
| `priv` | 1 | Set if privileged (M-mode) access
| `rvso` | 1 | Set if current access is a reservation-set operation (`lr` or `sc` instruction, <<_zalrsc_isa_extension>>)
| `amo` | 1 | Set if current access is an atomic memory operation (<<_atomic_memory_access>>)
| `amoop` | 4 | Type of atomic memory operation (<<_atomic_memory_access>>)
3+^| **Out-Of-Band Signals**
| `fence` | 1 | Data/instruction fence request; single-shot
| `sleep` | 1 | Set if ALL upstream devices are in <<_sleep_mode>>
Expand Down Expand Up @@ -463,36 +464,31 @@ additional latency). However, _all_ bus signals (request and response) need to b


:sectnums:
==== Atomic Accesses
==== Atomic Memory Access

The load-reservate (`lr.w`) and store-conditional (`sc.w`) instructions from the <<_zalrsc_isa_extension>> execute as standard
load/store bus transactions but with the `rvso` ("reservation set operation") signal being set. It is the task of the
<<_reservation_set_controller>> to handle these LR/SC bus transactions accordingly. Note that these reservation set operations
are intended for processor-internal usage only (i.e. the reservation state is not available for processor-external modules yet).
The <<_zaamo_isa_extension>> adds atomic read-modify-write memory operations. Since the <<_bus_interface_protocol>>
only supports read-or-write operations, the atomic memory requests are handled by a dedicated module of the bus
infrastructure - the <<_atomic_memory_operations_controller>>.

.Reservation Set Controller
[NOTE]
See section <<_address_space>> / <<_reservation_set_controller>> for more information.

The figure below shows three exemplary bus accesses (1 to 3 from left to right). The `req` signal record represents
the CPU-side of the bus interface. For easier understanding the current state of the reservation set is added as `rvs_valid` signal.
For the CPU, the atomic memory accesses are handled as plain "load" operation but with the `amo` signal set
and also providing write data (see <<_bus_interface>>). The `amoop` signal defines the actual atomic processing
operation:

[start=1]
. A load-reservate (LR) instruction using `addr` as address. This instruction returns the loaded data `rdata` via `rsp.data`
and also registers a reservation for the address `addr` (`rvs_valid` becomes set).
. A store-conditional (SC) instruction attempts to write `wdata1` to address `addr`. This SC operation **succeeds**, so
`wdata1` is actually written to address `addr`. The successful operation is indicated by a **0** being returned via
`rsp.data` together with `ack`. As the LR/SC is completed the registered reservation is invalidated (`rvs_valid` becomes cleared).
. Another store-conditional (SC) instruction attempts to write `wdata2` to address `addr`. As the reservation set is already
invalidated (`rvs_valid` is `0`) the store access fails, so `wdata2` is **not** written to address `addr` at all. The failed
operation is indicated by a **1** being returned via `rsp.data` together with `ack`.

.Three Exemplary LR/SC Bus Transactions (showing only in-band signals)
image::bus_interface_atomic.png[700]

.Store-Conditional Status
[NOTE]
The "normal" load data mechanism is used to return success/failure of the `sc.w` instruction to the CPU (via the LSB of `rsp.data`).
.AMO Operation Type Encoding
[cols="<1,<4"]
[options="header",grid="rows"]
|=======================
| `bus_req_t.amoop` | Description
| `-000` | swap
| `-001` | unsigned add
| `-010` | logical xor
| `-011` | logical and
| `-100` | logical or
| `0110` | unsigned minimum
| `0111` | unsigned maximum
| `1110` | signed minimum
| `1111` | signed maximum
|=======================

.Cache Coherency
[IMPORTANT]
Expand Down Expand Up @@ -521,7 +517,7 @@ This chapter gives a brief overview of all available ISA extensions.
| <<_m_isa_extension,`M`>> | Integer multiplication and division instructions | <<_processor_top_entity_generics, `RISCV_ISA_M`>>
| <<_u_isa_extension,`U`>> | Less-privileged _user_ mode extension | <<_processor_top_entity_generics, `RISCV_ISA_U`>>
| <<_x_isa_extension,`X`>> | Platform-specific / NEORV32-specific extension | Always enabled
| <<_zalrsc_isa_extension,`Zalrsc`>> | Atomic reservation-set instructions | <<_processor_top_entity_generics, `RISCV_ISA_Zalrsc`>>
| <<_zaamo_isa_extension,`Zaamo`>> | Atomic memory operations | <<_processor_top_entity_generics, `RISCV_ISA_Zaamo`>>
| <<_zba_isa_extension,`Zba`>> | Shifted-add bit manipulation instructions | <<_processor_top_entity_generics, `RISCV_ISA_Zba`>>
| <<_zbb_isa_extension,`Zbb`>> | Basic bit manipulation instructions | <<_processor_top_entity_generics, `RISCV_ISA_Zbb`>>
| <<_zbkb_isa_extension,`Zbkb`>> | Scalar cryptographic bit manipulation instructions | <<_processor_top_entity_generics, `RISCV_ISA_Zbkb`>>
Expand Down Expand Up @@ -689,37 +685,23 @@ RISC-V specs. Also, custom trap codes for <<_mcause>> are implemented.
* There are <<_neorv32_specific_csrs>>.


==== `Zalrsc` ISA Extension

The `Zalrsc` ISA extension is a sub-extension of the RISC-V _atomic memory access_ (`A`) ISA extension and includes
instructions for reservation-set operations (load-reservate `lr` and store-conditional `sc`) only.
It is enabled by the top's <<_processor_top_entity_generics, `RISCV_ISA_Zalrsc`>> generic.
==== `Zaamo` ISA Extension

.AMO / `A` Emulation
[NOTE]
The atomic memory access / read-modify-write operations of the `A` ISA extension can be emulated using the
LR and SC operations (quote from the RISC-V spec.: "_Any AMO can be emulated by an LR/SC pair._").
The NEORV32 <<_core_libraries>> provide an emulation wrapper for emulating AMO/read-modify-write instructions that is
based on LR/SC pairs. A demo/program can be found in `sw/example/atomic_test`.
The `Zaamo` ISA extension is a sub-extension of the RISC-V `A` ISA extension and compromises instructions for read-modify-write
<<_atomic_memory_access>> operations. It is enabled by the top's <<_processor_top_entity_generics, `RISCV_ISA_Zaamo`>> generic.

.Instructions and Timing
[cols="<2,<4,<3"]
[cols="<2,<4,<1"]
[options="header", grid="rows"]
|=======================
| Class | Instructions | Execution cycles
| Load-reservate word | `lr.w` | 5
| Store-conditional word | `sc.w` | 5
| Atomic memory operations | `amoswap.w` `amoadd.w` `amoand.w` `amoor.w` `amoxor.w` `amomax[u].w` `amomin[u].w` | 5 + 2 * _memory_latency_
|=======================

.`aq` and `rl` Bits
[NOTE]
The instruction word's `aq` and `lr` memory ordering bits are not evaluated by the hardware at all.

.Atomic Memory Access on Hardware Level
[NOTE]
More information regarding the atomic memory accesses and the according reservation
sets can be found in section <<_reservation_set_controller>>.


==== `Zifencei` ISA Extension

Expand Down
14 changes: 7 additions & 7 deletions docs/datasheet/cpu_csr.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -435,10 +435,10 @@ However, any write-access will be ignored and will not cause an exception to mai
[options="header",grid="rows"]
|=======================
| Bit | Name [C] | R/W | Function
| 3 | `CSR_MIP_MSIP` | r/- | **MSIP**: Machine _software_ interrupt pending, triggered by `msi_i` top port (see <<_cpu_top_entity_signals>>); _cleared by source-specific mechanism_
| 7 | `CSR_MIP_MTIP` | r/- | **MTIP**: Machine _timer_ interrupt pending, triggered by `mei_i` top port (see <<_cpu_top_entity_signals>>)or by the processor-internal <<(from <<_core_local_interruptor_clint>>)>>; _cleared by source-specific mechanism_
| 11 | `CSR_MIP_MEIP` | r/- | **MEIP**: Machine _external_ interrupt pending, triggered by `mti_i` top port (see <<_cpu_top_entity_signals>>) or by the processor-internal <<(from <<_core_local_interruptor_clint>>)>>; _cleared by source-specific mechanism_
| 31:16 | `CSR_MIP_FIRQ15P` : `CSR_MIP_FIRQ0P` | r/- | **FIRQxP**: Fast interrupt channel 15..0 pending, see <<_neorv32_specific_fast_interrupt_requests>>; _cleared by source-specific mechanism_
| 3 | `CSR_MIP_MSIP` | r/- | **MSIP**: Machine _software_ interrupt pending, triggered by `msi_i` top port (see <<_cpu_top_entity_signals>>); cleared by source-specific mechanism
| 7 | `CSR_MIP_MTIP` | r/- | **MTIP**: Machine _timer_ interrupt pending, triggered by `mei_i` top port (see <<_cpu_top_entity_signals>>) or by the processor-internal <<_core_local_interruptor_clint>>; cleared by source-specific mechanism
| 11 | `CSR_MIP_MEIP` | r/- | **MEIP**: Machine _external_ interrupt pending, triggered by `mti_i` top port (see <<_cpu_top_entity_signals>>) or by the processor-internal <<_core_local_interruptor_clint>>; cleared by source-specific mechanism
| 31:16 | `CSR_MIP_FIRQ15P` : `CSR_MIP_FIRQ0P` | r/- | **FIRQxP**: Fast interrupt channel 15..0 pending, see <<_neorv32_specific_fast_interrupt_requests>>; cleared by source-specific mechanism
|=======================

.FIRQ Channel Mapping
Expand Down Expand Up @@ -770,8 +770,8 @@ caused by a fence instruction, a control flow transfer or a instruction fetch bu
| 5 | `HPMCNT_EVENT_WAIT_ALU` | r/w | any delay/wait cycle caused by a _multi-cycle_ <<_cpu_arithmetic_logic_unit>> operation
| 6 | `HPMCNT_EVENT_BRANCH` | r/w | any executed branch instruction (unconditional, conditional-taken or conditional-not-taken)
| 7 | `HPMCNT_EVENT_BRANCHED` | r/w | any control transfer operation (unconditional jump, taken conditional branch or trap entry/exit)
| 8 | `HPMCNT_EVENT_LOAD` | r/w | any executed load operation (including atomic memory operations, <<_zalrsc_isa_extension>>)
| 9 | `HPMCNT_EVENT_STORE` | r/w | any executed store operation (including atomic memory operations, <<_zalrsc_isa_extension>>)
| 8 | `HPMCNT_EVENT_LOAD` | r/w | any executed load operation (including any atomic memory operations)
| 9 | `HPMCNT_EVENT_STORE` | r/w | any executed store operation (including any atomic memory operations)
| 10 | `HPMCNT_EVENT_WAIT_LSU` | r/w | any memory/bus/cache/etc. delay/wait cycle while executing any load or store operation (caused by a data bus wait cycle))
| 11 | `HPMCNT_EVENT_TRAP` | r/w | starting processing of any trap (<<_traps_exceptions_and_interrupts>>)
|=======================
Expand Down Expand Up @@ -979,7 +979,7 @@ discover ISA sub-extensions and CPU configuration options
| 22 | `CSR_MXISA_ZBA` | r/- | <<_zba_isa_extension>> available
| 23 | `CSR_MXISA_ZBB` | r/- | <<_zbb_isa_extension>> available
| 24 | `CSR_MXISA_ZBS` | r/- | <<_zbs_isa_extension>> available
| 25 | `CSR_MXISA_ZALRSC` | r/- | <<_zalrsc_isa_extension>> available
| 25 | `CSR_MXISA_ZAAMO` | r/- | <<_zaamo_isa_extension>> available
| 28:26 | - | r/- | _reserved_, hardwired to zero
| 27 | `CSR_MXISA_CLKGATE` | r/- | sleep-mode clock gating implemented when set (`CPU_CLOCK_GATING_EN`), see <<_cpu_tuning_options>>
| 28 | `CSR_MXISA_RFHWRST` | r/- | full hardware reset of register file available when set (`CPU_RF_HW_RST_EN`), see <<_cpu_tuning_options>>
Expand Down
78 changes: 26 additions & 52 deletions docs/datasheet/soc.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -226,7 +226,7 @@ The generic type "`suv(x:y)`" is an abbreviation for "`std_ulogic_vector(x downt
| `RISCV_ISA_E` | boolean | false | Enable <<_e_isa_extension>> (reduced register file size).
| `RISCV_ISA_M` | boolean | false | Enable <<_m_isa_extension>> (hardware-based integer multiplication and division).
| `RISCV_ISA_U` | boolean | false | Enable <<_u_isa_extension>> (less-privileged user mode).
| `RISCV_ISA_Zalrsc` | boolean | false | Enable <<_zalrsc_isa_extension>> (atomic reservation-set operations).
| `RISCV_ISA_Zaamo` | boolean | false | Enable <<_zaamo_isa_extension>> (atomic memory operations).
| `RISCV_ISA_Zba` | boolean | false | Enable <<_zba_isa_extension>> (shifted-add bit-manipulation instructions).
| `RISCV_ISA_Zbb` | boolean | false | Enable <<_zbb_isa_extension>> (basic bit-manipulation instructions).
| `RISCV_ISA_Zbkb` | boolean | false | Enable <<_zbkb_isa_extension>> (scalar cryptography bit manipulation instructions).
Expand Down Expand Up @@ -576,67 +576,41 @@ explicit specific processor generic. See section <<_processor_external_bus_inter


:sectnums:
==== Reservation Set Controller
==== Atomic Memory Operations Controller

The reservation set controller is responsible for handling the load-reservate and store-conditional bus transaction that
are triggered by the `lr.w` (LR) and `sc.w` (SC) instructions from the CPU's <<_zalrsc_isa_extension>>.
The atomic memory operations (AMO) controller is responsible for handling the read-modify-write operations issued by the
CPU's <<_zaamo_isa_extension>>. For each AMO request, the controller executes an atomic set of three operations:

A "reservation" defines an address or address range that provides a guarding mechanism to support atomic accesses. A new
reservation is registered by the LR instruction. The address provided by this instruction defines the memory location
that is now monitored for atomic accesses. The according SC instruction evaluates the state of this reservation. If
the reservation is still valid the write access triggered by the SC instruction is finally executed and the instruction
return a "success" state (`rd` = 0). If the reservation has been invalidated the SC instruction will not write to memory
and will return a "failed" state (`rd` = 1).

.Reservation Set(s) and Granule
[NOTE]
The reservation set controller supports only **a single** global reservation set with a **word-aligned 4-byte granule**.

The reservation is invalidated if...

* an SC instruction is executed that accesses an address **outside** of the reservation set of the previous LR instruction.
This SC instruction will **fail** (not writing to memory).
* an SC instruction is executed that accesses an address **inside** of the reservation set of the previous LR instruction.
This SC instruction will **succeed** (finally writing to memory).
* a normal store operation accesses an address **inside** of the current reservation set (by the CPU or by the DMA).
* a hardware reset is triggered.

.Consecutive LR Instructions
[NOTE]
If an LR instruction is followed by another LR instruction the reservation set of the former one is overridden
by the reservation set of the latter one.
.Simplified AMO Controller Operation
[cols="^1,<3,<6"]
[options="header",grid="rows"]
|=======================
| Step | Pseudo Code | Description
| 1 | `tmp1 <= MEM[address];` | Perform a read operation accessing the addressed memory
cell and store the loaded data into an internal buffer (`tmp1`).
| 2 | `tmp2 <= tmp1 OP cpu_wdata` | The buffered data from the first step is processed
using the write data provide by the CPU. The result is stored to another internal buffer (`tmp2`).
| 3 | `MEM[address] <= tmp2;` `cpu_rdata <= tmp1;` | The data from the second buffer (`tmp2`) is
written to the addressed memory cell. In parallel, the data from the first buffer (`tmp1` = original
content of the addresses memory cell) is sent back to the requesting CPU.
|=======================

.Bus Access Errors
[IMPORTANT]
If the LR operation causes a bus access error (raising a load access exception) the reservation **is registered anyway**.
If the SC operation causes a bus access error (raising a store access exception) an already registered reservation set
**is invalidated anyway**.
The controller performs two bus transactions: a read operations and a write operation. Only the acknowledge/error
handshake of the last transaction is sent back to the CPU.

.Strong Semantic
[IMPORTANT]
The LR/SC mechanism follows the _strong semantic_ approach: the LR/SC instruction pair fails only if there is a write
access to the referenced memory location between the LR and SC instructions (by the CPU itself or by the DMA).
Context changes, interrupts, traps, etc. do not effect nor invalidate the reservation state at all.
As the AMO controller is the memory-nearest instance (see <<_bus_system>>) the previously described set of operations
cannot be interrupted. Hence, they execute in an atomic way.

.Physical Memory Attributes
[NOTE]
The reservation set can be set for _any_ address (only constrained by the configured granularity). This also
includes cached memory, memory-mapped IO devices and processor-external address spaces.

Bus transactions triggered by the LR instruction register a new reservation set and are delegated to the adressed
memory/device. Bus transactions triggered by the SC remove a reservation set and are forwarded to the adressed
memory/device only if the SC operations succeeds. Otherwise, the access request is not forwarded and a local ACK is
generated to terminate the bus transaction.

.LR/SC Bus Protocol
[NOTE]
More information regarding the LR/SC bus transactions and the the according protocol can be found in section
<<_bus_interface>> / <<_atomic_accesses>>.
Atomic memory operations can be executed for _any_ address. This also includes
cached memory, memory-mapped IO devices and processor-external address spaces.

.Cache Coherency
[IMPORTANT]
Atomic operations **always bypass** the cache using direct/uncached accesses. Care must be taken
to maintain data cache coherency (e.g. by using the `fence` instruction).
Atomic operations **always bypass** the CPU's <<_processor_internal_data_cache_dcache, data cache>>
using direct/uncached accesses. Care must be taken to maintain data cache coherency when accessing
cached memory (e.g. by using the `fence` instruction).


:sectnums:
Expand Down
Loading

0 comments on commit 651732d

Please sign in to comment.