From db6a4b77684e41bbfb2eb1224c413b32c2822ec7 Mon Sep 17 00:00:00 2001
From: stnolting <stnolting@gmail.com>
Date: Fri, 10 Jan 2025 12:03:30 +0100
Subject: [PATCH] [docs] smp updates and cleanups

---
 docs/datasheet/cpu_dual_core.adoc | 65 ++++++++++++++++++-------------
 docs/datasheet/software.adoc      |  3 +-
 2 files changed, 40 insertions(+), 28 deletions(-)

diff --git a/docs/datasheet/cpu_dual_core.adoc b/docs/datasheet/cpu_dual_core.adoc
index a70e0ed0f..fec414fc5 100644
--- a/docs/datasheet/cpu_dual_core.adoc
+++ b/docs/datasheet/cpu_dual_core.adoc
@@ -1,9 +1,9 @@
 :sectnums:
 === Dual-Core Configuration
 
-.Dual-Core Example
+.Dual-Core Example Programs
 [TIP]
-A simple dual-core example program can be found in `sw/example/demo_dual_core`.
+A set of rather simple dual-core example programs can be found in `sw/example/demo_dual_core*`.
 
 Optionally, the CPU core can be implemented as **symmetric multiprocessing (SMP) dual-core** system.
 This dual-core configuration is enabled by the `DUAL_CORE_EN` <<_processor_top_entity_generics, top generic>>.
@@ -24,24 +24,30 @@ The following table summarizes the most important aspects when using the dual-co
 [cols="<2,<10"]
 [grid="rows"]
 |=======================
+| **CPU configuration** | Both cores use the same cache, CPU and ISA configuration provided by the according top generics.
 | **Debugging** | A special SMP openOCD script (`sw/openocd/openocd_neorv32.dual_core.cfg`) is required to
-debug both cores at one. SMP-debugging is fully supported by RISC-V gdb port.
+debug both cores at one. SMP-debugging is fully supported by the RISC-V gdb port.
 | **Clock and reset** | Both cores use the same global processor clock and reset. If <<_cpu_clock_gating>>
-is enabled the clock of each core can be individually halted by putting it into <<_sleep_mode>>.
-| **Address space** | Both cores have access to the same <<_address_space>>.
+is enabled, the clock of each core can be individually halted by putting the core into <<_sleep_mode>>.
+| **Address space** | Both cores have full access to the same physical <<_address_space>>.
 | **Interrupts** | All <<_processor_interrupts>> are routed to both cores. Hence, each core has access to
 all <<_neorv32_specific_fast_interrupt_requests>> (FIRQs). Additionally, the RISC-V machine-level _external
 interrupt_ (via the top `mext_irq_i` port) is also send to both cores. In contrast, the RISC-V machine level
-_software_ and _timer_ interrupts are exclusive for each core (provided by the <<_core_local_interruptor_clint>>).
-| **RTE** | The <<_neorv32_runtime_environment>> also supports the dual-core configuration. However, it needs
-to be explicitly initialized on each core individually. The RTE trap handling provides a individual handler
-tables for each core.
+_software_ and _timer_ interrupts are core-exclusive (provided by the <<_core_local_interruptor_clint>>).
+| **RTE** | The <<_neorv32_runtime_environment>> fully supports the dual-core configuration and provides
+core-individual trap handler tables. However, the RTE needs to be explicitly initialized on each core
+(executing `neorv32_rte_setup()`).
 | **Memory** | Each core has its own stack. The top of stack of core 0 is defined by the <<_linker_script>>
 while the top of stack of core 1 has to be explicitly defined by core 0 (see <<_dual_core_boot>>). Both
-cores share the same heap, `.data` and `.bss` sections.
-| **Constructors and destructors** | Constructors and destructors are executed on core 0 only.
-(see )
-| **Core communication** | See section <<_inter_core_communication_icc>>.
+cores share the same heap, `.data` and `.bss` sections. Hence, only core 0 setups the `.data` and `.bss`
+sections at boot-up.
+| **Constructors and destructors** | Constructors and destructors are executed by core 0 only
+(see section <<_c_standard_library>>).
+| **Cache coherency** | Be aware that there is no cache snooping available. If any level-1 cache is enabled
+(<<_processor_internal_instruction_cache_icache>> and/or <<_processor_internal_data_cache_dcache>>) care
+must be taken to prevent access to outdated data - either by using cache synchronization (`fence[.]`
+instructions) or by using <<_atomic_memory_access>>.
+| **Inter-core communication** | See section <<_inter_core_communication_icc>>.
 | **Bootloader** | Only core 0 will boot and execute the bootloader while core 1 is held in standby.
 | **Booting** | See section <<_dual_core_boot>>.
 |=======================
@@ -55,22 +61,18 @@ shared-memory communication. Additionally, communication using these links is gu
 
 The inter-core communication (ICC) module is implemented as dedicated hardware module within each CPU core
 (VHDL file `rtl/core/neorv32_cpu_icc.vhd`). This module is automatically included if the dual-core option
-is enabled. Each core provides a 32-bit wide and 4 entries deep FIFO for sending data to the other core.
+is enabled. Each core provides a **32-bit wide** and **4 entries deep** FIFO for sending data to the other core.
 Hence, there are two FIFOs: one for sending data from core 0 to core 1 and another one for sending data the
 opposite way.
 
-The ICC communication links are accessed via NEORV32-specific CSRs. Hence, those FIFOs are accessible only
-by the CPU core itself and cannot be accessed by the DMA or any other CPU core. In total, three CSRs are
-provided to handle communications:
+The ICC communication links are accessed via two NEORV32-specific CSRs. Hence, those FIFOs are accessible only
+by the CPU core itself and cannot be accessed by the DMA or any other CPU core.
 
-The <<_mxiccsr>> is used to select the core with which to communicate. In the dual-core configuration core 1
-can only select core 0 and vice versa. The core selection in this register allows access to the according
-message FIFOs via the two other CSRs. Additionally, the CSR provides status flags (TX FIFO data available;
-RX FIFO free space) related to the selected communication link.
-
-The <<_mxiccrxd>> and <<_mxicctxd>> CSRs are used for the actual data read and write operations. Writing data
-to <<<<_mxicctxd>>> will send to the message queue of the core selected by <<_mxiccsr>>. Conversely, reading
-data from <<_mxiccrxd>> will return data received from the core selected by <<_mxiccsr>>.
+The <<_mxiccsreg>> provides read-only status information about the core's ICC links: bit 0 becomes set if
+there is RX data available for _this_ core (send from the the other core). Bit 1 is set as long there is
+free space in _this_ core's TX data FIFO. The <<_mxiccdata>> CSR is used for actual data send/receive operations.
+Writing this register will put the according data word into the TX link FIFO of _this_ core. Reading this CSR
+will return a data word from the RX FIFO of _this_ core.
 
 The ICC FIFOs do not provide any interrupt capabilities. Software is expected to use the machine-software
 interrupt of the receiving core (provided by the <<_core_local_interruptor_clint>>) to inform it about
@@ -94,11 +96,20 @@ To boot-up core 1, the primary core has to use a special library function provid
 .CPU Core 1 launch function prototype (note that this function can only be executed on core 0)
 [source,c]
 ----
-int neorv32_smp_launch(int hart_id, void (*entry_point)(void), uint8_t* stack_memory, size_t stack_size_bytes);
+int neorv32_smp_launch(int (*entry_point)(void), uint8_t* stack_memory, size_t stack_size_bytes);
 ----
 
-When executed, core 0 use the <<_inter_core_communication_icc>> to send launch data that includes the entry point
+When executed, core 0 uses the <<_inter_core_communication_icc>> to send launch data that includes the entry point
 for core 1 (via `entry_point`) and the actual stack configuration (via `stack_memory` and `stack_size_bytes`).
+Note that the main function for core 1 has to use a specific type (return `int`, no arguments):
+
+.CPU Core 1 Main Function
+[source,c]
+----
+int core1_main(void) {
+  return 0; // return to crt0 and go to sleep mode
+}
+----
 
 .Core 1 Stack Memory
 [NOTE]
diff --git a/docs/datasheet/software.adoc b/docs/datasheet/software.adoc
index 944084796..61e643439 100644
--- a/docs/datasheet/software.adoc
+++ b/docs/datasheet/software.adoc
@@ -380,7 +380,8 @@ Note that `\n` (newline) is automatically converted to `\r\n` (carriage-return a
 .Constructors and Destructors
 [NOTE]
 Constructors and destructors for plain C code or for C++ applications are supported by the software framework.
-See `sw/example/hello_cpp` for a minimal example.
+See `sw/example/hello_cpp` for a minimal example. Note that constructor and destructors are only executed
+by core 0 (primary core) in the SMP <<_dual_core_configuration>>.
 
 .Newlib Test/Demo Program
 [TIP]