SpinalHDL · Dolu1990 · Sep 8, 2024 · Sep 8, 2024
diff --git a/source/VexiiRiscv/BranchPrediction/index.rst b/source/VexiiRiscv/BranchPrediction/index.rst
@@ -1,54 +1,54 @@
 Branch Prediction
-==================
+=================
 
-The branch prediction is implemented as follow : 
+The branch prediction is implemented as follow :
 
 - During fetch, a BTB, GShare, RAS memory is used to provide an early branch prediction (BtbPlugin / GSharePlugin)
 - In Decode, the DecodePredictionPlugin will ensure that no "none jump/branch instruction"" predicted as a jump/branch continues down the pipeline.
-- In Execute, the prediction made is checked and eventualy corrected. Also a stream of data is generated to feed the BTB / GShare memories with good data to learn.
+- In Execute, the prediction made is checked and eventually corrected. Also a stream of data is generated to feed the BTB / GShare memories with good data to learn.
 
-Here is a diagram of the whole architecture : 
+Here is a diagram of the whole architecture :
 
 .. image:: /asset/picture/branch_prediction.png
 
 While it would have been possible in the decode stage to correct some miss prediction from the BTB / RAS, it isn't done to improve timings and reduce Area.
 
 BtbPlugin
--------------------------
+---------
 
 Will :
 
 - Implement a branch target buffer in the fetch pipeline
 - Implement a return address stack buffer
 - Predict which slices of the fetched word are the last slice of a branch/jump
-- Predict the branch/ĵump target
+- Predict the branch/jump target
 - Use the FetchConditionalPrediction plugin (GSharePlugin) to  know if branch should be taken
 - Apply the prediction (flush + pc update + history update)
-- Learn using the LearnPlugin interface. Only learn on missprediction. To avoid write to read hazard, the fetch stage is blocked when it learn.
-- Implement "ways" named chunks which are staticaly assigned to groups of word's slices, allowing to predict multiple branch/jump present in the same word
+- Learn using the LearnPlugin interface. Only learn on misprediction. To avoid write to read hazard, the fetch stage is blocked when it learn.
+- Implement "ways" named chunks which are statically assigned to groups of word's slices, allowing to predict multiple branch/jump present in the same word
 
 GSharePlugin
--------------------------
+------------
 
-Will : 
+Will :
 
 - Implement a FetchConditionalPrediction (GShare flavor)
 - Learn using the LearnPlugin interface. Write to read hazard are handled via a bypass
 - Will not apply the prediction via flush / pc change, another plugin will do that
 
 DecodePredictionPlugin
--------------------------
+----------------------
 
 The purpose of this plugin is to ensure that no branch/jump prediction was made for non branch/jump instructions.
 In case this is detected, the plugin will just flush the pipeline and set the fetch PC to redo everything, but this time with a "first prediction skip"
 
 BranchPlugin
---------------
+------------
 
 Placed in the execute pipeline, it will ensure that the branch prediction was correct, else it correct it. It also generate a learn interface.
 
 LearnPlugin
---------------
+-----------
 
 This plugin will collect all the learn interface (generated by the BranchPlugin) and produce a single stream of learn interface for the BtbPlugin / GShare plugin to use.
 
diff --git a/source/VexiiRiscv/Debug/index.rst b/source/VexiiRiscv/Debug/index.rst
@@ -1,6 +1,5 @@
 Debug
-============
-
+=====
 
 .. toctree::
    :maxdepth: 2

diff --git a/source/VexiiRiscv/Debug/jtag.rst b/source/VexiiRiscv/Debug/jtag.rst
@@ -1,5 +1,5 @@
 JTAG
-==============================
+====
 
 VexiiRiscv support debugging by implementing the official RISC-V debug spec.
 

diff --git a/source/VexiiRiscv/Decode/index.rst b/source/VexiiRiscv/Decode/index.rst
@@ -1,7 +1,7 @@
 Decode
-============
+======
 
-A few plugins operate in the fetch stage : 
+A few plugins operate in the fetch stage :
 
 - DecodePipelinePlugin
 - AlignerPlugin
@@ -11,62 +11,62 @@ A few plugins operate in the fetch stage :
 
 
 DecodePipelinePlugin
--------------------------
+--------------------
 
 Provide the pipeline framework for all the decode related hardware.
 It use the spinal.lib.misc.pipeline API but implement multiple "lanes" in it.
 
 
 AlignerPlugin
--------------------------
+-------------
 
-Decode the words froms the fetch pipeline into aligned instructions in the decode pipeline. Its complexity mostly come from the necessity to support having RVC [and BTB], mostly by adding additional cases to handle.
+Decode the words from the fetch pipeline into aligned instructions in the decode pipeline. Its complexity mostly come from the necessity to support having RVC [and BTB], mostly by adding additional cases to handle.
 
 1) RVC allows 32 bits instruction to be unaligned, meaning they can cross between 2 fetched words, so it need to have some internal buffer / states to work.
 
-2) The BTB may have predicted (falsly) a jump instruction where there is none, which may cut the fetch of an 32 bits instruction in the middle.
+2) The BTB may have predicted (falsely) a jump instruction where there is none, which may cut the fetch of an 32 bits instruction in the middle.
 
-The AlignerPlugin is designed as following : 
+The AlignerPlugin is designed as following :
 
 - Has a internal fetch word buffer in oder to support 32 bits instruction with RVC
 - First it scan at every possible instruction position, ex : RVC with 64 bits fetch words => 2x64/16 scanners. Extracting the instruction length, presence of all the instruction data (slices) and necessity to redo the fetch because of a bad BTB prediction.
 - Then it has one extractor per decoding lane. They will check the scanner for the firsts valid instructions.
-- Then each extractor is feeded into the decoder pipeline.
+- Then each extractor is fed into the decoder pipeline.
 
 .. image:: /asset/picture/aligner.png
 
 DecoderPlugin
--------------------------
+-------------
 
 Will :
 
 - Decode instruction
-- Generate ilegal instruction exception
+- Generate illegal instruction exception
 - Generate "interrupt" instruction
 
 DecodePredictionPlugin
--------------------------
+----------------------
 
 The purpose of this plugin is to ensure that no branch/jump prediction was made for non branch/jump instructions.
 In case this is detected, the plugin will just flush the pipeline and set the fetch PC to redo everything, but this time with a "first prediction skip"
 
 See more in the Branch prediction chapter
 
 DispatchPlugin
--------------------------
+--------------
 
-Will : 
+Will :
 
 - Collect instruction from the end of the decode pipeline
 - Try to dispatch them ASAP on the multiple "layers" available
 
-Here is a few explenation about execute lanes and layers : 
+Here is a few explanation about execute lanes and layers :
 
 - A execute lane represent a path toward which an instruction can be executed.
 - A execute lane can have one or many layers, which can be used to implement things as early ALU / late ALU
 - Each layer will have static a scheduling priority
 
-The DispatchPlugin doesn't require lanes or layers to be symetric in any way.
+The DispatchPlugin doesn't require lanes or layers to be symmetric in any way.
 
 
 

diff --git a/source/VexiiRiscv/Execute/custom.rst b/source/VexiiRiscv/Execute/custom.rst
@@ -1,10 +1,10 @@
 Custom instruction
-==============================
+==================
 
 There are multiple ways you can add custom instructions into VexiiRiscv. The following chapter will provide some demo.
 
 SIMD add
------------
+--------
 
 Let's define a plugin which will implement a SIMD add (4x8bits adder), working on the integer register file.
 
@@ -22,7 +22,7 @@ For instance the Plugin configuration could be :
     plugins += new SimdAddPlugin(early0) // <- We will implement this plugin
 
 Plugin implementation
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^
 
 Here is a example how this plugin could be implemented :
 
@@ -40,69 +40,69 @@ Here is a example how this plugin could be implemented :
     import vexiiriscv.compat.MultiPortWritesSymplifier
     import vexiiriscv.riscv.{IntRegFile, RS1, RS2, Riscv}
 
-    //This plugin example will add a new instruction named SIMD_ADD which do the following :
+    // This plugin example will add a new instruction named SIMD_ADD which do the following :
     //
-    //RD : Regfile Destination, RS : Regfile Source
-    //RD( 7 downto  0) = RS1( 7 downto  0) + RS2( 7 downto  0)
-    //RD(16 downto  8) = RS1(16 downto  8) + RS2(16 downto  8)
-    //RD(23 downto 16) = RS1(23 downto 16) + RS2(23 downto 16)
-    //RD(31 downto 24) = RS1(31 downto 24) + RS2(31 downto 24)
+    // RD : Regfile Destination, RS : Regfile Source
+    // RD( 7 downto  0) = RS1( 7 downto  0) + RS2( 7 downto  0)
+    // RD(16 downto  8) = RS1(16 downto  8) + RS2(16 downto  8)
+    // RD(23 downto 16) = RS1(23 downto 16) + RS2(23 downto 16)
+    // RD(31 downto 24) = RS1(31 downto 24) + RS2(31 downto 24)
     //
-    //Instruction encoding :
-    //0000000----------000-----0001011   <- Custom0 func3=0 func7=0
-    //       |RS2||RS1|   |RD |
+    // Instruction encoding :
+    // 0000000----------000-----0001011   <- Custom0 func3=0 func7=0
+    //        |RS2||RS1|   |RD |
     //
-    //Note :  RS1, RS2, RD positions follow the RISC-V spec and are common for all instruction of the ISA
+    // Note :  RS1, RS2, RD positions follow the RISC-V spec and are common for all instruction of the ISA
 
 
     object SimdAddPlugin{
-      //Define the instruction type and encoding that we wll use
+      // Define the instruction type and encoding that we wll use
       val ADD4 = IntRegFile.TypeR(M"0000000----------000-----0001011")
     }
 
-    //ExecutionUnitElementSimple is a plugin base class which will integrate itself in a execute lane layer
-    //It provide quite a few utilities to ease the implementation of custom instruction.
-    //Here we will implement a plugin which provide SIMD add on the register file.
-    class SimdAddPlugin(val layer : LaneLayer) extends ExecutionUnitElementSimple(layer)  {
+    // ExecutionUnitElementSimple is a plugin base class which will integrate itself in a execute lane layer
+    // It provide quite a few utilities to ease the implementation of custom instruction.
+    // Here we will implement a plugin which provide SIMD add on the register file.
+    class SimdAddPlugin(val layer : LaneLayer) extends ExecutionUnitElementSimple(layer) {
 
-      //Here we create an elaboration thread. The Logic class is provided by ExecutionUnitElementSimple to provide functionalities
+      // Here we create an elaboration thread. The Logic class is provided by ExecutionUnitElementSimple to provide functionalities
       val logic = during setup new Logic {
-        //Here we could have lock the elaboration of some other plugins (ex CSR), but here we don't need any of that
-        //as all is already sorted out in the Logic base class.
-        //So we just wait for the build phase
+        // Here we could have lock the elaboration of some other plugins (ex CSR), but here we don't need any of that
+        // as all is already sorted out in the Logic base class.
+        // So we just wait for the build phase
         awaitBuild()
 
-        //Let's assume we only support RV32 for now
+        // Let's assume we only support RV32 for now
         assert(Riscv.XLEN.get == 32)
 
-        //Let's get the hardware interface that we will use to provide the result of our custom instruction
+        // Let's get the hardware interface that we will use to provide the result of our custom instruction
         val wb = newWriteback(ifp, 0)
-        
-        //Specify that the current plugin will implement the ADD4 instruction
+
+        // Specify that the current plugin will implement the ADD4 instruction
         val add4 = add(SimdAddPlugin.ADD4).spec
 
-        //We need to specify on which stage we start using the register file values
+        // We need to specify on which stage we start using the register file values
         add4.addRsSpec(RS1, executeAt = 0)
         add4.addRsSpec(RS2, executeAt = 0)
 
-        //Now that we are done specifying everything about the instructions, we can release the Logic.uopRetainer
-        //This will allow a few other plugins to continue their elaboration (ex : decoder, dispatcher, ...)
+        // Now that we are done specifying everything about the instructions, we can release the Logic.uopRetainer
+        // This will allow a few other plugins to continue their elaboration (ex : decoder, dispatcher, ...)
         uopRetainer.release()
 
-        //Let's define some logic in the execute lane [0]
+        // Let's define some logic in the execute lane [0]
         val process = new el.Execute(id = 0) {
-          //Get the RISC-V RS1/RS2 values from the register file
+          // Get the RISC-V RS1/RS2 values from the register file
           val rs1 = el(IntRegFile, RS1).asUInt
           val rs2 = el(IntRegFile, RS2).asUInt
 
-          //Do some computation
+          // Do some computation
           val rd = UInt(32 bits)
           rd( 7 downto  0) := rs1( 7 downto  0) + rs2( 7 downto  0)
           rd(16 downto  8) := rs1(16 downto  8) + rs2(16 downto  8)
           rd(23 downto 16) := rs1(23 downto 16) + rs2(23 downto 16)
           rd(31 downto 24) := rs1(31 downto 24) + rs2(31 downto 24)
 
-          //Provide the computation value for the writeback
+          // Provide the computation value for the writeback
           wb.valid := SEL
           wb.payload := rd.asBits
         }
@@ -111,7 +111,7 @@ Here is a example how this plugin could be implemented :
 
 
 VexiiRiscv generation
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^
 
 Then, to generate a VexiiRiscv with this new plugin, we could run the following App :
 
@@ -144,7 +144,7 @@ To run this App, you can go to the NaxRiscv directory and run :
     sbt "runMain vexiiriscv.execute.VexiiSimdAddGen"
 
 Software test
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^
 
 Then let's write some assembly test code : (https://github.com/SpinalHDL/NaxSoftware/tree/849679c70b238ceee021bdfd18eb2e9809e7bdd0/baremetal/simdAdd)
 
@@ -157,16 +157,16 @@ Then let's write some assembly test code : (https://github.com/SpinalHDL/NaxSoft
     #include "../../driver/sim_asm.h"
     #include "../../driver/custom_asm.h"
 
-        //Test 1
+        // Test 1
         li x1, 0x01234567
         li x2, 0x01FF01FF
-        opcode_R(CUSTOM0, 0x0, 0x00, x3, x1, x2) //x3 = ADD4(x1, x2)
+        opcode_R(CUSTOM0, 0x0, 0x00, x3, x1, x2) // x3 = ADD4(x1, x2)
 
-        //Print result value
+        // Print result value
         li x4, PUT_HEX
         sw x3, 0(x4)
 
-        //Check result
+        // Check result
         li x5, 0x02224666
         bne x3, x5, fail
 
@@ -184,15 +184,15 @@ Compile it with
     make clean rv32im
 
 Simulation
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^
 
-You could run a simulation using this testbench : 
+You could run a simulation using this testbench :
 
 - Bottom of https://github.com/SpinalHDL/VexiiRiscv/blob/dev/src/main/scala/vexiiriscv/execute/SimdAddPlugin.scala
 
 .. code:: scala
 
-    object VexiiSimdAddSim extends App{
+    object VexiiSimdAddSim extends App {
       val param = new ParamSimple()
       val testOpt = new TestOptions()
 
@@ -231,7 +231,7 @@ Which will output the value 02224666 in the shell and show traces in simWorkspac
 Note that --no-rvls-check is required as spike do not implement that custom simdAdd.
 
 Conclusion
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^
 
 So overall this example didn't introduce how to specify some additional decoding, nor how to define multi-cycle ALU. (TODO).
 But you can take a look in the IntAluPlugin, ShiftPlugin, DivPlugin, MulPlugin and BranchPlugin which are doing those things using the same ExecutionUnitElementSimple base class.