Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix spelling, typos, space before {, // #3

Merged
merged 1 commit into from
Sep 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 13 additions & 13 deletions source/VexiiRiscv/BranchPrediction/index.rst
Original file line number Diff line number Diff line change
@@ -1,54 +1,54 @@
Branch Prediction
==================
=================

The branch prediction is implemented as follow :
The branch prediction is implemented as follow :

- During fetch, a BTB, GShare, RAS memory is used to provide an early branch prediction (BtbPlugin / GSharePlugin)
- In Decode, the DecodePredictionPlugin will ensure that no "none jump/branch instruction"" predicted as a jump/branch continues down the pipeline.
- In Execute, the prediction made is checked and eventualy corrected. Also a stream of data is generated to feed the BTB / GShare memories with good data to learn.
- In Execute, the prediction made is checked and eventually corrected. Also a stream of data is generated to feed the BTB / GShare memories with good data to learn.

Here is a diagram of the whole architecture :
Here is a diagram of the whole architecture :

.. image:: /asset/picture/branch_prediction.png

While it would have been possible in the decode stage to correct some miss prediction from the BTB / RAS, it isn't done to improve timings and reduce Area.

BtbPlugin
-------------------------
---------

Will :

- Implement a branch target buffer in the fetch pipeline
- Implement a return address stack buffer
- Predict which slices of the fetched word are the last slice of a branch/jump
- Predict the branch/ĵump target
- Predict the branch/jump target
- Use the FetchConditionalPrediction plugin (GSharePlugin) to know if branch should be taken
- Apply the prediction (flush + pc update + history update)
- Learn using the LearnPlugin interface. Only learn on missprediction. To avoid write to read hazard, the fetch stage is blocked when it learn.
- Implement "ways" named chunks which are staticaly assigned to groups of word's slices, allowing to predict multiple branch/jump present in the same word
- Learn using the LearnPlugin interface. Only learn on misprediction. To avoid write to read hazard, the fetch stage is blocked when it learn.
- Implement "ways" named chunks which are statically assigned to groups of word's slices, allowing to predict multiple branch/jump present in the same word

GSharePlugin
-------------------------
------------

Will :
Will :

- Implement a FetchConditionalPrediction (GShare flavor)
- Learn using the LearnPlugin interface. Write to read hazard are handled via a bypass
- Will not apply the prediction via flush / pc change, another plugin will do that

DecodePredictionPlugin
-------------------------
----------------------

The purpose of this plugin is to ensure that no branch/jump prediction was made for non branch/jump instructions.
In case this is detected, the plugin will just flush the pipeline and set the fetch PC to redo everything, but this time with a "first prediction skip"

BranchPlugin
--------------
------------

Placed in the execute pipeline, it will ensure that the branch prediction was correct, else it correct it. It also generate a learn interface.

LearnPlugin
--------------
-----------

This plugin will collect all the learn interface (generated by the BranchPlugin) and produce a single stream of learn interface for the BtbPlugin / GShare plugin to use.

3 changes: 1 addition & 2 deletions source/VexiiRiscv/Debug/index.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
Debug
============

=====

.. toctree::
:maxdepth: 2
Expand Down
2 changes: 1 addition & 1 deletion source/VexiiRiscv/Debug/jtag.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
JTAG
==============================
====

VexiiRiscv support debugging by implementing the official RISC-V debug spec.

Expand Down
30 changes: 15 additions & 15 deletions source/VexiiRiscv/Decode/index.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Decode
============
======

A few plugins operate in the fetch stage :
A few plugins operate in the fetch stage :

- DecodePipelinePlugin
- AlignerPlugin
Expand All @@ -11,62 +11,62 @@ A few plugins operate in the fetch stage :


DecodePipelinePlugin
-------------------------
--------------------

Provide the pipeline framework for all the decode related hardware.
It use the spinal.lib.misc.pipeline API but implement multiple "lanes" in it.


AlignerPlugin
-------------------------
-------------

Decode the words froms the fetch pipeline into aligned instructions in the decode pipeline. Its complexity mostly come from the necessity to support having RVC [and BTB], mostly by adding additional cases to handle.
Decode the words from the fetch pipeline into aligned instructions in the decode pipeline. Its complexity mostly come from the necessity to support having RVC [and BTB], mostly by adding additional cases to handle.

1) RVC allows 32 bits instruction to be unaligned, meaning they can cross between 2 fetched words, so it need to have some internal buffer / states to work.

2) The BTB may have predicted (falsly) a jump instruction where there is none, which may cut the fetch of an 32 bits instruction in the middle.
2) The BTB may have predicted (falsely) a jump instruction where there is none, which may cut the fetch of an 32 bits instruction in the middle.

The AlignerPlugin is designed as following :
The AlignerPlugin is designed as following :

- Has a internal fetch word buffer in oder to support 32 bits instruction with RVC
- First it scan at every possible instruction position, ex : RVC with 64 bits fetch words => 2x64/16 scanners. Extracting the instruction length, presence of all the instruction data (slices) and necessity to redo the fetch because of a bad BTB prediction.
- Then it has one extractor per decoding lane. They will check the scanner for the firsts valid instructions.
- Then each extractor is feeded into the decoder pipeline.
- Then each extractor is fed into the decoder pipeline.

.. image:: /asset/picture/aligner.png

DecoderPlugin
-------------------------
-------------

Will :

- Decode instruction
- Generate ilegal instruction exception
- Generate illegal instruction exception
- Generate "interrupt" instruction

DecodePredictionPlugin
-------------------------
----------------------

The purpose of this plugin is to ensure that no branch/jump prediction was made for non branch/jump instructions.
In case this is detected, the plugin will just flush the pipeline and set the fetch PC to redo everything, but this time with a "first prediction skip"

See more in the Branch prediction chapter

DispatchPlugin
-------------------------
--------------

Will :
Will :

- Collect instruction from the end of the decode pipeline
- Try to dispatch them ASAP on the multiple "layers" available

Here is a few explenation about execute lanes and layers :
Here is a few explanation about execute lanes and layers :

- A execute lane represent a path toward which an instruction can be executed.
- A execute lane can have one or many layers, which can be used to implement things as early ALU / late ALU
- Each layer will have static a scheduling priority

The DispatchPlugin doesn't require lanes or layers to be symetric in any way.
The DispatchPlugin doesn't require lanes or layers to be symmetric in any way.



Expand Down
86 changes: 43 additions & 43 deletions source/VexiiRiscv/Execute/custom.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
Custom instruction
==============================
==================

There are multiple ways you can add custom instructions into VexiiRiscv. The following chapter will provide some demo.

SIMD add
-----------
--------

Let's define a plugin which will implement a SIMD add (4x8bits adder), working on the integer register file.

Expand All @@ -22,7 +22,7 @@ For instance the Plugin configuration could be :
plugins += new SimdAddPlugin(early0) // <- We will implement this plugin

Plugin implementation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^

Here is a example how this plugin could be implemented :

Expand All @@ -40,69 +40,69 @@ Here is a example how this plugin could be implemented :
import vexiiriscv.compat.MultiPortWritesSymplifier
import vexiiriscv.riscv.{IntRegFile, RS1, RS2, Riscv}

//This plugin example will add a new instruction named SIMD_ADD which do the following :
// This plugin example will add a new instruction named SIMD_ADD which do the following :
//
//RD : Regfile Destination, RS : Regfile Source
//RD( 7 downto 0) = RS1( 7 downto 0) + RS2( 7 downto 0)
//RD(16 downto 8) = RS1(16 downto 8) + RS2(16 downto 8)
//RD(23 downto 16) = RS1(23 downto 16) + RS2(23 downto 16)
//RD(31 downto 24) = RS1(31 downto 24) + RS2(31 downto 24)
// RD : Regfile Destination, RS : Regfile Source
// RD( 7 downto 0) = RS1( 7 downto 0) + RS2( 7 downto 0)
// RD(16 downto 8) = RS1(16 downto 8) + RS2(16 downto 8)
// RD(23 downto 16) = RS1(23 downto 16) + RS2(23 downto 16)
// RD(31 downto 24) = RS1(31 downto 24) + RS2(31 downto 24)
//
//Instruction encoding :
//0000000----------000-----0001011 <- Custom0 func3=0 func7=0
// |RS2||RS1| |RD |
// Instruction encoding :
// 0000000----------000-----0001011 <- Custom0 func3=0 func7=0
// |RS2||RS1| |RD |
//
//Note : RS1, RS2, RD positions follow the RISC-V spec and are common for all instruction of the ISA
// Note : RS1, RS2, RD positions follow the RISC-V spec and are common for all instruction of the ISA


object SimdAddPlugin{
//Define the instruction type and encoding that we wll use
// Define the instruction type and encoding that we wll use
val ADD4 = IntRegFile.TypeR(M"0000000----------000-----0001011")
}

//ExecutionUnitElementSimple is a plugin base class which will integrate itself in a execute lane layer
//It provide quite a few utilities to ease the implementation of custom instruction.
//Here we will implement a plugin which provide SIMD add on the register file.
class SimdAddPlugin(val layer : LaneLayer) extends ExecutionUnitElementSimple(layer) {
// ExecutionUnitElementSimple is a plugin base class which will integrate itself in a execute lane layer
// It provide quite a few utilities to ease the implementation of custom instruction.
// Here we will implement a plugin which provide SIMD add on the register file.
class SimdAddPlugin(val layer : LaneLayer) extends ExecutionUnitElementSimple(layer) {

//Here we create an elaboration thread. The Logic class is provided by ExecutionUnitElementSimple to provide functionalities
// Here we create an elaboration thread. The Logic class is provided by ExecutionUnitElementSimple to provide functionalities
val logic = during setup new Logic {
//Here we could have lock the elaboration of some other plugins (ex CSR), but here we don't need any of that
//as all is already sorted out in the Logic base class.
//So we just wait for the build phase
// Here we could have lock the elaboration of some other plugins (ex CSR), but here we don't need any of that
// as all is already sorted out in the Logic base class.
// So we just wait for the build phase
awaitBuild()

//Let's assume we only support RV32 for now
// Let's assume we only support RV32 for now
assert(Riscv.XLEN.get == 32)

//Let's get the hardware interface that we will use to provide the result of our custom instruction
// Let's get the hardware interface that we will use to provide the result of our custom instruction
val wb = newWriteback(ifp, 0)
//Specify that the current plugin will implement the ADD4 instruction

// Specify that the current plugin will implement the ADD4 instruction
val add4 = add(SimdAddPlugin.ADD4).spec

//We need to specify on which stage we start using the register file values
// We need to specify on which stage we start using the register file values
add4.addRsSpec(RS1, executeAt = 0)
add4.addRsSpec(RS2, executeAt = 0)

//Now that we are done specifying everything about the instructions, we can release the Logic.uopRetainer
//This will allow a few other plugins to continue their elaboration (ex : decoder, dispatcher, ...)
// Now that we are done specifying everything about the instructions, we can release the Logic.uopRetainer
// This will allow a few other plugins to continue their elaboration (ex : decoder, dispatcher, ...)
uopRetainer.release()

//Let's define some logic in the execute lane [0]
// Let's define some logic in the execute lane [0]
val process = new el.Execute(id = 0) {
//Get the RISC-V RS1/RS2 values from the register file
// Get the RISC-V RS1/RS2 values from the register file
val rs1 = el(IntRegFile, RS1).asUInt
val rs2 = el(IntRegFile, RS2).asUInt

//Do some computation
// Do some computation
val rd = UInt(32 bits)
rd( 7 downto 0) := rs1( 7 downto 0) + rs2( 7 downto 0)
rd(16 downto 8) := rs1(16 downto 8) + rs2(16 downto 8)
rd(23 downto 16) := rs1(23 downto 16) + rs2(23 downto 16)
rd(31 downto 24) := rs1(31 downto 24) + rs2(31 downto 24)

//Provide the computation value for the writeback
// Provide the computation value for the writeback
wb.valid := SEL
wb.payload := rd.asBits
}
Expand All @@ -111,7 +111,7 @@ Here is a example how this plugin could be implemented :


VexiiRiscv generation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^

Then, to generate a VexiiRiscv with this new plugin, we could run the following App :

Expand Down Expand Up @@ -144,7 +144,7 @@ To run this App, you can go to the NaxRiscv directory and run :
sbt "runMain vexiiriscv.execute.VexiiSimdAddGen"

Software test
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^

Then let's write some assembly test code : (https://github.com/SpinalHDL/NaxSoftware/tree/849679c70b238ceee021bdfd18eb2e9809e7bdd0/baremetal/simdAdd)

Expand All @@ -157,16 +157,16 @@ Then let's write some assembly test code : (https://github.com/SpinalHDL/NaxSoft
#include "../../driver/sim_asm.h"
#include "../../driver/custom_asm.h"

//Test 1
// Test 1
li x1, 0x01234567
li x2, 0x01FF01FF
opcode_R(CUSTOM0, 0x0, 0x00, x3, x1, x2) //x3 = ADD4(x1, x2)
opcode_R(CUSTOM0, 0x0, 0x00, x3, x1, x2) // x3 = ADD4(x1, x2)

//Print result value
// Print result value
li x4, PUT_HEX
sw x3, 0(x4)

//Check result
// Check result
li x5, 0x02224666
bne x3, x5, fail

Expand All @@ -184,15 +184,15 @@ Compile it with
make clean rv32im

Simulation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^

You could run a simulation using this testbench :
You could run a simulation using this testbench :

- Bottom of https://github.com/SpinalHDL/VexiiRiscv/blob/dev/src/main/scala/vexiiriscv/execute/SimdAddPlugin.scala

.. code:: scala

object VexiiSimdAddSim extends App{
object VexiiSimdAddSim extends App {
val param = new ParamSimple()
val testOpt = new TestOptions()

Expand Down Expand Up @@ -231,7 +231,7 @@ Which will output the value 02224666 in the shell and show traces in simWorkspac
Note that --no-rvls-check is required as spike do not implement that custom simdAdd.

Conclusion
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^

So overall this example didn't introduce how to specify some additional decoding, nor how to define multi-cycle ALU. (TODO).
But you can take a look in the IntAluPlugin, ShiftPlugin, DivPlugin, MulPlugin and BranchPlugin which are doing those things using the same ExecutionUnitElementSimple base class.
Expand Down
Loading
Loading