RTL design is actually the implementation of a given specifications using a Hardware description language such as Verilog. The RTL design is checked for the adherence to the specification using a simulator by simulating the design. Tool used for simulating the design is iVerilog. The RTL design could be a single module or it could be separated to multiple modules/files. For testing the functionality of the Design a stimulas is also written using verilog/systemverilog. The simulator looks for the changes on the input of the design module and based on these changes it changes the output of the design. If no changes happen on the input no change occur on the outputs. An RTL design has a set primary inputs and a set of primary outputs. A test bench is used for testing the functionality of the RTL design which provides the a set of stimulus on the inputs of design and wathces the outputs stimulus. A simple flow is shown in below figure.
The outputs of the design can be viewed using waveform. For functional verification of the RTL design both the RTL design and test bench are provided to the simulator which is in our case is iVerilog. This simulator dumps a VCD file ( value change dump ) which is later on visualized using an other tool GTKwave. The tool gives the waveform from which one can analyze the output signals with respect to the input signals. With the help of this waveform one can verify the functionality of the RTL design. A general flow for functional verification of RTL design using iVerilog+GTKwave is shown in below figure.
If any bug occurs then by applying changes to the RTL design it is taken again from the above flow and verified using wavforms.
A 2x1 Mux is designed using Verilog as shown in below figure.
A testbench is also written using verilog in which this 2x1 Mux having module name as good_mux is instantiated as uut. The uut ports are mapped with the design.
Following commands are used for running the good_mux.v and its test bench tb_good_mux.v . iverilog good_mux.v tb_good_mux.v . ./a.out .gtkwave tb_good_mux.vcd
gtkwave provide the output wavform as shown below.
Logic synthesis is a process in which RTL design based on an HDL such as verilog, system verilog or VHDL is mapped into standard logic cell based on particular technology library. Here belwo figure is a depiction of the synthesis process.
A RTL design written in verilog is converted to respective gate level logic based on different constructs. For example, the top moduel's inputs and output are resulted into the ports of the design. The assign statement which is in below figure is implemented using a ternary operator is converted to a MUX. Similarly, the always block having clock in it's sentivity list is converted a register as shown in below figure.
Yosys is an RTL synthesizer. A synthesizer is a tool which converts RTL into gate level netlist. So, Yosys takes the verilog based RTL design, standard cell library as input and generate a netlist.
On the other hand iVerilog is a simulator which takes the RTL code for a design or a netlist, and a testbench and dumps a VCD file which is reading by GTKwave. GTKwave generates the resulting waveform of the design.
First of all run the pre installed Yosys tool by giving the command yosys in the terminal.
Read the sky130 .lib technolgy library file.
Read the RTL design file "good_mux.v".
elaborate the design for synthesis by specifying the top module name in the comand "synth -top good_mux".
A general statstics is shown here based on library cells used.
synthesized the design using "abc -liberity sky130.lib" command which mapped the design on library cells and perform optimizations.
This "abc" command also gives the number of input/output signals and cells present in the netlist.
"show" command is used for graphical visulization of the netlist.
In the graphical schematic a sky130 library cell is used for mux implementation along with some buffers.
Now at the end the netlist is generated using the below command in Yosys
The netlist for the 2x1 mux which is generated using Yosys and sky130 PDK is shown in the below figure.
In this workshop we are using Skywater130 PDK. This sky130 PDK has different timing libraries based on 130 nm process node. A timing library is actually the collection of standard cells like AND, OR, NOT and flip flop etc. A timing Library provides information about:
- Cells power for each input pin of the cells
- Timing delays for each cell in the form of lookup tables
- Area foot print for the cells
These Timing Libraries are categerized based on PVT corner that is process, voltage and temperature corner e.g the library which we are using is sky130_fd_sc_hd__tt_25c_1v80.lib. Here, "tt" represents the process, "25c" represents the temperature of 25 centigrade and "1v80" represents the voltage of 1.80 Volts. PVT directly effects the performance of the cells.
Process represents the variations during the fabrication of chip. It could be due to temperature, pressure, dopant concentation or due to instruments used in the manufacturing of the chip. Due to these process variations transitors may have different chanel lengths throughout the chip, some transistors could have larger length so they behave faster whereas others may have shorter lengths so they work slow. So based on the process timing library could be of:
** Typical, Typical (tt): typical normal process variation ** Fast ,Fast (ff): fast process variation ** Slow,Slow (ss): slow process variation
Voltage effects the performance of the cells. In a chip the cells which have high volatge (placed closer to the supply) have less delay where as the cells which have low volatge due high IR drop posses more delay.
As the chip is to be used in different parts of the world under different different temperatures. So, where the temperature is high the cell delay increases so they behave slow whereas where the temperature is low they behave fast. So timing library are categorized based on operating temperatures.
When we open sky130_fd_sc_hd__tt_25c_1v80.lib , on the very first line the name of the library appears. It also contains technolgy which is CMOS. A typical snap shot of top lines is shown below:
There are many cells in this library out of which and2 cell is shown in below figure. Here you can see that it is of 2 pin AND gate. And for these 2 inputs there are four combinations possible and for each combination leakage power is given here. Along with this area and total leakage power of the cell are also given as highlighted in yellow in belwo figure.
This .lib also conatains multiple versions of the same cell based on the strength e.g above 2 input AND gate has two more versions as shown in below figure. Here "and2_0" , "and2_2" and "and2_4" comparison is shown. It can be seen that as we move from left "and2_0" to right "and2_4" the cell area and leakage power both are increasing that means the speed of the cell is increasing. It means "and2_0" is slowest and "and2_4" is fastest with minimum delay but at the cost of large area and power.
A hierarchical synthesis is one in which the netlist generated preserves the same hierarchy which is present in the original RTL code. As we know that when we have a large design to code, we code it verilog by firstdesigning the its submodules and then we combine these submodules into a top module. Its a hirachical way of designing adigital design. For simple example let's RTL design named as multiple_modules which has three inputs A,B and C and a single ouput Y. This top module has two submodules, one submodule is an AND gate and the second submodule is an OR gate. The hierarchical schematic is shown below:
The RTL code for design is written in verilog as follows:
A hierachical synthesis is one which generates such a netlsit which preserves the this hierarchy that one top module contains these submodules inside it. This type of netlist is known as hierarchical netlist. For generating this netlist this designis synthesized using Yosys by following below steps:
- Read the .lib of sky130 in Yosys
- Read the RTL design multiple_modules
- synthesize the design by giving the top module name as "multiple_modules"
- Performed technology mapping using "abc -liberty sky130_fd_sc_hd__tt_25c_1v80.lib"
- Now the schematic for the hierachical netlist is obtained using "show" command in yosys which shown in below figure. This conatains submodule1 and submodule2 and muitple_module as top module, that means the hierarchy is same as in the RTL.
A hierarchical synthesis beneficial in two cases:
- When we have multiple instances of asame module in a design, so synthesize this single design and replicate it multiple times in the top module this saves time and synthesizer effort.
- We want to employ divide and conquer approach in case of massive designs.
- In hierarchcial netlsi the the pins on submodules are accessible so it helps in functional verification as well as in case of static timing analysis of the design.
If a synthesizer do not preserves the hierachy of the RTL design rather it generates a flat signle module netlist. Then this type of netlist is known as flatten netlist and such type of synthesis is known as flat synthesis.
- A flatten netlist is generated by giving a commmand "flatten" to the Yosys before writing the netlsit command. The flatten netlsit for the above RTL design "multiple_module" is shown below. It can seen that there is no submodules it rather it is gnerated in the for of basic AND and OR gates.
- The schematic for flatten netlist is shown below.
In digital circuits a flip flop is used to restrict the glitches that are produced in the combinational circuits due to the propagation delays present in them. When we give input to the combinational circuit its ouput changes after the propagtion delay present in them due to which the ouput glitches. To understand this let's suppose we have a simple combinational circuit consists of an And gate and OR gate. There are three inputs A,B and C and one single ouput Y. The boolean equation for the circuit is Y= (A & B) | C.
Inputs A,B and C are applied to the circuit according to the waveform shown in below figure. At t=0ns ,initially A=0, B=0 and C=1. So, output is Y=1 at start.
At t=1ns, both A and B are set to 1 and C is set to 0. As the propagation delay for AND gate is 2ns so, it's output repesented by wire "i" will get high after 2ns that is at t=3 as shown in below figure. Before this "i" is zero so the net result at the OR gate which is "(i | C)=Y" would make Y=0 at t=2ns as the propagation delay of OR gate is 1ns. At t=3ns the updated value of "i" appears and as result Y=1 again at t=4ns. According to the applied Inputs, Y should remain stable at 1 as can be seen in the below table:
A | B | C | Y |
---|---|---|---|
0 | 0 | 1 | 1 |
1 | 1 | 0 | 1 |
But due to the propagation delays we can see in the waveform does not remain 1 thorughout rather it Y glitches. So, in combinationa circuits output glitch is an isssue.
If we have series of combination circuits in which combinational circuit's output is fed to next combinational circuit as input and so on so forth, as shown in below figure. Then the not only the output of each combinational block would be glitching but also the whole circuit's output would contineously be glitch prone.
To restrict the glitches flip flop(FF) is used. As the flop output only change at the clock edge otherwise it remains stable. SO even if the D input of flop is glitching , the output Q of the flop will be stable, which it feeds to the next combination block. So, the next combinational block see it as a stable input as result it outputs will also setle down rather than to be glitching.
Flip flop as we know are edge triggered sequential element that is it works at the positive edge of the clock. One of the most comonly used flop is the D flip flop whose output Q flow the input D but only on the edge of the clock. A simple schematic of D-flip flop is shown below.
The D-FF can be coded in different styles in Verilog. Here, we are mentioning below 4 type of D-flip flops.
An asynchronous reset flip flop is actually a flip whose output Q can be reset to zero irrespective of the clock edge. It means that whenever a high reset signal appears on the input of D-FF, its output Q goes low otherwise it follows the input D on every clock edge. The verilog code for asynchronous reset flipflop is shown below:
This flop is simulated on iverilog using a testbench, the output waveform is shown below.
An asynchronous set flip flop is actually a flip whose output Q can be set to 1 irrespective of the clock edge. It means that whenever a high 'set' signal appears on the input of D-FF, its output Q goes high otherwise it follows the input D on every clock edge. The verilog code for asynchronous set flipflop is shown below:
This flop is simulated on iverilog using a testbench, the output waveform is shown below.
Synchronous reset flip flop is actually a flip whose output Q can be set to 0 only with respective of the clock edge. It means that whenever a high 'reset' signal appears on the input of D-FF, its output Q goes low but only at the positive edge oc clock. Otherwise it follows the input D on every clock edge. The verilog code for synchronous reset flipflop is shown below:
This flop is simulated on iverilog using a testbench, the output waveform is shown below.
Such flip flop has both the capability to behave as asynchronous reset as well as synchronous reset. But asynchronous reset has higher periority over synchronous reset. The verilog code for this flipflop is shown below:
This flop is simulated on iverilog using a testbench, the output waveform is shown below.
Logic optimization is actuallly squeezing the logic in order to get the most optimized design which is efficient in terms of area, power and performance. In otherwords it has optimum PPA. There are different techniques which are used for logic optimization both for combinational logic and sequential logic. For combinational logic there are two techniques that is:
- Constant propagation
- boolean logic optimization
In case of constant propagation technique the logic is optimized based on the signal that is contantly propagating either 0 or 1. For example, let's suppose we have a circuit based on Y= ((AB)+ C)`. The circuit diagraam for this expression is shown in belwo figure. Here you can see that if the signal A is constantly propagated as 0 then this circuit optimzed to an inverter just.
This can also observed in terms of CMOS logic. This original expression is modeled using 6 CMOS transisters, whereas incase of constant propagation of signal 'A' reduces the CMOS logic to only 2 CMOS transister that is reduces the area.
Incase of second technique that is boolean logic optimization, the circuit is optimized based on the K-map techinque. Here the boolean expression is reduces to minimal number of literals. To understand this let's suppose we have an expression as follows:
assign Y = a?(b?c:(c?a:0):(!c)
This expression is actully implemented interms of MUXs as shown in below figure. Based on the boolean logic this expression is optmized and reduces to an XOr gate when we write it using the boolean equqtions of a MUX.
Incase of sequational optimization there are two techniques one is basic and others are advanced.
- Basic
- sequential constant propagation
- Advanced
- State optimization
- Retiminng
- Sequential logic cloning/Floor plan aware synthesis
Gate level simulation (GLS) is actually runing the testbench with the netlist as design under test (DUT). Netlist is logically same as RTL code so same testbench will allign with netlist as well. The question is why GLS is used for? So the answer to this question is that GLS is used for:
- Verify the logical correctness of design after synthesis
- Ensuring the timing of the design is met. For this GLS need to be run with delay annotation.
iVerilog just like used for simulating the RTL design can also be used for netlist siumulation. The workflow is quite similar. Here one more this have to included that is verilog models for the standard cells as netlist contains instances of different standard cells. Following figure shows the work flow of the GLS using iVerilog.
The synthesis simulation mismatch could be occure due following reasons;
- Missing senstivity list
- Blocking Vs Non blocking assignments
- Non standard verilog coding
In real cases the simulator works on the activity which means that it evaluates the output whenever there is a change in the input or the inputs included in the senstivity list. On the otherhand synthesizer does not look into the senstivity list it only looks into the logic. T o understand this let's suppose we have implemented the a MUX using two different methods as shown in below figure as bad_mux.v and good_mux.v.
In case of bad_mux we can see that in the sentivity list of the always block there is only "sel", that means the output willl only be evaluated based on it it is not sentive to the inputs "i0" and "i1" which is not a good thing as the synthesizer mapped this a latch. Whereas incase of good_mux.v the senstivity list contains (), which any input changes the out will be evaluated(). This results in a MUX in a synthesizer.
The blocking and non blocking assignments are used inside an always block in verilog code. The blocking assignment (=) executes the statements in the order it is written . So, the first statement is evaluated before the second statement just like a C code.
Whereas the non blocking assignment (<=) executes all the RHS first whenever the always block is entered and assigns to LHS. It means it excutes in parallel.
Let's we a verilog code for a 2x1 MUX using ternary operator as shown in below figure.
And simulated it using iVerilog with testbench. It clearly behaves as 2x1 MUX.
Now generated the netlist using Yosys for 2x1 MUX. The Yosys statistics are shown below.
Now generated the schematic for the synthesized design using Yosys, which clearly a 2x1 MUX cell.
Write the verilog netlist using Yosys
Now we perform GLS using iVerilog for this we need primitives.v and verilog models for standard cells along with netlist and testbench.
On observing the waveform on gtkwave it can clearly concluded that it is behaving like a 2x1 MUX. So, GLS is functionaly verified.
Let's have another code for 2x1 MUX but it is coded diffrently this time. It ic coded using always block having 'SEL' in it's senstivity list.
Now simulated it using test bench in iVerilog, the resulting waveform is generated using GTKwave is also shown below. It can be seen in the waveform that it is not like a 2x1 MUX output rather it is behaving like a register.
Now we generate the netlist using Yosys. It can be seen in the Yosys statistics that Yosys infering a MUX from the bad_mux.v RTL.
Wrote the verilog netlist using Yosys
Now performed the GLS on this bad_mux.v netlist using iVerilog. Here it can clearly be seen that it is behaving like 2x1 MUX totally opposite to the RTL simulation. So, here we have a clear observation of synthesis and simulation mismatch.
Let's have simple logic using blocking statements coded in verilog as shown in below figure.
According to code it is seemed to be that it performing OR on 'a' and 'b' and assigning it to 'x' which is AND with 'c' and assigned to 'd'.
Now we first simulate it using iVerilog. The simulation waveform is shown below. This behaving as latch is formed which is storing the previous value and the d is evalusted based on these values.
Now perform GLS by simulating the GLS. For this first we generate the netlist using Yosys. It can be seen in the netlist viewer of Yosys that it is synthesized into or_and gate.
The GLS simulation is performed using iVerilog as follows. Here it can be seen that there is no latch like behaviour rathe the value of 'd' is correctly evaluated at the current values of 'a' and 'b'. So, from this synthesis simulation mismatch one should be very carefull while using the blocking statements.
If construct is used in verilog for periorty logic implementation. A sample code for IF construct is shown below. Here if the condition 1 is true than 'c1' part of code will be excuted, else if the condition 2 gets true than 'c2' part of code will run, else if condition 3 gets true than 'c3' part of code will run if none of the above conditions are true than the 'else' part of code which 'c4' will be executed.
In hardware this If construct is evaluated as MUXs as shon in the belwo figure. If is true than 'c1' will be at the output, else if gets true than 'c2' will be available at the output, else if gets true than 'c3' will be available at the output else the 'c4' will reach to the output.
It infered latches incase of bad coding practices or missing else construct in the code. e.g we have bad coding practise of missing the final else in teh code as shown in belwo figure.
This in hardaware will infered a latch that have OR gate at the enable having inputs and . When ever both of this condition are false than this latch enables and store the value of Y and provides it next time. So, latch is not intentional rather it is created because of bad coding of IF construct.
There are some cases where latch inferening is intentional and needed for the proper operation of circuit. For example let's suppose we write a 3 bit counter as follows, in the always block we have reset at which the counter reset to 0 and an enable 'en' at which it starts incrementing otherwise it stays at the same number.
As can be seen in belwo figure for the missing else for 'en' the counter latch to previous value if 'en' is zero. So, here latch is intetional.
In combinational circuits latchtes are not allowed.
Just like IF construct 'Case' is also used inside always block and whatever you want to assign must be a register variable. Here, we have a simple 'case' statement code. It is evaluated for all the possible combinations of 'sel'. As can be seen that for every possible combination 'Y' assigned to a value. In terms of hardware this code implements a MUX.
If we have not assigned the output Y for every possible commbination of case then it will infer a latch for missing case statements.
To avoid latch infering fro missing case statements we case use default case statment.e.g
If we have missing or partial assignments in case statements it will also infer latch for those missing asignments. e.g.
In case statment all the posiible combinations of the variable are evaluated which is present in the case statement. If somehow we have used overlaping combination in the case statement then it results in unpredictable output e.g.
As we have mentioned previously that an incomplete IF result in a Latch. Now we code an incomplete IF and perform its simulation and synthesis to check whether it infers a latch or not. To do this we have following verilog code:
Accoring to the code its hardware would be a D latch as shown in below figure.
Now we simulate it on iVerilog using a testbench. According to the waveform it can be seen it is infering a latch.
Now we perform its synthesis to check whether in synthesis it also refering a Latch. The Yosys statistics are shown below.
The synthesis schematic generated from yosys it can also be seen that it is inferring a Latch.
Now we code an other incomplete IF and perform its simulation and synthesis to check whether it infers a latch or not. To do this we have following verilog code:
Accoring to the code its hardware would infer a latch as shown in below figure.
Now we simulate it on iVerilog using a testbench. According to the waveform it can be seen it is infering a latch.
Now we perform its synthesis to check whether in synthesis it also refering a Latch. The Yosys statistics are shown below.
The synthesis schematic generated from yosys it can also be seen that it is inferring a Latch.
As we have mentioned previously that an incomplete case statement result in a Latch. Now we code an incomplete case and perform its simulation and synthesis to check whether it infers a latch or not. To do this we have following verilog code:
Accoring to the code its hardware would be a D latch as shown in below figure.
Now we simulate it on iVerilog using a testbench. According to the waveform it can be seen it is infering a latch.
Now we perform its synthesis to check whether in synthesis it also refering a Latch. The Yosys statistics are shown below.
The synthesis schematic is generated from yosys. It can also be seen that it is inferring a Latch.
As we have mentioned previously that an overlaping case statement result in unpredictable output. Now we code a overlaping case statement and perform its simulation and synthesis to check whether it infers a latch or not. To do this we have following verilog code:
Now we simulate it on iVerilog using a testbench. According to the waveform it can be seen the output is unpredictable in the highlighted box.
Now we perform its synthesis in Yosys
The synthesis schematic is generated from yosys.
Now we generate the netlsit for performing the GLS
The generated verilog netlist is shown below.
Upon performing the GLS in iVerilog, it can be seen in the waveform that there is no unpredictability in the ouput. rather it follows 'i0' when 'sel=00' , 'i1' when 'sel=01', 'i2' when 'sel=10' and 'i3' when 'sel=11'.
In verilog there are two type of looping constructs are used.
- For Loop It is used inside the always block and it used for evaluating the expressions.
- For generate It is outside the always block. It mainly used for instantiating the hardware for multipe times.
To understand the use of foor loop let's take the example of MUX. As we know that it is quite easy to a 2x1 MUX using IF constructs. Similary, a 4x1 MUX can be written using CASE statements. But when we going to write a 32x1 MUX using similar case statments it quite laborious and lenth task as can be seen in the below figure, we have to write all the possible 32 combinations for 32x1.
To make the task easy we can use a For loop inside the always block for genetating 32x1 MUX or higer MUXs in few lines of code.
So, for loop is helpful in creating a very wide MUX or DMUX.
This type of loop is used for instantiating the hardware for multiple times. Let's for some reason we have to instantiate an AND in module for 8 times. We can instantiate it using for-generate loop as follows.
Let's suppose we generate a MUX using a for loop. The verilog code is shown below. Here we have used 'i_int', which is actually bus combining the {i0,i1,i2,i3} in a single variable.
Now we simulate this code using iVerilog and a tesetbench. Clearly it can be sen that it is acurately simulating a 4x1 mux. For sel=00 'y' following the 'i0', for sel=01 'y' is following the 'i1', for sel=10 'y' following 'i2' and for sel=11 'y' is following 'i3'.
Now we perform its synthesis in Yosys
The synthesis schematic is generated from yosys.
Now we generate the netlsit for performing the GLS
The generated verilog netlist is shown below.
Upon performing the GLS in iVerilog, it can be seen in the waveform that the ouput follows 'i0' when 'sel=00' , 'i1' when 'sel=01', 'i2' when 'sel=10' and 'i3' when 'sel=11'. So, there is no simulation synthesis mismatch here.
Let's suppose we generate a DMUX using a for loop. The verilog code is shown below.
Now we simulated this code using iVerilog and a tesetbench. Clearly it can be sen that it is acurately simulating a 1x8 Dmux. For sel=000 'o0' following the 'i', for sel=001 'o1' is following the 'i', for sel=010 'o2' following 'i' and for sel=011 'o3' is following 'i' so on and so forth.
Now we perform its synthesis in Yosys
The synthesis schematic is generated from yosys.
Now we generate the netlsit for performing the GLS
Upon performing the GLS in iVerilog, it can be seen in the waveform that the ouput follows 'i'.For sel=000 'o0' following the 'i', for sel=001 'o1' is following the 'i', for sel=010 'o2' following 'i' and for sel=011 'o3' is following 'i' so on.
A simple adder for adding binary number is a ripple carry adder. It's buliding block is full adder. For example a 4 bit ripple carry adder is formed by connecting back to back four full adders. So, for writing the a 4 bit ripplr carry adder we need to instantiate 4 full aders in our code.
If we have to make 8 bit ripple carry adder than we have to instantiate 8 full adders in our code, Or in case 32 bit ripple carry adder we need 32 instantiation of same basic building blokc of full adder. This can be done efficiently using for-generate loop. Let;s take the example of a 8 bit ripple carry adder which coded using for-generate loop as shown below figure.
Now we perform its simulatio using iverilog and a tesbench. The resulting waveform is shown in belwo figure.
Now we perform its synthesis in Yosys
The synthesis schematic is generated from yosys.
Now performed the GLS of the rippple carry adder.
Here the 5 day work concluded.