-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy pathfeed.xml
7747 lines (6497 loc) · 507 KB
/
feed.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>The ZipCPU by Gisselquist Technology</title>
<description>The ZipCPU blog, featuring how to discussions of FPGA and soft-core CPU design. This site will be focused on Verilog solutions, using exclusively OpenSource IP products for FPGA design. Particular focus areas include topics often left out of more mainstream FPGA design courses such as how to debug an FPGA design.
</description>
<link>https://zipcpu.com/</link>
<atom:link href="https://zipcpu.com/feed.xml" rel="self" type="application/rss+xml"/>
<pubDate>Wed, 06 Nov 2024 09:46:48 -0500</pubDate>
<lastBuildDate>Wed, 06 Nov 2024 09:46:48 -0500</lastBuildDate>
<generator>Jekyll v4.2.0</generator>
<image>
<url>https://zipcpu.com/img/gt-rss.png</url>
<title></title>
<link></link>
</image>
<item>
<title>Your problem is not AXI</title>
<description><p>The following was a request for help from my inbox. It illustrates a common
problem students have. Indeed, the problem is common enough that <a href="/fpga-hell.html">this blog
was dedicated</a> to its solution. Let me
repeat the question here for reference:</p>
<blockquote>
<p>I’ve read some of your articles and old comments on forums in trying to
get something resembling Xilinx’ AXI4 Peripheral to work with my current
project in VIVADO for my FPGA. My main problem is that whenever I so much as
add a customizable AXI to my block design and connect it to my AXI
peripheral, generate a bitstream (with no failures), then build a platform
using it in VITIS (with no failures), my AXI GPIO connections which should
not be connected to the recently added customizable AXI, do not operate at
all (LEDs act as if tied to 0, although I’m sending all 1s). I tried a
solution I found online talking about incorrect “Makefile”s but to no avail.
I have also tried just adding some of your files <a href="https://github.com/ZipCPU/wb2axip">you provided on
github</a> instead of the Xilinx’ broken IP
including
“<a href="https://github.com/ZipCPU/wb2axip/blob/master/rtl/demoaxi.v">demoaxi.v</a>” and
“<a href="https://github.com/ZipCPU/wb2axip/blob/master/rtl/easyaxil.v">easyaxi.v</a>”
[sp]. The
“<a href="https://github.com/ZipCPU/wb2axip/blob/master/rtl/demoaxi.v">demoaxi.v</a>”
has the exact same problem as Xilinx’ AXI, just adding it to the
block design and connecting it to my AXI peripheral causes the GPIO not
connect somehow. Your
“<a href="https://github.com/ZipCPU/wb2axip/blob/master/rtl/easyaxil.v">easyaxi.v</a>”
[sp] does not cause this issue right away,
however adding an output and assigning it with the slave register “r0” then
results in the same issue. I am at a loss for what to do. I’m not very
familiar with the specifics of how AXI works, even after re-reading some of
your articles multiple times (I’m still a student with very little
experience), so I can’t be certain why I am running into this issue. My
guess at what is happening is that adding an AXI block with a certain
characteristic somehow causes the addresses for my GPIO and other connections
to “bug out”. But I have no idea why adding this kind of AXI block does
this (or something else that causes my issue). I’m reaching out because I
… might as well do something other
than making small changes to my design and waiting for 30+ minutes in between
tests to see if something breaks or doesn’t break my GPIO. Do you have any
idea what might be causing my issue or how to fix it?</p>
<p>Thanks,</p>
<p>(Student)</p>
</blockquote>
<p>(Links have been added …)</p>
<p>Let’s start with the easy question:</p>
<blockquote>
<p>Do you have any idea what might be causing my issue or how to fix it?</p>
</blockquote>
<p>No. Without looking at the design, the schematic, or digging into the design
files, I can’t really comment on something like this. Debugging hardware
designs is hard work, it takes time, and it takes a lot of attention to detail.
Without the details, I won’t be able to find the bug.</p>
<p>That said, let’s back up and address the root problem, and it’s not AXI.</p>
<p>Yes, I said that right: This student’s problem is not AXI.</p>
<p>If anything, AXI is just the symptom. If you don’t deal with the actual
problem, you will not succeed in this field.</p>
<h2 id="iterative-debugging">Iterative Debugging</h2>
<p>The fundamental problem is the method of debugging. The problem is that the
design doesn’t work, and this student doesn’t know how to figure out why not.
This was why I created my blog in the first place–to address this type of
problem.</p>
<table align="center" style="float: right"><caption>Fig 1. This is not how to do debugging</caption><tr><td><img src="/img/not-axi/broken-process.svg" width="320" /></td></tr></table>
<p>Here’s what I am hearing from the description: I tried A. It didn’t work.
I don’t know why not. So I tried B. That didn’t work either. I still don’t
know why not. Let me try asking an expert to see if he knows. It’s as though
the student expects me to be able, from these symptoms alone, to figure
out what’s wrong.</p>
<p>That’s not how this works. Indeed, this debugging process will lead you
straight to <a href="/fpga-hell.html">FPGA Hell</a>.</p>
<p>As an illustration, and for a fun story, consider the problem I’ve been working
on for the past couple weeks. I’m trying to get the FPGA processing working
for <a href="https://www.youtube.com/watch?v=vSB9BcLcUhM">this video project (fun promo video
link)</a>.</p>
<p>I got stuck for about two weeks at the point where I commanded the algorithm
to start and it didn’t do anything. Now what?</p>
<table align="center" style="padding: 25px; float: left"><caption>Fig 2. Voodoo computing defined</caption><tr><td><img src="/img/sdrxframe/voodoo.svg" width="320" /></td></tr></table>
<p>One approach to this problem would be to just change things, with no
understanding of what’s going on. I like to call this “Voodoo Computing”.
Sadly, it’s a common method of debugging that just … doesn’t work.</p>
<p>I use this definition because … it’s just so true. Even I often find myself
doing “voodoo computing” at times, and somehow expecting things to suddenly
fix themselves. The reality is, that’s not how engineering works.</p>
<p>Engineering works by breaking a problem down into smaller problems, and then
breaking those problems into smaller ones at that. In this student’s case,
he has a problem where his AXI slave doesn’t work. Let’s break that down by
asking a question: Is it your design that’s failing, or the Vivado created
“rest-of-the-system” that’s failing? Draw a line. Measure. Which one is it?</p>
<table align="center" style="float: right"><caption>Fig 3. Iterative Debugging</caption><tr><td><img src="/img/not-axi/iterative-debugging.svg" width="320" /></td></tr></table>
<p>Well, how would you know? You know by adding a test point of some type.
“Look” inside the system. Look at what’s going on. Look for any internal
evidence of a bug. For example, this student wants to write to his component
and to see a pin change. Perfect. Not trigger a capture on any writes to this
component, and see if you can watch that pin change from within the capture
and on the board. Does the component actually get written to? Do the
<code class="language-plaintext highlighter-rouge">AWVALID</code>, <code class="language-plaintext highlighter-rouge">AWREADY</code>, <code class="language-plaintext highlighter-rouge">WVALID</code>, <code class="language-plaintext highlighter-rouge">WREADY</code>, <code class="language-plaintext highlighter-rouge">BVALID</code>, and <code class="language-plaintext highlighter-rouge">BREADY</code> signals toggle
appropriately? How about <code class="language-plaintext highlighter-rouge">WDATA</code> and <code class="language-plaintext highlighter-rouge">WSTRB</code>? What of <code class="language-plaintext highlighter-rouge">AWADDR</code>? (You might
need to reduce this to a
single bit: <code class="language-plaintext highlighter-rouge">mydbg = (AWADDR == mydevices_register);</code>) If all these are
getting set appropriately, then the problem is in your design. Voila! You’ve
just narrowed down the issue.</p>
<p>Let’s illustrate this idea. You have a design that doesn’t work. You need
to figure out where the bug lies. So we first break this design into three
parts. I’ll call them 1) the AXI IP, 2) the LED output, and 3) the rest of the
design.</p>
<table align="center" style="float: none"><caption>Fig 4. Breaking down the problem</caption><tr><td><img src="/img/not-axi/decomposition.svg" width="560" /></td></tr></table>
<p>I would suggest two test points–although these can probably be merged into
the same “scope” (ILA). The first one would be between the AXI IP and the
rest of the design. This test point should look at all the AXI signals.
The second one should look at the LED output from your design.</p>
<p>Yes, I can hear you say, but of course the problem is within my AXI IP! Ahm,
no, you don’t get it. Earlier this year, I shipped a design to a well paying
customer, and they came back and complained that my design wasn’t properly
acknowledging write transactions. As I recall, either BID or BVALID were
getting corrupted or some such. What should I say as a professional engineer
to a comment like that? Do I tell the customer, gosh, I don’t know, that’s
never happened to me before? Do I tell him, not at all, my stuff works? Or
do I make random changes for him to try to see if these would fix his problem?
Frankly, none of these answers would be acceptable. Instead, I asked if he
could provide a trace or other evidence of the problem that we could inspect
together–much like I illustrated above in Fig. 4. When he did so, I was able
to clearly point out that my design was working–it was just Vivado’s IP
integrator that hadn’t properly connected it to the AXI bus. Yes, these
things happen. You, as the engineer, need to narrow down where the bug is
and getting a “trace” of what is going on is one clear way to do this.</p>
<table align="center" style="padding: 25px; float: left"><caption>Fig 5. Yes, it's hard. Get over it.</caption><tr><td><img src="/img/not-axi/encouragement.svg" width="320" /></td></tr></table>
<p>This problem is often both iterative and time consuming. Yes, it’s hard.
As my Ph.D. advisor used to say, “Take an Aspirin. Get over it.” It’s a
fact of life. This field isn’t easy. That’s why it pays well. Personally,
that’s also why I find it so rewarding to work in this field. I enjoy the
excitement of getting something working!</p>
<p>If we go back to the <a href="https://www.youtube.com/watch?v=vSB9BcLcUhM">video processing example I mentioned
earlier</a>, I eventually found
several bugs in my Verilog IP.</p>
<ol>
<li>
<p>A bus arbiter was broken, and so the arbiter would get locked up following
any bus error.</p>
<p>(Yes, this was <a href="https://github.com/ZipCPU/eth10g/blob/master/rtl/wbmarbiter.v">my own
arbiter</a>, and
and one I had borrowed from <a href="https://github.com/ZipCPU/eth10g">another
project</a>. It had no problems in the
<a href="https://github.com/ZipCPU/eth10g">that other project</a>.)</p>
</li>
<li>
<p>Every time the video chain got reset, the memory address got written to
zero–and so the design tried accessing a NULL memory pointer. This was then
the source of the bus error the arbiter was struggling with.</p>
</li>
<li>
<p>The CPU was faulting since the video controller was writing video data to
CPU instruction memory.</p>
<p>I traced this to using the wrong linker description file. Sure, a
simplified block RAM only description is great for initial bringup testing,
but there’s no way a 1080p image frame will fit in block RAM in addition
to the C library.</p>
</li>
<li>
<p>A key video component was dropping pixels any time Xilinx’s MIG had a
hiccup on the last return beat.</p>
<p>This was a bit more insidious than it sounds. The component in question
was the video frame buffer. This component reads video data from memory
and generates an outgoing video stream. A broken signaling flag caused the
frame buffer to drop the bus transaction while one word was still
outstanding. This left the memory request and memory recovery FSMs off by
one (more) beat.</p>
<p>If you’ve ever stared at traces from Xilinx’s MIG, you’ll notice that it
generates a lot of hiccups. Not only does it need to take the memory off
line periodically for refreshes, but it also needs to take it off line more
often for return clock phase tracking. This means that the ready wire,
in this case <code class="language-plaintext highlighter-rouge">ARREADY</code>, will have a lot of hiccups to it, and so
consequently will the <code class="language-plaintext highlighter-rouge">RVALID</code> (and <code class="language-plaintext highlighter-rouge">BVALID</code>) acknowledgments have similar
hiccups.</p>
<p>What happens, as it did in my case, when your design is sensitive to such
a hiccup at one particular clock cycle in your operation but not others?
The design might pass a simulation check, but still fail in hardware.</p>
<p>Fig 6. shows the basic trace of what was going on.</p>
</li>
</ol>
<table align="center" style="float: none"><caption>Fig 6. The missing ACK</caption><tr><td><img src="/img/not-axi/hlast-bug-annotated.png" width="760" /></td></tr></table>
<p>Notice what I just did there? I created a test point within the design, looked
at signals from within that test point, captured a trace of what was going on,
and hence was able to identify the problem. No, this wasn’t the first test
point–it took a couple to get to this point. Still, this is an example of
debugging a design within hardware.</p>
<p>The story of this video development goes on.</p>
<table align="center" style="float: right"><caption>Fig 7. The 3-board Stack</caption><tr><td><img src="/img/not-axi/stacked-woled.jpg" width="320" /></td></tr></table>
<p>At this point, though, I’ve now moved from one board to three. On the one
hand, that’s a success story. I only moved on once the single board was
working. On the other hand, the three boards aren’t talking to each other
(yet). I think I’ve now narrowed the problem down to a <a href="https://x.com/zipcpu/status/1853895732266516793">complex electrical
interaction between the two
boards</a>.</p>
<p>How did I do that? The key was to be able to capture a trace of what was
going on from within the system. Sound familiar? First, I captured a trace
indicating that the I2C master on the middle board was attempting to contact
the I2C slave on the bottom board and … the bottom board wasn’t
acknowledging. Then I captured a trace from the bottom board showing that
the I2C pins weren’t even getting toggled. Indeed, I eventually got to the
point where I was toggling the I2C pins by hand using the on board
switches–and even then the boards weren’t showing a connection between
them.</p>
<p>Generate a test. Test. Narrow down the problem. Continue.</p>
<h2 id="enumerating-debug-methods">Enumerating Debug Methods</h2>
<p>In many ways, debugging can be thought of as a feedback loop–much like
<a href="https://en.wikipedia.org/wiki/John_Boyd_(military_strategist)">Col Boyd</a>’s
<a href="https://en.wikipedia.org/wiki/OODA_loop">OODA loop</a>.</p>
<table align="center" style="float: none"><caption>Fig 8. Debugging Feedback Loop</caption><tr><td><img src="/img/not-axi/feedback-loop.svg" width="560" /></td></tr></table>
<p>The faster you can go through this loop, the faster you can find bugs, the
better your design will be.</p>
<p>Given this loop, let’s now go back and enumerate the basic methods for
debugging a hardware design.</p>
<ol>
<li>
<p><strong>Desk checking</strong>. This is the type of debugging where you stare at your
design, and hopefully just happen to see whatever the bug was. Yes, I do
this a lot. Yes, after a decade or two of doing design it does get easier
to find bugs this way. After a while, you start to see patterns and learn
look for them. No, I’m still not very successful using this
approach–and I’ve been doing digital design for a living for many years.</p>
<p>In the case of this student’s design, I’m sure he’d stared at his design
quite a bit and wasn’t seeing anything. Yeah. I get that. I’ve been there
too.</p>
<p>Build time required for desk checking? None.</p>
<p>Test time? This doesn’t involve testing, so none.</p>
<p>Analysis time? Well, it depends. Usually I give up before spending too
much time doing this.</p>
</li>
<li>
<p><strong>Lint</strong>, sometimes called “Static Design Analysis”. This type of
debugging takes place any time you use a tool to examine your design.</p>
<p>I personally like to use <code class="language-plaintext highlighter-rouge">verilator -Wall -cc mydesign.v</code>. Using Verilator,
I can get my design to have <em>zero</em> lint errors. Since this process tends
to be so quick and easy, I rarely discuss bugs found this way. They’re just
found and fixed so quickly that there’s no story to tell.</p>
<p>Vivado also produces a list of lint errors (warnings) every time it
synthesizes my design. The list tends to be long and filled with false
alarms. Every once in a long while I’ll examine this list for bugs.
Sometimes I’ll even find one or two.</p>
<p>From the student’s email above, I gather he believed his design was good
enough from this standpoint. Still, it’s a place worth looking when things
take unexpected turns.</p>
<p>Build time? None.</p>
<p>Test time? Almost instantaneous when using Verilator.</p>
<p>Analysis time? Typically very fast.</p>
</li>
<li>
<p><strong>Formal methods</strong>. Formal methods involve first <em>assuming</em> things about
your inputs, and then making <em>assertions</em> about how the design is supposed
to work. A solver can then be used to logically <em>prove</em> that if your
assumptions hold, then your assertions will as well. If the solver fails,
it will provide you with a very short trace illustrating what might happen.</p>
<p>You can read about <a href="/blog/2017/10/19/formal-intro.html">my own first experience with formal methods
here</a>, although that’s
no longer where I’d suggest you start. Were I to recommend a starting
place, it would probably be <a href="/tutorial/">my Verilog design
tutorial</a>.</p>
<p>Many of the bugs I mentioned in the <a href="https://www.youtube.com/watch?v=vSB9BcLcUhM">video design I’m working
with</a> <em>should’ve</em> been found
via formal methods. However, some of the key components didn’t get
formally verified. (Yes, that’s on me. This was supposed to be a
<em>prototype</em>…) The
<a href="https://github.com/ZipCPU/eth10g/blob/master/rtl/wbmarbiter.v">arbiter</a>,
however, had gone through a formal verification process. Sadly, at one point
I had placed an assumption into the design that there would never be any bus
errors. What do you know? That kept it from finding bus errors!
Likewise, the <a href="https://x.com/zipcpu/status/1852735323161207089">frame buffer’s proof never passed
induction</a>, so it
never completed a full bus request to see what would happen if the two got
out of sync. The excuses go on. I’m now working on formally verifying
these components.</p>
<p>In the case of the student above, he mentions using some formally verified
designs, but says nothing about whether or not he formally verified the LED
output of those designs.</p>
<p>Build time? For formal methods, this typically references how long it
takes to translate the design into a formal language of some type–such as
SMT. When using Yosys, the time it takes to do this is usually so quick I
don’t notice it.</p>
<p>Test time? <a href="/formal/2019/08/03/proof-duration.html">We measured formal proof solver time some time
ago</a>. Bottom
line, 87% of the time a formal proof will take less than two minutes, and
only 5% of the time will it ever take longer than ten minutes.</p>
<p>Analysis time? This tends to only take a minute or two. One of the
good things of formal proofs, is that the solver will lead you directly
to the error.</p>
</li>
<li>
<p><strong>Simulation</strong>.</p>
<p>Simulation is a very important debugging tool. It’s one of the easiest
ways to find bugs. In general, if a design doesn’t work in simulation,
then it will never work in hardware.</p>
<p>However, simulation depends upon <em>models</em> of all of the components in
question–both those written in Verilog and those only available via
data sheet, from which Verilog (or other) models need to be written
and thus only approximated. As a result, there are often gaps between how
the models work and what happens in reality.</p>
<p>A second reality of simulation is that it’s not complete. There will always
be cases that don’t get simulated. A good engineer will work to limit the
number of these cases, but it’s very hard to eliminate them entirely.
For example:</p>
<ul>
<li>Not simulating jumping to the last instruction in a cache line left me with <a href="/zipcpu/2017/12/28/ugliest-bug.html">quite a confusing mix of symptoms</a>.</li>
<li>Not simulating bus errors lead to missing a bus lockup in the arbiter above.</li>
<li>Not simulating ACK dropping at the last beat in a series of requests, led to the frame buffer perpetually resynchronizing.</li>
<li>Not simulating stalls and multiple outstanding requests led Xilinx to believe their AXI demo worked.</li>
</ul>
<p>Considering the <a href="https://www.youtube.com/watch?v=vSB9BcLcUhM">video processing
example</a> I’ve been discussing,
I’ll be the first (and proudest) to
declare that all of the video algorithms worked nicely in simulation.
Yes, they worked in simulation–they just didn’t work in hardware.
Why? My simulation didn’t include the MIG or the DDR3 SDRAM. Instead, I
had <em>approximated</em> their performance with a basic block RAM implementation.
This usually works for me, since I like to formally verify everything–only
I didn’t formally verify everything this time. The result were some bugs
that slipped through the cracks, and so among other things my simulation
never fully exercised the design. My simulation also didn’t include the
CPU, nor did it accurately have the same type and amount of memory as the
final design had. These were all problems with my simulation, that kept me
from catching some of these last bugs.</p>
<p>While simulation is the “easiest” type of debugging, does tend to be slow and
resource (i.e. memory and disk) consuming. Traces from my video tests are
often 200GB or larger. Indeed, this is one of the reasons why the simulation
doesn’t include either the MIG DDR3 SDRAM controller, the CPU, the
<a href="/blog/2019/03/27/qflexpress.html">flash</a>,
<a href="/zipcpu/2018/07/13/memories.html">block RAM</a>, or the
<a href="/blog/2019/07/17/crossbar.html">Wishbone crossbar</a>.</p>
<p>I would be very curious to know if the student who wrote me had fully
simulated his design–from ARM software to LED.</p>
<p>Build time? When using Verilator, I’ve seen this take up to a minute or
two for a large and complex design, although I rarely notice it.</p>
<p>Test time? The video simulations I’ve been running take about an hour or
so when using Verilator. A full ZipCPU test suite can take two hours using
Verilator, or about a week when using Icarus Verilog.</p>
<p>Test time gets annoying when using Vivado, since it doesn’t automatically
capture every signal from within the design as Verilator will. I
understand there’s a setting to make this happen, but … I haven’t found
it yet.</p>
<p>Analysis time? This tends to be longer than formal methods, since I
typically find myself tracing bugs through simulations of very large and
complex designs, and it takes a while to trace back from the evidence of the
bug to the actual bug itself. The worst examples of simulation analysis
I’ve had to do were of <a href="https://www.arasan.com/products/nand-flash/">NAND flash
simulations</a>, where you don’t
realize you have a problem until you read results from the flash. Then you
need to first find the evidence of the problem in the trace (expected
value doesn’t match actual value), then trace it from the AXI bus to the
flash read bus, across multiple flash transactions to the critical one
that actually programmed the block in question, back across the flash bus
to the host IP, and then potentially back further to the AXI transaction
that provided the information in the first place. While doable, this can
be quite painful.</p>
</li>
</ol>
<table align="center" style="float: center"><caption>Fig 9. Tracing from cause to effect can require a lot of investigation</caption><tr><td><img src="/img/not-axi/longsim.svg" width="760" /></td></tr></table>
<ol start="5">
<li>
<p><strong>Debug in hardware</strong>. Getting to hardware is painful–it requires building
a complete design, handling timing exceptions, and a typically long
synthesis process. Once you get there, tests can typically be run very
fast. However, such tests are often unrevealing. Trying something else
on hardware often requires a design change, rebuild, and … a substantial
stall in your process which will slow you down. In the case of this student,
he measured this stall time at 30min.</p>
<p>This <em>stall</em> time while things are rebuilding can make hardware debugging
slow and expensive. Why is it expensive? Because time is expensive. I
charge by the hour. I can do that. I’m not a student. Students on the
other hand are often overloaded for time. They have other projects to do,
and one class (or lab) consuming a majority of their time will quickly
become a serious problem on the road to graduation.</p>
<p>Knowing what’s wrong when things fail in
hardware is … difficult–else I wouldn’t be writing this note.</p>
<p>However, it’s a skill you need to have if you are going to work in this
field. How can you do it? You can use LEDs. You can use your UART. If
you are on an ARM based FPGA, you can often use printf. You can use a
companion CPU (PC), or even an on-board CPU (ARM or softcore). You can
use the ILA, or you can build your own (that’s me). In all cases, you
need to be able extract the key information regarding the “bug” (whatever
it might be) from the design. That key information needs to point you to
the bug. Is it in Vivado generated IP? Is it in the Verilog? If it’s in
your Verilog, where is it? You need to be able to bisect your design
repeatedly to figure this out.</p>
<p>In the case of <a href="https://www.youtube.com/watch?v=vSB9BcLcUhM">the video project I’m working
on</a>, this is (currently) where
I’m at in my development.</p>
<p>In the case of the student above, I’d love to know whether <code class="language-plaintext highlighter-rouge">assign led=1;</code>
would work, if the LED control wire was mapped to the correct pin, or
if the LED’s control was inverted. Without more information, I might
never know.</p>
<p>Build time? That is, how long does it take to turn the design Verilog
into a bit file? Typically I deal with build times of roughly 12-15 minutes.
The student above was dealing with a 30min build time. I’ve heard horror
stories of Vivado even taking as long as a day for particularly large
designs, but never had to deal with delays that long myself.</p>
<p>Test time? Most hardware tests take longer to set up than to perform, so
I’ll note this as “almost instantaneous.” Certainly my video tests tended
to be very quick.</p>
<p>Analysis time? “What just happened?” seems to be a common refrain in
hardware testing. Sure, you just ran a test, but … what really happened
in it? This is the problem with testing in hardware. It can take a lot
of work to get to the “success” or “failure” measure. In the video
processing case, video processing takes place on a pixel at a time at over
80M pixels per second, but the final “success” (once I got there) was
watching the effects of the video processing as applied to a 4 minute video.
Indeed, I was so excited (once I got there), that I called everyone from
my family to come and watch.</p>
</li>
</ol>
<p>While I’d love to say one debugging method is better than another, the reality
is that they each have their strengths and weaknesses. Formal methods, for
example, don’t often work on medium to large designs. Lint tends to miss
things. You get the picture. Still, you need to be familiar with
every technique, to have them in your tool belt for when something doesn’t
work.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Again, the bottom line is that you need to know how to debug a design
to succeed in this field. This is a prerequisite for anything that might
follow–such as building an AXI slave. Perhaps a <a href="https://zipcpu.com/zipcpu/2019/02/04/debugging-that-cpu.html">fun
story</a> might
help illustrate my points.</p>
<p>You might also find the <a href="https://zipcpu.com/blog/2017/06/02/design-process.html">first article I wrote on this hardware debugging
topic</a> to be valuable.</p>
<p>Or how about <a href="https://zipcpu.com/blog/2017/06/10/lost-college-student.html">the response from a student who then commented on that article,
after struggling with these same
issues</a>?</p>
<p>In all of this, the hard reality remains:</p>
<ol>
<li>
<p>Hardware debugging is hard.</p>
</li>
<li>
<p>There is a methodology to it. I might even use the word “methodical”,
but that would be redundant.</p>
</li>
<li>
<p>You will need to learn that methodology to debug your design.</p>
</li>
<li>
<p>Once you understand the methodology of hardware debugging, you can then
debug any design–to include any AXI design.</p>
</li>
</ol>
<p>Hardware design isn’t for everybody. Not everyone will make it through
their learning process–be it college or self taught. Yes, there are
<a href="https://reddit.com/r/FPGA">design communities</a> that would love to help
and encourage you. On the bright side, hard work pays well in any field.</p>
<hr /><p><em>Seest thou a man diligent in his business? He shall stand before kings; he shall not stand before mean men. (Prov 22:29)</em></description>
<pubDate>Wed, 06 Nov 2024 00:00:00 -0500</pubDate>
<link>https://zipcpu.com/blog/2024/11/06/not-axi.html</link>
<guid isPermaLink="true">https://zipcpu.com/blog/2024/11/06/not-axi.html</guid>
<category>blog</category>
</item>
<item>
<title>My Personal Journey in Verification</title>
<description><p>This week, I’ve been testing a CI/CD pipeline. This has been my opportunity
to shake the screws and kick the tires on what should become a new verification
product shortly.</p>
<p>I thought that a good design to check might be my
<a href="https://github.com/ZipCPU/sdsdpi">SDIO project</a>. It has roughly all the
pieces in place, and so makes sense for an automated testing pipeline.</p>
<p>This weekend, the CI project engineer shared with me:</p>
<blockquote>
<p>It’s literally the first time I get to know a good hardware project needs
such many verifications and testings! There’s even a real SD card
simulation model and RW test…</p>
</blockquote>
<p>After reminiscing about this for a bit, I thought it might be worth taking a
moment to tell how I got here.</p>
<h2 id="verification-the-goal">Verification: The Goal</h2>
<p>Perhaps the best way to explain the “goal” of verification is by way of an
old “war story”–as we used to call them.</p>
<p>At one time, I was involved with a DOD unit whose whole goal and purpose was
to build quick reaction hardware capabilities for the warfighter. We bragged
about our ability to respond to a call on a Friday night with a new product
shipped out on a C-130 before the weekend was over.</p>
<p>Anyone who has done engineering for a while will easily recognize that this
sort of concept violates all the good principles of engineering. There’s no
time for a requirements review. There’s no time for prototyping–or perhaps
there is, to the extent that it’s always the <em>prototype</em> that heads out the
door to the warfighter as if it were a <em>product</em>. There’s no time to build a
complete test suite, to verify the new capability against all things that could
go wrong. However, we’d often get only one chance to do this right.</p>
<p>Now, how do you accomplish quality engineering in that kind of environment?</p>
<p>The key to making this sort of shop work lay in the “warehouse”, and what
sort of capabilities we might have “lying on the shelf” as we called it.
Hence, we’d spend our time polishing prior capabilities, as well as
anticipating new requirements. We’d then spend our time building, verifying,
and testing these capabilities against phantom requirements, in the hopes that
they’d be close to what we’d need to build should a real requirement arise.
We’d then place these concept designs in the “warehouse”, and show them off
to anyone who came to visit wondering what it was that our team was able to
accomplish. Then, when a new requirement arose, we’d go into this “warehouse”
and find whatever capability was closest to what the customer required and
modify it to fit the mission requirement.</p>
<p>That was how we achieved success.</p>
<table align="center" style="float: right"><tr><td><img src="/img/vlog-wait/rule-of-gold.svg" width="320" /></td></tr></table>
<p>The same applies in digital logic design. You want to have a good set of
tried, trusted, and true components in your “library” so that whenever a new
customer comes along, you can leverage these components quickly to meet his
needs. This is why I’ve often said that well written, well tested, well
verified design components are gold in this business. Such components allow
you to go from zero to product in short order. Indeed, the more well-tested
components you have that you can
<a href="/blog/2020/01/13/reuse.html">reuse</a>, the faster you’ll be
to market with any new need, and the cheaper it will cost you to get there.</p>
<p>That’s therefore the ultimate goal: a library of
<a href="/blog/2020/01/13/reuse.html">reusable</a>
components that can be quickly composed into new products for customers.</p>
<p>As I’ve tried to achieve this objective over the years, my approach to
component verification has changed, or rather grown, many times over.</p>
<h2 id="hardware-verification">Hardware Verification</h2>
<p>When I first started learning FPGA design, I understood nothing about
simulation. Rather than learning how to do simulation properly, I instead
learned quickly how to test my designs in hardware. Most of these designs
were DSP based. (My background was DSP, so this made sense …) Hence,
the following approach tended to work for me:</p>
<ul>
<li>
<p>I created access points in the hardware that allowed me to read and write
registers at key locations within the design.</p>
</li>
<li>
<p>One of these “registers” I could write to controlled the inputs to my DSP
pipeline.</p>
</li>
<li>
<p>Another register, when written to, would cause the design to “step” the
entire DSP pipeline as if a new sample had just arrived from the A/D.</p>
</li>
<li>
<p>A set of registers within the design then allowed me to read the state of
the entire pipeline, so I could do debugging.</p>
</li>
</ul>
<p>This worked great for “stepping” through designs. When I moved to processing
real-time information, such as the A/D results from the antenna connected to
the design, I build an internal logic analyzer to catch and capture key
signals along the way.</p>
<p>I called this “Hardware in the loop testing”.</p>
<p>Management thought I was a genius.</p>
<p>This approach worked … for a while. Then I started realizing how painful it
was. I think the transition came when I was trying to debug
<a href="/2018/10/02/fft.html">my FFT</a> by writing test vectors to
an Arty A7 circuit board via UART, and reading the results back to display
them on my screen. Even with the hardware in the loop, hitting all the test
vectors was painfully slow.</p>
<p>Eventually, I had to search for a new and better solution. This was just too
slow. Later on, I would start to realize that this solution didn’t catch
enough bugs–but I’ll get to that in a bit.</p>
<h2 id="happy-path-simulation-testing">Happy Path Simulation Testing</h2>
<p><a href="https://en.wikipedia.org/wiki/Happy_path">“Happy path” testing</a>
is a reference to simply testing working paths
through a project’s environment. To use an aviation analogy, a <a href="https://en.wikipedia.org/wiki/Happy_path">“happy path”
test</a>
might make sure the ground avoidance radar never alerted when you
weren’t close to the ground. It doesn’t make certain that the radar
necessarily does the right thing when you are close to the ground.</p>
<p>So, let’s talk about my next project: the
<a href="/about/zipcpu.html">ZipCPU</a>.</p>
<p>Verification of the <a href="/about/zipcpu.html">CPU</a>
began with an <a href="https://github.com/ZipCPU/zipcpu/blob/a20b6064ea794d66fdeb2e00929287d7f2dc9ac6/bench/asm/simtest.s">assembly
program</a>
the <a href="/about/zipcpu.html">ZipCPU</a> would run. The
<a href="https://github.com/ZipCPU/zipcpu/blob/a20b6064ea794d66fdeb2e00929287d7f2dc9ac6/bench/asm/simtest.s">program</a>
was designed to test all the instructions of the
<a href="/about/zipcpu.html">CPU</a>
with sufficient fidelity to know when/if the
<a href="/about/zipcpu.html">CPU</a> worked.</p>
<p>The test had one of two outcomes. If the program halted, then the test was
considered a success. If it detected an error, the
<a href="/about/zipcpu.html">CPU</a> would execute a
<code class="language-plaintext highlighter-rouge">BUSY</code> instruction (i.e. jump to current address) and then perpetually loop.
My test harness could then detect this condition and end with a failing exit
code.</p>
<p>When the <a href="/about/zipcpu.html">ZipCPU</a> acquired a software
tool chain (GCC+Binutils) and C-library support, this <a href="https://github.com/ZipCPU/zipcpu/blob/a20b6064ea794d66fdeb2e00929287d7f2dc9ac6/bench/asm/simtest.s">assembly
program</a>
was abandoned and replaced with a <a href="https://github.com/ZipCPU/zipcpu/blob/a20b6064ea794d66fdeb2e00929287d7f2dc9ac6/sim/zipsw/cputest.c">similar program in
C</a>.
While I still use <a href="https://github.com/ZipCPU/zipcpu/blob/a20b6064ea794d66fdeb2e00929287d7f2dc9ac6/sim/zipsw/cputest.c">this
program</a>,
it’s no longer the core of the <a href="/about/zipcpu.html">ZipCPU</a>’s
verification suite. Instead, I tend to use it to shake out any bugs in any
new environment the <a href="/about/zipcpu.html">ZipCPU</a> might be
placed into.</p>
<p>This approach failed horribly, however, when I tried integrating an <a href="https://github.com/ZipCPU/zipcpu/blob/a20b6064ea794d66fdeb2e00929287d7f2dc9ac6/rtl/core/pfcache.v">instruction
cache</a>
into the <a href="/about/zipcpu.html">ZipCPU</a>. I built the
<a href="https://github.com/ZipCPU/zipcpu/blob/a20b6064ea794d66fdeb2e00929287d7f2dc9ac6/rtl/core/pfcache.v">instruction
cache</a>.
I tested the <a href="https://github.com/ZipCPU/zipcpu/blob/a20b6064ea794d66fdeb2e00929287d7f2dc9ac6/rtl/core/pfcache.v">instruction
cache</a>
in isolation. I tested the
<a href="https://github.com/ZipCPU/zipcpu/blob/a20b6064ea794d66fdeb2e00929287d7f2dc9ac6/rtl/core/pfcache.v">cache</a>
as part of the
<a href="/about/zipcpu.html">CPU</a>. I convinced myself that it worked.
Then I placed my “working” design onto hardware and <a href="/zipcpu/2017/12/28/ugliest-bug.html">all
hell broke lose</a>.</p>
<p>This was certainly not “the way.”</p>
<h2 id="formal-verification">Formal Verification</h2>
<p>I was then asked to <a href="/blog/2017/10/19/formal-intro.html">review a new, open source, formal verification tool called
SymbiYosys</a>. The tool
handed my cocky attitude back to me, and took my pride down a couple steps. In
particular, I found a bunch of bugs in a FIFO I had used for years. The bugs
had never shown up in hardware testing (that I had noticed at least), and
certainly hadn’t shown up in any of my <a href="https://en.wikipedia.org/wiki/Happy_path">“Happy path”
testing</a>. This left me wondering,
how many other bugs did I have in my designs that I didn’t know about?</p>
<p>I then started <a href="/blog/2018/01/22/formal-progress.html">working through my previous projects, formally verifying all my
prior work</a>. In every
case, I found more bugs. By the time I got to the
<a href="/about/zipcpu.html">ZipCPU</a>–<a href="/blog/2018/04/02/formal-cpu-bugs.html">I found a myriad of bugs
in what I thought was a “working”</a>
<a href="/about/zipcpu.html">CPU</a>.</p>
<p>I’d like to say that the quality of my IP went up at this point. I was
certainly finding a lot of bugs I’d never found before by using formal methods.
I now knew, for example, how to guarantee I’d never have any more of those
cache bugs I’d had before.</p>
<p>So, while it is likely that my IP quality was going up, the unfortunate
reality was that I was still finding bugs in my “formally verified”
IP–although not nearly as many.</p>
<p>A <a href="/formal/2020/06/12/four-keys.html">couple of improvements</a>
helped me move forward here.</p>
<ol>
<li>
<p>Bidirectional formal property sets</p>
<p>The biggest danger in formal verification is that you might <code class="language-plaintext highlighter-rouge">assume()</code>
something that isn’t true. The first way to limit this is to make
sure you never <code class="language-plaintext highlighter-rouge">assume()</code> a property within the design, but rather you
only <code class="language-plaintext highlighter-rouge">assume()</code> properties of inputs–never outputs, and never local
registers.</p>
<p>But how do you know when you’ve assumed too much? This can be a challenge.</p>
<p>One of the best ways I’ve found to do this is to create a bidirectional
property set. A bus master, for example, would make assumptions about
how the slave would respond. A similar property set for the bus slave
would make assumptions about what the master would do. Further, the slave
would turn the master’s assumptions into verifiable assertions–guaranteeing
that the master’s assumptions were valid. If you can use the same property
set in this manner for both master and slave, save that you swap
assumptions and assertions, then you can verify both in isolation to
include only assuming those things that can be verified elsewhere.</p>
<p>Creating such property sets for both AXI-Lite and AXI led me to find
many bugs in Xilinx IP. This alone suggested that I was on the “right path”.</p>
</li>
<li>
<p>Cover checking</p>
<p>I also learned to use <a href="/formal/2018/07/14/dev-cycle.html">formal coverage
checking</a>, in
addition to straight assertion
based verification. Cover checks weren’t the end all, but they could
be useful in some key situations. For example, a quick cover check might
help you discover that you had gotten the reset polarity wrong, and so
all of your formal assertions were passing because your design was assumed
to be held in reset. (This has happened to me more than once. Most
recently, the <a href="/blog/2024/06/13/kimos.html">cost was a couple of months
delay</a> on what should’ve
otherwise been a straight forward hardware bringup–but that wasn’t really
a <em>formal</em> verification issue.)</p>
<p>For a while, I also <a href="/formal/2018/07/14/dev-cycle.html">used cover checking to quickly discover (with minimal
work) how a design component might work within a larger
environment</a>. I’ve
since switched to simulation checking (with assertions enabled) for my
most recent examples of this type of work, but I do still find it valuable.</p>
</li>
<li>
<p><a href="/blog/2018/03/10/induction-exercise.html">Induction</a></p>
<p><a href="/blog/2018/03/10/induction-exercise.html">Induction</a> isn’t
really a “new” thing I learned along the way, but it is worth mentioning
specially. As I learned formal verification, I learned to use
<a href="/blog/2018/03/10/induction-exercise.html">induction</a>
right from the start and so I’ve tended to use
<a href="/blog/2018/03/10/induction-exercise.html">induction</a>
in every proof I’ve ever done. It’s just become my normal practice from day
one.</p>
<p><a href="/blog/2018/03/10/induction-exercise.html">Induction</a>,
however, takes a lot of work. Sometimes it takes so much work I wonder
if there’s really any value in it. Then I tend to find some key bug or
other–perhaps a buffer overflow or something–some bug I’d have never found
without
<a href="/blog/2018/03/10/induction-exercise.html">induction</a>.
That alone keeps me running
<a href="/blog/2018/03/10/induction-exercise.html">induction</a>.
every time I can. Even better, once the
<a href="/blog/2018/03/10/induction-exercise.html">induction</a>.
proof is complete, you can often <a href="/formal/2019/08/03/proof-duration.html">trim the entire formal proof down from
15-20 minutes down to less than a single
minute</a>.</p>
</li>
<li>
<p>Contract checking</p>
<p>My initial formal proofs were haphazard. I’d throw assertions at the wall
and see what I could find. Yes, I found bugs. However, I never really had
the confidence that I was “proving” a design worked. That is, not until I
learned of the idea of a “formal contract”. The “formal contract” simply
describes the essence of how a component worked.</p>
<p>For example, in a memory system, the formal contract might have the solver
track a single value of memory. When written to, the value should change.
When read, the value should be returned. If this contract holds for all such
memory addresses, then the memory acts (as you would expect) … like a
<em>memory</em>.</p>
</li>
<li>
<p>Parameter checks</p>
<p>For a while, I was maintaining <a href="https://github.com/ZipCPU/zbasic">“ZBasic”–a basic ZipCPU
distribution</a>. This was where I did all
my simulation based testing of the
<a href="/about/zipcpu.html">ZipCPU</a>. The problem was, this
approach didn’t work. Sure, I’d test the
<a href="/about/zipcpu.html">CPU</a> in one configuration, get it
to work, and then put it down believing the
“<a href="/about/zipcpu.html">CPU</a>” worked. Some time later,
I’d try the <a href="/about/zipcpu.html">CPU</a> in a different
configuration–such as pipelined vs non-pipelined, and … it
would fail in whatever mode it had not been tested in. The problem with the
<a href="https://github.com/ZipCPU/zbasic">ZBasic approach</a> is that it tended to only
check one mode–leaving all of the others unchecked.</p>
<p>This lead my to adjust the proofs of the
<a href="/about/zipcpu.html">ZipCPU</a> so that the
<a href="/about/zipcpu.html">CPU</a> would at least be formally
verified with as many parameter configurations as I could to make sure it
would work in all environments.</p>
</li>
</ol>
<p>I’ve written more about <a href="/formal/2020/06/12/four-keys.html">these parts of a proof some time
ago</a>, and I still stand
by them today.</p>
<p>Yes, formal verification is hard work. However, a well verified design is
highly valuable–on the shelf, waiting for that new customer requirement to
come in.</p>
<p>The problem with all this formal verification work lies in its (well known)
Achilles heel. Because formal verification includes an exhaustive
combinatorial search for bugs across all potential design inputs and states,
it can be computationally expensive. Yeah, it can take a while. To reduce
this expense, it’s important to limit the scope of what is verified. As a
result, I tend to verify design <em>components</em> rather than entire designs. This
leaves open the possibility of a failure in the logic used to connect all
these smaller, verified components together.</p>
<h2 id="autofpga-and-better-crossbars">AutoFPGA and Better Crossbars</h2>
<p>Sure enough, the next class of bugs I had to deal with were integration bugs.</p>
<p>I had to deal with several. Common bugs included:</p>
<ol>
<li>
<p>Using unnamed ports, and connecting module ports to the wrong signals.</p>
<p>At one point, I decided the
<a href="/zipcpu/2017/11/07/wb-formal.html">Wishbone</a>
“stall” port should come before the
<a href="/zipcpu/2017/11/07/wb-formal.html">Wishbone</a>
acknowledgment port. Now, how many designs had to change to accommodate
that?</p>
</li>
<li>
<p>I had a bunch of problems with my <a href="/blog/2017/06/22/simple-wb-interconnect.html">initial interconnect
design</a>
methodology. Initially, I used the slave’s
<a href="/zipcpu/2017/11/07/wb-formal.html">Wishbone</a>
strobe signal as an address decoding signal. I then had a bug where the
address would move off of the slave of interest, and the acknowledgment
was never returned. The result of that bug was that the design hung any
time I tried to read the entirety of <a href="/blog/2019/03/27/qflexpress.html">flash
memory</a>.</p>
<p>Think about how much simulation time and effort I had to go through to
simulate reading an <em>entire</em> <a href="/blog/2019/03/27/qflexpress.html">flash
memory</a>–just to find
this bug at the end of it. Yes, it was painful.</p>
</li>
</ol>
<p>Basically, when connecting otherwise “verified” modules together by hand,
I had problems where the result wasn’t reliably working.</p>
<p>The first and most obvious solution to something like this is to use a linting
tool, such as <code class="language-plaintext highlighter-rouge">verilator -Wall</code>.
<a href="https://www.veripool.org/verilator/">Verilator</a> can find things like
unconnected pins and such. That’s a help, but I had been doing that from
early on.</p>
<p>My eventual solution was twofold. First, I redesigned my <a href="/blog/2019/07/17/crossbar.html">bus
interconnect</a> from the
top to the bottom. You can find the new and redesigned
<a href="/blog/2019/07/17/crossbar.html">interconnect</a> components
in my <a href="https://github.com/ZipCPU/wb2axip">wb2axip repository</a>. Once these
components were verified, I then had a proper guarantee: all masters would get
acknowledgments (or errors) from all slave requests they made. Errors would
no longer be lost. Attempts to interact with a non-existent slave would
(properly) return bus errors.</p>
<p>To deal with problems where signals were connected incorrectly, I built a tool
I call <a href="/zipcpu/2017/10/05/autofpga-intro.html">AutoFPGA</a> to
connect components into designs. A special tag given to the tool would
immediately connect all bus signals to a bus component–whether it be a slave
or master, whether it be connected to a
<a href="/zipcpu/2017/11/07/wb-formal.html">Wishbone</a>,
<a href="/formal/2018/12/28/axilite.html">AXI-Lite</a>, or
<a href="/formal/2019/05/13/axifull">AXI</a> bus. This required that my
slaves followed one of two conventions. Either all the bus ports had to
follow a basic port ordering convention, or they needed to follow a bus
naming convention. Ideally, a slave should follow both. Further, after
finding even more port connection bugs, I’m slowly moving towards the practice
of naming all of my port connections.</p>
<p>This works great for composing designs of bus components. Almost all of my
designs now use this approach, and only a few (mostly test bench) designs
remain where I connect bus components by hand manually.</p>
<h2 id="mcy">MCY</h2>
<p>At one time along the way, I was asked to review <a href="https://github.com/YosysHQ/mcy">MCY: Mutation Coverage with
Yosys</a>. My review back to the team was …
mixed.</p>
<p><a href="https://github.com/YosysHQ/mcy">MCY</a>
works by intentionally breaking your design. Such changes to the design are
called “mutations”. The goal is to determine whether or not the mutated
(broken) design will trigger a test failure. In this fashion, the test suite
can be evaluated. A “good” test suite will be able to find any mutation.
Hence, <a href="https://github.com/YosysHQ/mcy">MCY</a>
allows you to measure how good your test suite is in the first place.</p>
<p>Upon request, I tried <a href="https://github.com/YosysHQ/mcy">MCY</a> with the
<a href="/about/zipcpu.html">ZipCPU</a>. This turned into a bigger
challenge than I had expected. Sure, <a href="https://github.com/YosysHQ/mcy">MCY</a>
works with <a href="https://github.com/steveicarus/iverilog">Icarus Verilog</a>,
<a href="https://www.veripool.org/verilator/">Verilator</a>, and even (perhaps) some other
(not so open) simulators as well. However, when I ran a design under
<a href="https://github.com/YosysHQ/mcy">MCY</a>, my simulations tended to find only a