-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gowin BSRAM inference -- We need B suffix (such as DPB instead of current DP) #4098
Comments
A patch that purely mechanically corrects memory synthesis. Now there will be less swearing when creating images, but this should not make them work. A good image is with direct use of the DPB primitive. prim-good.mp4Not a very good image - when we infer. infer-bad.mp4And now a little of my stupid reasoning as a newbie. DPB mem(
.DOB({dummy, read_data}),
.DIA({{8{gnd}}, write_data}),
.DIB({16{gnd}}),
.ADA({write_ad, gnd, gnd, gnd}),
.ADB({ read_ad, gnd, gnd, gnd}),
.CLKA(write_clk),
.CLKB(clk),
.OCEA(vcc),
.OCEB(vcc),
.CEA(write_ce),
.CEB(vcc),
.WREA(vcc),
.WREB(read_wre),
.BLKSELA(3'b000),
.BLKSELB(3'b000),
.RESETA(write_reset),
.RESETB(reset)
);
defparam mem.READ_MODE0 = 1'b1;
defparam mem.READ_MODE1 = 1'b1;
defparam mem.WRITE_MODE0 = 2'b01;
defparam mem.WRITE_MODE1 = 2'b01;
defparam mem.BIT_WIDTH_0 = 8;
defparam mem.BIT_WIDTH_1 = 8;
defparam mem.BLK_SEL_0 = 3'b000;
defparam mem.BLK_SEL_1 = 3'b000;
defparam mem.RESET_MODE = "SYNC"; In the case of inferring, yosys selects the primitive, parameters and used ports based on an analysis of memory usage. The Gowin documentation has a list of boilerplate pieces of code that lead to the generation of certain primitives. This is SUG550-1.6E_GowinSynthesis User Guide.pdf Here's a typical snippet from there: Instructions on how to select a primitive during inferring are in the files: techlibs/gowin/brams.txt and techlibs/gowin/brams_map.v You can see what yosys chose for each case in the logs (with the -g switch) and in the output .json file after synthesis. This is how I tried to get yosys to generate DPB for the goblin faces above. (* ram_style = "block" *) reg [7:0] mem[10:0];
always @(posedge clk or posedge reset) begin
if (reset) begin
read_data <= 0;
end else begin
read_data <= mem[read_ad];
end
end
always @(posedge write_clk) begin
if (write_ce) begin
mem[write_ad] <= write_data;
end
end That's what yosys did: "vmem.mem.0.0": {
"hide_name": 0,
"type": "DPX9B",
"parameters": {
"BIT_WIDTH_0": "00000000000000000000000000001001",
"BIT_WIDTH_1": "00000000000000000000000000001001",
"BLK_SEL_0": "000",
"BLK_SEL_1": "000",
"READ_MODE0": "0",
"READ_MODE1": "0",
"RESET_MODE": "ASYNC",
"WRITE_MODE0": "00000000000000000000000000000010",
"WRITE_MODE1": "00000000000000000000000000000010"
},
"attributes": {
"module_not_derived": "00000000000000000000000000000001",
"src": "/usr/local/bin/../share/yosys/gowin/brams_map.v:282.4-303.3"
},
"port_directions": {
"ADA": "input",
"ADB": "input",
"BLKSELA": "input",
"BLKSELB": "input",
"CEA": "input",
"CEB": "input",
"CLKA": "input",
"CLKB": "input",
"DIA": "input",
"DIB": "input",
"DOA": "output",
"DOB": "output",
"OCEA": "input",
"OCEB": "input",
"RESETA": "input",
"RESETB": "input",
"WREA": "input",
"WREB": "input"
},
"connections": {
"ADA": [ 45, 45, 45, 804, 805, 806, 807, 45, 45, 45, 45, 45, 45, 45 ],
"ADB": [ 45, 45, 45, 885, 884, 883, 882, 45, 45, 45, 45, 45, 45, 45 ],
"BLKSELA": [ 45, 45, 45 ],
"BLKSELB": [ 45, 45, 45 ],
"CEA": [ 66 ],
"CEB": [ 66 ],
"CLKA": [ 880 ],
"CLKB": [ 30 ],
"DIA": [ 886, 887, 888, 889, 890, 891, 892, 893, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45 ],
"DIB": [ 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45 ],
"DOA": [ 894, 895, 896, 897, 898, 899, 900, 901, 902, 903, 904, 905, 906, 907, 908, 909, 910, 911 ],
"DOB": [ 761, 728, 694, 660, 425, 794, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921, 922, 923 ],
"OCEA": [ 45 ],
"OCEB": [ 45 ],
"RESETA": [ 45 ],
"RESETB": [ 196 ],
"WREA": [ 66 ],
"WREB": [ 45 ]
}
}, Considering that wires 45 are GND, and 66 are VCC, it is already clear why there is garbage on the screen (partially): the WREA and CEA signals are always VCC, which means writing is always going on, although I clearly indicated that writing needs to be done only at certain moments. And that’s not all that is not entirely correct with the parameters and ports. To summarize: if for a guru it’s immediately obvious what little things need to be corrected for proper synthesis, then for me it’s a dark forest. Dual Port is too complex for inferring to understand and I plan to start experimenting with ROM and gradually move towards more complex options. And it doesn’t help matters that BSRAM in Tangnano9k may behave differently than in the documentation. |
The bit order (reading from the "connection" entry) seems to be reversed in the inferred version. Could this be part of the problem? |
Thank you, at this stage any advice is useful to me. "BLKSELA": [ 39, 39, 39 ],
"BLKSELB": [ 39, 39, 39 ],
"CEA": [ 66 ],
"CEB": [ 66 ],
"CLKA": [ 843 ],
"CLKB": [ 30 ],
"DIA": [ 904, 905, 906, 907, 908, 909, 910, 911, 39, 39, 39, 39, 39, 39, 39, 39 ],
"DIB": [ 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39 ],
"DOB": [ 724, 691, 657, 623, 392, 757, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921 ],
"OCEA": [ 66 ],
"OCEB": [ 66 ], However, your observation made me reconsider my examples and I found a couple of errors. Thanks! |
I'm happy to hear that, and glad to contribute however I can. I'll keep looking to see if I observe any other differences. Is there any chance you could share your examples? |
This is just a simplified example from the apicula distribution: https://github.com/YosysHQ/apicula/tree/master/examples/himbaechel Here are two directories that are copies except for the DPB-video-ram.v file. The build goes like this: make -f Makefile.himbaechel bsram-DPB-tangnano9k.fs And yes, I know that I use an 8-bit primitive, and yosys makes 9-bit, but at the moment this is unimportant: as long as inferring ignores CE and WRE for writing, there will be garbage on the screen. |
Something interesting. I wonder what I wrote incorrectly in the top-level code that yosys decided that 4 bits for the address is enough? @Seyviour You have all the source codes, can you tell me where I managed to turn the address [10:0] into [3:0]? |
Thank you for sharing! Would a Yosys build from source (without the patch) work as well? |
As I understand it, since the array has 11 elements ([10:0]), only 4 bits are required to represent any index into the array (log2(11)). For a memory with a 11-bit address, the declaration would be something like: |
I told that I'm very bad at verilog :) And thanks to @Seyviour we have working memory!!! inferr-02.mp4It works like crap, but still! Here is a reference video from a manual primitive inferr-03.mp4 |
This is amazing, @yrabbit. Thank you very much! I'm currently trying to figure out SDRAM for the GW2A chips; do you have any tips on how I could go about getting that to work? |
Well, the happy ending is still far away - the picture should be the same as that of a hand-made primitive. As for SDRAM, sorry, I've never done this. Is this something external to the FPGA? Not primitive? |
... this is a unique trait of Gowin devices which have SDRAM chiplet connected to FPGA die within package (SIP). This SDRAM is essentially external to FPGA and not a primitive. However, it is also "loosely internal", because it is consuming FPGA pins which are therefore not externally visible. Those pins in Gowin proprietary tools don't have pin numbers. They have specific pin names which tool recognizes as "internal connections" to then, under the hood, translate to the actual FPGA pin number. RTL tapping into this in-the-package SDRAM has to use the exact pin names that Gowin tool recognizes. For back-and-forth RTL porting, opensource tools should adhere to the same naming convention. For more, see: https://github.com/nand2mario/nestang/blob/master/src/sdram.v |
For the preassigned SDRAM pin names, see: https://github.com/nand2mario/nestang/blob/master/src/nestang_top.sv
|
If we slightly adjust the JSON that is obtained at the output of the inferring version, namely, turn on reading mode 1 and enable the output of the output register built into the memory by applying VCC to the OCEB input, then we will get a perfect match of the pictures. work-infer.mp4 |
I noticed one more thing - the infer and primitive versions start differently - the primitive normally handles the initial memory filling, while the infer version has a black screen. |
Infer version start: start-infer.mp4Infer version with READ_MODE=1 start: start-infer-dff.mp4The lower video especially shows how the horizontal resolution improves dramatically when the version with READ_MODE=1 is loaded and launched into the FPGA. It seems that with READ_MODE=0 we somehow manage to use one byte from memory to output two adjacent points, while with READ_MODE=1 each point uses its own byte from memory. |
Thank you, @chili-chips for your comments on the SDRAM. |
Ah. I also noticed that the pixels were getting repeated along the horizontal but I thought it was the writes that were failing because the instantiated version used the clock enable of the BRAM, unlike the version inferred by Yosys. Evidently, I was wrong about that :) I'll look at the timing diagram from the Gowin manual to see if I find anything that might be useful w.r.t the read modes. @yrabbit, it seems we don't have an equivalent of the |
Well, don’t bother with that - here’s the readmemh version. It’s more interesting to understand what’s happening with the output signal :) |
That's right. We recommend diving into the depths of the final bit-file for this example: |
I was in a hurry. Apparently I'm still doing something wrong. This is how I initialize the memory: (* ram_style = "block" *) reg [7:0] mem[2**11-1:0];
always @(posedge clk or posedge reset) begin
if (reset) begin
read_data <= 0;
end else begin
read_data <= mem[read_ad];
end
end
always @(posedge write_clk) begin
if (write_ce) begin
mem[write_ad] <= write_data;
end
end
initial begin
$readmemh("img0.hex", mem);
end Here is the very beginning of the data file:
Here is the entire data file: And this is what happens in JSON after synthesis: "vmem.mem.0.0": {
"hide_name": 0,
"type": "SPX9",
"parameters": {
"BIT_WIDTH": "00000000000000000000000000001001",
"BLK_SEL": "000",
"INIT_RAM_00": "000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000110100000010100000000100",
"INIT_RAM_01": "000001001000001001000001001000001010000001010000001011000001100000001100000001101000001101000001110000001110000001110000001110000001111000001111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000", Of course, yosys chose a 9-bit primitive, but even the 9-bit initialization line looks strange: already from the third byte there was a discrepancy with the data file:
How can 34 come out of 54? I messed something up. |
Changed the initial memory initialization bytes to
Result on the address inputs of the primitive and display pins (! this is important) in the READ_MODE=1 option The Clock, the least significant three bits of the address, and all the green channel bits are shown. Why is it important that the data is taken from the display pins - between the memory outputs and the display pins there are several LUTs and MUXes (which, for now, we will assume that they do not introduce delay) and one DFF, which can have some effect. And I made a repository so as not to constantly pack these files: https://github.com/yrabbit/bram-infer |
Given the impact of the register (as dictated by the read_mode), perhaps we should consider that the design somehow doesn't meet timing when READ_MODE is 0. Maybe you can check how the instantiated version performs when it has READ_MODE=0 🤔 |
I'll try to rework the address generation so that it happens on the preceding posedge rather than a negedge to see if that helps. |
This is what I propose. Forgive the weird naming and formatting here :)
I'll create a PR on the repo you created |
I'm waiting:) |
Well, I play with it exclusively - I don’t touch the version with the primitive. |
Done :) |
Ohhh. I thought it would be a good idea to change the version with the primitive to READ_MODE=0 to see how it performs then. |
I must have got something wrong. I'll double check and send an update. |
I think I found my mistake:
|
I've submitted a new PR |
READ_MODE=0 PR-2.mp4 |
READ_MODE=1 PR-2-1.mp4 |
I recognize by the eyes when there are byte gaps and when there are not: if the eyes consist of three horizontal dots, then there is no gap :) And it looks like you removed the check that prohibited drawing the goblin outside the central square, now his ghosts are everywhere :) |
Oh my. Did I affect the color of the goblins too? :) I'll keep looking to see what I got wrong. |
@yrabbit. This might be worth trying: With READ_MODE=0, left shift or right shift the address by 1-bit to see what happens. Probably, things get worse, but there's a small chance we observe something interesting. |
Of course it’s the same: I change READ_MODE directly in JSON after synthesis, so routing and so on are exactly the same ;)
Yeah, I think that it is not processed anywhere, but is simply set as 0 once and for all. yosys/techlibs/gowin/brams_map.v Line 349 in 242ae4e
|
Yes, it may be something related to the input lines, today I will read how this primitive is encoded into bits. I couldn’t make a cardinal mistake - in this case, not a single primitive would work at all - but I could have missed some rare feature. |
Congratulations!!! Now no more half-eyed goblins :) And thanks for the shout-out on the PR 🙏 I think there might be one more thing to consider:
|
Fixed by #4137 |
Well, it doesn't work like that. I made a repository with an example for Gowin IDE (https://github.com/yrabbit/gw-bsram-infer). It uses SDPB with READ_MODE=0, the outputs of the primitive are connected directly to pins without intermediate pieces. Launch result: You can compile the project, but I think it’s enough to look at the intermediate verilog: https://github.com/yrabbit/gw-bsram-infer/blob/master/impl/gwsynthesis/bram-infer-gw.vg There you can see that the outs go straight to OBUFs. So I think we won't dig any further if their own IDE doesn't do what the documentation says 😉 |
Ah. I see. Their IDE should be the final source of truth. Thanks for looking into it!! 🚀 |
Well, yeah, images generated by the vendor’s IDE are the last resort, and not because they are true - there may be a lot of errors in IDE - but because no one will give us internal documentation 😄 |
Version
35
On which OS did this happen?
Linux
Reproduction Steps
YosysHQ/apicula#208 (comment)
Expected Behavior
It's been a while since Gowin has added B suffix to their BSRAM primitives, to differentiate them from S(hadow) SRAM.
nextpnr-gowin (project Apicula) has updated the naming. However, Yosys seems to still be using the original nomenclature, creating a disconnect.
We need BSRAM to be inferred as DPB, not DP.
Actual Behavior
BSRAM is inferred as DP instead of DPB.
The text was updated successfully, but these errors were encountered: