Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gowin BSRAM inference -- We need B suffix (such as DPB instead of current DP) #4098

Closed
chili-chips opened this issue Dec 31, 2023 · 52 comments
Closed
Labels
pending-verification This issue is pending verification and/or reproduction

Comments

@chili-chips
Copy link

Version

35

On which OS did this happen?

Linux

Reproduction Steps

YosysHQ/apicula#208 (comment)

Expected Behavior

It's been a while since Gowin has added B suffix to their BSRAM primitives, to differentiate them from S(hadow) SRAM.
nextpnr-gowin (project Apicula) has updated the naming. However, Yosys seems to still be using the original nomenclature, creating a disconnect.

We need BSRAM to be inferred as DPB, not DP.

Actual Behavior

BSRAM is inferred as DP instead of DPB.

@chili-chips chili-chips added the pending-verification This issue is pending verification and/or reproduction label Dec 31, 2023
@yrabbit
Copy link
Contributor

yrabbit commented Jan 9, 2024

A patch that purely mechanically corrects memory synthesis. Now there will be less swearing when creating images, but this should not make them work.
b.path.txt

A good image is with direct use of the DPB primitive.

prim-good.mp4

Not a very good image - when we infer.

infer-bad.mp4

And now a little of my stupid reasoning as a newbie.
When we use a primitive directly, we set its parameters (data width, address width, write mode, read mode, etc.), and also explicitly connect signals to the inputs of the primitive - that is, we decide what to connect to ClockEnable and what to WriteEnable.

    DPB mem(
        .DOB({dummy, read_data}),
        .DIA({{8{gnd}}, write_data}),
        .DIB({16{gnd}}),
        .ADA({write_ad, gnd, gnd, gnd}),
        .ADB({ read_ad, gnd, gnd, gnd}),
        .CLKA(write_clk),
        .CLKB(clk),
        .OCEA(vcc),
        .OCEB(vcc),
        .CEA(write_ce),
        .CEB(vcc),
        .WREA(vcc),
        .WREB(read_wre),
        .BLKSELA(3'b000),
        .BLKSELB(3'b000),
        .RESETA(write_reset),
        .RESETB(reset)
    );
    defparam mem.READ_MODE0 = 1'b1;
    defparam mem.READ_MODE1 = 1'b1;
    defparam mem.WRITE_MODE0 = 2'b01;
    defparam mem.WRITE_MODE1 = 2'b01;
    defparam mem.BIT_WIDTH_0 = 8;
    defparam mem.BIT_WIDTH_1 = 8;
    defparam mem.BLK_SEL_0 = 3'b000;
    defparam mem.BLK_SEL_1 = 3'b000;
    defparam mem.RESET_MODE = "SYNC";

In the case of inferring, yosys selects the primitive, parameters and used ports based on an analysis of memory usage. The Gowin documentation has a list of boilerplate pieces of code that lead to the generation of certain primitives. This is SUG550-1.6E_GowinSynthesis User Guide.pdf

Here's a typical snippet from there:
shot-0

Instructions on how to select a primitive during inferring are in the files: techlibs/gowin/brams.txt and techlibs/gowin/brams_map.v

You can see what yosys chose for each case in the logs (with the -g switch) and in the output .json file after synthesis.

This is how I tried to get yosys to generate DPB for the goblin faces above.

    (* ram_style = "block" *) reg [7:0] mem[10:0];

    always @(posedge clk or posedge reset) begin
        if (reset) begin
            read_data <= 0;
        end else begin
            read_data <= mem[read_ad];
        end
    end

    always @(posedge write_clk) begin
        if (write_ce) begin
            mem[write_ad] <= write_data;
        end
    end

That's what yosys did:

        "vmem.mem.0.0": {
          "hide_name": 0,
          "type": "DPX9B",
          "parameters": {
            "BIT_WIDTH_0": "00000000000000000000000000001001",
            "BIT_WIDTH_1": "00000000000000000000000000001001",
            "BLK_SEL_0": "000",
            "BLK_SEL_1": "000",
            "READ_MODE0": "0",
            "READ_MODE1": "0",
            "RESET_MODE": "ASYNC",
            "WRITE_MODE0": "00000000000000000000000000000010",
            "WRITE_MODE1": "00000000000000000000000000000010"
          },
          "attributes": {
            "module_not_derived": "00000000000000000000000000000001",
            "src": "/usr/local/bin/../share/yosys/gowin/brams_map.v:282.4-303.3"
          },
          "port_directions": {
            "ADA": "input",
            "ADB": "input",
            "BLKSELA": "input",
            "BLKSELB": "input",
            "CEA": "input",
            "CEB": "input",
            "CLKA": "input",
            "CLKB": "input",
            "DIA": "input",
            "DIB": "input",
            "DOA": "output",
            "DOB": "output",
            "OCEA": "input",
            "OCEB": "input",
            "RESETA": "input",
            "RESETB": "input",
            "WREA": "input",
            "WREB": "input"
          },
          "connections": {
            "ADA": [ 45, 45, 45, 804, 805, 806, 807, 45, 45, 45, 45, 45, 45, 45 ],
            "ADB": [ 45, 45, 45, 885, 884, 883, 882, 45, 45, 45, 45, 45, 45, 45 ],
            "BLKSELA": [ 45, 45, 45 ],
            "BLKSELB": [ 45, 45, 45 ],
            "CEA": [ 66 ],
            "CEB": [ 66 ],
            "CLKA": [ 880 ],
            "CLKB": [ 30 ],
            "DIA": [ 886, 887, 888, 889, 890, 891, 892, 893, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45 ],
            "DIB": [ 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45 ],
            "DOA": [ 894, 895, 896, 897, 898, 899, 900, 901, 902, 903, 904, 905, 906, 907, 908, 909, 910, 911 ],
            "DOB": [ 761, 728, 694, 660, 425, 794, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921, 922, 923 ],
            "OCEA": [ 45 ],
            "OCEB": [ 45 ],
            "RESETA": [ 45 ],
            "RESETB": [ 196 ],
            "WREA": [ 66 ],
            "WREB": [ 45 ]
          }
        },

Considering that wires 45 are GND, and 66 are VCC, it is already clear why there is garbage on the screen (partially): the WREA and CEA signals are always VCC, which means writing is always going on, although I clearly indicated that writing needs to be done only at certain moments.

And that’s not all that is not entirely correct with the parameters and ports.

To summarize: if for a guru it’s immediately obvious what little things need to be corrected for proper synthesis, then for me it’s a dark forest. Dual Port is too complex for inferring to understand and I plan to start experimenting with ROM and gradually move towards more complex options. And it doesn’t help matters that BSRAM in Tangnano9k may behave differently than in the documentation.

@Seyviour
Copy link

The bit order (reading from the "connection" entry) seems to be reversed in the inferred version. Could this be part of the problem?

@yrabbit
Copy link
Contributor

yrabbit commented Jan 13, 2024

Thank you, at this stage any advice is useful to me.
But I think that in JSON, after synthesis, the low-order bits of the buses are located on the left. Well, at least when using the primitive explicitly, I get:

            "BLKSELA": [ 39, 39, 39 ],
            "BLKSELB": [ 39, 39, 39 ],
            "CEA": [ 66 ],
            "CEB": [ 66 ],
            "CLKA": [ 843 ],
            "CLKB": [ 30 ],
            "DIA": [ 904, 905, 906, 907, 908, 909, 910, 911, 39, 39, 39, 39, 39, 39, 39, 39 ],
            "DIB": [ 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39 ],
            "DOB": [ 724, 691, 657, 623, 392, 757, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921 ],
            "OCEA": [ 66 ],
            "OCEB": [ 66 ],

However, your observation made me reconsider my examples and I found a couple of errors. Thanks!
:)

@Seyviour
Copy link

Seyviour commented Jan 14, 2024

I'm happy to hear that, and glad to contribute however I can. I'll keep looking to see if I observe any other differences. Is there any chance you could share your examples?

@yrabbit
Copy link
Contributor

yrabbit commented Jan 14, 2024

This is just a simplified example from the apicula distribution: https://github.com/YosysHQ/apicula/tree/master/examples/himbaechel

bsram.ZIP

Here are two directories that are copies except for the DPB-video-ram.v file.
If you applied the yosys patch (see previous posts) and have himbaechel nextpnr installed (see YosysHQ/apicula#220 (comment)) then everything should compile.

The build goes like this:

make -f Makefile.himbaechel bsram-DPB-tangnano9k.fs

And yes, I know that I use an 8-bit primitive, and yosys makes 9-bit, but at the moment this is unimportant: as long as inferring ignores CE and WRE for writing, there will be garbage on the screen.

@yrabbit
Copy link
Contributor

yrabbit commented Jan 15, 2024

Corrected version (I had to regenerate the JSON) now infer the WRE port as connected to the network. This is not CE, but it will also do.
inerr-1

The picture, however, is still garbage, so we think further about what’s going on here: did I screw it up in the top-level code or is it still not good to use a 9-bit primitive for 8-bit memory. Or maybe force yosys to use registers on the memory output first.

inferr-01.mp4

@yrabbit
Copy link
Contributor

yrabbit commented Jan 15, 2024

Something interesting. I wonder what I wrote incorrectly in the top-level code that yosys decided that 4 bits for the address is enough?
The upper JSON is inferring, the lower one is manual use of the DPB primitive.

@Seyviour You have all the source codes, can you tell me where I managed to turn the address [10:0] into [3:0]?

addr

@Seyviour
Copy link

This is just a simplified example from the apicula distribution: https://github.com/YosysHQ/apicula/tree/master/examples/himbaechel

bsram.ZIP

Here are two directories that are copies except for the DPB-video-ram.v file.
If you applied the yosys patch (see previous posts) and have himbaechel nextpnr installed (see YosysHQ/apicula#220 (comment)) then everything should compile.

The build goes like this:

make -f Makefile.himbaechel bsram-DPB-tangnano9k.fs

And yes, I know that I use an 8-bit primitive, and yosys makes 9-bit, but at the moment this is unimportant: as long as inferring ignores CE and WRE for writing, there will be garbage on the screen.

Thank you for sharing! Would a Yosys build from source (without the patch) work as well?

@Seyviour
Copy link

@yrabbit

As I understand it, since the array has 11 elements ([10:0]), only 4 bits are required to represent any index into the array (log2(11)).

For a memory with a 11-bit address, the declaration would be something like: reg [7:0] mem [2**11-1 :0]

@yrabbit
Copy link
Contributor

yrabbit commented Jan 15, 2024

Thank you for sharing! Would a Yosys build from source (without the patch) work as well?

No. but you can avoid compiling yosys if after installation you find where the brams.txt and brams_map.v files are located in your system and correct it right there.
installed-failes

@yrabbit
Copy link
Contributor

yrabbit commented Jan 15, 2024

@yrabbit

As I understand it, since the array has 11 elements ([10:0]), only 4 bits are required to represent any index into the array (log2(11)).

For a memory with a 11-bit address, the declaration would be something like: reg [7:0] mem [2**11-1 :0]

I told that I'm very bad at verilog :)

And thanks to @Seyviour we have working memory!!!

inferr-02.mp4

It works like crap, but still! Here is a reference video from a manual primitive

inferr-03.mp4

@Seyviour
Copy link

This is amazing, @yrabbit. Thank you very much!

I'm currently trying to figure out SDRAM for the GW2A chips; do you have any tips on how I could go about getting that to work?

@yrabbit
Copy link
Contributor

yrabbit commented Jan 15, 2024

Well, the happy ending is still far away - the picture should be the same as that of a hand-made primitive.

As for SDRAM, sorry, I've never done this. Is this something external to the FPGA? Not primitive?

@chili-chips-ba
Copy link

... this is a unique trait of Gowin devices which have SDRAM chiplet connected to FPGA die within package (SIP).

This SDRAM is essentially external to FPGA and not a primitive. However, it is also "loosely internal", because it is consuming FPGA pins which are therefore not externally visible.

Those pins in Gowin proprietary tools don't have pin numbers. They have specific pin names which tool recognizes as "internal connections" to then, under the hood, translate to the actual FPGA pin number. RTL tapping into this in-the-package SDRAM has to use the exact pin names that Gowin tool recognizes. For back-and-forth RTL porting, opensource tools should adhere to the same naming convention.

For more, see: https://github.com/nand2mario/nestang/blob/master/src/sdram.v

@chili-chips-ba
Copy link

For the preassigned SDRAM pin names, see: https://github.com/nand2mario/nestang/blob/master/src/nestang_top.sv

// SDRAM
// For Primer 25K: https://github.com/MiSTer-devel/Hardware_MiSTer/blob/master/releases/sdram_xsds_3.0.pdf
// For Nano 20K: 8MB 32-bit SDRAM
output O_sdram_clk,
output O_sdram_cke,
output O_sdram_cs_n, // chip select
output O_sdram_cas_n, // columns address select
output O_sdram_ras_n, // row address select
output O_sdram_wen_n, // write enable
inout [SDRAM_DATA_WIDTH-1:0] IO_sdram_dq, // bidirectional data bus
output [SDRAM_ROW_WIDTH-1:0] O_sdram_addr, // multiplexed address bus
output [1:0] O_sdram_ba, // two banks
output [SDRAM_DATA_WIDTH/8-1:0] O_sdram_dqm,

@yrabbit
Copy link
Contributor

yrabbit commented Jan 16, 2024

If we slightly adjust the JSON that is obtained at the output of the inferring version, namely, turn on reading mode 1 and enable the output of the output register built into the memory by applying VCC to the OCEB input, then we will get a perfect match of the pictures.
Which, naturally, suggests an idea, I don’t know what yet.
Something in the code generated by yosys stretches the pixels by half horizontally. It is logical to assume that it is the address, but no - this would not be cured by turning on the register at the output. Hmm... unclear...

shot-2

work-infer.mp4

@yrabbit
Copy link
Contributor

yrabbit commented Jan 16, 2024

I noticed one more thing - the infer and primitive versions start differently - the primitive normally handles the initial memory filling, while the infer version has a black screen.
Let's deal with this first.

@yrabbit
Copy link
Contributor

yrabbit commented Jan 16, 2024

Infer version start:

start-infer.mp4

Infer version with READ_MODE=1 start:

start-infer-dff.mp4

The lower video especially shows how the horizontal resolution improves dramatically when the version with READ_MODE=1 is loaded and launched into the FPGA.

It seems that with READ_MODE=0 we somehow manage to use one byte from memory to output two adjacent points, while with READ_MODE=1 each point uses its own byte from memory.
But this is not a property of this parameter - it just shifts the byte output by one clock cycle. No ideas yet :(

@Seyviour
Copy link

Thank you, @chili-chips for your comments on the SDRAM.
My apologies to you, @yrabbit for taking this long to respond; @chili-chips comments pretty much sum it up. I'd like to find the location and (hidden) pins that map to the SDRAM and figure out routing to those pins. It's probably more something to be done in Nextpnr rather than in Yosys.

@Seyviour
Copy link

Seyviour commented Jan 16, 2024

Ah. I also noticed that the pixels were getting repeated along the horizontal but I thought it was the writes that were failing because the instantiated version used the clock enable of the BRAM, unlike the version inferred by Yosys. Evidently, I was wrong about that :) I'll look at the timing diagram from the Gowin manual to see if I find anything that might be useful w.r.t the read modes.

@yrabbit, it seems we don't have an equivalent of the include "img-video-ram.vh" statement that initializes the ram for the primitive version. I'll try to initialize via readmemh and report what I find.

@yrabbit
Copy link
Contributor

yrabbit commented Jan 16, 2024

@yrabbit, it seems we don't have an equivalent of the include "img-video-ram.vh" statement that initializes the ram for the primitive version. I'll try to initialize via readmemh and report what I find.

Well, don’t bother with that - here’s the readmemh version. It’s more interesting to understand what’s happening with the output signal :)
bsram-infer.ZIP

@chili-chips-ba
Copy link

I'd like to find the location and (hidden) pins that map to the SDRAM and figure out routing to those pins. It's probably more something to be done in Nextpnr rather than in Yosys.

That's right. We recommend diving into the depths of the final bit-file for this example:
https://wiki.sipeed.com/hardware/en/tang/tang-nano-20k/example/unbox.html

@yrabbit
Copy link
Contributor

yrabbit commented Jan 17, 2024

I was in a hurry. Apparently I'm still doing something wrong. This is how I initialize the memory:

    (* ram_style = "block" *) reg [7:0] mem[2**11-1:0];
    always @(posedge clk or posedge reset) begin
        if (reset) begin
            read_data <= 0;
        end else begin
            read_data <= mem[read_ad];
        end
    end

    always @(posedge write_clk) begin
        if (write_ce) begin
            mem[write_ad] <= write_data;
        end
    end
    initial begin
        $readmemh("img0.hex", mem);
    end

Here is the very beginning of the data file:

04
14
54
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00

Here is the entire data file:
img0.hex.txt

And this is what happens in JSON after synthesis:

        "vmem.mem.0.0": {
          "hide_name": 0,
          "type": "SPX9",
          "parameters": {
            "BIT_WIDTH": "00000000000000000000000000001001",
            "BLK_SEL": "000",
            "INIT_RAM_00": "000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000110100000010100000000100",
            "INIT_RAM_01": "000001001000001001000001001000001010000001010000001011000001100000001100000001101000001101000001110000001110000001110000001110000001111000001111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000",

Of course, yosys chose a 9-bit primitive, but even the 9-bit initialization line looks strange: already from the third byte there was a discrepancy with the data file:

...000110100000010100000000100 -> 0 0011 0100 0 0001 0100 0 0000 0100 = 34 14 04

How can 34 come out of 54? I messed something up.

@yrabbit
Copy link
Contributor

yrabbit commented Jan 17, 2024

Changed the initial memory initialization bytes to

00
04
08
0C
10
14
18
1C
20
24
28
2C
30
34
38
3C
00
00
00
00
00

Result on the address inputs of the primitive and display pins (! this is important) in the READ_MODE=1 option
read-mode-1

The Clock, the least significant three bits of the address, and all the green channel bits are shown.
An enlarged piece, where you can see after how many clock cycles and on which edge the data is changed after the address change:

read-mode-1-0

Why is it important that the data is taken from the display pins - between the memory outputs and the display pins there are several LUTs and MUXes (which, for now, we will assume that they do not introduce delay) and one DFF, which can have some effect.

And I made a repository so as not to constantly pack these files: https://github.com/yrabbit/bram-infer

@yrabbit
Copy link
Contributor

yrabbit commented Jan 17, 2024

And here’s what happens with READ_MODE_0, again on the display pins:
read-mode-0
Some thought is floating in the air, something opaque... maybe it’s all about this DFF that we have in front of the display? What if we get rid of it and just say that we already have DFF inside the primitive? I do not know yet...

@Seyviour
Copy link

Given the impact of the register (as dictated by the read_mode), perhaps we should consider that the design somehow doesn't meet timing when READ_MODE is 0.

Maybe you can check how the instantiated version performs when it has READ_MODE=0 🤔

@Seyviour
Copy link

I'll try to rework the address generation so that it happens on the preceding posedge rather than a negedge to see if that helps.

@Seyviour
Copy link

This is what I propose. Forgive the weird naming and formatting here :)

        reg 		[15:0] __pixel_count;
	reg	        [15:0] __line_count;
	reg 		[7:0] __vmem_start_col;
	reg 		[7:0] __vmem_start_row; 
	
	always @(*) begin
		if (pixel_count == PixelForHS) begin
                        __pixel_count      =  16'b0;
                       __line_count       =  line_count + 1'b1;
                end
                else if (line_count == LineForVS) begin
                       __line_count       =  16'b0;
                       __pixel_count      =  16'b0;
                end
                else begin
                       __pixel_count      =  pixel_count + 1'b1;
                end

		      __vmem_start_col = __pixel_count - `START_X;
		      __vmem_start_row = __line_count - `START_Y;
	end

	reg           [7:0] vmem_start_col;
	reg           [7:0] vmem_start_row;

	always @(posedge pixel_clk or negedge rst)begin
        if (!rst) begin
                         line_count           <=  16'b0;    
                        pixel_count           <=  16'b0;
			vmem_start_col	 <=  0 - `START_X;
			vmem_start_row	 <=  0 - `START_Y; 
            end
        else begin
			pixel_count 	         <= __pixel_count;
			line_count 		 <= __line_count;
			vmem_start_col	 <= __vmem_start_col;
			vmem_start_row	 <= __vmem_start_col; 
		end
	end

I'll create a PR on the repo you created

@yrabbit
Copy link
Contributor

yrabbit commented Jan 17, 2024

I'll create a PR on the repo you created

I'm waiting:)

@yrabbit
Copy link
Contributor

yrabbit commented Jan 17, 2024

Maybe you can check how the instantiated version performs when it has READ_MODE=0 🤔

Well, I play with it exclusively - I don’t touch the version with the primitive.
The last pictures with both READ_MODE=1 and READ_MODE=0 refer to it.

@Seyviour
Copy link

I'll create a PR on the repo you created

I'm waiting:)

Done :)

@Seyviour
Copy link

Maybe you can check how the instantiated version performs when it has READ_MODE=0 🤔

Well, I play with it exclusively - I don’t touch the version with the primitive. The last pictures with both READ_MODE=1 and READ_MODE=0 refer to it.

Ohhh. I thought it would be a good idea to change the version with the primitive to READ_MODE=0 to see how it performs then.

@yrabbit
Copy link
Contributor

yrabbit commented Jan 17, 2024

I'll create a PR on the repo you created

I'm waiting:)

Done :)
Uploading PR.jpeg…

mmm... this is not good - I can't orient the logic analyzer because there are so many overlays on the screen. and the goblin disappeared :(

I'll try with READ_MODE=1

@Seyviour
Copy link

I must have got something wrong. I'll double check and send an update.

@yrabbit
Copy link
Contributor

yrabbit commented Jan 17, 2024

nope, still
PR-1

@Seyviour
Copy link

I think I found my mistake:

vmem_start_row <= __vmem_start_col;

@Seyviour
Copy link

I've submitted a new PR

@yrabbit
Copy link
Contributor

yrabbit commented Jan 17, 2024

READ_MODE=0

PR-2.mp4

@yrabbit
Copy link
Contributor

yrabbit commented Jan 17, 2024

READ_MODE=1

PR-2-1.mp4

@yrabbit
Copy link
Contributor

yrabbit commented Jan 17, 2024

I recognize by the eyes when there are byte gaps and when there are not: if the eyes consist of three horizontal dots, then there is no gap :)

And it looks like you removed the check that prohibited drawing the goblin outside the central square, now his ghosts are everywhere :)

@Seyviour
Copy link

Oh my. Did I affect the color of the goblins too? :)

I'll keep looking to see what I got wrong.

@yrabbit
Copy link
Contributor

yrabbit commented Jan 17, 2024

So, I can say with absolute confidence that the green color is connected directly to the memory output, there are no DFFs, LUTs or MUXs between the memory and the pin.
G-wire

In light of this we have:
Let me remind you that the values 0h, 4h, 8h, Ch, 10h, 14h, 18h and so on are written from address 0.
The option with READ_MODE_1 is a good picture, bytes are not thrown out, edges are shown when address 1 is set and when the number 4h stored there is output:
856-read-mode-1

And now the option with READ_MODE=0
Let me remind you that I only change the line with the READ_MODE parameter in JSON, so no routes or anything else changes. The addresses are set identically along the same edges, maintained for the same number of clock cycles, but the output is who knows what.
Still a mystery.
858-read-mode-0

@Seyviour
Copy link

So, I can say with absolute confidence that the green color is connected directly to the memory output, there are no DFFs, LUTs or MUXs between the memory and the pin. G-wire

In light of this we have: Let me remind you that the values 0h, 4h, 8h, Ch, 10h, 14h, 18h and so on are written from address 0. The option with READ_MODE_1 is a good picture, bytes are not thrown out, edges are shown when address 1 is set and when the number 4h stored there is output: 856-read-mode-1

And now the option with READ_MODE=0 Let me remind you that I only change the line with the READ_MODE parameter in JSON, so no routes or anything else changes. The addresses are set identically along the same edges, maintained for the same number of clock cycles, but the output is who knows what. Still a mystery. 858-read-mode-0

Is this connection the same for READ_MODE=0 and READ_MODE=1? I guess that might make sense since/if the output register is internal to the RAM?

Also, I'm trying to find any code that references the READ_MODE and I did not find anything particularly useful across yosys, apicula, and nextpnr. From brams.txt, I can see WRITE_MODE, but no READ_MODE.

Do you have any pointers on where I might find the code that decides READ_MODE or handles it? Maybe the answer is not in yosys 😫

@Seyviour
Copy link

@yrabbit. This might be worth trying: With READ_MODE=0, left shift or right shift the address by 1-bit to see what happens. Probably, things get worse, but there's a small chance we observe something interesting.

@yrabbit
Copy link
Contributor

yrabbit commented Jan 18, 2024

Is this connection the same for READ_MODE=0 and READ_MODE=1? I guess that might make sense since/if the output register is internal to the RAM?

Of course it’s the same: I change READ_MODE directly in JSON after synthesis, so routing and so on are exactly the same ;)

Also, I'm trying to find any code that references the READ_MODE and I did not find anything particularly useful across yosys, apicula, and nextpnr. From brams.txt, I can see WRITE_MODE, but no READ_MODE.

Do you have any pointers on where I might find the code that decides READ_MODE or handles it? Maybe the answer is not in yosys 😫

Yeah, I think that it is not processed anywhere, but is simply set as 0 once and for all.
And this is just funny - if mode 1 had been set there, then we would not have noticed this problem in real life - everything would have worked (well, with a delay of one clock, but it would have looked correct) 😉

.READ_MODE(1'b0),

@yrabbit
Copy link
Contributor

yrabbit commented Jan 18, 2024

@yrabbit. This might be worth trying: With READ_MODE=0, left shift or right shift the address by 1-bit to see what happens. Probably, things get worse, but there's a small chance we observe something interesting.

Yes, it may be something related to the input lines, today I will read how this primitive is encoded into bits. I couldn’t make a cardinal mistake - in this case, not a single primitive would work at all - but I could have missed some rare feature.

@yrabbit
Copy link
Contributor

yrabbit commented Jan 19, 2024

I found it 😄 When READ_MODE=0, the OCEx inputs must be set to 1.
Now the data is spit out on the next clock cycle after the address is set and no bytes are skipped.
Time to brush up on PR.

1509-read-mode-0

@Seyviour
Copy link

Congratulations!!! Now no more half-eyed goblins :) And thanks for the shout-out on the PR 🙏

I think there might be one more thing to consider:

  • As I understand things, the idea for the 'bypass' mode is that the read operation should be combinational, in the sense that the output should be available before the next rising edge, and the timing diagram from the manual seems to support this. How do we test that this is the case?

@mmicko
Copy link
Member

mmicko commented Jan 19, 2024

Fixed by #4137

@mmicko mmicko closed this as completed Jan 19, 2024
@yrabbit
Copy link
Contributor

yrabbit commented Jan 19, 2024

I think there might be one more thing to consider:

  • As I understand things, the idea for the 'bypass' mode is that the read operation should be combinational, in the sense that the output should be available before the next rising edge, and the timing diagram from the manual seems to support this. How do we test that this is the case?

Well, it doesn't work like that. I made a repository with an example for Gowin IDE (https://github.com/yrabbit/gw-bsram-infer). It uses SDPB with READ_MODE=0, the outputs of the primitive are connected directly to pins without intermediate pieces. Launch result:
gw

You can compile the project, but I think it’s enough to look at the intermediate verilog: https://github.com/yrabbit/gw-bsram-infer/blob/master/impl/gwsynthesis/bram-infer-gw.vg

There you can see that the outs go straight to OBUFs.

So I think we won't dig any further if their own IDE doesn't do what the documentation says 😉

@Seyviour
Copy link

Ah. I see. Their IDE should be the final source of truth. Thanks for looking into it!! 🚀

@yrabbit
Copy link
Contributor

yrabbit commented Jan 19, 2024

Well, yeah, images generated by the vendor’s IDE are the last resort, and not because they are true - there may be a lot of errors in IDE - but because no one will give us internal documentation 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending-verification This issue is pending verification and/or reproduction
Projects
None yet
Development

No branches or pull requests

5 participants