Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added nordshift attribute #3877

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Added nordshift attribute #3877

wants to merge 9 commits into from

Conversation

daglem
Copy link
Contributor

@daglem daglem commented Aug 5, 2023

The nordshift attribute on a packed array prohibits the generation of shift type circuits for reads from that array, akin to nowrshmsk for lvalue indexing.

To facilitate this, the AST transformations for rvalue indexing are moved from genrtlil.cc to simplify.cc, bringing them in line with transformations for lvalue indexing.

tests/techmap/shiftx2mux.ys Outdated Show resolved Hide resolved
@daglem daglem force-pushed the nordshift branch 2 times, most recently from e68b0d7 to 5574fa8 Compare August 7, 2023 12:20
@daglem daglem mentioned this pull request Aug 8, 2023
@phsauter phsauter mentioned this pull request Aug 8, 2023
3 tasks
@daglem
Copy link
Contributor Author

daglem commented Aug 12, 2023

I did a rebase / squash to simplify any later comparison with genrtlil.cc

@daglem
Copy link
Contributor Author

daglem commented Nov 13, 2023

FIXME: tests/simple/partsel.v doesn't pass.

It is currently not possible to use AST_SHIFTX to generate the exact RTLIL $shiftx previously generated in genrtlil.cc. genrtlil.cc uses an intermediate "fake_ast" to trick binop2rtlil into generating RTLIL with different width for the result (Y) and sign for the shift value (B).

@daglem daglem marked this pull request as draft November 13, 2023 13:12
@daglem
Copy link
Contributor Author

daglem commented Nov 27, 2023

All tests now pass both with and without a forced use_case_method = true, except for, as expected for (* nordshift *), a few tests which count cells:

  • tests/arch/*/mux.ys
  • tests/arch/xilinx/xilinx_srl.ys
  • tests/svtypes/struct_dynamic_range.ys
  • tests/various/peepopt.ys

Also, for tests/techmap/shift2mux.ys to pass with (* nordshift *), it would have to run proc in order to handle the introduced CASE process.

@daglem
Copy link
Contributor Author

daglem commented Nov 27, 2023

Commenting in reg_demux_noshift in the Makefile in shift_issue3.zip in #3875 and running the tests via make now results in:

   Chip area for module '\reg_demux_noshift': 18689.907600
   Chip area for module '\reg_demux_nowrshmsk_mdim_arr': 19213.891200
   Chip area for module '\reg_demux_nowrshmsk_typedef': 19115.271000
   Chip area for module '\reg_demux_pow2': 5513.961600
   Chip area for module '\reg_demux': 21857.661000

reg_demux_noshift yields further savings, and the other results are exactly the same as in #3875, i.e. no regressions.

@daglem daglem force-pushed the nordshift branch 4 times, most recently from c4f6855 to a4b558f Compare December 9, 2023 20:59
@daglem daglem marked this pull request as ready for review January 10, 2024 20:18
@povik povik self-requested a review January 15, 2024 15:14
@povik povik self-assigned this Jan 15, 2024
@povik
Copy link
Member

povik commented Feb 22, 2024

@daglem I had hard time reviewing the code for the default case when the attribute is not in use, given that this PR touched on that too. I decided to rewrite that part to something that should be more obvious. Please see if it looks good to you.

@povik
Copy link
Member

povik commented Feb 22, 2024

Not sure what the CI failure is about, the runners almost seem to be building from some other revision of the source since if you take e.g.

../frontends/ast/simplify.cc:2320:49: error: too many arguments to function call, expected 0, have 1
                AST::AstNode *member_node = get_struct_member(this);
                                            ~~~~~~~~~~~~~~~~~ ^~~~

that's not what you find on line 2320 in simplify.cc.

@povik
Copy link
Member

povik commented Feb 23, 2024

Pushed a rebase to see if that helps with CI...

@povik
Copy link
Member

povik commented Feb 23, 2024

It turns out the CI runs on the state of the tree after the PR is merged in, not on the head commit of the PR itself. That's of course desirable but caught me by surprise still.

@daglem
Copy link
Contributor Author

daglem commented Feb 26, 2024

@daglem I had hard time reviewing the code for the default case when the attribute is not in use, given that this PR touched on that too. I decided to rewrite that part to something that should be more obvious. Please see if it looks good to you.

Either way the branch doesn't play well with tests/arrays03.sv. The RTLIL for that test now doesn't seem to make sense - I'll try to investigate what's going wrong.

// Decode the index based on wire dimensions
int idx_signed_nbits = shift_expr_width_hint + !shift_expr_sign_hint;
if (!id2ast->range_swapped) {
int raw_idx_nbits = 1 + std::max(idx_signed_nbits, ceil_log2(std::abs(wire_offset) + 1) + 1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the + 1 in ceil_log2(std::abs(wire_offset) + 1) really needed here? It's not included in similar code below.

new AstNode(AST_TO_SIGNED,
new AstNode(AST_CAST_SIZE, node_int(raw_idx_nbits), shift_expr)
),
node_int(wire_offset));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you avoid the AST_SUB here if wire_offset == 0?
This ends up in RTLIL even when it's not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@povik I tested to only do the the offset calculations if id2ast->range_swapped || wire_offset, and that makes the test of unsigned offsets in partsel_test008 in tests/simple/partsel.v fail 😱
I now suspect that the implementation of $shiftx doesn't handle unsigned shifts correctly - would this be in techlibs/common/techmamp.v?
I'm all for improving this code, which I translated more or less directly from genrtlil.cc, however I'd like it to be completely understandable - having to sign an unsigned shift amount doesn't feel quite right 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@povik The plot thickens. Skipping the offset calculations as mentioned above, the following code in tests/simple/shiftx.v passes make -f ../tools/autotest.mk shiftx.v. However if you change the constant from 31'sd0 to 32'sd0, it fails!

module shiftx_test (
	input [1:0] din,
	input [1:0] uoffset,
	output [1:0] dout
);

assign dout = din[31'sd0 -uoffset +: 2];
endmodule

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holy cow! It seems like iverilog is actually the culprit here! 🤦‍♂️

Simplifying the test even further and running ../tools/autotest.sh -S 1 shiftx.v yields the same result for shiftx.out/shiftx_out_syn1 (iverilog via Yosys techmap) for both 31'sd0 and 32'sd0, while the results in shiftx.out/shiftx_out_ref (directly processed by iverilog) are incompatible.

module shiftx_test (
	input [1:0] din,
	input uoffset,
	output [1:0] dout
);
	assign dout = din[31'sd0 - uoffset +: 2];
endmodule
../tools/cmp_tbdata shiftx_out_ref-32 shiftx_out_ref-31
Error in testbench output compare (line=10):
-#OUT# 011 x 1x                 1000           1
+#OUT# 011 x xx                 1000           1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a (currently not passing) test I cooked up for iverilog, which incidentally only one of the commercial simulators on EDA Playground handles correctly 🙈
That simulator also warns that it is truncating MSBs if uoffset is made wider than 32 bits, which seems reasonable.

/*
 * partsel_outside
 * Check that an unsigned integer offset in an indexed part-select is not converted to a signed integer.
 * This would yield incorrect out of bound results for part-selects like arr[idx*8 - uoffset +: 4].
 */

module main;

   reg [1:0] arr = 1;
   int unsigned uoffset = '1;
   wire [1:0] outside = arr[uoffset +: 2];

   initial begin
      #1 if (outside !== 'x) begin
	 $display("FAILED -- out of bounds value %b != xx", outside);
	 $finish;
      end

      $display("PASSED");
      $finish;
   end

endmodule // main

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you avoid the AST_SUB here if wire_offset == 0?
This ends up in RTLIL even when it's not needed.

We can, but I am convinced to improve the code quality in the frontend, we shouldn't do any specialized optimizations that can be delegated to general machinery -- either to AST transformations (AST_SUB of zero can be removed, if the sizing effect is kept) or netlist passes (opt_expr should be able to remove the no-op subtraction later on).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all for improving this code, which I translated more or less directly from genrtlil.cc, however I'd like it to be completely understandable - having to sign an unsigned shift amount doesn't feel quite right 😅

The rationale was the following: We want the result of the subtraction to be signed, since that's the general case, and for that to happen per the Verilog rules, we need both operands signed. Since the signed cast is after resizing, there shouldn't be any hazard of wraparound. Later optimizations can optimize away the sign bit and any extra higher bits, if they are superfluous, so this shouldn't affect QoR negatively.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duly noted, but there is a limit to what spinoff problems I can work on 😅
Currently at least the AST transformations only optimize AST_ADD/AST_SUB if both operands are constant. In the case of removing addition or subtraction of 0, one would possibly want to optimize beforehand anyway, since it is not necessary to set aside an extra bit for overflow in this case. One could conceivably introduce new AST nodes for addition and subtraction with self-determined widths guaranteed not to overflow, but I digress.

In any case, testing the simplest possible AST here revealed a bug in iverilog, as far as I can tell. I believe the same kind of bug is actually present in Yosys as well. If I understand correctly, if the index expressions here are >= 32 bits, the additions and subtractions may cause the expressions to wrap around, causing actual vector bits instead of x bits to be returned:

'[' expr TOK_POS_INDEXED expr ']' {
$$ = new AstNode(AST_RANGE);
AstNode *expr = new AstNode(AST_SELFSZ, $2);
$$->children.push_back(new AstNode(AST_SUB, new AstNode(AST_ADD, expr->clone(), $4), AstNode::mkconst_int(1, true)));
$$->children.push_back(new AstNode(AST_ADD, expr, AstNode::mkconst_int(0, true)));
} |
'[' expr TOK_NEG_INDEXED expr ']' {
$$ = new AstNode(AST_RANGE);
AstNode *expr = new AstNode(AST_SELFSZ, $2);
$$->children.push_back(new AstNode(AST_ADD, expr, AstNode::mkconst_int(0, true)));
$$->children.push_back(new AstNode(AST_SUB, new AstNode(AST_ADD, expr->clone(), AstNode::mkconst_int(1, true)), $4));

This may not be a big deal for Yosys, however it's certainly not ideal for a simulator to return anything but x bits for out of bounds addresses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of removing addition or subtraction of 0, one would possibly want to optimize beforehand anyway, since it is not necessary to set aside an extra bit for overflow in this case.

I operate under the assumption that if this bit isn't necessary, it will get optimized away at the netlist stage anyway. If not, I fully agree we need to handle it.

In any case, testing the simplest possible AST here revealed a bug in iverilog, as far as I can tell.

Feel free to opt for any simple resolution of this PR, since the non-nordshift handling isn't the focus of it. If iverilog doesn't handle something correctly, we can disable that part of the test until it does. The fixed requirement is that the CI passes and that the tests are in a reasonable state, working around shortcomings of external tools should be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I've just made some minor simplifications/corrections to your code, keeping the expression widening and addition/subtraction even when it's not strictly needed. This just so happens to pass all iverilog tests both before and after the iverilog PR.

Only the extra widening seems to be left after optimization (not running synthesis), but I guess that doesn't matter much - the widths are already excessive to start with in many cases, since a minimum of 32 bits are used for any offset calculation. In any case excessive widening seems to be pruned after synthesis 👍

@daglem daglem marked this pull request as draft March 4, 2024 07:49
@daglem
Copy link
Contributor Author

daglem commented Mar 4, 2024

I converted this to draft while I attempt to make sense of things in iverilog.

@daglem
Copy link
Contributor Author

daglem commented Mar 5, 2024

I've made a pull request for iverilog - we'll see how that pans out: steveicarus/iverilog#1106

With those changes in place, the default AST_SHIFTX code can be simplified even further. I'll hold off commiting until the iverilog PR is (hopefully) merged.

I'll look into whether the AST_CASE code can also be simplified now - I suspect the iverilog PR would have saved me from a lot of struggles there 🙈

@daglem daglem marked this pull request as ready for review March 6, 2024 19:52
@daglem
Copy link
Contributor Author

daglem commented Mar 7, 2024

@povik I did a final force push to correct my latest correction 😅

Without the latest commit, one could just as well have calculated raw_idx_nbits as 1 + std::max(idx_signed_nbits, 32), since node_int creates a 32 bit wide constant.

Hopefully the AST_SHIFTX code now does exactly what you wanted.

@povik
Copy link
Member

povik commented Mar 7, 2024

Without the latest commit, one could just as well have calculated raw_idx_nbits as 1 + std::max(idx_signed_nbits, 32), since node_int creates a 32 bit wide constant.

Yeah, I was aware the constant node is always 32 bits wide, but still put in the code to calculate raw_idx_nbits as the least amount of bits for the arithmetic to never overflow. I saw there was no harm in it, and thought we could eventually replace node_int to avoid it overflowing on constants that are more than 32-bit wide.

I will try to get back to this PR to give it a final read-through.

@daglem daglem force-pushed the nordshift branch 3 times, most recently from 39cf48a to e80dbfe Compare March 7, 2024 17:46
daglem and others added 9 commits September 18, 2024 21:51
nordshift on a net or variable yields generation of muxes instead of shift
circuits for dynamic rvalue indexing, akin to nowrshmsk for lvalue indexing.

To facilitate this, the AST transformations for rvalue indexing are moved
from genrtlil.cc to simplify.cc, bringing them in line with transformations
for lvalue indexing.
… AST_SHIFTX

AST_CAST_SIZE on the right operand caused an unsigned operand to be signed.

This is corrected by handling the right operand like in AST_POW.
This also corrects the calculation of bit widths, using the new
function min_bit_width.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants