-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added nordshift attribute #3877
base: main
Are you sure you want to change the base?
Conversation
e68b0d7
to
5574fa8
Compare
I did a rebase / squash to simplify any later comparison with genrtlil.cc |
FIXME: tests/simple/partsel.v doesn't pass. It is currently not possible to use AST_SHIFTX to generate the exact RTLIL $shiftx previously generated in genrtlil.cc. genrtlil.cc uses an intermediate "fake_ast" to trick binop2rtlil into generating RTLIL with different width for the result (Y) and sign for the shift value (B). |
All tests now pass both with and without a forced
Also, for tests/techmap/shift2mux.ys to pass with |
Commenting in
|
c4f6855
to
a4b558f
Compare
@daglem I had hard time reviewing the code for the default case when the attribute is not in use, given that this PR touched on that too. I decided to rewrite that part to something that should be more obvious. Please see if it looks good to you. |
Not sure what the CI failure is about, the runners almost seem to be building from some other revision of the source since if you take e.g.
that's not what you find on line 2320 in |
Pushed a rebase to see if that helps with CI... |
It turns out the CI runs on the state of the tree after the PR is merged in, not on the head commit of the PR itself. That's of course desirable but caught me by surprise still. |
Either way the branch doesn't play well with |
frontends/ast/simplify.cc
Outdated
// Decode the index based on wire dimensions | ||
int idx_signed_nbits = shift_expr_width_hint + !shift_expr_sign_hint; | ||
if (!id2ast->range_swapped) { | ||
int raw_idx_nbits = 1 + std::max(idx_signed_nbits, ceil_log2(std::abs(wire_offset) + 1) + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the + 1
in ceil_log2(std::abs(wire_offset) + 1)
really needed here? It's not included in similar code below.
frontends/ast/simplify.cc
Outdated
new AstNode(AST_TO_SIGNED, | ||
new AstNode(AST_CAST_SIZE, node_int(raw_idx_nbits), shift_expr) | ||
), | ||
node_int(wire_offset)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you avoid the AST_SUB here if wire_offset == 0
?
This ends up in RTLIL even when it's not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@povik I tested to only do the the offset calculations if id2ast->range_swapped || wire_offset
, and that makes the test of unsigned offsets in partsel_test008
in tests/simple/partsel.v
fail 😱
I now suspect that the implementation of $shiftx
doesn't handle unsigned shifts correctly - would this be in techlibs/common/techmamp.v
?
I'm all for improving this code, which I translated more or less directly from genrtlil.cc
, however I'd like it to be completely understandable - having to sign an unsigned shift amount doesn't feel quite right 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@povik The plot thickens. Skipping the offset calculations as mentioned above, the following code in tests/simple/shiftx.v
passes make -f ../tools/autotest.mk shiftx.v
. However if you change the constant from 31'sd0
to 32'sd0
, it fails!
module shiftx_test (
input [1:0] din,
input [1:0] uoffset,
output [1:0] dout
);
assign dout = din[31'sd0 -uoffset +: 2];
endmodule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Holy cow! It seems like iverilog is actually the culprit here! 🤦♂️
Simplifying the test even further and running ../tools/autotest.sh -S 1 shiftx.v
yields the same result for shiftx.out/shiftx_out_syn1
(iverilog via Yosys techmap) for both 31'sd0
and 32'sd0
, while the results in shiftx.out/shiftx_out_ref
(directly processed by iverilog) are incompatible.
module shiftx_test (
input [1:0] din,
input uoffset,
output [1:0] dout
);
assign dout = din[31'sd0 - uoffset +: 2];
endmodule
../tools/cmp_tbdata shiftx_out_ref-32 shiftx_out_ref-31
Error in testbench output compare (line=10):
-#OUT# 011 x 1x 1000 1
+#OUT# 011 x xx 1000 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a (currently not passing) test I cooked up for iverilog, which incidentally only one of the commercial simulators on EDA Playground handles correctly 🙈
That simulator also warns that it is truncating MSBs if uoffset
is made wider than 32 bits, which seems reasonable.
/*
* partsel_outside
* Check that an unsigned integer offset in an indexed part-select is not converted to a signed integer.
* This would yield incorrect out of bound results for part-selects like arr[idx*8 - uoffset +: 4].
*/
module main;
reg [1:0] arr = 1;
int unsigned uoffset = '1;
wire [1:0] outside = arr[uoffset +: 2];
initial begin
#1 if (outside !== 'x) begin
$display("FAILED -- out of bounds value %b != xx", outside);
$finish;
end
$display("PASSED");
$finish;
end
endmodule // main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you avoid the AST_SUB here if wire_offset == 0?
This ends up in RTLIL even when it's not needed.
We can, but I am convinced to improve the code quality in the frontend, we shouldn't do any specialized optimizations that can be delegated to general machinery -- either to AST transformations (AST_SUB of zero can be removed, if the sizing effect is kept) or netlist passes (opt_expr
should be able to remove the no-op subtraction later on).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm all for improving this code, which I translated more or less directly from genrtlil.cc, however I'd like it to be completely understandable - having to sign an unsigned shift amount doesn't feel quite right 😅
The rationale was the following: We want the result of the subtraction to be signed, since that's the general case, and for that to happen per the Verilog rules, we need both operands signed. Since the signed cast is after resizing, there shouldn't be any hazard of wraparound. Later optimizations can optimize away the sign bit and any extra higher bits, if they are superfluous, so this shouldn't affect QoR negatively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duly noted, but there is a limit to what spinoff problems I can work on 😅
Currently at least the AST transformations only optimize AST_ADD
/AST_SUB
if both operands are constant. In the case of removing addition or subtraction of 0
, one would possibly want to optimize beforehand anyway, since it is not necessary to set aside an extra bit for overflow in this case. One could conceivably introduce new AST nodes for addition and subtraction with self-determined widths guaranteed not to overflow, but I digress.
In any case, testing the simplest possible AST here revealed a bug in iverilog, as far as I can tell. I believe the same kind of bug is actually present in Yosys as well. If I understand correctly, if the index expressions here are >= 32 bits, the additions and subtractions may cause the expressions to wrap around, causing actual vector bits instead of x
bits to be returned:
yosys/frontends/verilog/verilog_parser.y
Lines 955 to 965 in e9cd6ca
'[' expr TOK_POS_INDEXED expr ']' { | |
$$ = new AstNode(AST_RANGE); | |
AstNode *expr = new AstNode(AST_SELFSZ, $2); | |
$$->children.push_back(new AstNode(AST_SUB, new AstNode(AST_ADD, expr->clone(), $4), AstNode::mkconst_int(1, true))); | |
$$->children.push_back(new AstNode(AST_ADD, expr, AstNode::mkconst_int(0, true))); | |
} | | |
'[' expr TOK_NEG_INDEXED expr ']' { | |
$$ = new AstNode(AST_RANGE); | |
AstNode *expr = new AstNode(AST_SELFSZ, $2); | |
$$->children.push_back(new AstNode(AST_ADD, expr, AstNode::mkconst_int(0, true))); | |
$$->children.push_back(new AstNode(AST_SUB, new AstNode(AST_ADD, expr->clone(), AstNode::mkconst_int(1, true)), $4)); |
This may not be a big deal for Yosys, however it's certainly not ideal for a simulator to return anything but x
bits for out of bounds addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of removing addition or subtraction of 0, one would possibly want to optimize beforehand anyway, since it is not necessary to set aside an extra bit for overflow in this case.
I operate under the assumption that if this bit isn't necessary, it will get optimized away at the netlist stage anyway. If not, I fully agree we need to handle it.
In any case, testing the simplest possible AST here revealed a bug in iverilog, as far as I can tell.
Feel free to opt for any simple resolution of this PR, since the non-nordshift handling isn't the focus of it. If iverilog doesn't handle something correctly, we can disable that part of the test until it does. The fixed requirement is that the CI passes and that the tests are in a reasonable state, working around shortcomings of external tools should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I've just made some minor simplifications/corrections to your code, keeping the expression widening and addition/subtraction even when it's not strictly needed. This just so happens to pass all iverilog tests both before and after the iverilog PR.
Only the extra widening seems to be left after optimization (not running synthesis), but I guess that doesn't matter much - the widths are already excessive to start with in many cases, since a minimum of 32 bits are used for any offset calculation. In any case excessive widening seems to be pruned after synthesis 👍
I converted this to draft while I attempt to make sense of things in iverilog. |
I've made a pull request for iverilog - we'll see how that pans out: steveicarus/iverilog#1106 With those changes in place, the default I'll look into whether the |
@povik I did a final force push to correct my latest correction 😅 Without the latest commit, one could just as well have calculated Hopefully the |
Yeah, I was aware the constant node is always 32 bits wide, but still put in the code to calculate I will try to get back to this PR to give it a final read-through. |
39cf48a
to
e80dbfe
Compare
nordshift on a net or variable yields generation of muxes instead of shift circuits for dynamic rvalue indexing, akin to nowrshmsk for lvalue indexing. To facilitate this, the AST transformations for rvalue indexing are moved from genrtlil.cc to simplify.cc, bringing them in line with transformations for lvalue indexing.
… AST_SHIFTX AST_CAST_SIZE on the right operand caused an unsigned operand to be signed. This is corrected by handling the right operand like in AST_POW.
This also corrects the calculation of bit widths, using the new function min_bit_width.
The
nordshift
attribute on a packed array prohibits the generation of shift type circuits for reads from that array, akin tonowrshmsk
for lvalue indexing.To facilitate this, the AST transformations for rvalue indexing are moved from genrtlil.cc to simplify.cc, bringing them in line with transformations for lvalue indexing.