Redo internal cell memory layout #4461

widlarizer · 2024-06-17T15:27:44Z

Brief

What are the reasons/motivation for this change?
small is fast!
Explain how this is achieved.
make cell small.
If applicable, please suggest to reviewers how they can test the change.
right now, nothing works

Full

Core idea

Yosys is conceptually flexible in what a cell can be. This makes it easy to do odd custom things that match use cases. The way this is achieved is inefficient for commonly represented types. Each cell has a dict from interned parameter name strings to parameter value constants. It also has a dict from port name strings to signals represented by SigSpecs. However, most cells that Yosys uses in common flows have a fixed set of ports and parameters. So instead of dynamically growing space for a dict and hashing to access the value stored in it, we can have preallocated space and a compile-time known mapping if we just use a struct specific to a given cell type:

// $not etc
struct RTLIL::Unary {
	SigSpec a;
	SigSpec y;
	Const a_width;
	Const y_width;
	Const is_signed;
};

We can then use a tagged union to construct a new, slimmer, faster to access cell type, with a fallback pointer to the old cell structure.

struct RTLIL::Cell
{
	RTLIL::IdString type;
	RTLIL::IdString name;
	RTLIL::Module* module;
	union {
		RTLIL::Unary not_;
		RTLIL::Unary pos;
		RTLIL::Unary neg;
		RTLIL::OldCell* flexible;
	};
};

Wait, wasn't this supposed to be a Yosys rewrite in Rust?

This isn't next generation Yosys (project nyan). Rust or Zig would be better to do what I'm doing, but Yosys is a C++ codebase, so C++ it is. This is a surgical, gradual transition to more efficient internals with minimal impact on stuff outside kernel/rtlil.h and kernel/rtlil.cc. The Yosys codebase (~200k LOC) and user plugins have a large number of code like cell->getParam(ID::A_WIDTH), we have to reproduce these interfaces rather than change them all to cell->not_.a_width. There's a good number of such functions. The annoying part is mocking for example the parameters dictionary. Instead, there's struct FakeParams which provides iterators to hide this. Same goes for connections. Implementations of getParam etc are simple switches over the cell type so they should inline well.

However, there's a lot of "internal" cells we may want to be fast, and their struct layouts and methods are basically duplicating the same information. Manually writing these methods is also very error prone, accessing tagged unions incorrectly is common UB. This is why, to add support for the various cell types, the next stage will get a bit weird:

Writing a compiler

Modern C++ offers multiple ways to have code that thinks about types. However, what we need here is to construct functions and construct their return types based on struct layouts. This level of reflection if possible seems too difficult to achieve with templates and constexpr and such. We don't have Rust's proc_macro, we don't have Zig's comptime, so one possible solution is a light Python utility which from information-dense Python or plain text structures emits generates C++ which is committed into the repository and checked with CI. Yosys already depends on Python in a similar way, see passes/pmgen/pmgen.py.

Beyond the cell

It's possible that the changes outlined above alone won't make a huge difference. RTLIL::Const is fairly inefficient as well. Size is virtuous in a positive feedback loop: the fewer bytes per cell, the more it helps to save a single byte per cell.

Size goals

The current flexible RTLIL::Cell is 192 bytes. The proposed one is a bit bigger while it supports only unary elements, but doesn't require external allocations. With the current flexible cell, this simple verilog eats 6kB memory per inverter-and-FF-bit:

module invert_n_times #(parameter N = 10000) (
    input wire in,
    output wire out
);
    wire [N:0] intermediate;

    assign intermediate[0] = in;

    genvar i;
    generate
        for (i = 0; i < N; i = i + 1) begin : invert_loop
            assign intermediate[i+1] = ~intermediate[i];
        end
    endgenerate

    assign out = intermediate[N];
endmodule

This suggests that focused effort can bring it down by a significant factor.

widlarizer · 2024-06-17T15:28:53Z

This was meant to be a draft PR and I hit the wrong button writing the description. Heck

povik · 2024-06-17T17:25:48Z

fundamental issues with the Yosys IR

What would be the fundamental issues provided we switch to the buffer-normalized mode of #3967 (which this PR should be compatible with)?

don't come with benchmarks

I pushed for @widlarizer to open a draft PR no matter what state this is in. Benchmarks can come later.

povik · 2024-06-17T17:31:32Z

so one possible solution is a light Python utility which from information-dense Python or plain text structures emits generates C++ which is committed into the repository and checked with CI. Yosys already depends on Python in a similar way, see passes/pmgen/pmgen.py.

FWIW pmgen-generated C++ code is not checked into the repository. It's treated as a build artifact.

whitequark · 2024-06-17T18:12:32Z

What would be the fundamental issues provided we switch to the buffer-normalized mode of #3967 (which this PR should be compatible with)?

That's an improvement, although other issues remain (like the unnecessary coarse/fine cell split, or the combined treatment of logic and inout wires in the same netlist).

widlarizer · 2024-06-17T19:59:58Z

Benchmarks are coming. The work presented here right now is exclusively an incomplete proof of concept. Incomplete as in it immediately segfaults on anything in the current state. I want to find a case where some cells being fast tracked really shows improvements. If we find better fundamental approach, all this is free to get canned, in such a case it still will have been a helpful exercise in my eyes

widlarizer · 2024-06-17T20:07:41Z

the unnecessary coarse/fine cell split

Can you elaborate on the adverse effects of this? From a performance point of view it doesn't seem too concerning, but I may be missing something.

combined treatment of logic and inout wires in the same netlist

I wanted to redesign the netlist first, but that was before I learned about bufnorm, after which it was suggested to me by povik that redoing how connections are handled would require excessive code changes and that bufnorm + smaller internal cells takes priority as a result.

whitequark · 2024-06-17T20:13:27Z

From a performance point of view it doesn't seem too concerning, but I may be missing something.

I don't think this affects performance but in general I don't care a whole lot about Yosys performance; as you may know I'm quite happy using the YoWASP version despite a ~2x perf hit, so clearly that much performance loss isn't an issue to me. But the IR is poorly defined and difficult to work with (I think the single most obnoxious issue from my perspective is the batshit behavior of \init, which is directly downstream from the coarse/fine split and some other bad decisions) and I'd much rather see that addressed than any performance improvements.

povik · 2024-06-17T20:16:26Z

I think the single most obnoxious issue from my perspective is the batshit behavior of \init, which is directly downstream from the coarse/fine split and some other bad decisions

I don't follow. How is \init related to the coarse/fine split, if that's what you are saying?

whitequark · 2024-06-17T20:20:37Z

I don't follow. How is \init related to the coarse/fine split, if that's what you are saying?

\init is only an attribute and not a parameter because of fine cells (which would otherwise have to triple in quantity), which in turn only lack parameters because of BLIF export. Nobody cares how many fine cells Yosys has, there's hundreds already, just do the 0, 1, X variant for them and stop imposing the cost of \init handling on every other pass.

widlarizer · 2024-06-17T20:59:25Z

YoWASP

I would be remiss if I didn't bring up that the road to performance I'm talking about is through major runtime memory size reduction which should matter for use cases of EDA in browsers.

IR is poorly defined and difficult to work with

As I understand, there are many open questions on design decisions in Yosys, as viewed from the perspective of correctness and developer experience and maintainability, but I'm not aware of any interaction this PR would have on those topics, other than this being an opportunity to have correct-by-construction, rather than correct-by-cell-checker cells.

povik · 2024-06-17T21:26:48Z

Nobody cares how many fine cells Yosys has, there's hundreds already, just do the 0, 1, X variant for them and stop imposing the cost of \init handling on every other pass.

Huh, so e.g. the state writeback in sim would involve changing the cell types on fine flip-flops, and adjusting a parameter on coarse flip-flops. In exchange we make init disappear. I think I like the change.

I'd much rather see that addressed than any performance improvements.

We should mention a bit about the background for the PR. It's motivated by company interests which allow us to invest significant effort into performance improvements. If we can fix some outstanding ills of the IR along the way, that's all the better, but it's not the case that we can pick any IR work instead. I guess we shouldn't be doubling down on a wrong design decision by writing specialized memory layouts around it, but that doesn't seem to be happening, at least not with the issues brought up so far.

whitequark · 2024-06-17T21:33:00Z

I would be remiss if I didn't bring up that the road to performance I'm talking about is through major runtime memory size reduction which should matter for use cases of EDA in browsers.

Interestingly, YoWASP already greatly benefits from such size reduction compared to native builds because it uses an ABI with 32-bit pointers. This is for example why nextpnr is ~on par with a native build, despite other inefficiencies.

I do welcome performance improvements (who doesn't?) but I feel that there are many other things to address in Yosys that I'm a lot more excited about, since the IR structure is a major pain point and performance rarely is, at least from where I stand.

other than this being an opportunity to have correct-by-construction, rather than correct-by-cell-checker cells.

I'm uncertain on whether this design would further entrench the existing IR flaws or not.

I think ultimately I'll choose to withhold any judgement I would have before I see benchmarks. That seems only right if this was intended to be a draft (which I didn't realize at first); I don't want to be negative on something that is clearly not the final work.

Huh, so e.g. the state writeback in sim would involve changing the cell types on fine flip-flops, and adjusting a parameter on coarse flip-flops.

Yes. The problem with \init is that it's antithetical to correctness-by-construction because it's just ignored on wires with no flops, and conflicts resolve kind of randomly? We've had a lot of difficulty with emitting it in Amaranth correctly as well, but that's not the main motivation.

whitequark · 2024-06-17T21:34:35Z

It's motivated by company interests which allow us to invest significant effort into performance improvements.

So on second thought and having read this, I think I'll just leave this PR to be driven by the company interests. Looks like it pinged me because of the CXXRTL code owner bit, I'll pre-emptively sign off on any changes there since they're quite minimal.

widlarizer · 2024-06-18T10:19:32Z

This PR was initiated by my personal interest which I found to be well aligned with company interests of making specifically opt_expr faster

widlarizer · 2024-06-20T11:55:33Z

Major road block identified: Yosys until now has been built with the assumption that at will one can modify cell types and erase and add params and ports. At coverage tag "opt.opt_expr.eqneq.isnot" that's exactly what's done. With my changes, it is possible to attempt it but it's union access UB. As a solution, the type field could become private, which will create large code change where read-only type access is replaced with a const getter (from cell->type to cell->type()). This will bring large code changes. Or we can just fix the places where types are modified (there seem to be 79 locations where type is set outside of addCell) and temporarily keep the UB hazard for new code

…ells

widlarizer added 25 commits May 28, 2024 23:51

tracy: init, zones per pass execute method, pointer-colored

0fb3f3a

tracy: frame pointer, track memory allocations

60bf00e

fix stuff, don't strip

b85062f

oldcell

277c10e

delete conn iter attempt

e243968

iterators

8403eee

add functions to new cell

2213b5d

add functions to new cell

930a9f6

wip

c0a51c8

wip

d7251df

iterator hell

33910bc

iterator hell

1c2fb07

references - breaking

919e210

dead end, backtracking

33987d9

clean, not backtracking

2d6c454

consty stuff

36289ab

consty stuff

8bdcc69

look at all those chickens

61cf4b6

silly compat file

cc10ef7

erase, clear, fork replace, idk

193a43e

conns and params from dict, oldcell no longer attrobject

866b7a7

tiny fix

4c9f682

73%

eeb15ea

100%

65d50db

placement new, fix empty probably

fbdfff1

widlarizer requested review from eddiehung, Ravenslofty, whitequark and nakengelhardt as code owners June 17, 2024 15:27

experimenting with test_cell

43d8c7f

widlarizer added 3 commits June 18, 2024 16:16

fix swapped conns, params in interators

b190055

cells can now be created, techmap broken

81f783b

$not now passes test_cell!

76102f0

widlarizer added 10 commits June 20, 2024 16:43

reconsidering unset

d2107a9

add morphCell instead of type assignments, test_cell passes for all c…

1be8f80

…ells

fix proc, reduce warnings

66c6293

rtlil: rename InternalOldCellChecker back to InternalCellChecker

343bc66

shim: init

6ab2e77

shim: fix

b13d092

rtlil: fix erase

d04d810

mem: fix morphCell calls

5ca78ae

shim: fix, add forgotten generator

6922a68

hierarchy: morphCell

d3d5738

widlarizer mentioned this pull request Jul 18, 2024

Initial VAY draft #4498

Draft

widlarizer added the status-superseded Status: Work continues in a different PR or was made redundant label Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redo internal cell memory layout #4461

Redo internal cell memory layout #4461

widlarizer commented Jun 17, 2024 •

edited

Loading

widlarizer commented Jun 17, 2024

povik commented Jun 17, 2024

povik commented Jun 17, 2024

whitequark commented Jun 17, 2024 •

edited

Loading

widlarizer commented Jun 17, 2024

widlarizer commented Jun 17, 2024

whitequark commented Jun 17, 2024

povik commented Jun 17, 2024

whitequark commented Jun 17, 2024 •

edited

Loading

widlarizer commented Jun 17, 2024

povik commented Jun 17, 2024

whitequark commented Jun 17, 2024

whitequark commented Jun 17, 2024

widlarizer commented Jun 18, 2024

widlarizer commented Jun 20, 2024

Redo internal cell memory layout #4461

Are you sure you want to change the base?

Redo internal cell memory layout #4461

Conversation

widlarizer commented Jun 17, 2024 • edited Loading

Brief

Full

Core idea

Wait, wasn't this supposed to be a Yosys rewrite in Rust?

Writing a compiler

Beyond the cell

Size goals

widlarizer commented Jun 17, 2024

povik commented Jun 17, 2024

povik commented Jun 17, 2024

whitequark commented Jun 17, 2024 • edited Loading

widlarizer commented Jun 17, 2024

widlarizer commented Jun 17, 2024

whitequark commented Jun 17, 2024

povik commented Jun 17, 2024

whitequark commented Jun 17, 2024 • edited Loading

widlarizer commented Jun 17, 2024

povik commented Jun 17, 2024

whitequark commented Jun 17, 2024

whitequark commented Jun 17, 2024

widlarizer commented Jun 18, 2024

widlarizer commented Jun 20, 2024

widlarizer commented Jun 17, 2024 •

edited

Loading

whitequark commented Jun 17, 2024 •

edited

Loading

whitequark commented Jun 17, 2024 •

edited

Loading