Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC][Sim] Add triggered simulation procedures #7676

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

fzi-hielscher
Copy link
Contributor

@fzi-hielscher fzi-hielscher commented Oct 8, 2024

Continuing the series of #7314 and #7335 (and hoping to finally get to lower the sim.proc.print operation) this PR adds trigger-related types and operations to the Sim Dialect. The primary point is to be able to express the execution order of side-effecting ops and procedures without having to rely on operation order within a HWModule's graph region. As added benefits, triggers allow us to:

  • Call procedures on simulation start without using register initializers
  • Have explicitly concurrent procedures within a single HWModule
  • Impose an execution order on operations in different modules and instances
  • Efficiently look-up the order, if any, of two given triggers

Triggers span virtual clock trees. Their root node is either an edge event of a "real" clock (sim.on_edge) or the start of simulation (sim.on_init). When the root event occurs, all leaf operations of the given tree are triggered. In contrast to normal clock trees, trigger trees impose a partial order on their leaf nodes from which we can derive their execution order. Two leaf nodes are unordered (incomparable) if they are not part of the same trigger tree. They are concurrent (equal) if their lowest common ancestor operation is not a TriggerSequenceOp. If the lowest common ancestor is a TriggerSequenceOp the order depends on the result indices of the sequence op.

So... in practice:

%init = sim.on_init
%sequenced:2 = sim.trigger_sequence %init, 2 : !sim.trigger.init
sim.triggered on (%sequenced#0 : !sim.trigger.init) {
  // This first
}
sim.triggered on (%sequenced#1 : !sim.trigger.init) {
  // This second (concurrent with below)
}
sim.triggered on (%sequenced#1 : !sim.trigger.init) {
  // This also second (concurrent with above)
}

sim.triggered provides a region in which we can place procedural operations. These operations can have side-effects. However, they are required to make forward progress and eventually terminate independently of all other procedures and simulation events. This means that concurrent procedures are not actually required to be run in parallel. Any chosen serialization should be legal / dead-lock free. Note that during lowering previously unordered procedures can become concurrent, e.g., by CSEing their root triggers.

TriggeredOps can also produce results via the sim.yield_seq terminator. The "seq" is to indicate an implicit register at the output. I.e., results are produced in a clock synchronous fashion. At some point we"ll probably need an asynchronous sim.yield_comb. But this can create all sorts of complex interactions, so I try to put it off as long as I can 😅.
All results of sim.triggered must have an explicit tie-off constant specified. These are used both as results outside of simulation contexts (i.e., synthesis), and as (pre)initial value of the implicit register.

I have a very much proof-of-conceptish implementation of an arcilator lowering in my github fork. It can compile this little gadget, showing how to do sequenced calls to a side-effecting procedure during initialization across module instances (and print stuff).

Copy link
Member

@uenoku uenoku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, thank you for working on this! Introducing triggered to core dialects might be controversial since it essentially represents behavioral constructs. I have couple questions:

  1. How does this relate do LLHD? LLHD is I think really good at this kind of representation and is more flexible, is it possible to promote LLHD to core dialect and use it for behavioral.
  2. on_init as an operation seems a bit weird to me. Also when on_init is provided to TriggeredSeqeneceOp it must be the first element, correct? it may be more reasonable to put on_init as an attribute on TriggeredOp.
  3. TriggeredOp could capture values outside I think it's fairly easy to cause race conditions. If the two triggered ops are trigged at the same edge and one triggered op depend another triggered op results, what is expected behavior? Also I think there is a same problem as what we talked about seq.to_immutable. If a triggered op operand is a port, there is initialization ordering problem.

@fabianschuiki
Copy link
Contributor

fabianschuiki commented Oct 10, 2024

Really cool 😎!

I'm wondering how this relates to seq.initial and its !seq.immutable<T> wrapper type. Might that already do what you need? You could return a dummy value to have multiple seq.initials get ordered in a predefined way. We might even want to introduce something like a void type for that purpose. Or use i0? If seq.initial also works, we could extend hw.triggered with a similar !hw.triggered<T> wrapper and ability to return results, and thus allow you to schedule side-effecting op execution on a clock edge. What do you think?

@fzi-hielscher
Copy link
Contributor Author

fzi-hielscher commented Oct 14, 2024

Thank you both for your feedback, yet another time. Let's see if I can defend my design decisions - apologies if it is getting a bit longer:

Introducing triggered to core dialects might be controversial since it essentially represents behavioral constructs.

I would argue that sim is the place in the core dialects to put behavioral constructs in, so we can keep them out of the other dialects. Initializers are a difficult corner case, but more on that later. I also would not necessarily call it "behavioral", I'd plainly call it "software". The body of sim.triggered can be used to describe software which becomes part of the simulator and which is very explicitly not meant for synthesis. That's why there is a mandatory tieoff constant for each result, so we have well-defined behavior for both simulation and synthesis. We have to live with the fact that there will be differences in behavior. My goal is to make them obvious and explicit rather than hidden and subtle, as they often tend to be in Verilog. Ideally, if you only use hw, comb and seq ops, you should have a guarantee that simulated behavior matches synthesized behavior.
Using sim.triggered you can then do somthing like this:

%init = sim.on_init
%isSimulation = sim.triggered () on (%init : !sim.trigger.init)  tieoff [0 : i1] {
  %true = hw.constant true
  sim.yield_seq %true : i1
} : () -> i1

In SV this would become:

logic isSimulation = 1'b0;
`ifndef SYNTHESIS
initial isSimulation <= 1'b1;
`endif

I'm not saying that you should do that, but at least the difference clearly originates from a sim operation.

How does this relate do LLHD? LLHD is I think really good at this kind of representation and is more flexible, is it possible to promote LLHD to core dialect and use it for behavioral.

I have to shamefully admit that I only have superficial knowledge of LLHD. But from what I have picked up so far, it is mostly aimed at event queue and time based simulation. It's great that we can do that if we must. But for frontends like FIRRTL, which don't really have a concept of time, it seems like overkill to me. For sim I'm aiming for a mechanism, that can cover most basic use-cases, but is restrictive enough to remain easy to analyze and to lower with different backends. Let me try to outline it:

  • Every trigger tree has an event as its root. If, and only if, the event occurs, the entire trigger tree is executed.
  • The exact conditions and time at which an event 'occurs' are determined by the simulation environment. The only requirement is that it is the same mechanism which is used to trigger the sampling and updating of registers.
  • During the execution of a trigger tree, simulation time (if it exists) is suspended.
  • For any two operations contained in the same tree, we can determine whether they are executed before, after or concurrently to each other.
  • Operations in a trigger tree must make forward progress independently of all other operations in the model. Notably, they must not wait for any event originating form within the model.
  • Side-effects of operations in a triggered operation must not be observable by operations outside of the same operation, unless they are passed as a result.

So, I guess the body of a TriggeredOp is pretty much the same as a "function" in LLHD. Thinking of arcilator, I recon the difference between using Sim vs. LLHD would be like the difference between using --no-timing vs. --timing for verilator. But I'll happily let @fabianschuiki have the last word here. 😅

on_init as an operation seems a bit weird to me. Also when on_init is provided to TriggeredSeqeneceOp it must be the first element, correct? it may be more reasonable to put on_init as an attribute on TriggeredOp.

on_init is kind of like a clock that pulses exactly once before all other clocks. I'm not sure I get what you mean by "the first element". TriggerSequenceOp takes precisely one trigger argument that defines its parent trigger. This can be an on_init or an on_edge root, or the result of another sequence. Having an on_init attribute on TriggeredOp would break the single root event concept.

TriggeredOp could capture values outside I think it's fairly easy to cause race conditions. If the two triggered ops are trigged at the same edge and one triggered op depend another triggered op results, what is expected behavior?

TriggeredOps simultaneously capture their argument at the occurrence of their root event. A chain of TriggeredOps on the same clock/event would behave like a shift register or a clocked pipeline. This is meant to avoid race conditions. If I ended up creating them, I did something wrong. 😬
Generally, I'd expect the majority of trigger user ops to not carry results. My primary motivation for adding this option was to be able to model the current behavior of clocked DPI calls with a procedural call operation inside a TriggeredOp. I'd like to have a unified mechanism to deal with clock synchronous function calls, independently of them being defined inline, at the top-level or externally.

Also I think there is a same problem as what we talked about seq.to_immutable. If a triggered op operand is a port, there is initialization ordering problem.

Yes. It is frustrating but at least for SV I'm afraid we cannot avoid it. As I mentioned in the other PR, I think our best option here is some sort of interface contract, either encoded by type or by an attribute, promising that any initialization of the port has occurred before the initial processes are started. For the Arc backend I don't see this as much of a problem. We should be able to insert a hook between state allocation and invocation of the initializer function that allows user code to specify the "pre-initial" value of input ports.

I'm wondering how this relates to seq.initial and its !seq.immutable wrapper type. Might that already do what you need? You could return a dummy value to have multiple seq.initials get ordered in a predefined way.

There is definitely a functional overlap with seq.initial, but I would argue that conceptually a register initializer and a procedure called at the start of simulation are different things. The former is tied to a physical register while the latter is purely a simulation artifact. Initializers occupy this weird space where they may or may not be synthesizable depending on whether you are targeting ASICs or FPGAs. Since we do not specify a tie-off value for seq.initial my interpretation of it is that it provides a (potentially) synthesizable "elaboration-time constant". On the other hand sim.triggered provides a "simulation runtime constant" with an explicit constant tie-off for synthesis. A not-too-accurate analogy could be constexpr vs. const in C++.

We might even want to introduce something like a void type for that purpose. Or use i0? If seq.initial also works, we could extend hw.triggered with a similar !hw.triggered wrapper and ability to return results, and thus allow you to schedule side-effecting op execution on a clock edge. What do you think?

If I understand you correctly you are suggesting to schedule operations via the topological order of their arguments and results, like @uenoku did for seq.initial, right? If so, my primary concern here is that we would need to handle !hw.triggered<T> operands differently depending on whether they are produced on the same or a different event as the TriggererdOp consuming them. If we were to wait on a result produced on a different clock edge, we would deadlock. If we fail to wait for a result on our "own" clock edge, we mess up execution order. This becomes especially a problem when we start passing them through module boundaries, as this dependency might change depending how the IOs are connected in the parent module. In contrast, the tree structure guarantees that we only depend on a single, well-defined parent trigger.
Note that it is fairly easy to convert the tree structure to a topological dependency graph, while the other way around is generally not possible. I implemented this "unraveling" in my arc lowering prototype. But it happens after module inlining, when we can see the entire picture. I would have talked more about this, but given your recent wave of PRs, I need to first check how much of it has become obsolete. 😉
I would also argue that, for a middle-end representation, a tree has some minor "ergonomic" benefits, like not having to deal with unit value results for print and similar operations, and more efficient look-up of order between two arbitrary operations (assuming trigger trees will be mostly flat).
The only real drawback I can see so far is the inability to pass values between TriggeredOps "immediately", i.e. on the same clock edge. There are ways around it, but they would violate the "side-effects are not observable outside of the current op" property. I think in many real-world cases where this becomes necessary we should also be able to just merge the TriggeredOps.

TL;DR: I like trees. 🌲

@fzi-hielscher
Copy link
Contributor Author

After letting this settle for a month I still think it is a viable approach (which, for me, is somewhat unusual 😅). So, let me nudge it out of draft mode. My condensed argument in favor of trigger trees instead of a token-flow approach would be that they structurally guarantee deadlock freedom even through opaque interfaces, while clinging to the familiar concept of clock trees. The only new concept that is added is sim.trigger_sequence.

I think it is worth highlighting that sim.triggered cannot be used to initialize (hardware) registers and was never meant to do that. I've grown increasingly convinced that it would make sense to add a sim.initial op that can work in tandem with seq.initial to have a clean separation between non-synthesizable and synthesizable register initialization. But that's for another PR.

The problem of non-deterministic initialization order in SV remains a pain point. I don't know how much of that should bleed into the core dialects. At the moment my gut-feeling is to avoid being overly restrictive in the middle-end and then have a legalization/sanitizer pass try to convert it into deterministic behavior during SV lowering.

I haven't gotten around writing a lowering for the new arc passes, yet. I've been eyeing the discussion on arc.task in #7650. At first glace it looks to me like a good fit to lower sim.triggered to, even if we don't actually want to do multi threading.

@fzi-hielscher fzi-hielscher marked this pull request as ready for review November 4, 2024 22:26
@darthscsi
Copy link
Contributor

Starting to go through this. Is it simpler to say that the ordering of tasks is the in-order traversal of the tree from the root of trigger-sequences with multiple leaves on the same edge being unordered?

@fzi-hielscher
Copy link
Contributor Author

Starting to go through this. Is it simpler to say that the ordering of tasks is the in-order traversal of the tree from the root of trigger-sequences with multiple leaves on the same edge being unordered?

Going by my nomenclature above leaves on the same edge would be "concurrent". "Unordered" would imply that they are not part of the same tree. But other than that, yes. In-order traversal always provides a legal order. Iff no trigger value has more than one user, it is the only legal order.

Flatten nested sequences in a single rewrite rather than iteratively.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants