-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reasoning behind the push-then-pull model #196
Comments
It would seem that “effects” control the pulling of the graph when they are “mounted”. Another way of saying this is that the system is push based to only the active parts (the pulled parts). This would not require a “dirty flag” but instead an active flag which is pulled by effects. The system would be pushed to only the active portion of the graph (one trip, instead of a round trip with the push-then-pull model). It seems like this is just a lazy push system, where pushes only propagate when the paths are activated by some effects at the end of the graph. I believe this is how Preact works, but don’t quote me on this. If an effect mounts (activating a portion of the graph) and then it unmounts and is garbage collected, then if source nodes are weakly held references, the cleanup would cascade up to where the root nodes may be held in memory (which may be capturing some events). So, it would be clear that resource management happens at the root nodes, and not the endpoints. The endpoints are held in memory for enabling/disabling effects in our program, but ultimately they inherit their life cycle from their sources. The top-most source is a root. As long as a root exist in memory (perhaps as an event listener capturing events in our program) then its sources live with it only where the effects are instantiated. If a root node is GC'd, then the entire sub-graph depending on it should be GC'd as well, regardless as to whether the effects were "unmounted". Why? Because why persist an effect which will no longer be re-computed from any new state change? This is exactly how Flash handles the graph: all source references are held weakly, and all target references (sinks) are held strongly. If a root is removed, all strong references to targets are removed. If no other strong references are held to these targets either by other source (sink) nodes, or by some JavaScript reference to the node, then they're scheduled for cleanup by the GC. This cascades down to effects, where they'll be automatically cleaned regardless of any unmounting. |
A batch of changes will invalidate lots of dependencies. Having the dirty flag can save a lot of evaluation. Effects should only happen at the end of transaction. |
Can you be a bit more exhaustive with your point here? I'm having to read between the lines of what you're saying; some examples could help. My question was about justifying push-then-pull model outside of it being novel or having a narrow case for optimization. I tried to expand on the case where there would be optimizations which would be only when the graph is not used. When the graph is not used, then there is remains the low overhead of pushing dirty flags. The purpose of an unused graph is to maintain idle code paths which may later become active by mounting an effect. Though, why not then just mark the idle code paths as idle and not propagate a dirty flag at all? When the code path becomes active by a mounted effect, then the push of changes propagate without any need to check for dirty flags. This means the model is a lazy push model and achieve the same result if not a better result (less unnecessary steps). I haven't mentioned batching in this issue, so I'm curious why it's brought up? No batching occurs in a lazy-push model, unless you consider pulling that occurs once when mounting an effect? Again, can't read between the lines of what you're saying, so please clarify. |
I think you're mistaking the "pull" part as a single distinct phase of the algorithm, but it isn't really. The pull happens when user code calls Some parts of the graph may not end up getting pulled at all, if other changes make them irrelevant -- e.g. if they're only consumed in one branch of an The basic guarantees of the push-then-pull model are:
That is:
Checking the dirty flag on a Computed if it's "live" is also an optimization in a similar vein. It doesn't change the outcome of when or which computeds rerun (guideline 3). It's just that if a computed does happen to be in the live part of the graph and you see it's not marked dirty, you know (without having to look at them) that none of its dependencies could possibly have changed. |
@shaylew Thanks for the thorough explanation. I can see why the model can be beneficial as I reason about your explanation. What point stood out to me was the mention of consumers that no longer need to pull a path by the change of a condition in a branch ( const computed = new Signal.Computed(() => a.get() ? b.get() : c.get) In this example a.set(false)
b.set(val) In this case, an update to b.set(val) // causes `computed` to re-evaluate `b`
a.set(false) // causes `computed` to re-evaluate `c` In that case, the push will cause It seems to me that the cases here really are based on the order of changes in the system and the races for those changes to whatever batching may be implemented for optimizations. All of that said, you're going to get the performance benefits of the system you're given, and the next question will be what cases will you need to be more explicit in how you use of the system to gain the optimizations that are relevant for the cases for your app. |
I'm trying to contrive an example where push-than-pull leads to some optimization when it comes to re-evaluation, but I only see added complexity. If you look at the example, the pull system simply triggers depth-first re-evaluations down the graph. During re-evaluations, we can assume the current cached state in each node is the latest state since we're pushing changes down the graph (from left to right). After each node is re-evaluated, its state is updated (yellow to blue). After its update it cascades to another push. There is never a need to track a "dirty" flag since updates are pushed immediately. Any value "pulled" during evaluation is taken directly from the cached state in the node which should be up-to-date for all parent nodes. I'm looking for an example where an optimization can be had. I can only imagine an optimization case if the first phase (push) was able to allocate multiple updates of root-nodes before end-nodes pulled values down as they re-evaluate. Which is why I ask about a "batching process of some kind". |
So, two things: Some kind of batching process is the norm for UI use cases. The proposal doesn't explicitly need to include this as a named feature because it leaves all the pull timing to user or framework code. Effectively if you call The second thing is: when you say
it turns out that "from left to right" is doing a lot of work here, and depth-first and left-to-right are different things. For an example of where the naïve push-based approach goes wrong, think about a graph like this: So if you want a working push system:
You can come up with coherent answers to these issues, and you end up with something like Jane Street's Incremental. But the thing you end up with isn't compatible with autotracking as far as I know, and it just bites the bullet on the "unnecessarily recomputing F" front and decides to live with that issue. Autotracking is one of the main reasons for this being a language proposal, so I don't think this sort of topologically-sorted push system is a good fit for its goals. |
So now I realized that the spec leaves the behavior of how the graph gets pulled up to the watchers (user-land). This gives flexibility of optimizing the way the graph is read / pulled. So, a user of the graph must be aware of which tool they're using to subcribe to the graph. Very interesting, and makes really good sense for the motivation. Though, all the benefits for this model which leaves the pulling up to how watchers are implemented comes with the overhead of managing a little bit of extra state (dirty flag) and an extra few trivial calls between pulls on the graph (the real evaluation of the state). I'm curious at what case is the extra work a worthwhile trade-off? Is there any analysis on this trade-off specifically? Given a naive watcher which pulls on every notification from the graph, it wouldn't be more optimal than a push-system. So what are some rudimentary watcher implementations which give you some net gain in optimization and at what minimum topology and update frequency would the gains begin? |
Given this spec moves the responsibility of how to most optimally read from the graph to the user-land, have we considered the alternative approach to the problem: moving the responsibility of how most optimally update (push) the graph to the user-land? This is the basis of my questioning for the push-than-pull model. With the spec, we consider all the strategies of batching, and what not, to be done at the watcher, which sits between two stages push and pull. The alternative to this, that i'm willing to explorer for the sake of this proposal, is to allow for strategies to be implemented at any node in the graph. This can be done with a push-based system where each node tracks some state and can reduce over its previous state while evaluating (see Flash Reducers). And with such a capability, the entire architecture of push-then-pull model could be opt-in with an user-land implementation of such node types. What's beneficial about this is that you can then apply this strategy to only parts of your graph and mix and match other strategies as well. I'll have to contrive an example soon to illustrate. |
This is just some chicken-scratch code to illustrate how a push model, which can access some execution context values, could fully emulate the push-then-pull model. const onPubSub = compute => {
const computed = on(() => compute())
// This is the computed signal derived from the compute function.
// It will auto-track all of the signals invoked in the compute function.
const dirty = on(() => computed() !== off(cached))
// Track whether the computed becomes dirty by comparing it to the cached value.
const cached = on(() => {
const { cache } = own() ?? {}
// Get the cached value from internal state (or default to undefined)
if (dirty()) {
const newCache = when('pull') ? off(compute) : cache
// Only recompute the value if the current execution context contains
// 'pull', otherwise get the cached value.
own({ value })
// Always update the internal state to propagate dirtiness.
// By setting a new object always, the signal will re-evaluate and propagate
// a change to sinks in the graph.
}
return cache
// return internal cached value.
// Signals can act like "views" over internal state in this way.
// Even though cache may not have changed, the internal state change can force
// push changes downstream.
})
// This is a signal that caches and returns the computed value.
// Initially, the cache is undefined, which will result in `dirty() === true`.
// Pulling while `dirty() === true` will recompute the value and update the cache.
// The `cached` signal updating will _not_ invoke the dirty signal because it
// doesn't track `cached` by use of `off(cached)`, avoiding cycles
// In summary, cached only recomputes for the value when pulled and the signal
// tracking the compute value is dirty.
return cached
}
const source = on<number>(initialValue)
// Create some regular non-pubsub signal as a source.
// This to illustrate compatibility with regular signals.
const square = onPubSub(() => source() ** 2)
// Create our first pub-sub signal which is the square of the source signal.
// Internally a signal is tracking the source state, but only to update the
// internal dirty state. The dirty state is what propagates as a push through
// the signal graph.
const b = onPubSub(() => square() / 2)
// Create our second pub-sub signal which is half of the square.
// It is tracking another pubsub signal so it will only receive dirty pushes.
// Usage:
const effect = on(() => {
b(false)
// Track `b` without pulling it.
// Subscribes to dirty notifications with `when` utility which returns true
// when the signal is what caused the re-compute for this signal.
if (when(b)) {
// ...do extra logic to determine whether to pull `b` or not...
b.with('pull') // pull!
// Step 1:
// `b.with('pull')` will pull the compute function for `b` if `b` is dirty.
// This is done by adding 'pull' to the execution context stack.
// Step 2:
// `compute()` will re-evaluate `square() / 2` which will pull `square()`
// if its dirty.
// Step 3:
// `square()` will re-evaluate `source() ** 2` which will pull `source()`.
}
})() We first define an abstraction for the push-than-pull signal called In this example, Alternatively, one could implement the same execution context trick for changing signal evaluation modes by defining a impure compute function with the context parameterized: let isPushingContext = false // at module scope; shared with all onPubSub instances
// Inside the onPushPull implementation:
const cached = on((push) => {
// set the context value to match the push parameter if it's set.
// Otherwise, set the push parameter to the current state of the context.
// Evaluate the compute function if `push === true` and `dirty()`...
// If push argument was given, reset the isPushingContext state in a setImmediate
}) Though this attempt is more hacky and error prone than being provided with an execution context API built in. ConclusionI want to illustrate the point here that a purely push model is all the that is needed to define other models at the user-land level. This is case for defaulting to the push-model which is the missing model in the JavaScript language; we already have a pull-model with function invocation and iterators. We have a push model with promises for single values, but we don't have a standard push model for multiple values standardized yet. Observables was the original attempt at this initiative. I think Signals are a better and more refined concept to take the place of the multiple-value push-based primitive than Observables for the main reason of auto-tracking and minimalist API and semantic constructs. Over-complicating their implementation without any means of opting-out of standard behavior, would miss the opportunity to cover all applications which simply need a multiple-value push-based primitive. Exhausting my case for what signals should beSo far, my multi-year research on the topic of signals (or observables, or reactors) has lead me to the conclusion that all that is missing is the ability to define functions which are invoked implicitly instead of explicitly: a function By realizing that the push system is just an inversion of the call-stack, it becomes the obvious answer to the missing push state primitive in any functional language (including JavaScript). Whether this kind of reactive primitive fits the specification for what defines a "signal" or not, remains irrelevant because it is at least more pure to the concept of a reactive primitive standard from which all other reactive systems could derive. |
It's true that a pull is just two pushes. However, one of the goals of this proposal is to ensure that we have interoperability between frameworks that use Signals with the guarantees that Shay mentioned above. I think requiring people to build custom push protocols to emulate pull messages makes interoperability between frameworks harder. |
@modderme123 By my point isn't that a pull is just two pushes. I shall look at the context you've provided to understand what is meant here by pull is two pushes. Instead, my point is that the language already has 'pull' mechanisms. And introducing a push mechanism can enable any protocol desired in the graph. The push-then-pull model is simply a protocol within the graph that may be interoperable with most signal libraries, but certainly it's not interoperable with all of them. If only a push model is standardized, then it is possible for the standard to be interoperable with all reactive frameworks whether signal-based or observable-based. For example, there is no way for this signals proposal to be used as the implementation details for Flash, because Flash, like observables, is push-based. Therefore, in order for Flash to be interoperable with other libraries which consume a standard reactive primitive such as signals, then Flash would need to compromise on the design entirely. There may not even be ways to achieve the design decisions in Flash using the standard. This would render the signals standard useless for Flash's application. Which is unfortunate, because then the language could become fragmented with a standard for reactivity, and many identical user-land reactivity implementations that only require a push-model. That is until the push-model primitive is standardized as something else, and then we'd have two primitives in the language and a bit of redundancy. |
After re-reading this, I wonder: How does a system which pushes "dirtiness state" down the graph solve for this problem? The watcher will receive a notification that a node is dirty before the change has fully propagated the dirtiness state through the graph, right? Unless the push phase of the dirty flag happens in one transaction and then all relevant watchers are called once this phase has completed? |
This is essentially the reason you're not allowed to |
I think this is solved with flatmap on pretty much any push stream, no? Here is an example using rxjs, but it'd work fine in mostjs, etc. import * as S from "npm:[email protected]";
const State = new S.BehaviorSubject({
c: true,
f: "Hello",
g: "World",
});
// Nodes A and B omitted for brevity
const C = State.pipe(
S.map((state) => state.c),
S.tap(() => console.log("C Computed")),
);
const F = State.pipe(
S.map((s) => s.f),
S.tap(() => console.log("F Computed")),
);
const G = State.pipe(
S.map((s) => s.g),
S.tap(() => console.log("G Computed")),
);
// Logs "Hello\nWorld"
const subscription = C.pipe(
S.switchMap((c) => c ? F : G),
S.tap(() => console.log("Result Computed")),
).subscribe({ next: console.log });
// Change root state
State.next({ ...State.value, c: false }); Outputs:
Here, the map calls in F and G are, I believe, not computed until they are returned in switchmap operator. Am I missing something? This also leads me to another question, can a signal be nested in another signal? If not, that is a big indication to me that it is probably better classified as a design pattern over an interface than a language primitive. If so, what does that look like? |
I’d like to better understand the reasoning behind the push-then-pull model specified in this proposal. I’m looking for strong justification for it besides it being novel.
From my understanding, signals seem to be a push based system much like observables. The benefit of signals over observables would be their ergonomics and resource management. That would be my understanding, but I am open to being corrected in this understanding.
My understanding of the push-then-pull model is that dirty flags are pushed first through the graph, then only signals which are attached to some watcher endpoints (effects) are then pulling the graph where it’s dirty. The pulling starts at the beginning of the graph and the state changes propagate down the dirty paths until either reaching the end of the graph or a node where the state hasn’t changed (a
===
comparison).If this understanding is correct, it seems to assuming that some optimization is to be had by eagerly propagating cache invalidation and then lazily propagating state changes. This would be an optimization assuming that portions of the graph may be frequently idle (not watched) only to become active when an effect is “mounted” which activates those portions of the graph.
From this, I can conclude that this proposal of signals is to suggest that signals are to be a state management system where the application state control flow is to be held in memory at all times, but the propagation of the state passing through the control flow is to be inactive until actively enabled by some watcher (effects). If this is a valid assessment, then my curiosity is why not leverage the language for this design: control flow can exist as function references in memory, and state only exists at the endpoints. Allow me to elaborate…
If a portion of the graph is not active, because no endpoints are watching this portion of the graph, then why hold reference to this sub-graph in memory at all? If all in-memory references to the instances are of the signals in this subgraph are not held, then the GC should clean them up. However, I could imagine that the references to them are held in memory while waiting for some event to mount a watcher or effect to activate them, but then the question is why track cache invalidation for them with a dirty flag? If they were in-active before the effect/watcher is mounted, then they are dirty because they haven’t been observed yet. So, we only seem to need to track whether the signal is being observed/pulled by some endpoint in the graph. If it is not, then sources shouldn’t propagate changes to them (nor dirty flags). If a signal is actively contacted to some endpoint then sources will push the changes. This avoids the two step process of push then pull, and simplifies it to push-but-only-where-necessary.
If I’m missing something critical to understanding the advantages of a push-the-pull model over a push model which tracks the active portions of the graph for optimizations, then please share that insight.
The text was updated successfully, but these errors were encountered: