Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

typeguide spike #20

Draft
wants to merge 2 commits into
base: rust
Choose a base branch
from
Draft

Conversation

alexswan10k
Copy link

@alexswan10k alexswan10k commented Oct 16, 2021

Firstly, not ready yet, do not merge! (some tests are broken, although a lot of stuff does work). I am prepared to throw half of this away, but I think the new concepts are definitely progress and worth discussing. This all started when I tried to uncomment a seemingly innocuous closure test that captures a mutable value, and then tried to turn mutables into wrapped cells for capture purposes.

[<Fact>]
let ``Mutable capture works`` () =
    let mutable x = 0
    let incrementX () =
        x <- x + 1
        ()

    incrementX()
    x |> equal 1
    incrementX()
    x |> equal 2
    incrementX()
    x |> equal 3

@ncave , I could do with some input really. This is kind of where I am going. The new concepts:

  • A typeguide concept, allowing you to specify (for any expr transform) what you want (wrapped, owned, unwrapped, ref, etc)
  • A FitToGuide transform that lets you take any expression of any wrapped/unwrapped/ref state, and turn it into any other combination
  • In theory this should massively simplify things in the long term, as you would no longer have to think exactly where the logical place is to clone something, when to wrap or unwrap something etc etc. You can just say "I started with an X and I need a Y here"
  • The ScopedVarAttrs is really the actual representation of the computed expression, so FitToGuide uses this to determine what to start from.

Herein lies the problem - I think we are missing a fundamental piece of the puzzle. The ScopedVarAttrs can obviously be pulled from the ScopedVarAttrs in the ident context if you are working with an ident. If you are working with a class/record field you can infer what the default is for that field. The issue is when you have hierarchies where multiple steps might call FitToGuide, it is not obvious, or perhaps even resolvable who has done what unwrapping and when. A simple example I am hitting up against is a let which defines a variable. If the let guides and the create value guides, you double wrap. The problem is you need it in both places, because let might call a function that emits an incompatible type (not a cell when you want a cell).

On a side note create value should presumably always wrap, and this is actually where we do it today, just very unstructured and ad-hoc. In this pr we can just throw the output through the guide fn, and tada, wrapped/cell/whatever.

A possible solution

What if every transform expression returned both an expression and a ScopedVarAttrs (it probably needs another name i guess). By doing this you could subsequently understand what you actually have after any expression transform. This would allow you to know not to double clone for example, or double unwrap a ref.

transform (com, ctx, fableExpr) -> rustExpr * ScopedVarAttrs

The problem with this is every trasnformExpr derivative changes its return signature, which is a massive ballache. An alternative pattern I was initially considering is how the Fable AST derives types - by effectively walking the subtree and pulling types out. This seemed cool in principle, but in practice I do not think the AST actually has all the info we need at this point, because it might just be an ident that actually points to something else (the context has been lost). This is what we need to track against each transformed expression (at least that is my conclusion):

    IsCellWrapped: bool
    IsRef: bool
    IsRefCountWrapped: bool
    HasMultipleUses: bool //or some other clone/borrowed/owned capture info

Again apologies if I am holding you up with other stuff, feel free to just carry on if you need and I will have to just work around it. I am just finding the existing architecture has limits I am hitting up against which I am not sure we can solve without these extra pieces. As you can imagine fixing this has been non-trivial either as when you fix 1 thing you break 2 other things... Let me know anyway, happy to follow your lead on this.

@alexswan10k
Copy link
Author

alexswan10k commented Oct 16, 2021

It has also just crossed my mind that maybe create (new value) is not the place to be wrapping/unwrapping at all, although this does lead to some complexities around things that get returned from libcalls that are already wrapped such as strings.

@ncave
Copy link
Owner

ncave commented Oct 16, 2021

@alexswan10k First of all, I really appreciate your effort, you obviously spent a great deal of time thinking about this. Here are some random thoughts about it, in no particular order (and it's just an opinion, so don't put too much weight on it):

  • What you're describing sounds a lot like an AST transformation that we could be doing first, to map desired Rust output to F# language concepts, and then just generate on that modified AST. For example, mutable values can be transformed as reference cells, refs can be byrefs, etc.

  • That said, my first instinct would be to strive for simplicity, and try to get most of the answers statically from the AST, where possible. For example, IsRef can be by the shape of AST (by convention for function parameters), IsOwned by the shape of AST (for all blocks), IsRcCounted is really not(IsValueType), so the type can give a static answer to that, etc.

  • In theory we still have a small window of opportunity to propose changes in the Fable AST if we need to, if they're general enough and really needed to express something, before the next Fable version solidifies the AST again.

  • Perhaps some of the answers (is something wrapped etc.) can be statically pulled out of the already generated (partial) Rust AST, where needed. There is a sample visitor for Rust AST here, we can build a helper to deep inspect Rust AST for something.

  • So perhaps first we can try to simplify and see if we can cut away some of the state we carry around, and from what's left, what can be moved into the AST with a transformation. Perhaps it's just the mutables that need to be transformed into FSharpRef mutable cells.

Again, just random thoughts, what do you think? Perhaps start (as you already did) with something that doesn't work and see what's the simplest way to make it work.

@alexswan10k
Copy link
Author

Happy to help. I really feel that this has wings, and it is quite exciting to see it starting to come together. Just need to crack these damn low level issues and then I am sure it will become far easier.

Ok, good ideas here. I am completely with you in that whatever we can push out of tracked state and into the AST would be hugely beneficial.

  • The visitor is great to have as a tool. I am continually surprised at how much groundwork has already been done. I appreciate that porting this from the rust AST helped, but i am sure it was not free either! Anyway, to that point I agree that pulling things like refs and wrapped status out of the AST would be preferred. My concern here is that the information we need is probably not always there unless the status of these things is directly encoded in the expression. If the expression points to an ident, for example, the information exists somewhere else on the AST. Is there a way we can get around this? I guess again we are back to tracking. It might be that if we combined this approach with the existing ident tracking, that we actually have everything we need. I guess what gets complicated is when an ident is part of a bigger expression, when it is not obvious what is necessarily coming out the other end. This is kind of how I came to the conclusion that you would need to follow the transforms.
  • Pushing cell into dotnet cell - this is a great idea, and might well work. We would have to do this is as a pre-transform presumably. I do think this depends on many of the decisions we make around mutables (more below).
  • Wrapped/unwrapped seems on the surface to be something we can statically determine, which is pretty much what we are doing today. This works for 90% of cases. If it is not a value type, it should be wrapped etc. I am still not that confident we can statically determine this just from a type though.

So I think where this is going to get difficult to statically analyze is actually with mutable values. What we would ideally like to do is one of the following:

  • If the value is trivially mutated without leaving scope, it should be a mutable.
  • If the value is passed over a function or captured, it presumably needs to be wrapped in a Rc<Cell>
  • Is there a case for a Cell ever? Again arrays use these internally so maybe this can remain invisible.

Two other pain points are operations and pattern matching:

  • Pattern matching always emits new idents as refs, so presumably this is resolvable. The complexity comes from the fact that you normally also need to unwrap values to match on them. If I have a union for example, I need to unwrap its Rc container first.
  • Operations (addition, subtraction, etc) mostly require symmetry of both operands, and also often require both sides to be dereferenced. On the other side they presumably need to be rewrapped to their old status. This can get quite hard to follow, when you have something that may or may not have been wrapped in a cell for example. Should the result be wrapped in a cell?

Saying all this though, it may well be that the mess I have got into is simply because we need to define mutables as their lowest common denominator which is statically derivable, the Rc<Cell>. This is kind of sub par though from a performance point of view, especially for trivial counters etc.

Apologies for my incoherent stream of consciousness.. i guess this is just part of the process haha

@ncave
Copy link
Owner

ncave commented Oct 16, 2021

@alexswan10k

If the value is trivially mutated without leaving scope, it should be a mutable.

I don't think we lose any performance by keeping all mutable values everywhere uniformly wrapped in Cells. There is no panic with Cells, and numeric benchmarks with tight loops confirm that performance is good, so at least for now we can assume it's ok. So perhaps we don't need to track it in this case, since it doesn't leave scope.

If the value is passed over a function or captured, it presumably needs to be wrapped in a Rc

Passing to function should always unwrap the cell value at call site (and it does, already, in transformIdentGet).
For local uses (including capturing), idents (and fields) already carry a IsMutable property in their AST definition.

I wonder if we can temporarily leave the perhaps complicated case of capturing mutables by closures for later, and just use your refactoring to clean up and simplify the other tracking. What do you think?

@alexswan10k
Copy link
Author

alexswan10k commented Oct 17, 2021

I would love to but the problem is stuff still isn't working properly because this pattern breaks other assumptions we have made elsewhere. The double unwrap scenario seems to be the issue I just cannot work around any more, because I have attempted to pull out the inline wrapping code in Create and handle it as a post-process step. This is conflicting with the let bindings which also do the same op, but have no context of the inner expr. I had to add this to let bindings because of another assumptionelsewhere around returning mutable values, which breaks when you start wrapping them.

On a side note the problem with the cell scenario is closures cannot clone cells, so they have to wrap them in Rc<Cell<T>>, a 'Cell' will not suffice.

I had a random idea yesterday. Rather than tear the whole program apart to change all the return statements, we could just track it on the compiler against the rust expr. You can get away with this because actual context only ever comes from a directly computed child. Let me try and push this approach a little further as this would theoretically give us a mechanism to permanently solve double clones, double wraps under certain edge cases, and a number of other frustrating inconsistencies that come about because context is assumed rather than known.

@ncave
Copy link
Owner

ncave commented Oct 17, 2021

@alexswan10k

we could just track it on the compiler against the rust expr

Yes, that's what I meant by inspecting the generated Rust AST, something like this:

    let isCallExpr pathNames (expr: Rust.Expr): bool =
        let pathExpr = mkGenericPathExpr pathNames None
        match expr.kind with
        | Rust.ExprKind.Call(callee, _) when callee = pathExpr -> true
        | _ -> false

    let isRcWrapped (value: Rust.Expr): bool =
        value |> isCallExpr ["Rc";"from"]

    let isCellWrapped (value: Rust.Expr): bool =
        value |> isCallExpr ["Cell";"from"]

(assuming values were created like this):

    let makeCallExpr pathNames (args: Rust.Expr list) =
        let callee = mkGenericPathExpr pathNames None
        mkCallExpr callee args

    let makeRefValue (value: Rust.Expr) =
        makeCallExpr ["Rc";"from"] [value]

    let makeMutValue (value: Rust.Expr) =
        makeCallExpr ["Cell";"from"] [value]

@alexswan10k
Copy link
Author

alexswan10k commented Oct 17, 2021

Pulling stuff out of the rust AST is definitely worth keeping in mind, I am just not convinced we can get around a lot of the non-trivial edge cases, namely:

  • When the result is computed from various sub expressions which all return something. It might do a libcall for example where wrapping happens internally eg strings.
  • When the data is actually in an ident, or expression of an ident, so the metadata does not exist in the rust AST, but in the context. Sure we can reach into the context for trivial cases, but what do we do when this is non-trivially complex. For example it might be an ident expr that calls a field that calls a function.

Just a quick update:
Managed to get a lot more working, still quite a few things broken but it is definitely starting to look like the approach has legs. I have added a real return type to the compiler, which is working well, but due to its inherent mutable nature, is fiddly to get working because state can bleed across when you forget to reset it, so probably needs some more tweaking. This is why I think the return statement approach (immutable approach) is probably the correct one, because it is far harder to accidently mix the result up with some unrelated value. When it works though, the values definitely seem to be more consistent, regardless of how many layers are contributing to the fitting process. I have everything from create value to leave context exprs to use this, so there is only 1 mechanism to work out how to wrap/unwrap/clone.

Anyway, still not really sure where we go from here, but happy to keep pushing this if you think it is worth it. I don't want to be a blocker though so happy to park it and use as an idea spike or whatever.

@ncave
Copy link
Owner

ncave commented Oct 17, 2021

@alexswan10k Does keeping all mutable values uniformly wrapped in Cells help? They're not traveling across boundaries, except in mutable containers like arrays or records with mutable fields.

@ncave
Copy link
Owner

ncave commented Oct 17, 2021

@alexswan10k I guess I don't quite understand the actual problem with capturing mutable values, perhaps I can use some education. If Rust closures capture by reference (in this case, a reference to a cell with internal mutability), is that not what we want for some reason? Or am I completely off-base?

@alexswan10k
Copy link
Author

alexswan10k commented Oct 17, 2021

So the problem in the closure example is you cannot just capture a Cell or a RefCell because they can only have 1 owner and are not cloneable, so they will be moved into the inner scope. To make it behave the way we want (basically multi-owner, closure owned and outside scope owned), you need to wrap it in a Rc too, allowing it to be cloned on capture. Then you are confronted with the problem that you basically have to turn everything into Rc<Cell<T>>, which will cause an extra pointer lookup in tight loops so not ideal. What you really want is to be able to choose based on usage.

I should also add - closures in our implementation are capture by move, not by reference. This is the safest approach to get around the lifecycle problems you would have to deal with if you allowed non-owned references to be captured.

@ncave ncave force-pushed the rust branch 2 times, most recently from 8b18ae5 to a7936ff Compare December 10, 2021 00:56
@ncave ncave force-pushed the rust branch 2 times, most recently from 7ead1e3 to 88f0c4c Compare June 19, 2022 10:32
@ncave ncave force-pushed the rust branch 4 times, most recently from 95f0008 to c8b6491 Compare July 26, 2022 18:56
@ncave ncave force-pushed the rust branch 5 times, most recently from 03e6514 to ab03971 Compare August 1, 2022 21:49
@ncave ncave force-pushed the rust branch 3 times, most recently from c55ecba to e294395 Compare August 7, 2022 21:28
@ncave ncave force-pushed the rust branch 2 times, most recently from 9259fe7 to 1a1854a Compare May 29, 2024 08:53
@ncave ncave force-pushed the rust branch 3 times, most recently from e7afdaa to 6ae8390 Compare June 13, 2024 19:36
@ncave ncave force-pushed the rust branch 2 times, most recently from c237e91 to 291c69f Compare August 19, 2024 02:03
@ncave ncave force-pushed the rust branch 2 times, most recently from 95acf2a to 2c1fe37 Compare September 19, 2024 22:06
@ncave ncave force-pushed the rust branch 4 times, most recently from 43b31bb to 573e28f Compare October 13, 2024 18:49
@ncave ncave force-pushed the rust branch 2 times, most recently from 4453d24 to 122083b Compare November 8, 2024 01:19
@ncave ncave force-pushed the rust branch 3 times, most recently from 31628b7 to 9cde400 Compare December 30, 2024 17:15
@ncave ncave force-pushed the rust branch 2 times, most recently from 96148c4 to 7c93b15 Compare January 9, 2025 03:59
@ncave ncave force-pushed the rust branch 2 times, most recently from 8e2fe54 to 4f8930b Compare January 19, 2025 19:58
@ncave ncave force-pushed the rust branch 3 times, most recently from 8588ba5 to 8052758 Compare February 23, 2025 01:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants