Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework AST to be more type safe and allow mutation #95

Draft
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

lukel97
Copy link

@lukel97 lukel97 commented Dec 22, 2021

This is the result of a three week yak shave that began with trying to use the visitor utility to mutate the AST.
A lot has changed to make things work with my use case, so this PR may not be suitable for this library, but the gist is:

  1. The AST now uses value semantics (structs not classes)
  2. Use keypaths instead of strings (get/set) for traversing the AST
  3. Existential types throughout the AST (instance fields that are of a protocol type, e.g. Value, Definition) have been removed and replaced with enums
  4. Visitors are now implemented as protocols
  5. The enter and leave methods are now type safe: You can only return new nodes that are of the same type
  6. In general, more functions that took in existential types have been replaced with generic functions
  7. As a bonus, have AST nodes implement the TextOutputStreamable protocol so they can be printed out again

Using structs

I first ran into the fact that the editing part of the visit function wasn't yet implemented: returning a new node in the visit function didn't actually change anything.
So I began to translate over the graphql-js reference implementation when I realised a fundamentally limiting factor of the current implementation:

The visit function should return a new copy of the AST with the modifications applied, so in the javascript implementation they do a deep copy of each node whenever it's mutated. However in Swift, classes have no practical way of doing such a deep copy.

In Swift, types that need to be copied about with value semantics are typically structs: Copying them then becomes a matter of using doing a var astCopy = ast.

And although this is a significant semantic change, changing this didn't seem to affect much of the existing code nor the test suite.

The next step was to finish implementing the set functions on the Node protocol, which uncovered two more problems:

Using KeyPaths to traverse the AST

The set functions on the nodes were not implemented yet so I began to fill them out, only to quickly realise that it wasn't possible to assign arbitrary Node values to the concrete instance variables in a type-safe manner. For reference, the types of the get and set functions for traversing the AST were:

protocol Node {
    func get(key: String) -> NodeResult?
    func set(value: Node?, key: String)
}
public enum NodeResult {
    case node(Node)
    case array([Node])
}

The value passed to set is just any arbitrary Node, and so would need to be downcasted before being set. On top of this, it would require a lot of boilerplate, matching up each string to each variable.

extension Variable {
func set(value: Node?, key: String) {
    switch key {
    case "name":
        name = value as! Name
    default:
        fatalError()
  }
}

This is less than ideal, since it's using arbitrary strings and duplicates most of the boilerplate that get is doing anyway. This seemed like a good use for key paths in Swift, which allows for strongly typed paths to the nested fields in each struct.

The immediately obvious solution would be to have each Node return an array of WritableKeyPaths to its various children. However we would end up with a heterogenous array, which if we were to type erase would prevent us from actually being able to write to the nodes. And even if we were to try this, there is no type-erased version of WritableKeyPath: you simply need to know the type of the field you are writing to at compile time, and no you can't just make it a WritableKeyPath<Self, Node> – Using Self within the Node protocol definition prevents it from being used as an existential type i.e. let x: Node (more on this later), and you can't assign a concrete type Variable to a Node either.

So instead the resulting function on each Node looks like this now:

extension Field {
    public mutating func descend(descender: inout Descender) {
        descender.descend(&self, \.name)
        descender.descend(&self, \.variableDefinitions)
        descender.descend(&self, \.directives)
        descender.descend(&self, \.selectionSet)
    }
}

The Descender struct contains three generic functions:

struct Descender {
    mutating func descend<T, U: Node>(_ node: inout T, _ kp: WritableKeyPath<T, U>)
    mutating func descend<T, U: Node>(_ node: inout T, _ kp: WritableKeyPath<T, U?>)
    mutating func descend<T, U: Node>(_ node: inout T, _ kp: WritableKeyPath<T, [U]>)
}

And this works great for allowing us to traverse the children in the AST, handling nullable children as well as any lists that may occur.

However, some of the Nodes' children were existentially typed: For example, SelectionSet has an array of Selections:

struct SelectionSet {
    var definitions: [Selection]
}
protocol Selection {}
extension Field : Definition {}
extension FragmentSpread  : Definition {}
extension InlineFragment  : Definition {}

And because of this, we get the "Protocol 'Selection' as a type cannot conform to 'Node'" error when trying to use the key path.

Replacing existential types with enums

Existential types in Swift – that is, naked uses of protocols in type signatures like var definitions: [Definition] – are a bit of a foot-gun, and carry with them some restrictions as mentioned above. There's an evolution proposal to make this syntactically explicit, and a lot of the time generics are a better fit.

But until we get generalised existential types, I had to look around for an alternative.

I decided to replace the existential types with enums, essentially turning the AST into an algebraic data type:

public enum Selection: EnumNode, Equatable {
    case field(Field)
    case fragmentSpread(FragmentSpread)
    case inlineFragment(InlineFragment)
}

Using enums to represent the sum types has sum extra benefits like not needing to downcast to extract the inner types and being able to exhaustively pattern match.

AnyKeyPath

Because we're using key paths to access the fields within the nodes, we could represent the path that gets passed to the Visitor by chaining them together with appending(path: KeyPath). This would have been a cool use of key paths, but unfortunately, Swift does not currently have support for key paths on enums.

So instead we have to use just an array of AnyKeyPath, and the descend method is implemented a bit differently on the enum nodes so that they don't actually leave a path.

Type safe enter and leave functions

The key paths passed to the Descender still need to be strongly typed i.e. WritableKeyPath with a known type, since Swift needs to be able to type check that we are assigning a value of the type to the path. But if we look at the type signature for the enter and leave functions, we find ourselves with two more existential types:

struct Visitor {
    typealias Visit = (Node, IndexPathElement?, NodeResult?, [IndexPathElement], [NodeResult]) -> VisitResult
}
enum VisitResult {
    case `continue`
    case skip
    case `break`
    case node(Node?)
}

The existential type passed as an argument, and the existential type returned from the function. As mentioned earlier, existential types come with a lot of restrictions, one of them being that we can't use a value of Node to write to a key path of type WritableKeyPath<Field, Name>: We simply need the guarantee that the Node value inside the VisitResult actually is of the correct type.

And with the current API, there's nothing stopping the user from returning a Field to replace a Name node in the tree.

Since we need to work with concrete types again, lets start by making VisitResult generic so we can make sure that users are replacing the node with the same type of node.

enum VisitResult<T: Node> {
    case `continue`, skip, `break`, node(T?)
}

Then, we just need to make the closure generic so it takes in a node of type T and returns a result of type VisitResult<T> – but as it turns out, closures cannot be generic in Swift.

Functions can be generic though, and you can define generic function requirements on protocols. So why not try adopting a more protocol-orientated design and make Visitor a protocol that users of the library can adopt?

protocol Visitor {
    func enter<T: Node>(node: T, key: AnyKeyPath?, parent: VisitorParent?, path: [AnyKeyPath], ancestors: [VisitorParent]) -> VisitResult<T>
    func leave<T: Node>(node: T, key: AnyKeyPath?, parent: VisitorParent?, path: [AnyKeyPath], ancestors: [VisitorParent]) -> VisitResult<T>
}

This looks like a nice design, now we can just do a quick cast at runtime to differentiate the concrete type:

struct MyVisitor: Visitor {
    func enter<T: Node>(node: T, key: AnyKeyPath?, parent: VisitorParent?, path: [AnyKeyPath], ancestors: [VisitorParent]) -> VisitResult<T> {
        switch node {
        case let name as Name:
            return .node(Name(loc: name.loc, value: "hello")) // Cannot convert value of type 'Name' to expected argument type 'T?'
        default:
            return .continue
        }
    }
}

But we run into a type error when trying to return a VisitResult of a concrete type. We know for a fact that Name is the same type asT because we matched on it, so unfortunately we have to force downcast it. But we can hide this away from users by doing the down casting within the library, and delegating to functions with concrete types for each node.

protocol Visitor {
    func enter(name: Name, key: AnyKeyPath?, parent: VisitorParent?, ancestors: [VisitorParent]) -> VisitResult<Name>
    func leave(name: Name, key: AnyKeyPath?, parent: VisitorParent?, ancestors: [VisitorParent]) -> VisitResult<Name>
    
    func enter(document: Document, key: AnyKeyPath?, parent: VisitorParent?, ancestors: [VisitorParent]) -> VisitResult<Document>
    func leave(document: Document, key: AnyKeyPath?, parent: VisitorParent?, ancestors: [VisitorParent]) -> VisitResult<Document>
    
    func enter(definition: Definition, key: AnyKeyPath?, parent: VisitorParent?, ancestors: [VisitorParent]) -> VisitResult<Definition>
    func leave(definition: Definition, key: AnyKeyPath?, parent: VisitorParent?, ancestors: [VisitorParent]) -> VisitResult<Definition>
}

extension Visitor {
    func enter(name: Name, key: AnyKeyPath?, parent: VisitorParent?, ancestors: [VisitorParent]) -> VisitResult<Name> { .continue }
    func leave(name: Name, key: AnyKeyPath?, parent: VisitorParent?, ancestors: [VisitorParent]) -> VisitResult<Name> { .continue }
    
    func enter(document: Document, key: AnyKeyPath?, parent: VisitorParent?, ancestors: [VisitorParent]) -> VisitResult<Document> { .continue }
    func leave(document: Document, key: AnyKeyPath?, parent: VisitorParent?, ancestors: [VisitorParent]) -> VisitResult<Document> { .continue }
    
    func enter(definition: Definition, key: AnyKeyPath?, parent: VisitorParent?, ancestors: [VisitorParent]) -> VisitResult<Definition> { .continue }
    func leave(definition: Definition, key: AnyKeyPath?, parent: VisitorParent?, ancestors: [VisitorParent]) -> VisitResult<Definition> { .continue }

    func enter<T: Node>(node: T, key: AnyKeyPath?, parent: VisitorParent?, path: [AnyKeyPath], ancestors: [VisitorParent]) -> VisitResult<T> {
        switch node {
        case let name as Name:
            return enter(name: name, key: key, parent: parent, ancestors: ancestors) as! VisitResult<T>
        case let document as Document:
            return enter(document: document, key: key, parent: parent, ancestors: ancestors) as! VisitResult<T>
        case let definition as Definition:
            return enter(definition: definition, key: key, parent: parent, ancestors: ancestors) as! VisitResult<T>
        ...
    }
}

This is the same approach that SwiftSyntax takes to their visitor:
https://github.com/apple/swift-syntax/blob/7117363d54ed5dfdbcb5eaa4c81697923c9768db/Sources/SwiftSyntax/SyntaxVisitor.swift.gyb#L40-L53

But they are using gyb, and it leaves us with a lot of boilerplate. If anyone has any ideas for a better design/way to improve this, I'd love to hear them.

In any case, the finished design gives us a nice API surface for editing the AST:

struct IntIncrementer: Visitor {
    func enter(intValue: IntValue, key: AnyKeyPath?, parent: VisitorParent?, ancestors: [VisitorParent]) -> VisitResult<IntValue> {
        let newVal = Int(intValue.value)! + 1
        return .node(IntValue(loc: nil, value: String(newVal)))
    }
}
let document = try! parse(source: """
    query foo {
        bar(a: 1, b: 2, c: 3)
    }
""")
let visitor = IntIncrementer()
let newDocument = visit(root: document, visitor: visitor)

In a way thats type-safe, and plays well with value semantics.

Also replace existential types with enums
They would be necessary in a duck-typed language such as TypeScript but
we can check at runtime with casts in Swift
@paulofaria
Copy link
Member

Hey, @bubba! Thanks for all this. The AST visitor was begging for a rework for a long time! At a glance, your rationale makes a lot of sense. I didn't have time to do a proper review, but I'm looking forward to taking some time to thoroughly digest all of the changes. You mentioned no KeyPaths for enums. Are you aware of pointfree.co's CasePath library?

@lukel97
Copy link
Author

lukel97 commented Dec 22, 2021

@paulofaria Thanks, apologies for this being such a large PR.

I've indeed heard of some of these enum case path libraries, and I'm not opposed to using them at all.
I just didn't want to pull in another dependency/resort to reflection just for the sake of the path field, which doesn't actually seem to be used anywhere in the validation rules, but happy to reconsider

@paulofaria
Copy link
Member

Don't worry. I'll do a proper review when I have a chance, then I'll get back to you about this. 🙂

@lukel97 lukel97 force-pushed the mutatable-visitor branch from 09d32c6 to edcfcba Compare June 9, 2022 16:15
@paulofaria
Copy link
Member

Hey, @bubba. Looks like this PR has conflicts with master. Can you please update from master? The PR is still in draft mode. Please, let us know if you think you're done or if you think there's still work pending.

@lukel97
Copy link
Author

lukel97 commented Jul 18, 2022

Hi @paulofaria , this branch probably needs readjusted as it has become a catch-all for any tweaks needed for use in https://github.com/minm-inc/swiftui-graphql

There’s a few extra validations copied over from the spec and some bug fixes.
But the resolvers have also been made lazy now to allow for schemas to be constructed without them, i.e. in the case of our client library swiftui-graphql, we just need to parse the schema so we can reason about types when doing code generation. I’m not sure if you want this upstreamed or not, or at the very least there’s probably a better design for it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants