Skip to content
This repository has been archived by the owner on Oct 17, 2021. It is now read-only.

Crash when referencing children of deallocated element #28

Open
Lukas-Stuehrk opened this issue Feb 23, 2021 · 4 comments
Open

Crash when referencing children of deallocated element #28

Lukas-Stuehrk opened this issue Feb 23, 2021 · 4 comments

Comments

@Lukas-Stuehrk
Copy link
Member

Lukas-Stuehrk commented Feb 23, 2021

We fixed a memory leak that existed for a long time, when we merged #22. Unfortunately, this introduced a new issue: When you create a document, we now mark it as managed. When you then reference a child node of the document and no longer keep a reference to the document itself, we will free the node for the document: https://github.com/SwiftDocOrg/CommonMark/blob/main/Sources/CommonMark/Nodes/Node.swift#L35 But this also frees all the child nodes and we suddenly have a use after free when we still have a reference to any child.

Minimal test case to reproduce:

    func testCrashingChild() throws {
        var heading: Heading?
        try autoreleasepool {
            var document: Document? = try Document(Fixtures.udhr)
            heading = document!.children.first! as? Heading
            document = nil
        }

        // Will crash because of use after free.
        XCTAssertEqual("# [Universal Declaration of Human Rights][udhr]", heading?.render(format: .commonmark))
    }

Before, this was not happening, because our document node was always leaking, therefore "preventing" this crash 😄.

I'd say that this is a very common use case of this library. E.g. when you try to upgrade SwiftMarkup to use CommonMark version 0.5.0, then you will run into this crash.

I'd happy to provide a fix for this, but I think we should discuss the approach first. If I'm not mistaken, this is somehow a design flaw in the parent to child relation and a simple managed flag is not enough to represent the relationship.

@regexident
Copy link
Member

regexident commented Feb 23, 2021

In its current state you need to call document.remove(child: heading) in order to detach heading from the document, turning it into an independent managed node. Or let children = node.removeChildren() to detach all children of a node at once, effectively unwrapping it.

@mattt
Copy link
Contributor

mattt commented Feb 24, 2021

I'd say that this is a very common use case of this library. E.g. when you try to upgrade SwiftMarkup to use CommonMark version 0.5.0, then you will run into this crash.

Thanks for pointing this out, @Lukas-Stuehrk. @regexident is correct about the correct resolution, which I followed for this PR updating SwiftMarkup to the latest release of CommonMark.

I'd happy to provide a fix for this, but I think we should discuss the approach first. If I'm not mistaken, this is somehow a design flaw in the parent to child relation and a simple managed flag is not enough to represent the relationship.

The managed flag is an admittedly primitive tool for reference management. Did you have another approach in mind?

@Lukas-Stuehrk
Copy link
Member Author

I'm sorry, I got confused by the changes between 0.4.0 and 0.5.0 and wrongfully assumed it's a bug. I can see that the current behavior is well documented. My bad.

Nevertheless, I think the current behavior can be rather surprising. And it does not feel very swifty. As a user of the library, you always need to keep a mental model of the lifetime of the nodes, otherwise you get a dangling pointer. Basically you can get a dangling pointer by accessing a property. This does not feel right.

In my point of view, a library should make it very easy to use it correctly and very hard to use incorrectly. I think the current state of things was a good step to solve the memory leak problems, but it makes it also very easy to mistakenly use the API.

I can see a couple of different approaches:

  1. We could go to a more DOM-like approach. Currently, the Node instances of children are always created ad-hoc when accessing children. We could create all instances of child nodes upfront and then keep them on the parent. Then, when accessing properties, you will always get the same node instance. Then we would move the lifetime problem to the lifetime of the Swift objects and always free the cmark_node when the Swift Node is deinited.
    Then you could work with it like you would with DOM. When you keep a reference to a node, this also keeps the rest of the hierarchy in memory and traversing the tree works like you expect. If you don't want to keep a node in it's hierarchy, you would need to unlink it. But you could never create a dangling pointer and you don't have leaks. The memory model is easy to understand, because it's Swift's memory model.
    Depending on the use case, this could even be a performance improvement because the nodes are not constantly recreated when walking the node tree. But if you walk the tree only once, then the performance could be worse.
  2. We could go for a primitive reference counting model. Basically we could use the OpaquePointer as identifier (it's Hashable) and increase a counter when a Node for it is created and decrease when a Node for it is deallocated. But it would require some global state. And a lot more thought, because we would also need to consider the hierarchy. I don't know how feasible this is. And it needs some thoughts when nodes get detached.
  3. We could rename the APIs. Don't have a simple .children, but make it more clear that you should not access and store the children without detaching. Or even make it more complicated by design to work with them, e.g. you need to pass-in a closure that then gets the children and we document it clearly that the children should not escape the closure.

However, all of this also sounds like a lot of work. So maybe we should simply add better user guides and keep the current state 😄.

@regexident
Copy link
Member

regexident commented Feb 27, 2021

@Lukas-Stuehrk I 100% agree with you. The fix we went for (for now) was something we could do "today". It's far from good, but it at least allows you to use the API safely. Before you couldn't really.

I'm not entirely sure if it's feasible to safely tack new improved semantics onto the underlying memory-model of cmark's heap-allocated C pointers. I might be mistaken though and there is a clean and efficient way. In fact I very much hope so.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants