Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rooting objects #60

Open
CeleritasCelery opened this issue Jan 23, 2024 · 7 comments
Open

Rooting objects #60

CeleritasCelery opened this issue Jan 23, 2024 · 7 comments

Comments

@CeleritasCelery
Copy link

I was reading over the section about dereferencing safely and noticed that there was not any way to root objects. I don't know if this already planned to be covered in the section on garbage collection, but I think it is a really interesting problem that doesn't have a easy solution. It seems like the two approaches used for Rust thus far are either mutation sessions where you can only access heap objects in a closure or having some macro with drop guards. Each have their own trade-offs and I am curious what rust-hosted-langs recommends.

@pliniker
Copy link
Member

In this project we're using mutation sessions. See https://github.com/rust-hosted-langs/book/blob/master/interpreter/src/memory.rs#L142 for the interface and usage. This is essentially just the interior mutability pattern, and thus as a pattern that Rust widely supports, has seemed to me the more likely to be sustainable in a larger codebase.

As for rooting, you're right, I haven't explicitly addressed this yet. I intend to in the WIP section on GC. In short, though, everything referenced by a Thread object should be considered as roots: https://github.com/rust-hosted-langs/book/blob/master/interpreter/src/vm.rs#L178

There are some holes in the safety of the mutation session implementation currently: I think at present you could instantiate two separate VMs each with their own heap, but they do not validate that a Thread and its roots are associated with a particular instance. That's a TODO.

If you see any other gaps/holes I would love to know! Thank you for your interest :)

@CeleritasCelery
Copy link
Author

CeleritasCelery commented Jan 29, 2024

Thank you. Just looking over the code, and had a few questions:

Can roots be stored outside of Memory, or each time you open a mutating session do you need to reaccess your roots? If they are stored outside, how to find them during garbage collection?

With the Mutator trait you can only have a single run method with static input and output types. How would handle a struct that has many methods, like vec or hashmap? Would you have to define you input type as some sort of enum that tells the run method which operation to perform?

Looking at the repl it seems like you would be unable to call GC at all during the execution of a function (which could be arbitrarily long) because the whole thing is evaluated inside a mutation session. Is that the case?

What would prevent me from doing the following?

  1. inside a mutation session clone a CellPtr<T>
  2. change something so that the original T is not accessible (this is not a problem because GC can't run, so it is still safe to access)
  3. pass the cloned CellPtr<T> out of the mutation session via the Output type or Thread-local storage.
  4. run garbage collection (the original T is now collected and freed).
  5. pass the CellPtr<T> back into a mutation session and then convert it to a ScopedPtr via get.
  6. try to access T (triggering UB via UAF)

@pliniker
Copy link
Member

Thank you for provoking me to write some things down that have only sat in my own head so far! 😁 Sorry for the months gone by, hopefully this is still interesting!

Can roots be stored outside of Memory, or each time you open a mutating session do you need to reaccess your roots? If they are stored outside, how to find them during garbage collection?

My goal, not yet fully thought through into implementation, is that roots would strictly reside only inside Memory. That said, I can foresee some kind of pinning mechanism to allow references to roots from outside of Memory. This mechanism would probably use reference counting and also prevent the GC from moving objects.

With the Mutator trait you can only have a single run method with static input and output types. How would handle a struct that has many methods, like vec or hashmap? Would you have to define you input type as some sort of enum that tells the run method which operation to perform?

I think you're asking how data structures would be shared between the language runtime and external Rust code? Which in some way is also asking what is the input/output interface between the two domains.

Since native Rust and the hosted language are, in this project, a lot more independent than I think most Rust language implementations, it would be up to the user to implement aa translation layer. The language Memory is not intended to be shared between the hosted language and external Rust data structures (Vec, HashMap etc), so some kind of serialization/deserialization protocol would be needed to make an interface for more complex data structures. The only thing implemented at this time is a basic repl, which provides simply for string input and printing to screen!

Looking at the repl it seems like you would be unable to call GC at all during the execution of a function (which could be arbitrarily long) because the whole thing is evaluated inside a mutation session. Is that the case?

This is a work in progress on my own code branch. The code as-is hasn't been refactored towards answering this situation.

My thought thus far is that if GC is needed, the allocator would signal as such by returning a AllocPressure enum variant. This would then pause bytecode evaluation and tell the Mutator that a GC iteration is needed.

What would prevent me from doing the following?

  1. inside a mutation session clone a CellPtr
  2. change something so that the original T is not accessible (this is not a problem because GC can't run, so it is still safe to access)
  3. pass the cloned CellPtr out of the mutation session via the Output type or Thread-local storage.
  4. run garbage collection (the original T is now collected and freed).
  5. pass the CellPtr back into a mutation session and then convert it to a ScopedPtr via get.
  6. try to access T (triggering UB via UAF)

Right, another form of causing a root to escape.

I am hoping that the answer to be that generalized negative impls will solve this in a future stable Rust release.

If Mutator::Input and Mutator::Output cannot implement or contain anything that implements, a trait, say, Pointer, and if CellPtr implements Pointer, then perhaps the hole can be closed?

I am not sure if this would work or if I have understood all the implications of generalized negative impls.

What do you think?

@CeleritasCelery
Copy link
Author

I think you're asking how data structures would be shared between the language runtime and external Rust code? Which in some way is also asking what is the input/output interface between the two domains.

I was more asking about how you would handle the case where you wanted your entry point to have more then 1 method. For example ReadEvalPrint has only a run method. What if you wanted it to have other functionality (like parsing, batch-mode, or linting mode)?

My thought thus far is that if GC is needed, the allocator would signal as such by returning a AllocPressure enum variant. This would then pause bytecode evaluation and tell the Mutator that a GC iteration is needed.

But you would have to unwind all the way back to the mutator scope to call GC correct? How would you resume what you were executing before you needed to call GC?

Right, another form of causing a root to escape.
I am hoping that the answer to be that generalized negative impls will solve this in a future stable Rust release.
If Mutator::Input and Mutator::Output cannot implement or contain anything that implements, a trait, say, Pointer, and if CellPtr implements Pointer, then perhaps the hole can be closed?

Is there not a solution that people can use now to write a sound interpreter?

@pliniker
Copy link
Member

pliniker commented Dec 4, 2024

What would prevent me from doing the following?

1. inside a mutation session clone a `CellPtr<T>`
2. change something so that the original T is not accessible (this is not a problem because GC can't run, so it is still safe to access)
3. pass the cloned `CellPtr<T>` out of the mutation session via the `Output` type or Thread-local storage.
4. run garbage collection (the original T is now collected and freed).
5. pass the `CellPtr<T>` back into a mutation session and then convert it to a `ScopedPtr` via `get`.
6. try to access `T` (triggering UB via UAF)

I've been thinking more on this but ultimately, I cannot see a way to absolutely prevent it. If somebody wants to, they will find a way to circumvent type based roadblocks. However, I wouldn't consider myself a Rust or type logic expert by any means and I've tried to avoid deep/clever type logic in this project to keep it more accessible. Do you have any insight into a solution?

Looking at the repl it seems like you would be unable to call GC at all during the execution of a function (which could be arbitrarily long) because the whole thing is evaluated inside a mutation session. Is that the case?

I've thought more about this, too. In this particular project implementation, GC will never move objects. I'm certain I can implement stop-the-world transparently to the Mutator instance inside an alloc() call. Obviously I need to add the relevant code to identify roots.

@CeleritasCelery
Copy link
Author

if somebody wants to, they will find a way to circumvent type based roadblocks

I would be a lot more worried about people accidentally passing an object out and hitting UB without realizing it. I think the only way to really prevent this would be to disallow passing CellPtr out of the mutation session. Or when it does get handed out, wrap it in a Rc or Arc.

Obviously I need to add the relevant code to identify roots.

I think that is 90% of the problem. freeing unused memory is easy if you know the roots. But getting the roots is hard. How would you identify them?

@pliniker
Copy link
Member

It's been a while since reading the available articles out there on rooting, so honestly, thanks for prompting me to revisit.

I think I agree that a Pin based mechanism as originally prototyped by withoutboats is the right approach. I think, early on, understanding shifgrethor did not come easy and so I deferred a deeper dive but it's clearer what's going on, now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants