Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't copy imported module to filter out its dependencies #776

Closed
wants to merge 5 commits into from

Conversation

vsbogd
Copy link
Collaborator

@vsbogd vsbogd commented Sep 24, 2024

At the moment contents of the imported module are copied to filter out its dependencies. It makes modules depend heavily on some way of iterating through the space. Bigger downside is that it duplicates the space on import and makes import slower as mentioned in Luke's comment in the code.

In this PR ModuleSpace is introduced which keeps main space and its dependencies separately. This allows querying them separately and when dependent space is queried it doesn't queried recursively. Thus all transitive dependencies are queried only once from the top level space.

For example if we have space A which imports space B and C and D is imported from B and C then A contains B, C and D as dependencies in a separate collection. Query to A goes through its atoms first then goes to B, C and D but dependencies of these spaces are not queried.

ModuleSpace::atom_iter() iterates through main atomspace only. This breaks tests in f1_imports.metta which are disabled in this change. There are two ways of solving this: (1) adding dependency spaces as atoms into ModuleSpace iterator which resembles previous behavior, (2) add separate function which gets dependencies from space. I would like to implement solution after discussing a way.

Second big change in this PR is implementation of the Space::atom_iter for DynSpace. This change was necessary because now ModuleSpace is a top space and it contains DynSpace as a main space. When client gets atom_iter from ModuleSpace the call is forwarded to DynSpace which returns None. Any non-standard combination of the DynSpace and other spaces can fall into this issue. Thus I decided to implement it despite it cannot be implemented without unsafe.

I am raising this as a draft PR to discuss two questions.

  1. @Necr0x0Der, @luketpeterson Do we really need to return dependent atomspaces as atoms when iterating through the space. Probably more general it should sound like: do we need to represent imported spaces as atoms? On the one hand it is implemented like this now but trying to add dependency manually putting it into space leads to the duplication of the results.
  2. @luketpeterson Could you please share your opinion on Space::atom_iter implementation for DynSpace I implemented. Is it correct? Do you think it is needed?
  3. @luketpeterson What do you think about ModuleSpace as a way of solving dependency duplication issue?

ModuleSpace doesn't keep imported spaces as atoms. It keeps them in a
separate collection instead. This allows constructing new instances of
the ModuleSpace without filtering original imported space.
@vsbogd
Copy link
Collaborator Author

vsbogd commented Sep 24, 2024

old_interpreter CI fails because Space::common for DynSpace has issue similar to the Space:atom_iter. DynSpace cannot return FlexRef if underlying space returns FlexRef::RefCell.

@luketpeterson
Copy link
Contributor

Should also update the docs/modules_internal_discussion.md file to indicate the discussed solution is implemented.

@luketpeterson
Copy link
Contributor

luketpeterson commented Sep 25, 2024

  1. @luketpeterson What do you think about ModuleSpace as a way of solving dependency duplication issue?

I think, at the conceptual level, we both hit upon the same solution; so our thinking is very well aligned. My thoughts were written up in the docs/modules_internal_discussion.md file. I'll do a thorough line-by-line review soon.

@luketpeterson
Copy link
Contributor

luketpeterson commented Sep 25, 2024

  1. @luketpeterson Could you please share your opinion on Space::atom_iter implementation for DynSpace I implemented. Is it correct? Do you think it is needed?

I have conflicting opinions on the question of "is it correct"? I believe your implementation will be safe in practice. (See comment below) In effect you take two parallel references to the same data. One Ref<T> and also the iterator. The Ref<T> is along for the ride and just guarantees the RefCell's rules are followed so the iterator's accesses will be ok. So, on the upside, this will guarantee no RefMut can be created while the iterator still exists.

On the downside, this is relying on implementation details of RefCell, so I think it would not technically be considered sound. For example, if RefCell implemented a layer of internal indirection this could break. Also, this approach would have a race condition if we changed the RefCell to a Mutex or RwLock in the future.

I am sad because, in a perfect world, this should not require unsafe. I wish there were a Ref::map_into_box API that allowed the creation of a boxed object that had the lifetime of the borrowed ref.

I did not find a crate that already does this, but I have wished for one myself in the past. (in fact, this is why I left the method unimplemented for DynSpace earlier in the year.)

If I had a few spare days I would try and generalize this pattern to a stand-alone crate and get it code-reviewed by the wider community, but I don't have the spare time to do that right now.

Bottom line: I don't love it, but I don't see a better short-term solution so I am ok with it as a short-term fix and we can revisit it in the future

@luketpeterson
Copy link
Contributor

luketpeterson commented Sep 25, 2024

UPDATE on the DynSpace implementation of Space::atom_iter
I also found this discussion on the topic here - rust-lang/rust#54776 But it looks like it's not a very active topic.

Sadly, it also looks like owning_ref (another possible solution) still has one remaining soundness hole as well. https://users.rust-lang.org/t/whats-the-current-state-of-the-owning-ref-crate/102098

@luketpeterson
Copy link
Contributor

luketpeterson commented Sep 25, 2024

Ugh. I realized that there is an unfixable* safety hole in the implementation of DynSpace impl of Space::atom_iter. (*unfixable without changing the Space trait's method signature) It comes from the lifetimes of the Iterator::next function. fn next(&mut self) -> Option<Self::Item>;

Say the user borrows an iterator from the space behind a RefCell, then uses the iterator to get references to some atoms, saves those references, and then drops the iterator. The user could then use borrow_mut to mutably access the space and delete those atoms. This would leave the original atom references pointing to garbage.

To make this safe, the lifetime of the returned references can't outlive the iterator itself. So you'd need a method like fn next(&'i mut self) -> Option<&'i Self::Item>; We have been calling this the "cursor" pattern. Alternatively the iterator could clone the atoms as it goes. Either way, the Space::atom_iter method will need to change or an alternative way to access the atoms is needed.

@luketpeterson
Copy link
Contributor

old_interpreter CI fails because Space::common for DynSpace has issue similar to the Space:atom_iter. DynSpace cannot return FlexRef if underlying space returns FlexRef::RefCell.

For FlexRef the issue is less fundamental. It could be reworked using a pattern similar to https://crates.io/crates/nested-ref (or you could use that crate)

@vsbogd
Copy link
Collaborator Author

vsbogd commented Oct 3, 2024

Ugh. I realized that there is an unfixable* safety hole in the implementation of DynSpace impl of Space::atom_iter.

Thanks Luke, it took a while to realize it but after reading RefVal related links I realized that implementing iterator on collection internal mutability will always has such flaw. Thus I agree the only way is to change atom_iter API and I would replace it by adding fold of the atomspace.

@vsbogd
Copy link
Collaborator Author

vsbogd commented Oct 3, 2024

For FlexRef the issue is less fundamental. It could be reworked using a pattern similar to https://crates.io/crates/nested-ref (or you could use that crate)

What bother me here is that using DynSpace along with space implementations which can include other spaces can lead to the uncontrolled increasing level of the indirection. To simplify it a little I would pass specific space instance into MettaMod constructor instead of passing DynSpace. It doesn't fully remove indirection because ModuleSpace::deps anyway should keep shared pointers to the dependency modules, but should solve specific issue with old_interpreter behaviour.

@luketpeterson
Copy link
Contributor

I realized that implementing iterator on collection internal mutability will always has such flaw. ...

In practice, I think this concern is slightly tempered by the realization that iterating every atom in a space is only a viable solution for relatively small spaces. So copying the atoms might not be so bad. For large spaces, iterating the whole space is a non-starter, with or without a copy.

So you could return RefMut<dyn Iterator> or some equivalent type, and then it would be caller's responsibility to copy the atoms if they need to outlive the borrow of the space.

Thus I agree the only way is to change atom_iter API and I would replace it by adding fold of the atomspace.

In Mork, the approach we are taking is that the outcome of a space-wide operation is another space. And it's rarely necessary to iterate a space. Although it is possible. Essentially it's the fold idea taken all the way. So this is very compatible with our thinking. Hopefully we can start sketching out how MORK fits in, this December.

What bother me here is that using DynSpace along with space implementations which can include other spaces can lead to the uncontrolled increasing level of the indirection

This actually bothered me too. This is what I meant by matryoshka nesting spaces, when we were talking about the general DynSpace approach a year ago.

I think your solution is good. But to be honest I might go even further. (if not immediately, then eventually, as time permits) I'm personally not a fan of the pattern of a smart pointer (DynSpace in this case) implementing the same interface as the thing it points to. I think it's useful for a client to know if they have a space itself or a shared smart pointer to a space. For example, the abstract iter method interface would be different for DynSpace than for &dyn Space

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants