Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds an array reference type #1440

Open
wants to merge 35 commits into
base: master
Choose a base branch
from
Open

Conversation

akern40
Copy link
Collaborator

@akern40 akern40 commented Oct 4, 2024

This PR would add an array reference type into ndarray. This type's relationship to ArrayBase is analogous to the relationship between [T] and Vec<T>. Closes #879; look there for more information on the motivation of this type.

EDITED: See below

This implementation is chosen to make as few breaking changes as possible, introducing only one: array views are no longer Copy. This is a break that the compiler will immediately error on, and can be fixed relatively easily with calls to .clone(). As a result, the behavior is functionally equivalent, but the calling convention has to change. This change must happen because the concept of a reference type cannot implement Clone (since reference types must only be available via the Deref traits as &RawRef or &ArrRef); as a result, it cannot be Copy, and neither can any variant of ArrayBase.

I had originally tried to implement RawRef and ArrRef as two different types, and it's worth discussing why I didn't. The goal there was to further reduce monomorphization bloat (the original motivation for the RFC mentioned above, although I don't think it's the most important reason) and to improve ergonomics of the reference type. However, this would fundamentally require breaking ArrayBase into two variants: raw views and everything else, since a single type cannot implement Deref twice (trust me, I tried). If generic_const_exprs or implement_specialization land, then this approach is possible and perhaps preferred.

Right now, this PR just introduces the type and some documentation. It does not do the logical next step, which is moving most of ndarray's functionality into the RefBase type. This would accomplish the goal of creating a single way for users to write functions that accept arrays of all kinds. Neither does it add the functionality necessary for AddAssign and its variants, which is also an important motivator for this change.

@akern40 akern40 changed the title Adds and array reference type Adds an array reference type Oct 5, 2024
@bluss
Copy link
Member

bluss commented Oct 6, 2024

Copyable-ArrayViews 😮 nice!

@akern40
Copy link
Collaborator Author

akern40 commented Oct 6, 2024

Early on I'd been trying to avoid the "layout-compatible structs" approach because I was worried about maintainability. But once I realized we could have 0 breaking changes at the cost of a maintenance concern, I went back and changed it. It helps that as long as nobody touches the internals of ArrayBase or RefBase, we should be good.

@akern40 akern40 marked this pull request as draft October 7, 2024 07:07
@akern40
Copy link
Collaborator Author

akern40 commented Oct 14, 2024

Ok, for real this time: add array types (plural!) and change the entire library to use them.

There is a lot to write about this PR; frankly, I could make a blog post about it, and maybe we should. Certainly, the changelog is going to need to be extensive. I guess this is the first cut at that.

This PR still needs significant documentation work before it is ready to merge; however, the vast majority of the implementation is now complete.

What's Changed?

This PR introduces three new types: ArrayRef, RawRef, and LayoutRef. Most functionality has been moved to those types, and users are encouraged to switch to using them as soon as possible.

ArrayRef

As the original RFC suggested, this type is the Deref target of ArrayBase when S: Data; in other words, it represents a look into an array whose data is safe to read. In addition, the DerefMut implementation guarantees that any user with &mut ArrayRef<...> holds unique, mutable access to the underlying data.

Unlike the original RFC, this implementation allows users to modify the shape and strides (although not dimensionality) of an array by using ArrayRef; this capability actually extends all the way down to LayoutRef. Practically, separating out data mutation from shape/stride mutation leads to a tricky problem for users: they'd have to juggle between functions that take ArrayRef and functions that take ArrayBase. Because you can't call ArrayBase functions from ArrayRef functions, users will be forced to frequently use views and write functions with two different paradigms.

RawRef

This type represents a look into an array whose data is not safe to read. It is the Deref(Mut) target of ArrayRef.

Due to limitations of Rust's compiler, a RawRef is not the Deref(Mut) target of RawArrayView(Mut). Those types have no Deref implementation. As a result, getting a RawRef from a RawArrayView(Mut) must be done using view.as_ref().

LayoutRef

This type represents read and write access to an array's layout, but never its data. It is the Deref(Mut) target of RawRef. Due to the aforementioned compiler limitations, LayoutRefs can only be obtained from RawArrayView(Mut)s via view.as_ref().

This type is the home to very basic methods such as is_empty() as well as in-place slicing methods such as slice_axis_inplace. Maintainers of ndarray must guarantee going forward that LayoutRef never leaks its underlying data. Although RawRef and ArrayRef are neither Clone nor Copy, LayoutRef implements both; as long as data isn't leaked, this is safe.

When to Use Each Type

When writing a function or trait implementation, users can use the following table to decide what type they want to use:

Element Deref Safety Read elements Mutate elements Read shape / strides Mutate shape / strides / pointer, but not data
Safe &ArrRef &mut ArrRef &LayoutRef &mut LayoutRef
Unsafe &RawRef &mut RawRef &LayoutRef &mut LayoutRef

If users need to a) alter the underlying buffer size of an array (e.g., shrink, reserve, append, etc), b) split an array into multiple pieces, or c) move data, they will need to take ArrayBase or ArrayView.

Deref and AsRef Implementations

To connect these types, we use a series of Deref and AsRef implementations, captured as follows:

                 ┌─────────┐                   
              ┌──┼ArrayBase│                   
              │  └────┬────┘                   
              │       │Deref when S: Data      
AsRef         │       │DerefMut when S: DataMut
when          │       │                        
S: RawData    │  ┌────▼────┐                   
              │  │ArrayRef ┼───────┐           
AsMut         │  └────┬────┘       │           
when          │       │Deref       │AsRef      
S: RawDataMut │       │DerefMut    │AsMut      
              │       │            │           
              │  ┌────▼────┐       │           
              ├──►RawRef   ◄───────┤           
              │  └────┬────┘       │           
        AsRef │       │ Deref      │AsRef      
        AsMut │       │ DerefMut   │AsMut      
              │       │            │           
              │  ┌────▼────┐       │           
              └──►LayoutRef◄───────┘           
                 └─────────┘                   

The result of this is that ArrayBase<S: RawData, D>, RawArrayView, or RawArrayViewMut can only access methods on RawRef and LayoutRef via the AsRef trait.

Writing Functions on RawRef and LayoutRef

Users who want to write functions on RawRef and LayoutRef should write their functions or traits on a generic T where T: AsRef<RawRef<A, D>> (and analogously for LayoutRef). It's a slightly more cumbersome way to write functions, but provides two major benefits:

  1. Users can call the function directly, instead of having to call .as_ref() at every call site
  2. If a function mutably take these types directly and is called on a shared array (without the callee using .as_ref()), the shared array will go through the dereferencing chain down to these types. As it passes through ArrayRef it will guarantee unique access to the data, a potentially costly operation to ensure an invariant that is not guaranteed on RawRef or LayoutRef. By writing the generic : AsRef implementation as described, codebases mitigate this computational footgun.

Future Designs

This limitations imposed on interactions with raw views, etc, can be fixed if the Trait specialization RFC lands.

The reference type, `RefBase<A, D, R>`, is parameterized by its element
type, a dimensionality type, and its "referent" type. This last type
is only a `PhantomData` marker, and acts to differentiate references
that are "safe" from those that are "raw", i.e., references that come
from an array whose storage is `Data` vs an array whose stoarge is
`RawData`. The `DerefMut` implementation contains a check of uniqueness,
guaranteeing that an `&mut RefBase` is safe to mutate.

This comes with one breaking change and one important internal library
note. The breaking change is that `ArrayBase` no longer implements `Copy`
for views. This is because a `RefBase` does not know where it is pointing,
and therefore cannot implement `Copy`; this, in turn, prohibits
`ArrayBase` from implementing `Copy`. Users will now have to explicitly
invoke `Clone` in the same way that they do for non-view arrays.

The important library note has to do with the `.try_ensure_unique` check
that occurs in the `DerefMut` implementation. The library's methods are
allowed to *temporarily* break the invariant that an array's `ptr`
is contained within `data`. The issue comes when trying to then alter
`ptr`, `shape`, or `dim` while that invariant is broken. Code was
using `self.ptr = ...` (where `self` is `ArrayBase`), which triggers
`DerefMut`, which calls `try_ensure_unique`, which panics on the
pointer inbounds check. As a result, code that is altering `RefBase`
fields while the array's invariants are broken must be sure to write
`self.aref.ptr = ...` instead.
…ow to write functionality with the new types
@akern40 akern40 linked an issue Oct 20, 2024 that may be closed by this pull request
@bluss
Copy link
Member

bluss commented Oct 20, 2024

Hm, what does this mean "Maintainers of ndarray must guarantee going forward that LayoutRef never leaks its underlying data."

Avoiding memory leaks in general is hard and not part of memory safety, but I'm not sure what this means, looking around in the PR to try to find out.

You're getting very far ahead - that's exciting, but this is a lot to take in one bite or one PR! Plan for new types - AsRef vs Deref - backwards compat situation - should this be discussed?

Why is AsRef needed? From the diagram it looks like deref coercion should work. I.e &ArrayRef will coerce to &LayoutRef, I thought.

@akern40
Copy link
Collaborator Author

akern40 commented Oct 20, 2024

I know this is a big PR, and I'd be happy to break it up into smaller chunks that can be more easily reviewed and discussed! I did all of it because I kept running into design problems when I went to actually implement functionality. I think I've settled onto a very defensible design, but we should discuss and potentially break up the PR now that I feel more confident about the choices.

Leak is the wrong word here, that's my bad. I meant "leak" in the sense of an API boundary: a LayoutRef contains a pointer to data, which it must have to support slicing (the pointer may move into the interior of the array or view). But that pointer should never be exposed to the user.

On the note of backwards compatibility: I've tried very hard to design this such that it is fully backwards compatible. I think I achieved about 99% of that. Breaking this up into smaller PRs would let us identify where any incompatibilities may have snuck in.

On the note of new types, while I'm fairly confident about the need for LayoutRef, I'm actually on the fence about keeping RawRef (another reason I implemented everything: I wanted to figure out what use RawRef really had). Turns out, not a lot! These are the methods that ndarray provides that previously operated on ArrayBase<S: RawData(Mut), D> (i.e., not including those operations explicitly on raw views):

  • as_mut_ptr
  • as_ptr
  • get_mut_ptr
  • get_ptr
  • raw_view
  • raw_view_mut

Removing RawRef would simplify the types, and we could simply implement the above functionality directly on ArrayRef and ArrayBase (you'd have to do it both places for backwards compatibility because these functions don't guarantee the unique hold of their underlying data, which always happens when you have ArrayRef). The question is whether we want a type that can unsafely access data without guaranteeing that data is a) safe to read or b) uniquely held.

The AsRef is there for two reasons. The first is performance on shared arrays. Functionality implemented on LayoutRef does not need to guarantee the uniqueness of the underlying data, a potentially expensive operation. Relying on the deref chain from shared arrays down to LayoutRef will go through ArrayRef, thereby triggering un-sharing. In contrast, the AsRef implementation just exposes the underlying layout without guaranteeing data uniqueness. When users write functions with T: AsRef<LayoutRef>>, they allow callers to pass in shared arrays without triggering that deref path.
Unfortunately, this one performance necessity triggers a cascade of AsRef: once we ask users to write LayoutRef functionality using AsRef, you have to implement AsRef everywhere. This playground illustrates why: Rust won't auto-deref to satisfy a generic trait bound.

The second reason is to imbue raw views with the functionality implemented on LayoutRef. They don't get to be part of the Deref chain (I tried so many ways, I can't figure out how), so they're essentially "cut off" from the main stream of functionality. By using AsRef, you also imbue raw arrays with the capability that the rest of the arrays get through the Deref chain.

@akern40 akern40 marked this pull request as ready for review October 20, 2024 17:48
@akern40
Copy link
Collaborator Author

akern40 commented Oct 20, 2024

I believe this PR is now fully ready for review, with the exception of documenting RawRef (since we should decide whether to keep it or get rid of it). @bluss do you want me to break it up into smaller PRs? I think I could do the following:

  1. PR that provides the types, the documentation, and their various Deref and AsRef ties
  2. PR that moves relevant methods to LayoutRef
  3. PR that moves relevant methods to RawRef (if we decide to keep it)
  4. Various PRs that move the remaining methods to ArrayRef

Or we can just leave it all as one. Let me know.

Since this is such a big change, I'll also ping @nilgoyette, @adamreichold, @jturner314, and @termoshtt; I know everyone is busy, but I really want to avoid (even the perception of) unilaterally changing the library's internals without a larger maintainer consensus. Feel free to look, skim, criticize, question, etc.

@bluss
Copy link
Member

bluss commented Oct 20, 2024

I'm impressed by this work.

No tests even have to change, best possible backwards compatibility. Surely something must be a breaking change, some methods have been given new homes under different types (?)

The diff is big but most of the changes are repetitive and can be skipped over, this is probably ok to review as it is. That's my initial feeling

@akern40
Copy link
Collaborator Author

akern40 commented Oct 20, 2024

Thanks! While it's true that methods have new homes, the deref structure means that most things are callable from ArrayBase. The exception there is methods that have been re-homed to RawRef or LayoutRef. For those, the file alias_asref.rs provides shadowing functions on ArrayBase that use the AsRef / AsMut to skip the uniqueness guarantee, thereby maintaining the functional backwards-compatibility. If there's a breaking change, it's probably there: I should double-check that I've added all of those aliases.

@akern40
Copy link
Collaborator Author

akern40 commented Oct 20, 2024

I should update the serde serialization. We can fold it in here or just do a follow-up PR, but either way we should make sure to include it before the next release.

Let's push it to another PR. The serialize/deserialize code still works, since it uses the types' APIs instead of their underlying representations. Great coding by @dtolnay on that front.

@akern40
Copy link
Collaborator Author

akern40 commented Oct 21, 2024

I should double-check that I've added all of those aliases

Done! I was missing a few.

Also added documentation to RawRef, deciding to keep it. By keeping it, we duplicate the method signatures and docs twice (in RawRef and in ArrayBase) and the body once (in RawRef). If we got rid of it, we'd have to duplicate the body as well.

Copy link
Collaborator

@nilgoyette nilgoyette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A big "Thank you" for taking the time and effort to do this! This is kinda crazy that it will be invisible for the users.

As I wrote earlier, I don't know much about the internals, so my review is mainly questions, not criticism.

tests/array.rs Show resolved Hide resolved
examples/functions_and_traits.rs Show resolved Hide resolved
crates/blas-tests/tests/oper.rs Show resolved Hide resolved
tests/array.rs Outdated Show resolved Hide resolved
/// Return an iterator over the length and stride of each axis.
pub fn axes(&self) -> Axes<'_, D>
{
<Self as AsRef<LayoutRef<_, _>>>::as_ref(self).axes()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the logic, this saves work by using as ref?

Could we add an (initially private) helper method here - something that's the same as as_ref but without the types specified, so it would look like self.as_layout_ref().axes()?

Copy link
Collaborator Author

@akern40 akern40 Oct 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the logic, this saves work by using as ref?

The AsRef / AsMut aliasing functions serve two purposes. First off, a function like axes that got moved to LayoutRef would not longer be directly available on a raw view - i.e., raw_view.axes() would be an error - because raw views do not dereference. (They would still exist on any array with S: Data, though). So these aliases restore that capability to raw views.

The second is a performance consideration, only relevant for mutating functions on shared arrays. Functions defined on LayoutRef and RawRef are directly available to shared arrays, even without these aliases. However, they do that by going through &mut ArrayRef, which un-shares the data. But we don't want that in these cases; LayoutRef and RawRef don't require unique data. So these aliases shadow these mutating functions and create a method that doesn't un-share data.

Could we add an (initially private) helper method here - something that's the same as as_ref but without the types specified, so it would look like self.as_layout_ref().axes()?

I've actually gone ahead and added those methods publicly - as you've pointed out, .as_ref and .as_mut, when used without type hints, are totally ambiguous. I think having explicit methods is helpful not just for us, but for our end users as well.

pub fn slice_collapse<I>(&mut self, info: I)
where I: SliceArg<D>
{
self.as_mut().slice_collapse(info);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal philosophy is that AsMut/AsRef are conversion traits and should be used for argument conversion.

That philosophy would lead us to having more specific methods here - .as_layout_mut() for example, and so on. The reason is among others that there are many AsMut implementations in the wild, and the number of them sometimes even grows. (Maybe that doesn't apply to our types, so maybe disregard this comment!) Any new AsRef/AsMut implementation - in a different crate - can lead to type inference failure in a new version.

I am mostly worried about that we would lead our users to use AsRef and AsMut in ways that are ultimately fragile for their code and for our API evolution.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great call. I've added the functions publicly (I think our users will want that stability and unambiguity as well) and replaced our usage in the crate.

tests/array.rs Show resolved Hide resolved
@bluss
Copy link
Member

bluss commented Oct 24, 2024

I think that we should treat this change as worthy of the 'breaking-changes' label, unless we have very good evidence it's almost entirely free of existing code breakage.

  • I know that bumping to ndarray 0.17 is not so fun for the ecosystem and users
  • But this is also big change and it might have consequences for people out there writing generic functions with ArrayBase<S,D> where S: Data... and so on.
    • If the version bump had no adverse costs, increasing to ndarray 0.17 would be a no-brainer.

I think I accidentally moved this over to ArrayRef, but we're not sure it should still require DataOwned
@akern40
Copy link
Collaborator Author

akern40 commented Oct 26, 2024

I think that we should treat this change as worthy of the 'breaking-changes' label, unless we have very good evidence it's almost entirely free of existing code breakage.

Ya, as much as I've done my best to avoid breaking changes, I think Rust's type inference will introduce errors, if nothing else. I agree we should likely treat this as a bump to 0.17.

However, I think we could hold off on the release for a bit. We've got some other PRs we're trying to clean up, so we can roll those into a 0.17. As I've said on Zulip, I want to take a crack at the shape / stride problem, too. If I can make some headway, would that be good to roll into 0.17 as well? Or should we just aim for one major crate redesign per version?

  • If we wait, users only have to deal with a version bump once, not twice
  • If we do it in two releases, users don't have to deal with learning two new major redesigns at once

@akern40 akern40 linked an issue Oct 26, 2024 that may be closed by this pull request
This is unfortunately out of our control: users can select their blas implementations, and those implementations may not be MSRV 1.64, like ndarray itself. So we should add some documentation that makes this clear (or bump our MSRV). For now, we'll just add a gate in our test driver to handle this case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants