Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Advanced) Indexing, Slicing, Masking, Reshaping and Transposition for Random Variables and Distributions #134

Conversation

marvinpfoertner
Copy link
Collaborator

This PR enables full numpy-style indexing for random variables and distributions (currently Normal and Dirac) as in

https://numpy.org/doc/stable/reference/arrays.indexing.html

Closes #11.

@marvinpfoertner
Copy link
Collaborator Author

marvinpfoertner commented Aug 11, 2020

I noticed that the .reshape() call on a random variable changes the shape of the random variable in-place.

def reshape(self, newshape):
"""
Give a new shape to a random variable.
Parameters
----------
newshape : int or tuple of ints
New shape for the random variable. It must be compatible with the original shape.
Returns
-------
reshaped_rv : ``self`` with the new dimensions of ``shape``.
"""
self._shape = newshape
self._distribution.reshape(newshape=newshape)
return self

To be consistent with the numpy API, I would propose we change this to return a new random variable with the desired shape.

Any thoughts @JonathanWenger, @pnkraemer, @nathanaelbosch?

@marvinpfoertner marvinpfoertner self-assigned this Aug 11, 2020
@marvinpfoertner marvinpfoertner added improvement Improvements of existing functionality randvars Issues related to random variables refactoring Refactoring of existing functionality labels Aug 11, 2020
@marvinpfoertner marvinpfoertner added this to the Initial Release milestone Aug 11, 2020
@marvinpfoertner marvinpfoertner marked this pull request as ready for review August 11, 2020 08:14
@JonathanWenger
Copy link
Contributor

I noticed that the .reshape() call on a random variable changes the shape of the random variable in-place.

def reshape(self, newshape):
"""
Give a new shape to a random variable.
Parameters
----------
newshape : int or tuple of ints
New shape for the random variable. It must be compatible with the original shape.
Returns
-------
reshaped_rv : ``self`` with the new dimensions of ``shape``.
"""
self._shape = newshape
self._distribution.reshape(newshape=newshape)
return self

To be consistent with the numpy API, I would propose we change this to return a new random variable with the desired shape.

Any thoughts @JonathanWenger, @pnkraemer, @nathanaelbosch?

In general we should be consistent with the numpy API, so I agree that .reshape() should not perform the operation in place, but return a new random variable. However rv.shape = (a,b) should perform the operation in place to be consistent with NumPy.

It seems to be a bit more complicated (see also: https://stackoverflow.com/questions/46574568/does-numpy-reshape-create-a-copy). While some_array.reshape(size) creates a new object (a so-called view) it still shares the same data buffer, which is not the same as reshape creating a new array.

@marvinpfoertner
Copy link
Collaborator Author

I noticed that the .reshape() call on a random variable changes the shape of the random variable in-place.

def reshape(self, newshape):
"""
Give a new shape to a random variable.
Parameters
----------
newshape : int or tuple of ints
New shape for the random variable. It must be compatible with the original shape.
Returns
-------
reshaped_rv : ``self`` with the new dimensions of ``shape``.
"""
self._shape = newshape
self._distribution.reshape(newshape=newshape)
return self

To be consistent with the numpy API, I would propose we change this to return a new random variable with the desired shape.
Any thoughts @JonathanWenger, @pnkraemer, @nathanaelbosch?

In general we should be consistent with the numpy API, so I agree that .reshape() should not perform the operation in place, but return a new random variable. However rv.shape = (a,b) should perform the operation in place to be consistent with NumPy.

It seems to be a bit more complicated (see also: https://stackoverflow.com/questions/46574568/does-numpy-reshape-create-a-copy). While some_array.reshape(size) creates a new object (a so-called view) it still shares the same data buffer, which is not the same as reshape creating a new array.

I agree but since we'll call numpy's resize internally, our random variables inherit the same data buffer. Actually I'm pretty sure that a view is essentially a new array which just shares the pointer to the contiguous piece of memory of the old array.

@JonathanWenger
Copy link
Contributor

I noticed that the .reshape() call on a random variable changes the shape of the random variable in-place.

def reshape(self, newshape):
"""
Give a new shape to a random variable.
Parameters
----------
newshape : int or tuple of ints
New shape for the random variable. It must be compatible with the original shape.
Returns
-------
reshaped_rv : ``self`` with the new dimensions of ``shape``.
"""
self._shape = newshape
self._distribution.reshape(newshape=newshape)
return self

To be consistent with the numpy API, I would propose we change this to return a new random variable with the desired shape.
Any thoughts @JonathanWenger, @pnkraemer, @nathanaelbosch?

In general we should be consistent with the numpy API, so I agree that .reshape() should not perform the operation in place, but return a new random variable. However rv.shape = (a,b) should perform the operation in place to be consistent with NumPy.
It seems to be a bit more complicated (see also: https://stackoverflow.com/questions/46574568/does-numpy-reshape-create-a-copy). While some_array.reshape(size) creates a new object (a so-called view) it still shares the same data buffer, which is not the same as reshape creating a new array.

I agree but since we'll call numpy's resize internally, our random variables inherit the same data buffer. Actually I'm pretty sure that a view is essentially a new array which just shares the pointer to the contiguous piece of memory of the old array.

That's also my understanding from a quick scan of the topic.

@marvinpfoertner
Copy link
Collaborator Author

I noticed that the .reshape() call on a random variable changes the shape of the random variable in-place.

def reshape(self, newshape):
"""
Give a new shape to a random variable.
Parameters
----------
newshape : int or tuple of ints
New shape for the random variable. It must be compatible with the original shape.
Returns
-------
reshaped_rv : ``self`` with the new dimensions of ``shape``.
"""
self._shape = newshape
self._distribution.reshape(newshape=newshape)
return self

To be consistent with the numpy API, I would propose we change this to return a new random variable with the desired shape.
Any thoughts @JonathanWenger, @pnkraemer, @nathanaelbosch?

In general we should be consistent with the numpy API, so I agree that .reshape() should not perform the operation in place, but return a new random variable. However rv.shape = (a,b) should perform the operation in place to be consistent with NumPy.
It seems to be a bit more complicated (see also: https://stackoverflow.com/questions/46574568/does-numpy-reshape-create-a-copy). While some_array.reshape(size) creates a new object (a so-called view) it still shares the same data buffer, which is not the same as reshape creating a new array.

I agree but since we'll call numpy's resize internally, our random variables inherit the same data buffer. Actually I'm pretty sure that a view is essentially a new array which just shares the pointer to the contiguous piece of memory of the old array.

That's also my understanding from a quick scan of the topic.

Should I make these changes in this PR?

@JonathanWenger
Copy link
Contributor

Fine with me.

@marvinpfoertner
Copy link
Collaborator Author

Fine with me.

Done

Copy link
Contributor

@JonathanWenger JonathanWenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job on the PR! I saw some things that will surely changed based on the discussion surrounding the removal of Distribution. So just keep those in mind before you invest too much time into making improvements based on my comments.

src/probnum/prob/distributions/dirac.py Outdated Show resolved Hide resolved
src/probnum/prob/distributions/normal.py Outdated Show resolved Hide resolved
src/probnum/prob/distributions/normal.py Outdated Show resolved Hide resolved
tests/test_prob/test_distributions/test_normal.py Outdated Show resolved Hide resolved
tests/test_prob/test_distributions/test_normal.py Outdated Show resolved Hide resolved
tests/test_prob/test_distributions/test_normal.py Outdated Show resolved Hide resolved
@marvinpfoertner marvinpfoertner force-pushed the feature/indexing-reshaping-transpose-rv branch from 58624dc to 3204b2a Compare August 17, 2020 09:38
)

def _reshape_inplace(self, newshape):
raise NotImplementedError
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inplace reshaping for univariate normals is actually impossible due to the fact that the attributes of numpy's array scalars are immutable. See also https://numpy.org/doc/stable/reference/arrays.scalars.html. We should overthink #130. It would be best if scalar quantities behave exactly the same as arrays. Maybe we should use arrays of shape () internally?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes i think representing everything internally as arrays makes a lot of sense. This removes quite a few annoying conditionals in the code as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I'll rephrase #130 then?

@marvinpfoertner
Copy link
Collaborator Author

@JonathanWenger I resolved all your comments. The PR would be ready to merge. Any objections?

@marvinpfoertner marvinpfoertner merged commit 4999842 into probabilistic-numerics:master Aug 17, 2020
@marvinpfoertner marvinpfoertner deleted the feature/indexing-reshaping-transpose-rv branch August 17, 2020 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvements of existing functionality randvars Issues related to random variables refactoring Refactoring of existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement indexing and reshaping for random variables
2 participants