-
-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added new features to the ndcube.__add__ method #794
base: main
Are you sure you want to change the base?
Conversation
ndcube/ndcube.py
Outdated
# addition | ||
new_data = self.data + value_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition should be done as part of the masked array addition. You've already done this below, you just need to extract the added data from the results as well as the mask.
ndcube/ndcube.py
Outdated
return self._new_instance( | ||
data=new_data, uncertainty=new_uncertainty, mask=new_mask | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having a separate return
here for the NDData
case, I think we should build a dictionary of kwargs
that we can give self._new_instance
, here. So, you can create an empty kwargs dictionary at the start of the method, and add the new data, uncertainty, etc. in the relevant place, e.g.
kwargs["uncertainty"] = new_uncertainty
Then the final line of the method would become
return self._new_instance(**kwargs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if this doesn't make sense
Co-authored-by: DanRyanIrish <[email protected]>
Co-authored-by: DanRyanIrish <[email protected]>
…o nddataArithmetic and further modify the _add_ method.
ndcube/ndcube.py
Outdated
if self.uncertainty is not None and value.uncertainty is not None: | ||
new_uncertainty = self.uncertainty.propagate( | ||
np.add, value.uncertainty, correlation=0 | ||
np.add, value.uncertainty, result_data = value.data, correlation=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result_data
needs to be the result of the operation. So, assuming you moved the addition of the datas using the masked array to before the uncertainty propagation, you could do:
np.add, value.uncertainty, result_data = value.data, correlation=0 | |
np.add, value.uncertainty, result_data = kwargs["data"], correlation=0 |
ndcube/ndcube.py
Outdated
# combine mask | ||
self_ma = np.ma.MaskedArray(self.data, mask=self.mask) | ||
value_ma = np.ma.MaskedArray(value_data, mask=value.mask) | ||
|
||
# addition | ||
result_ma = self_ma + value_ma | ||
new_mask = result_ma.mask | ||
|
||
# extract new mask and new data | ||
kwargs["mask"] = result_ma.mask | ||
kwargs["data"] = result_ma.data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in above comment, I think it makes sense to do this before the uncertainty propagation so you can use the kwargs["data"]
value in that propagation.
ndcube/ndcube.py
Outdated
kwargs["data"] = result_ma.data | ||
|
||
# return the new NDCube instance | ||
return self._new_instance(**kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this line to the end of the method and use the kwargs
approach when handling the other cases, e.g. Quantity
. So, for example, L1082 would become:
kwargs["data"] = self.data + value.to_value(cube_unit)
A changelog file needs to be added. And your branch needs to be updated with the latest version of main. |
Hi @DanRyanIrish, as we have discussed in our project meetings, below are the issues we encountered and may need further discussions with others in the community: The issue is mainly around how NumPy handles masks when performing an addition for two NumPy.MaskedArray. However, from experimentation, it can be seen that I find this confusing because even if it does combine the mask and then apply it on the result, it should be: Please correct me if there is anything wrong in my understanding. |
@DanRyanIrish, Secondly, we also encountered some issues around the propagate method:
|
Hi @PCJY. I think the first thing we need to do is decide what behaviours we want to implement in the case where at least one of the Firstly, if an object has no mask, that is equivalent to all pixels being unmasked.
Alternatives for parts of the scheme could include:
Once we agree on a scheme, the way forward on your uncertainty questions will become clear. @Cadair what are your thoughts on this scheme. I also think we should bring this up at the sunpy weekly meeting to get other thoughts. |
This is where I also find numpy masked arrays counter-intuitive. However, the logic is as follows:
Notice that this is not the same as the scheme I've proposed in my previous comment, in part because it's confusing, as you've found. |
@PCJY, until we agree a way forward with the mask, you should proceed by implementing in the case for which neither object has a mask. So no need for masked arrays. |
I haven't thought too much each of these individual cases, but the fact there is a list is enough to make me think we probably need a way for the user to choose. This is obviously not possible with the Is this also not a problem for other operators as well? |
I think this is a good idea. As well as As far as I can see, this ambiguity only arises when there are masks involved. So we could still implement the dunder methods as wrappers around the above methods, but have the raise and error/return |
Here is an example of including fixtures in parameterisations: https://github.com/sunpy/ndcube/blob/main/ndcube/tests/test_ndcube.py#L48 |
…esolve parameterized fixtures via getfixturevalue().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test function names are now much clearer, and it's easier to see the cases that are tested.
A couple more comments on your test writing:
-
At least one of your tests is somewhat circular, i.e. you use the result of the operation you're testing to form the expected values. This means that an error might make it into the expected values and the test will pass when it should fail. See inline comments for a better way to structure tests. These are for information only. There is an even better way to structure these tests. See item below.
-
ndcube provides a test helper function that checks if two cubes are equal. This means your test only needs to construct the expected cube, and then pass it and the result of the addition to that helper function. See below for an example of how to do this for one test
def test_cube_add_cube_unit_unc_nddata_unit_unc(ndc, value):
output_cube = ndc + value # perform the addition
# Construct expected cube
expected_unit = u.ct
expected_data = ((ndc.data * ndc.unit) + (value.data * value.unit)).to_value(expected_unit)
expected_uncertainty = ndc.uncertainty.propagate(
operation=np.add,
other_nddata=value,
result_data=expected.data*expected.unit,
correlation=0,
)
expected_cube = NDCube(expected_data, ndc.wcs, uncertainty=expected_uncertainty, unit=expected_unit)
# Assert output cube is same as expected cube
assert_cubes_equal(output_cube, expected_cube)
All your tests in this PR should be written like this to ensure the output cubes are fully tested. It should also make them easier to read.
ndcube/tests/test_ndcube.py
Outdated
expected_uncertainty = ndc.uncertainty.propagate( | ||
operation=np.add, | ||
other_nddata=value, | ||
result_data=new_cube.data*new_cube.unit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is a bit circular. You shouldn't use outputs of the test to form expected values. Otherwise the test may pass when it shouldn't.
expected_uncertainty = ndc.uncertainty.propagate( | |
operation=np.add, | |
other_nddata=value, | |
result_data=new_cube.data*new_cube.unit, | |
expected_unit = u.ct | |
expected_data = (ndc.data * ndc.unit).to_value(expected_unit) + (value.data * value.unit).to_value(expected_unit) | |
expected_uncertainty = ndc.uncertainty.propagate( | |
operation=np.add, | |
other_nddata=value, | |
result_data=expected_data*expected_unit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @DanRyanIrish , thank you for the suggestions. I checked the code for the assert_cubes_equal method. I am unsure whether it checks the values, type and units of the uncertainty attributes as well.
It looks like it only checks whether the shapes of uncertainties are the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DanRyanIrish Or maybe I can
either: use the assert_cube_equal method together with a few more lines that checks the values, type and units of the uncertainty attributes,
or: adding a few more lines into the assert_cube_equal method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @PCJY. Well spotted. You're right. You should include a check_uncertainty_values=False
kwarg to assert_cube_equal
and make it check those aspects of the uncertainty if set to True
. So the code here would be replaced by something like:
if check_uncertainty_values:
# Check output and expected uncertainty are of same type. Remember they could be None.
# If the uncertainties are not None,...
# Check units, shape, and values of the uncertainty.
elif test_input.uncertainty:
assert test_input.uncertainty.array.shape == expected_cube.uncertainty.array.shape
Then you can set check_uncertainty_values
to True
when you call assert_cubes_equal
in your tests, and that should do what you need it to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DanRyanIrish I see what you mean, thank you for this suggestion! I will implement this.
ndcube/tests/test_ndcube.py
Outdated
assert np.allclose(new_cube.data, ndc.data + value.data) | ||
assert new_cube.unit == u.ct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using above suggested changes:
assert np.allclose(new_cube.data, ndc.data + value.data) | |
assert new_cube.unit == u.ct | |
assert np.allclose(new_cube.data, expected_data) | |
assert new_cube.unit == expected_unit |
Co-authored-by: DanRyanIrish <[email protected]>
Hi @PCJY. The tests are looking good now. However, could you please rename them |
ndcube/tests/helpers.py
Outdated
else: | ||
assert (test_input.uncertainty is None and expected_cube.uncertainty is None) | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the way you have it, you could do this:
pass | |
assert type(test_input.uncertainty) is type(expected_cube.uncertainty) | |
assert np.allclose(test_input.uncertainty.array, expected_cube.uncertainty.array), \ | |
f"Expected uncertainty: {expected_cube.uncertainty}, but got: {test_input.uncertainty.array}" |
and then remove lines 135-139. Up to you though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DanRyanIrish Thank you for your suggestion. I re-understood what the code aims to do, and changed it again.
Hi @PCJY. I've edited your mask discussion in the PR description. I've created a table for all the scenarios for determining the result of adding the masked data. Can you fill in the table with the result of each case? Either |
Hi @DanRyanIrish, thank you for providing the logic structures. I have filled in the results, please feel free to tell me whether I made any mistakes. (I showed my understanding process in the comments as well.) |
Hi @PCJY Regarding you question in the PR description:
Yes, that's right. Regarding your two options, |
@PCJY: Regarding your comments on the distinct cases in the PR description:
Yes :)
Yes!
Yet to be decided So I think a way to implement this would be: self_data, value_data = self.data, value.data # May require a copy
self_mask, value_mask = self.mask, value.mask # May require handling/converting of cases when masks aren't boolean arrays but are None, True, or False.
if not operation_ignores_mask:
no_op_value = 0 # Value to set masked values since we are doing addition. (Would need to be 1 if we were doing multiplication.)
idx = np.logical_and(self_mask, np.logical_not(value_mask))
self_data[idx] = no_op_value
idx = np.logical_and(value_mask, np.logical_not(self_mask))
value_data[idx] = no_op_value
idx = np.logical_and(self_mask, value_mask)
#self_data[idx], value_data[idx] = ?, ? # Handle case when both values are masked here. # We are yet to decide the best behaviour here.
else:
pass # If operation ignores mask, nothing needs to be done. This line not needed in actual code. Only here for clarity.
# Perform addition
new_data = self_data + value_data
# Calculate new mask.
new_mask = handle_mask(self_mask, value_mask) if handle_mask else None |
PR Description
This PR aims to fix issue #734 created by @DanRyanIrish .
write down the scenarios.
(list, to see what has been covered)
Testing scenarios without mask
redrafting the tests.
One and only one of them has a unit. [Error]
Both NDCube and NDData have unit
Both have uncertainty.
NDCube has uncertainty.
NDData has uncertainty.
Neither has uncertainty.
Neither has a unit.
Both have uncertainty.
NDCube has uncertainty.
NDData has uncertainty.
Neither has uncertainty.
name them again:
test_cube_add_cube_unit_mask_nddata_unc_unit_mask
Handling Masks
Determing data result of operation
The
operation_ignores_mask
kwarg determines the resulting data value when adding objects, and accounting for mask values. Below are the different scenarios for the addition of a single pixelNDCube
, namedcube
with a data value of 1, and a single pixelNDData
object, namednddata
with a data value of 2:my draft
what this means:
when OIM is T, does it ignore the mask after the addition or during the addition?
during the addition, because we want to see the arithmetic operation results here but not the mask. if it deals with the mask after the addition, there would be no need to check the arithmetic values here
no need to do boolean operations here for the masks?
this is why mask handling and arithmetic operation are separate with each other.
data and mask are two separate things: First data, then mask;
addition is done on both the data and the mask.
cube.data = 1
,nddata.data = 2
What the distinct cases are:
When OIM is T, the result is always 3, i.e. the actual result of the addition of the two data values.
When OIM is F, the result is always the addition result of any value with its corresponding mask being F (when there is not one, the result is None).
Determining the new mask value produced by the operation
The
handle_mask
kwarg takes a function which determines the new mask value resulting from the addition. While this function can be anything that takes two boolean arrays and outputs a resulting boolean array of the shape, the most commonly used are expected to bynumpy.logical_and
andnumpy.logical_or
. Since the user supplies the function, the resulting mask can be implemented something like:My understanding:
handle_mask is a function,
as long as it has a value, it will be True, and does the operation on the two masks,
otherwise, the new_mask value would be None, meaning:
1), the user did not set anything for the handle_mask kwarg, should an error be raised here?
2), there is no need to set any value for the handle_mask kwarg.