Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the return type of Enumerable#sum/product for union elements #15314

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

rvprasad
Copy link

Using the first type of a union type as the type of the result of Enumerable#sum/product() call can cause runtime failures, e.g. [1, 10000000000_u64].sum/product will result in an OverflowError. A safer alternative is to flag/disallow the use of union types with Enumerable#sum/product() and recommend the use of Enumerable#sum/product(initial) with an initial value of the expected type of the sum/product call.

@Blacksmoke16 Blacksmoke16 added breaking-change topic:stdlib:numeric kind:bug A bug in the code. Does not apply to documentation, specs, etc. labels Dec 26, 2024
@Blacksmoke16 Blacksmoke16 linked an issue Dec 26, 2024 that may be closed by this pull request
@rvprasad
Copy link
Author

This PR expands on the (accidentally closed) PR 15308.

Tagging @Blacksmoke16, @straight-shoota, and @BlobCodes to continue the discussion here.

Using the first type of a union type as the type of the result of
`Enumerable#sum/product()` call can cause runtime failures, e.g. `[1,
10000000000_u64].sum/product` will result in an ``OverflowError``. A
safer alternative is to flag/disallow the use of union types with
`Enumerable#sum/product()` and recommend the use of
`Enumerable#sum/product(initial)` with an initial value of the
expected type of the sum/product call.
The error message and documentation specifically refers to the variants
of Enumerable#sum/product that do not support union types.
@Sija
Copy link
Contributor

Sija commented Dec 28, 2024

This change is rather ill-placed, given it modifies the internal Enumerable::Reflect(X).first method.

@rvprasad
Copy link
Author

@Sija can you please elaborate on "this change is rather ill-placed'?

@Sija
Copy link
Contributor

Sija commented Dec 29, 2024

@rvprasad As I've already mentioned, this change modifies the internal Enumerable::Reflect(X).first method, defying its sole purpose, since now it doesn't return the first element of the type X anymore.

@rvprasad
Copy link
Author

@Sija I see what you mean. Here are a few thoughts and observations.

  1. I agree that the member named first becomes redundant with this change. So, we could rename it (e.g., as type) and update the comments in the source.
  2. As is, Reflect seems to be a common reusable unit of code that is used at different program points to check the availability of identity . So, we might not want to do away with Reflect. However, since the type argument to Reflect is always the type parameter of the enclosing enumerable, we could replace Reflect struct by a class method of Enumerable but this class method will have to have public visibility, which will be undesirable.
  3. Looking at sum(&: T -> _) and product(&: T -> _), they fail with OverFlowError when the block argument yields values of type different from the element type of the enumerable, e.g., puts [1, 2].sum {|i| i * 10000000000_u64} and puts [1_u64, 2].sum {|i| i * 10000000000_u64} fails but puts [1_u64, 2_u64].sum {|i| i * 10000000000_u64} works fine. I think the cause is that the type parameter of the enclosing enumerable is provided as the type argument to Reflect instead of the type of the value yielded by the block argument. We can solve this issue by using the return type of the block argument or by removing the block variants of sum and product that do not accept initial values.
  4. While all of the below tests fail when I execute make std_spec, the backtrace in the failing product tests does not have a line from enumerable_spec.cr, which seems strange.
    it { [1, 2].sum { |x| x * 20000000000_u64 }.should eq(60000000000_u64) }
    it { [1, 2].sum(1) { |x| x * 20000000000_u64 }.should eq(60000000001_u64) }

    it { [1, 2].product { |x| x * 100000_u64 }.should eq(20000000000_u64) }
    it { [1, 2].product(1) { |x| x * 100000_u64 }.should eq(20000000000_u64) }

While I realize surfacing more issues is not ideal, I hope they can help clear up a small corner of the language.

@straight-shoota
Copy link
Member

I don't think this change is ill-placed. The purpose of Reflect.first is to resolve a union type to a single type for #sum and #product. Of course that isn't the case anymore, so we may want to rename the method. And of course, the comment needs to be updated.
Reflect is a reusable component, but it's only used internally for this specific use case. There's no sense in preserving its original behaviour. If that's no longer used, the type is obsolete.
That could actually be an alternative, to drop Reflect entirely. We would end up calling .multiplicative_identity directly on the element type. In case of a union type, that would result in an error message about the missing method.

I believe the approach taken here to handle this case explicitly with a custom error message is better. The specific implementation such as the method name of the internal helper can be adjusted easily.

Any issue with the yielding overloads sum(&: T -> _) and product(&: T -> _) should be discussed separately. But I believe they are entirely correct though. The overflow error comes from the multiplication in the block in your test code (e.g. 1 * 10000000000_u64, the result is Int32) not the aggregation method itself. It works perfectly fine when you adjust the product type to Int64: [1, 2].sum { |i| 1000000000_i64 * i } # => 3000000000.

With this PR, Reflect is not used to determine the first type of a union type.
This commit aligns Reflect's method and related comments with the class'
purpose.
@rvprasad
Copy link
Author

rvprasad commented Dec 31, 2024

I have pushed a revision that updates the implementation and comments in Reflect.

Regarding the yielding overloads, I did not realize that, during type coercion, Crystal uses the type of expression based on the type of the first sub-expression of the expression, i.e., typeof(1 * 1u64) is Int32 while typeof(1u64 * 1) is UInt64.

On a related note, I believe typeof(yield Enumerable.element_type(self)) will return the element type of the Enumerable and not the type of valued yielded by the block. Is this correct? If so, why can't we instead use the type parameter T of the Enumerable?

Furthermore, when I tried to use T instead of Reflect(T), I encountered the following errors.

  1. Tuple(String, String, String)#sum() at string.cr@5717 (string interpolation) failed as additive_identity was undefined for String.class.
  2. Compiling Array(Array(Char) | Array(Int32))#product() at indexable.cr@226 (cartesian product) failed as multiplicative_identity was undefined for (Array(Char) | Array(Int32)).class.

@rvprasad
Copy link
Author

rvprasad commented Jan 1, 2025

With regards to typeof(yield Enumerable.element_type(self)), I think the use of yield executes the block and gets the type of the returned value. If so, I get it now.

@rvprasad
Copy link
Author

rvprasad commented Jan 1, 2025

While Reflect can be extended to flag heterogeneous tuples, {"a", "b", 3}.sum(&.to_s) fails. Interestingly, this code snippet fails with v1.14.0 with the error message Error: undefined method 'zero' for String.class.

Also, this observation is true of heterogeneous arrays, i.e., arrays whose element type is a union type.

Edit1: I have captured this observation in issue #15317.

@rvprasad
Copy link
Author

rvprasad commented Jan 1, 2025

Assuming we can pursue other issues in separate dedicated tickets, are there any objections to the proposed change to address the primary issue captured in this ticket?

@rvprasad
Copy link
Author

rvprasad commented Jan 2, 2025

@straight-shoota thanks for the approval!

Do I need to do anything more for these changes to be merged?

Copy link
Member

@bcardiff bcardiff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that + and * uses the type of the left operand one could argue that using the type of the first argument is consistent with that. Not allowing unions would be aligned to disable 1i32 + 1i64.

But still I think this change make sense, code that will break should be easy to fix, even if polymorphic since there is multiplicative_identity and additive_identity.

@straight-shoota straight-shoota added this to the 1.16.0 milestone Feb 21, 2025
Co-authored-by: Sijawusz Pur Rahnama <[email protected]>
@straight-shoota
Copy link
Member

Seems like a gave bad advise to use assert_error in #15269 (comment)
This won't compile in the interpreter. Of course we could skip in interpreted mode, but there's more: It's a test helper for the compiler specs and should never be used in the stdlib specs.

While it generally seems like a good idea to test semantics by directly using the compiler, this also requires building a huge part of the compiler just to run enumerable_spec. And this would even happen if we skip these examples (e.g. via --tag ~slow) because filters only apply at runtime. The compiler semantics still need to be compiled.

If we want to build some code from stdlib specs we should instead run the compiler as an external program. std/spec_helper.cr has a compile_file helper for successful compilation. I'll add a similar helper to detect compiler errors.

@straight-shoota straight-shoota removed this from the 1.16.0 milestone Feb 24, 2025
@straight-shoota straight-shoota self-assigned this Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:collection
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Return type of Enumerable#sum and #product for union elements
6 participants