-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Preserve concrete Union types in collect() with generators #23940
Conversation
When return_type() gives a Union of concrete types, use it as the element type of the array instead of the type of the first element and progressively making it broader as needed (often ending with element type Any). This means that no reallocation will be needed anymore if/when encoutering an element with a new type. Note that the inferred element type may still be broader than the actual contents of the array, for example with (x for x in Union{Int, Void}[1]). Using a Union element type also has the advantage of using the more efficient layout for isbitsunion types and of allowing for more efficient generated code when using the resulting array. This is particularly noticeable with array comprehensions, which inherit the behavior of collect().
I really don't think we can handle this exactly this way --- there are too many cases where inference might return an unexpected Union type, or might decide to widen the Union instead. I would rather do something like looking at |
isconcrete(T) | ||
end | ||
end | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have Base.isbitsunion
for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is slightly different, it also includes things like Union{String, Void}
.
It might be enough to find some syntax to allow declaring the eltype of a Generator upon construction; I'm not sure if we could somehow use/detect something like
Not sure if this is too disruptive for Generators in general, but it would certainly help solve the "eltype" problem they have. Happy to spec some code out and do a PR if people think it's viable; would need some help on parser changes though (if we indeed think catching a type assert is viable or if some other syntax is possible). |
@JeffBezanson The problem is, Or maybe we could restrict this to @quinnj Why wouldn't you want to preserve an inferred |
Oh, and maybe my misunderstanding is related to the fact that currently we observe this: # OK: element type is the most specific given array contents
julia> map(identity, Union{Int,Void}[1])
1-element Array{Int64,1}:
1
# Not OK: element type is always Any
julia> map(identity, Union{Int,Void}[1, nothing])
2-element Array{Any,1}:
1
nothing That is, when the return type is not based on what the array actually contains, but is always I guess the solution would be, first to always return an array with the most specific type ( |
One thing we can easily do is make |
I imagine there will be certain restrictions on this though, like only leaftypes? Would this just require changing the last line of |
Closing in favor of the more general #24332. |
When
return_type()
gives aUnion
of concrete types, use it as the elementtype of the array instead of the type of the first element and progressively
making it broader as needed (often ending with element type
Any
).This means that no reallocation will be needed anymore if/when encoutering
an element with a new type. Note that the inferred element type may still be
broader than the actual contents of the array, for example with
(x for x in Union{Int, Void}[1])
.Using a
Union
element type also has the advantage of using the more efficientlayout for
isbitsunion
types and of allowing for more efficient generated codewhen using the resulting array. This is particularly noticeable with array
comprehensions, which inherit the behavior of
collect()
.Cf. JuliaData/Missings.jl#6 (comment) and following comments.
This change is surprisingly not disruptive at all, probably because code is not supposed to rely on inference. I'm still uncertain about at least one issue: whether it makes sense to apply this strategy only to
Union
s of concrete types. The clearest performance gain should be forisbitsunion
types (efficient memory layout for the resulting array), but it should also be noticeable for otherUnion
s of concrete types (at least for code reusing the resulting array, since branches on the type will be used).Union
s of abstract types shouldn't get a better performance, but maybe for simplicity/consistency we could treat them the same?Simple benchmark on realistic data with missing values:
Note that
s == 0
returnsnull
fornull
inputs (with Nulls master), so with this PR the output ofmap
is aVector{Union{Bool, Null}}
, which means that its narrow typing will help performance down the line too. The remaining allocations will probably be reduced by general optimizations forUnion
.Cc: @davidanthoff @quinnj