-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use promote_typejoin for Tuple and NamedTuple promotion #25924
Conversation
Even if we do not want to call promote_type() on the entries when promoting tuples, we still have to use promote_typejoin() so that e.g. promote_type(Tuple{Int}, Tuple{Missing}) returns Tuple{Union{Int,Missing}}. In particular, this ensures array concatenation uses Union{T,Missing/Nothing} instead of Any, which is consistent with map(identity, x).
It's not clear to me why we would want it to be this type |
Because that allows using an efficient storage and fastercode down the line? Storing This is particularly problematic when using |
That part is what is not clear to me, as being likely / soon :) |
Wouldn't the minimum requirement for an efficient in-memory representation of |
What do you mean? # Creates a Vector{Tuple{Union{Missing, Int64}} thanks to my recent changes to map
julia> x = map(_ -> (rand(Bool) ? missing : 1,), 1:10_000);
# This is the kind of vector you get with the array literal syntax
julia> y = convert(Array{Tuple{Any}}, x);
julia> @btime map(identity, x);
48.031 μs (13 allocations: 78.66 KiB)
julia> @btime map(identity, y);
3.175 ms (13 allocations: 78.66 KiB)
I don't know, but as shown above the performance gain is so large that whether or not the storage is more efficient using a more precise type sounds worth it already. Anyway, even disregarding performance, without this PR we have a weird exception for tuples, which is that array literal syntax gives a less precise element type than |
I'm not advocating against this PR! I'm just surprised that there is a performance improvement by going from I wonder how this looks for tuples with multiple elements? Maybe for a tuple with one element the compiler in some clever way just "removes" (whatever that might mean) the tuple wrapper and treats it as the element type, in which case all the small union optimizations might kick in, and that is why there is this perf improvement? |
Looking at the generated code, it appears that knowing the type of the elements allows the compiler to use small-
I've just tested with a 4-tuple, and the performance difference is still there, even without heterogeneous types. |
Isn't that just a side-effect of Base.promote_typejoin(::Type{Tuple{T}}, ::Type{Tuple{T}}) where {T} = T
Base.promote_typejoin(::Type{Tuple{T}}, ::Type{Tuple{S}}) where {T, S} = Any |
Interesting. I get this: julia> Base.promote_typejoin(::Type{Tuple{T}}, ::Type{Tuple{T}}) where {T} = T
julia> Base.promote_typejoin(::Type{Tuple{T}}, ::Type{Tuple{S}}) where {T, S} = Any
julia> x = map(_ -> (rand(Bool) ? missing : 1,), 1:10_000); # ::Vector{Any}
julia> y = convert(Array{Tuple{Any}}, x);
julia> z = convert(Array{Tuple{Union{Int,Missing}}}, x);
julia> @btime map(identity, x);
244.888 μs (5041 allocations: 235.11 KiB)
julia> @btime map(identity, y);
40.441 μs (9 allocations: 156.48 KiB)
julia> @btime map(identity, z);
34.031 μs (9 allocations: 156.48 KiB)
# And for a more complex operation involving a call
julia> @btime map(x -> sin(x[1]), x);
1.504 ms (5039 allocations: 244.89 KiB)
julia> @btime map(x -> sin(x[1]), y);
1.517 ms (5039 allocations: 244.89 KiB)
julia> @btime map(x -> sin(x[1]), z);
306.159 μs (5039 allocations: 244.89 KiB)
So AFAICT the narrowest type still gives better performance, in particular when a function call is involved (I guess because of dynamic dispatch when the type isn't known statically). I'm surprised that the
The change that this comment and #25553 (comment) referred to was defining |
I suppose this makes sense for the sake of synchronizing
Notice: we returned a
Solving that seems quite tricky though. |
OK. So should we merge this then? Regarding the fact that |
This also helps to re-synchronize `cat` and `map` Closes #25924
Discussed on triage, and I think I agree with @vtjnash that |
Wait, this PR isn't about |
This also helps to re-synchronize `cat` and `map` Closes #25924
I'm confused here. I've been working on a package called KeyedTables because I'd been given to understand that pure tuple implementations of namedtuples would be better able to handle inference. I'd gotten to this point: julia> julia> z = keyed_tuple(x = [1, 2, missing], y = [missing, "a", "b"])
((.x, Union{Missing, Int64}[1, 2, missing]), (.y, Union{Missing, String}[missing, "a", "b"]))
julia> collect(rows(z))
3-element Array{Tuple{Tuple{Key{:x},Any},Tuple{Key{:y},Any}},1}:
((.x, 1), (.y, missing))
((.x, 2), (.y, "a"))
((.x, missing), (.y, "b")) So I find myself in strong agreement with my personal downvote fairy @ararslan above. This is especially confusing because julia> Base.@default_eltype rows(z)
Tuple{Tuple{Key{:x},Union{Missing, Int64}},Tuple{Key{:y},Union{Missing, String}}} |
What would people think of something like this? function typejoin(@nospecialize(a), @nospecialize(b); recur = typejoin)
# same but use recur instead of typejoin when recurring
end
promote_typejoin(@nospecialize(a), @nospecialize(b)) = typejoin(a, b, recur = promote_typejoin) |
I think you want |
Um sure I can try with pairs would that improve inference? |
Doesn't seem to really help: julia> z = keyed_tuple(x = [1, 2, missing], y = [missing, "a", "b"])
(.x => Union{Missing, Int64}[1, 2, missing], .y => Union{Missing, String}[missing, "a", "b"])
julia> collect(rows(z))
3-element Array{Tuple{Pair{Keys.Key{:x},B} where B,Pair{Keys.Key{:y},B} where B},1}:
(.x => 1, .y => missing)
(.x => 2, .y => "a")
(.x => missing, .y => "b") EDIT: Nope, pairs make things worse: julia> Base.@default_eltype rows(z)
Tuple{Union{Pair{Keys.Key{:x},Missing}, Pair{Keys.Key{:x},Int64}},Union{Pair{Keys.Key{:y},Missing}, Pair{Keys.Key{:y},String}}} |
|
Currently it's returning a generator? |
Sure – I meant that it should generate objects of that specific type |
Oh yes, it should. That would definitely be nice, not currently happening in 0.7 tho. |
Sorry don't mean to derail yet another thread but here's my function for rows function rows(k)
keys = map(key, k)
Generator(
values -> map(tuple, keys, values),
zip(map(value, k)...)
)
end where or with the pairs version function rows(k)
keys = map(key, k)
Generator(
values -> map(Pair, keys, values),
zip(map(value, k)...)
)
end |
These are wrong: they drop the type parameters of the source. It's probably easier/clear (and more efficient) to implement this as a simple view-wrapper(*), rather than eagerly copying the elements to a (named)tuple. But here's the eager version anyways: function rows(k)
keys = map(key, k) # list of keys
values_t = map(value, k) # list of Arrays of values
Vs = map(eltype, values_t) # list of value Types
element(V, key, value) = Pair{typeof(key), V}(key, value) # element constructor
return (map(element, Vs, keys, values) for values in zip(values_t...))
end (* if the original table was able to indexed with |
Oooo got it so if I ensure each element is the same type then I won't have to worry about promote_typejoin at all |
Was this PR about removing the following "discrepancy"? julia> [1, missing]
2-element Vector{Union{Missing, Int64}}:
1
missing
julia> [(1,), (missing,)]
2-element Vector{Tuple{Any}}:
(1,)
(missing,) I don't understand why that's the behavior, instead of producing a |
Yes that was the goal. |
Even if we do not want to call
promote_type
on the entries when promoting tuples (#25553 (comment)), we still have to usepromote_typejoin
so that e.g.promote_type(Tuple{Int}, Tuple{Missing})
returnsTuple{Union{Int,Missing}}
. In particular, this ensures array concatenation usesUnion{T,Missing/Nothing}
instead ofAny
, which is consistent withmap(identity, x)
.