Use promote_typejoin for Tuple and NamedTuple promotion #25924

nalimilan · 2018-02-07T11:27:29Z

Even if we do not want to call promote_type on the entries when promoting tuples (#25553 (comment)), we still have to use promote_typejoin so that e.g. promote_type(Tuple{Int}, Tuple{Missing}) returns Tuple{Union{Int,Missing}}. In particular, this ensures array concatenation uses Union{T,Missing/Nothing} instead of Any, which is consistent with map(identity, x).

Even if we do not want to call promote_type() on the entries when promoting tuples, we still have to use promote_typejoin() so that e.g. promote_type(Tuple{Int}, Tuple{Missing}) returns Tuple{Union{Int,Missing}}. In particular, this ensures array concatenation uses Union{T,Missing/Nothing} instead of Any, which is consistent with map(identity, x).

vtjnash · 2018-02-07T16:06:09Z

returns Tuple{Union{Int,Missing}}

It's not clear to me why we would want it to be this type

nalimilan · 2018-02-07T17:00:41Z

It's not clear to me why we would want it to be this type

Because that allows using an efficient storage and fastercode down the line? Storing [(missing, 1), (1, missing)] as a Vector{Tuple{Any,Any}} is really bad in that regard. That's really the same reasoning as the one which prompted us to make [1, missing] return a Vector{Union{Int,Missing}} rather than a Vector{Any}.

This is particularly problematic when using NamedTuple to represent data frame rows. See also #25925.

vtjnash · 2018-02-07T17:43:05Z

Because that allows using an efficient storage and fastercode down the line?

That part is what is not clear to me, as being likely / soon :)

davidanthoff · 2018-02-08T05:45:55Z

Wouldn't the minimum requirement for an efficient in-memory representation of Tuple{Union{Int,Missing}} be that it is a concrete type? And wouldn't that require that tuples are not covariant? And that would probably be a breaking change (assuming it would be desirable in the first place, which I'm not convinced off), i.e. if it isn't done for 1.0 it would have to wait at least for 2.0?

nalimilan · 2018-02-08T08:51:17Z

That part is what is not clear to me, as being likely / soon :)

What do you mean? Tuple{Union{T,Missing}}} is already much more efficient than Tuple{Any}. For example:

# Creates a Vector{Tuple{Union{Missing, Int64}} thanks to my recent changes to map
julia> x = map(_ -> (rand(Bool) ? missing : 1,), 1:10_000);

# This is the kind of vector you get with the array literal syntax
julia> y = convert(Array{Tuple{Any}}, x);

julia> @btime map(identity, x);
  48.031 μs (13 allocations: 78.66 KiB)

julia> @btime map(identity, y);
  3.175 ms (13 allocations: 78.66 KiB)

Wouldn't the minimum requirement for an efficient in-memory representation of Tuple{Union{Int,Missing}} be that it is a concrete type? And wouldn't that require that tuples are not covariant? And that would probably be a breaking change (assuming it would be desirable in the first place, which I'm not convinced off), i.e. if it isn't done for 1.0 it would have to wait at least for 2.0?

I don't know, but as shown above the performance gain is so large that whether or not the storage is more efficient using a more precise type sounds worth it already.

Anyway, even disregarding performance, without this PR we have a weird exception for tuples, which is that array literal syntax gives a less precise element type than map, while for other types it's the contrary (due to promotion, e.g. [1, 2.0] vs map(identity, Any[1, 2.0])). I think the burden of the proof should be on people who advocate for keeping this inconsistency.

davidanthoff · 2018-02-08T17:30:48Z

I'm not advocating against this PR! I'm just surprised that there is a performance improvement by going from Tuple{Any} to Tuple{Union{T,Missing}}. Given that neither is a a concrete type I would have thought that perf is similar between them. But clearly, that is not the case :)

I wonder how this looks for tuples with multiple elements? Maybe for a tuple with one element the compiler in some clever way just "removes" (whatever that might mean) the tuple wrapper and treats it as the element type, in which case all the small union optimizations might kick in, and that is why there is this perf improvement?

nalimilan · 2018-02-08T18:00:56Z

I'm not advocating against this PR! I'm just surprised that there is a performance improvement by going from Tuple{Any} to Tuple{Union{T,Missing}}. Given that neither is a a concrete type I would have thought that perf is similar between them. But clearly, that is not the case :)

Looking at the generated code, it appears that knowing the type of the elements allows the compiler to use small-Union optimizations, which makes a big difference.

I wonder how this looks for tuples with multiple elements? Maybe for a tuple with one element the compiler in some clever way just "removes" (whatever that might mean) the tuple wrapper and treats it as the element type, in which case all the small union optimizations might kick in, and that is why there is this perf improvement?

I've just tested with a 4-tuple, and the performance difference is still there, even without heterogeneous types.

vtjnash · 2018-02-08T18:01:43Z

Tuple{Union{T,Missing}}} is already much more efficient than Tuple{Any}

Isn't that just a side-effect of map not giving Tuple{Any} (or Any) for this case, so it's expending extra work trying to narrow the type for a case that we don't optimize? (I had thought we were avoiding changing this based on the comment at #25553 (comment)). I get the best performance on that example above by removing that promote rule entirely (followed by going back to Tuple{Any}):

Base.promote_typejoin(::Type{Tuple{T}}, ::Type{Tuple{T}}) where {T} = T
Base.promote_typejoin(::Type{Tuple{T}}, ::Type{Tuple{S}}) where {T, S} = Any

nalimilan · 2018-02-08T18:27:36Z

Interesting. I get this:

julia> Base.promote_typejoin(::Type{Tuple{T}}, ::Type{Tuple{T}}) where {T} = T

julia> Base.promote_typejoin(::Type{Tuple{T}}, ::Type{Tuple{S}}) where {T, S} = Any

julia> x = map(_ -> (rand(Bool) ? missing : 1,), 1:10_000); # ::Vector{Any}

julia> y = convert(Array{Tuple{Any}}, x);

julia> z = convert(Array{Tuple{Union{Int,Missing}}}, x);

julia> @btime map(identity, x);
  244.888 μs (5041 allocations: 235.11 KiB)

julia> @btime map(identity, y);
  40.441 μs (9 allocations: 156.48 KiB)

julia> @btime map(identity, z);
  34.031 μs (9 allocations: 156.48 KiB)

# And for a more complex operation involving a call
julia> @btime map(x -> sin(x[1]), x);
  1.504 ms (5039 allocations: 244.89 KiB)

julia> @btime map(x -> sin(x[1]), y);
  1.517 ms (5039 allocations: 244.89 KiB)

julia> @btime map(x -> sin(x[1]), z);
  306.159 μs (5039 allocations: 244.89 KiB)

So AFAICT the narrowest type still gives better performance, in particular when a function call is involved (I guess because of dynamic dispatch when the type isn't known statically).

I'm surprised that the promote_typejoin call reduces performance so much for the Tuple{Any} case in the identity benchmark from my previous post, though. AFAIK it should only be called once (the first time encountering a missing or a valid value). But I guess for short operations like that any inference operation has a significant cost.

(I had thought we were avoiding changing this based on the comment at #25553 (comment))

The change that this comment and #25553 (comment) referred to was defining promote_type on tuples to call promote_type on the element types, which was more radical. Here I just propose calling promote_typejoin on the element types, which only affects Union{T, Missing/Nothing}.

JeffBezanson · 2018-02-17T18:35:32Z

I suppose this makes sense for the sake of synchronizing cat with map, but we're not quite there yet. The remaining problem is that, for containers of containers, we don't generally want to convert all the inner containers to have abstract element types. For example, we do this:

julia> [[1], [:a]]
2-element Array{Array{T,1} where T,1}:
 [1]       
 Symbol[:a]

Notice: we returned a Vector{Vector}, not a Vector{Vector{Any}}. If we returned a Vector{Vector{Any}}, that would require copying both inner arrays, and boxing every element of the [1] array. If it had a million elements, that would be bad. Unfortunately, we do that for NamedTuples:

julia> map(x->(rand(Bool) ? (a=1,) : (a=:a,)), 1:10)
10-element Array{NamedTuple{(:a,),Tuple{Any}},1}:
 NamedTuple{(:a,),Tuple{Any}}((1,))

Solving that seems quite tricky though.

nalimilan · 2018-02-18T12:27:56Z

OK. So should we merge this then?

Regarding the fact that Base.promote_typejoin(typeof((a=1,)), typeof((a=:a,))) returns NamedTuple{(:a,),Tuple{Any}} (already on master), AFAICT it's relatively logical if we want promote_typejoin to return a named tuple type with Union{T, Missing} fields where it makes sense. Is that really a problem? Contrary to arrays, named tuples are immutable, so does changing their type require any additional computations? If that's a problem, I guess we could use a custom function similar to promote_typejoin which would leave unspecified the type of fields which would end up not being concrete or Union{T, Missing/Nothing} with T concrete.

This also helps to re-synchronize `cat` and `map` Closes #25924

JeffBezanson · 2018-02-22T21:00:40Z

Discussed on triage, and I think I agree with @vtjnash that map should just collect the exact values you give it and not meddle with their types. The union type issue is better solved upstream, by making named tuples with the element types you want in the first place.

nalimilan · 2018-02-22T21:06:07Z

Wait, this PR isn't about map at all, it's about promotion and cat. As I said, I don't care a lot about this PR. OTOH #26109 is about map, and the things it reverts seem much more problematic to me.

This also helps to re-synchronize `cat` and `map` Closes #25924

bramtayl · 2018-03-17T15:55:06Z

I'm confused here. I've been working on a package called KeyedTables because I'd been given to understand that pure tuple implementations of namedtuples would be better able to handle inference. I'd gotten to this point:

julia> julia> z = keyed_tuple(x = [1, 2, missing], y = [missing, "a", "b"])
((.x, Union{Missing, Int64}[1, 2, missing]), (.y, Union{Missing, String}[missing, "a", "b"]))

julia> collect(rows(z))
3-element Array{Tuple{Tuple{Key{:x},Any},Tuple{Key{:y},Any}},1}:
 ((.x, 1), (.y, missing))
 ((.x, 2), (.y, "a"))
 ((.x, missing), (.y, "b"))

So I find myself in strong agreement with my personal downvote fairy @ararslan above.

This is especially confusing because

julia> Base.@default_eltype rows(z)
Tuple{Tuple{Key{:x},Union{Missing, Int64}},Tuple{Key{:y},Union{Missing, String}}}

bramtayl · 2018-03-17T17:48:36Z

What would people think of something like this?

function typejoin(@nospecialize(a), @nospecialize(b); recur = typejoin)
    # same but use recur instead of typejoin when recurring
end


promote_typejoin(@nospecialize(a), @nospecialize(b)) = typejoin(a, b, recur = promote_typejoin)

vtjnash · 2018-03-17T22:25:08Z

Tuple{Tuple{Key{:x},Union{Missing, Int64}},Tuple{Key{:y},Union{Missing, String}}}

I think you want Tuple{Pair{Key{:x}, Union{Missing, Int64}}, Pair{Key{:y}, Union{Missing, String}}} here. Is that possible?

bramtayl · 2018-03-17T22:26:29Z

Um sure I can try with pairs would that improve inference?

bramtayl · 2018-03-17T22:37:38Z

Doesn't seem to really help:

julia>  z = keyed_tuple(x = [1, 2, missing], y = [missing, "a", "b"])
(.x => Union{Missing, Int64}[1, 2, missing], .y => Union{Missing, String}[missing, "a", "b"])

julia> collect(rows(z))
3-element Array{Tuple{Pair{Keys.Key{:x},B} where B,Pair{Keys.Key{:y},B} where B},1}:
 (.x => 1, .y => missing)
 (.x => 2, .y => "a")
 (.x => missing, .y => "b")

EDIT: Nope, pairs make things worse:

julia> Base.@default_eltype rows(z)
Tuple{Union{Pair{Keys.Key{:x},Missing}, Pair{Keys.Key{:x},Int64}},Union{Pair{Keys.Key{:y},Missing}, Pair{Keys.Key{:y},String}}}

vtjnash · 2018-03-17T23:04:39Z

rows(z) should also be returning an object of that type (Pair{Key{:x}, Union{T, Missing}}(key, val))

bramtayl · 2018-03-18T01:52:56Z

Currently it's returning a generator?

vtjnash · 2018-03-18T02:21:27Z

Sure – I meant that it should generate objects of that specific type

bramtayl · 2018-03-18T02:23:33Z

Oh yes, it should. That would definitely be nice, not currently happening in 0.7 tho.

bramtayl · 2018-03-18T02:31:23Z

Sorry don't mean to derail yet another thread but here's my function for rows

function rows(k)
    keys = map(key, k)
    Generator(
        values -> map(tuple, keys, values),
        zip(map(value, k)...)
    )
end

where key is just x -> x[1] and value is x -> x[2]

or with the pairs version

function rows(k)
    keys = map(key, k)
    Generator(
        values -> map(Pair, keys, values),
        zip(map(value, k)...)
    )
end

vtjnash · 2018-03-18T03:47:31Z

    values -> map(Pair, keys, values),

These are wrong: they drop the type parameters of the source. It's probably easier/clear (and more efficient) to implement this as a simple view-wrapper(*), rather than eagerly copying the elements to a (named)tuple. But here's the eager version anyways:

function rows(k)
    keys = map(key, k) # list of keys
    values_t = map(value, k) # list of Arrays of values
    Vs = map(eltype, values_t) # list of value Types
    element(V, key, value) = Pair{typeof(key), V}(key, value) # element constructor
    return (map(element, Vs, keys, values) for values in zip(values_t...))
end

(* if the original table was able to indexed with [row, :col], then this is perhaps just Transpose?)

bramtayl · 2018-03-18T12:26:24Z

Oooo got it so if I ensure each element is the same type then I won't have to worry about promote_typejoin at all

goretkin · 2021-04-15T17:51:25Z

Was this PR about removing the following "discrepancy"?

julia> [1, missing]
2-element Vector{Union{Missing, Int64}}:
 1
  missing

julia> [(1,), (missing,)]
2-element Vector{Tuple{Any}}:
 (1,)
 (missing,)

I don't understand why that's the behavior, instead of producing a Vector{Tuple{Union{Missing, Int64}}}.

nalimilan · 2021-04-15T18:20:08Z

Yes that was the goal.

nalimilan added the missing data Base.missing and related functionality label Feb 7, 2018

nalimilan requested a review from JeffBezanson February 7, 2018 11:27

chethega mentioned this pull request Feb 9, 2018

Regression with arrays of singletons #25983

Closed

Add fast path for NTuple{N,Any} promotion

7aada26

vtjnash added a commit that referenced this pull request Feb 19, 2018

ensure promotion rules don't alter eltype values

4a20236

This also helps to re-synchronize `cat` and `map` Closes #25924

vtjnash mentioned this pull request Feb 19, 2018

ensure promotion rules don't alter eltype values #26109

Merged

nalimilan added the triage This should be discussed on a triage call label Feb 21, 2018

vtjnash closed this Feb 22, 2018

JeffBezanson removed the triage This should be discussed on a triage call label Feb 22, 2018

vtjnash added a commit that referenced this pull request Feb 27, 2018

ensure promotion rules don't alter eltype values (#26109)

32e4610

This also helps to re-synchronize `cat` and `map` Closes #25924

bramtayl mentioned this pull request Mar 18, 2018

Make promote_typejoin recursive #26501

Closed

nalimilan deleted the nl/typejoin_tuple branch December 1, 2018 17:02

nalimilan mentioned this pull request Feb 15, 2019

type promotion of missing inside tuples #31077

Open

goretkin mentioned this pull request Apr 15, 2021

Match setindex!, push!, and append! behavior of AbstractArray Interface JuliaArrays/StructArrays.jl#184

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use promote_typejoin for Tuple and NamedTuple promotion #25924

Use promote_typejoin for Tuple and NamedTuple promotion #25924

nalimilan commented Feb 7, 2018

vtjnash commented Feb 7, 2018

nalimilan commented Feb 7, 2018

vtjnash commented Feb 7, 2018

davidanthoff commented Feb 8, 2018

nalimilan commented Feb 8, 2018

davidanthoff commented Feb 8, 2018

nalimilan commented Feb 8, 2018

vtjnash commented Feb 8, 2018

nalimilan commented Feb 8, 2018

JeffBezanson commented Feb 17, 2018

nalimilan commented Feb 18, 2018

JeffBezanson commented Feb 22, 2018

nalimilan commented Feb 22, 2018

bramtayl commented Mar 17, 2018 •

edited

Loading

bramtayl commented Mar 17, 2018 •

edited

Loading

vtjnash commented Mar 17, 2018

bramtayl commented Mar 17, 2018

bramtayl commented Mar 17, 2018 •

edited

Loading

vtjnash commented Mar 17, 2018

bramtayl commented Mar 18, 2018

vtjnash commented Mar 18, 2018

bramtayl commented Mar 18, 2018

bramtayl commented Mar 18, 2018 •

edited

Loading

vtjnash commented Mar 18, 2018

bramtayl commented Mar 18, 2018

goretkin commented Apr 15, 2021

nalimilan commented Apr 15, 2021

Use promote_typejoin for Tuple and NamedTuple promotion #25924

Use promote_typejoin for Tuple and NamedTuple promotion #25924

Conversation

nalimilan commented Feb 7, 2018

vtjnash commented Feb 7, 2018

nalimilan commented Feb 7, 2018

vtjnash commented Feb 7, 2018

davidanthoff commented Feb 8, 2018

nalimilan commented Feb 8, 2018

davidanthoff commented Feb 8, 2018

nalimilan commented Feb 8, 2018

vtjnash commented Feb 8, 2018

nalimilan commented Feb 8, 2018

JeffBezanson commented Feb 17, 2018

nalimilan commented Feb 18, 2018

JeffBezanson commented Feb 22, 2018

nalimilan commented Feb 22, 2018

bramtayl commented Mar 17, 2018 • edited Loading

bramtayl commented Mar 17, 2018 • edited Loading

vtjnash commented Mar 17, 2018

bramtayl commented Mar 17, 2018

bramtayl commented Mar 17, 2018 • edited Loading

vtjnash commented Mar 17, 2018

bramtayl commented Mar 18, 2018

vtjnash commented Mar 18, 2018

bramtayl commented Mar 18, 2018

bramtayl commented Mar 18, 2018 • edited Loading

vtjnash commented Mar 18, 2018

bramtayl commented Mar 18, 2018

goretkin commented Apr 15, 2021

nalimilan commented Apr 15, 2021

bramtayl commented Mar 17, 2018 •

edited

Loading

bramtayl commented Mar 17, 2018 •

edited

Loading

bramtayl commented Mar 17, 2018 •

edited

Loading

bramtayl commented Mar 18, 2018 •

edited

Loading