-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Match setindex!
, push!
, and append!
behavior of AbstractArray
Interface
#184
Conversation
but now missing definition of linear indexing that probably relied on an `Base.AbstractArray` fallback.
i.e. `Base.append!(s::StructVector, rows)`
Fixes remaining broken tests.
I probably need to think harder about this, but the original design of this package was very much "table" oriented, so it made sense that one could support s = StructArray(rand(ComplexF64, 10))
push!(s, (re = 1.0, im = 1.0) IMO, we either choose that we support My main concerning with the promotion is approach is that it would fail to promote fields. For example julia> struct A{T}
x::T
end
julia> s = A(1)
A{Int64}(1)
julia> convert(A{Float64}, s)
ERROR: MethodError: Cannot `convert` an object of type
A{Int64} to an object of type
A{Float64}
Closest candidates are:
convert(::Type{T}, ::T) where T at essentials.jl:205
A{T}(::Any) where T at REPL[10]:2
Stacktrace:
[1] top-level scope
@ REPL[12]:1 So if I have a |
That concern makes sense. Note that this is consistent with e.g. julia> struct A{T}
x::T
end
julia> push!([A{Float64}(1.1)], A{Int}(2))
ERROR: MethodError: Cannot `convert` an object of type
A{Int64} to an object of type
A{Float64}
And so concretely, if julia> d = Dict{Int, Complex{Float64}}()
Dict{Int64, ComplexF64}()
julia> d[3.0] = true
true
julia> d
Dict{Int64, ComplexF64} with 1 entry:
3 => 1.0+0.0im At the same time, it seems useful to support |
Gotcha. I would love to simply swap out an array-of-structs layout like In fact, perhaps my perspective is exactly opposite. This package gives a way to "wrap" the first argument of That would separate two things: 1. the memory layout, 2. the "behavior". The second point reminds me of https://github.com/JuliaObjects/ConstructionBase.jl and is kind of the essence of my ugly kludge here: |
Unrelated, but ConstructionsBase seems super useful for things that I do here. I should definitely check it out, thanks for mentioning it! Regarding the "convert or not" discussion, it's a valid point that one should be able to separate memory layout and behavior. We should also see how this plays with the widening machinery in The problem there is that if one collects an iterable of named tuples and the type of the values of the named tuple changes (for example because of missing data) the promoted type becomes too general, so that is a scenario where StructArrays needs to differ from base. For example julia> Base.promote_typejoin(typeof((a=1,)), typeof((a=missing,)))
NamedTuple{(:a,), T} where T<:Tuple whereas the "field by field" |
Ah, hm. Here's what I dug up: JuliaLang/julia#25924 [EDIT, not sure, actually] It seems like The behavior is julia> function commutator(f, A, B)
Tuple{f(A, B)}, f(Tuple{A}, Tuple{B})
end
commutator (generic function with 2 methods)
julia> commutator(Base.promote_typejoin, Int64, Missing)
(Tuple{Union{Missing, Int64}}, Tuple{Any})
julia> commutator(Base.promote_typejoin, Int64, String)
(Tuple{Any}, Tuple{Any}) |
I was trying to identify where the behavior diverges with e.g. |
Sure, this is probably the simplest example: julia> using StructArrays
julia> iter = (i for i in ((a=1,), (a=missing,)))
Base.Generator{Tuple{NamedTuple{(:a,), Tuple{Int64}}, NamedTuple{(:a,), Tuple{Missing}}}, typeof(identity)}(identity, ((a = 1,), (a = missing,)))
julia> StructArray(iter)
2-element StructArray(::Vector{Union{Missing, Int64}}) with eltype NamedTuple{(:a,), Tuple{Union{Missing, Int64}}}:
NamedTuple{(:a,), Tuple{Union{Missing, Int64}}}((1,))
NamedTuple{(:a,), Tuple{Union{Missing, Int64}}}((missing,))
julia> collect(iter)
2-element Vector{NamedTuple{(:a,), T} where T<:Tuple}:
(a = 1,)
(a = missing,) |
I think that is just some julia> StructArray(i for i in 1:5)
0-element StructArray() with eltype Int64 with indices 1:0
julia> StructArray((;i) for i in 1:5)
5-element StructArray(::Vector{Int64}) with eltype NamedTuple{(:i,), Tuple{Int64}}:
(i = 1,)
(i = 2,)
(i = 3,)
(i = 4,)
(i = 5,)
julia> Vector(i for i in 1:5)
ERROR: MethodError: no method matching (Vector{T} where T)(::Base.Generator{UnitRange{Int64}, typeof(identity)})
[...]
julia> Array((;i) for i in 1:5)
ERROR: MethodError: no method matching Array(::Base.Generator{UnitRange{Int64}, var"#5#6"})
[...] |
I'm slowly convincing myself that yes, we should convert on I am a bit perplexed about the I'm becoming convinced that in the end it's only important that one can |
I agree that we should not allocate a new array.
A few tests did break, but not many. I can take a closer look. I tend to agree with embracing the breaking change. The I am not sure if this is directly applicable, but the "kind" of In any case, I would remove |
Fixes #131 Closes #184 Co-authored-by: Gustavo Goretkin <[email protected]>
Glad to hear that! Coming to this package now, rather than having followed it from its inception, my expectations are that it should be an AbstractArray with SOA layout, and its heritage as a "Tables" implementation is a surprise. In any case of conflict between the two, I would rather see it act like an AbstractArray. |
See #182
There are a few TODOs with respect to efficiency, but otherwise this does not break any tests in this test suite, although it does change the behavior and so this could be a breaking change.
If someone defines for their own type a method
Base.convert(::Type{MyType}, ::NamedTuple)
,StructArrays.jl
will disregard it, and it will in my opinion deviate from theAbstractArray
.One possible way to rectify this is to introduce a wrapper
StructArrays.Structural
. Anyone who add methods to that would be committing type piracy, and so this could be a sentinel value for indicating "do the structural (not nominal) thing".This means a test like
would be written (now) as
and it would perhaps be worthwhile to define methods such that the following worked too: