Replies: 9 comments 3 replies
-
see: basically, Lines 344 to 351 in c34dea5 |
Beta Was this translation helpful? Give feedback.
-
notice, nesting julia> a = collect(1:10);
julia> b = @view a[1:5];
julia> c = @view b[1:3];
julia> c
3-element view(::Vector{Int64}, 1:3) with eltype Int64:
1
2
3
julia> reverse!(a);
julia> c
3-element view(::Vector{Int64}, 1:3) with eltype Int64:
10
9
8 |
Beta Was this translation helpful? Give feedback.
-
The situation with Array is different because it has a setindex! method and it's itself an Array. The proposal would be that getindex on LazyBranch returns a LazyBranch, similarly to getindex on Tuple that returns a Tuple and getindex on Array that returns an Array. So getindex will allocate a new LazyBranch and the LazyBranch struct will have two extra fields to keep track of the offset and maximum length. Philippe. |
Beta Was this translation helpful? Give feedback.
-
the guideline on "getindex() shouldn't return lazy view" is for all If We have argued If you want to keep the property |
Beta Was this translation helpful? Give feedback.
-
He will use collect(). The purpose of collect is to return an Array from a collection or an iterable, so it matches exactly the intention. There is nothing wrong with view, it is just that the getindex behavior is confusing: it is counterintuitive that adding a 1:n to limit the number of processed events in a I don't see contradiction from my proposal with the guidelines: you wrote that it must allocate a copy, it allocates a copy of the LazyBranch which is your "array" (in the sense of subtyping AbstractArray). The LayzyBranch contains references to data stored in a file, "copy" just means copying the reference but not the referred content, as it's the case for the elements of an array that are references. The important point is what was the rationale of the guideline. Maybe there is some reason I missed that required a copy of the data |
Beta Was this translation helpful? Give feedback.
-
If Arrow.jljulia> df = Arrow.Table("/tmp/a.feather");
julia> typeof(df.x) # this is memory-mapp, i.e. lazy
Arrow.Primitive{Int64, Vector{Int64}}
julia> typeof(df.x[1:3])
Vector{Int64} (alias for Array{Int64, 1}) MappedArraysjulia> using MappedArrays
julia> a = mappedarray(sqrt, [1,3,4]);
julia> typeof(a) # lazy because `sqrt` only evaluates upon indexing
ReadonlyMappedArray{Float64, 1, Vector{Int64}, typeof(sqrt)}
julia> typeof(a[1:2])
Vector{Float64} (alias for Array{Float64, 1})
julia> t = LazyTree(UnROOT.samplefile("NanoAODv5_sample.root"), "Events", r"Muon_(pt|eta|phi)$");
julia> collect(t[1:2])
2-element Vector{UnROOT.LazyEvent}:
UnROOT.LazyEvent at index 1 with 3 columns:
(Muon_phi = Float32[], Muon_pt = Float32[], Muon_eta = Float32[])
UnROOT.LazyEvent at index 2 with 3 columns:
(Muon_phi = Float32[-0.30541992, 0.98999023], Muon_pt = Float32[19.93826, 15.303187], Muon_eta = Float32[0.53015137, 0.2286377])
julia> t[1:2] # materialized but still is a table
Row │ Muon_phi Muon_pt Muon_eta
│ Vector{Float32} Vector{Float32} Vector{Float32}
─────┼───────────────────────────────────────────────────
1 │ [] [] []
2 │ [-0.305, 0.99] [19.9, 15.3] [0.53, 0.229] we cannot make |
Beta Was this translation helpful? Give feedback.
-
Let's conclude that it is natural for Julia experts, but not for a particle physicist. I still think that with the size of the file we are dealing with, copy of the file content into memory should be avoided, unless explicitly requested by the user. |
Beta Was this translation helpful? Give feedback.
-
I highly appreciate the intention and we definitely try our best to design everything for the target users whenever possible. But at some point users gotta know what they are doing, we can't make things surprising for everyone except the most naive user who just tried whatever came to their mind. I just feel like if we tailors designs around "day 1 users", it's not good for the long term. And I believe given HEP physicists literacy in software, it would be good to be consistent with the ecosystem we're in. I think we have room to improve for sure, and let's keep the discussion open. I am also open to other proposals if any come across your mind in the future. |
Beta Was this translation helpful? Give feedback.
-
An interesting julia> r = 1:5
1:5
julia> isa(r, AbstractVector)
true
julia> r[2:3]
2:3
julia> r[:]
1:5
julia> r[[1,2,3,4,5]]
5-element Vector{Int64}:
1
2
3
4
5
julia> r[iseven.(r)]
2-element Vector{Int64}:
2
4 |
Beta Was this translation helpful? Give feedback.
-
I've been confused in my first experience with LazyTree with its promptness to materialize and trigger full-branch reads.
Is there any motivation for having lazy_branch[n:m] and lazy_tree[n:m] "eager" or is it for implementation simplification?
I would naturally expect them to limit the range (and offset indexing) without breaking the laziness, while the collect function can be used for a materialization.
Philippe.
Beta Was this translation helpful? Give feedback.
All reactions