Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple broadcast expressions expansion #22376

Closed
joshbode opened this issue Jun 15, 2017 · 6 comments
Closed

Multiple broadcast expressions expansion #22376

joshbode opened this issue Jun 15, 2017 · 6 comments
Labels
broadcast Applying a function over a collection compiler:lowering Syntax lowering (compiler front end, 2nd stage) performance Must go faster

Comments

@joshbode
Copy link
Contributor

The code expansion for broadcast expressions such as f.(g.(x)) looks to be incorrect (involving :composite_type, for example). For some vector types, this is also leading to incorrect return types (see JuliaStats/DataArrays.jl#263).

0.6.0-rc3.0> expand(:(f.(g.(x))))
:($(Expr(:thunk, CodeInfo(:(begin
        $(Expr(:thunk, CodeInfo(:(begin
        global ##13#14
        const ##13#14
        $(Expr(:composite_type, Symbol("##13#14"), :((Core.svec)()), :((Core.svec)()), :(Core.Function), :((Core.svec)()), false, 0))
        return
    end))))
        $(Expr(:method, false, :((Core.svec)((Core.svec)(##13#14, Core.Any), (Core.svec)())), CodeInfo(:(begin
        #temp#@_3 = g(#temp#@_2)
        return f(#temp#@_3)
    end)), false))
        #13 = $(Expr(:new, Symbol("##13#14")))
        SSAValue(0) = #13
        return (Base.broadcast)(SSAValue(0), x)
    end)))))

versus the more intuitive expansion when the inner expression is put inside a begin...end block:

0.6.0-rc3.0> expand(:(f.(begin g.(x) end)))
:($(Expr(:thunk, CodeInfo(:(begin
        # meta: location REPL[33] # REPL[33], line 1:
        # meta: pop location
        SSAValue(0) = (Base.broadcast)(g, x)
        return (Base.broadcast)(f, SSAValue(0))
    end)))))

It is also very slow, even for very simple expressions:

0.6.0-rc3.0> f(x) = x;
0.6.0-rc3.0> g(x) = x;
0.6.0-rc3.0> x = [1,2,3];

(each ran a few times to burn-in)

0.6.0-rc3.0> @time f.(g.(x));
0.063864 seconds (4.26 k allocations: 237.958 KiB)

compared to:

0.6.0-rc3.0> @time f.(begin g.(x) end);
0.000213 seconds (58 allocations: 2.563 KiB)
@joshbode joshbode changed the title Multiple broadcast expressions. Multiple broadcast expressions expansion/performance Jun 15, 2017
@ararslan ararslan added compiler:lowering Syntax lowering (compiler front end, 2nd stage) performance Must go faster broadcast Applying a function over a collection labels Jun 15, 2017
@martinholters
Copy link
Member

In short: Don't benchmark in global scope.

f.(g.(x)) creates an anonymous function x -> f(g(x)) (which is then passed to broadcast). Run in the REPL, it will indeed create and compile this function every time. If you place it inside a function, only once during compilation:

julia> @time f.(g.(x)); # warm up
  0.876215 seconds (64.20 k allocations: 3.565 MiB)

julia> @time f.(g.(x));
  0.054706 seconds (4.38 k allocations: 244.556 KiB)

julia> foo(x) = f.(g.(x));

julia> @time foo(x); # warm up
  0.050510 seconds (14.72 k allocations: 681.195 KiB)

julia> @time foo(x);
  0.000005 seconds (5 allocations: 272 bytes)

This should then be faster than bar(x) = f.(begin g.(x) end) (and allocate less), too. Please reopen if this is not the case for you.

@joshbode
Copy link
Contributor Author

Thanks - I wasn't aware of that.
That covers the performance, however the following is still an issue:

0.6.0-rc3.0> using DataArrays

0.6.0-rc3.0> x = DataArray(1:3);

0.6.0-rc3.0> foo(x) = .!isna.(x)
foo (generic function with 1 method)

0.6.0-rc3.0> foo(x)
3-element DataArrays.DataArray{Bool,1}:
 true
 true
 true

0.6.0-rc3.0> bar(x) = (y = isna.(x); .!y)
bar (generic function with 1 method)

0.6.0-rc3.0> bar(x)
3-element BitArray{1}:
 true
 true
 true

i.e. the return of foo(x) should be a BitArray, like from bar(x) (and indeed from isna.(x)).

@joshbode
Copy link
Contributor Author

joshbode commented Jun 15, 2017

Or, a slightly contrived example with types from Base only, which has different return values:

0.6.0-rc3.0> f(x) = x != 2
f (generic function with 1 method)

0.6.0-rc3.0> x = SparseVector(10, [1,2,5,9,10], [1,2,3,4,5])
10-element SparseVector{Int64,Int64} with 5 stored entries:
  [1 ]  =  1
  [2 ]  =  2
  [5 ]  =  3
  [9 ]  =  4
  [10]  =  5

0.6.0-rc3.0> .!f.(x)
10-element SparseVector{Bool,Int64} with 1 stored entry:
  [2 ]  =  true

0.6.0-rc3.0> .!begin f.(x) end
10-element SparseVector{Bool,Int64} with 10 stored entries:
  [1 ]  =  false
  [2 ]  =  true
  [3 ]  =  false
  [4 ]  =  false
  [5 ]  =  false
  [6 ]  =  false
  [7 ]  =  false
  [8 ]  =  false
  [9 ]  =  false
  [10]  =  false

@joshbode
Copy link
Contributor Author

Please reopen, @martinholters :)

@joshbode joshbode changed the title Multiple broadcast expressions expansion/performance Multiple broadcast expressions expansion Jun 15, 2017
@martinholters
Copy link
Member

DataArrays specializes broadcast for isna which is of questionable merit:

julia> x = DataArray(1:3);

julia> typeof(isna.(x))
BitArray{1}

julia> _isna(x) = isna(x)
_isna (generic function with 1 method)

julia> typeof(_isna.(x))
DataArrays.DataArray{Bool,1}

I'd say that's an issue of DataArrays, not Base.

And the two results for SparseVector are in fact == to each other.

@joshbode
Copy link
Contributor Author

Ahh - so they are. Interesting that the representation differs, though.
Thank you for your help on this - I'll follow up in DataArrays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broadcast Applying a function over a collection compiler:lowering Syntax lowering (compiler front end, 2nd stage) performance Must go faster
Projects
None yet
Development

No branches or pull requests

3 participants