Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

specialize loops inside functions #5654

Open
johnmyleswhite opened this issue Feb 3, 2014 · 13 comments
Open

specialize loops inside functions #5654

johnmyleswhite opened this issue Feb 3, 2014 · 13 comments
Labels
performance Must go faster

Comments

@johnmyleswhite
Copy link
Member

While trying to optimize performance of operations on DataFrames, I've found that a recurring pattern underlies a lot of the performance bottlenecks people encounter: functions that take in DataFrames as input don't have enough type information to specialize their implementations for the specific types of the columns stored in a DataFrame. This means that a two-part function that (a) selects a subset of columns to compute on and then (b) performs an intensive computation on those columns can't take advantage of the type information available at the end of part A to improve the performance of part B.

It's possible to articulate this pattern without any reference to DataFrames: you get the exact same effect if you try pulling whole columns out of a Vector{Any} and then computing on them. The only workaround I've found is to use an inner function defined inside of the two-part function. This inner function helps to defer type specialization until a time at which it can be carried out effectively.

The three examples below highlight this problem:

function dot1(vals::Vector{Any})
    a, b = vals[1], vals[2]
    x = 0.0
    for i in 1:length(a)
        x += a[i] * b[i]
    end
    return x
end

function dot2(a::Vector, b::Vector)
    x = 0.0
    for i in 1:length(a)
        x += a[i] * b[i]
    end
    return x
end

function dot3(vals::Vector{Any})
    a, b = vals[1], vals[2]
    function inner(a, b)
        x = 0.0
        for i in 1:length(a)
            x += a[i] * b[i]
        end
        return x
    end
    return inner(a, b)
end

If you run these functions with the following data, you get very substantial performance differences between the initial naive implementation and the other two:

n = 1_000_000
a, b = randn(n), randn(n)
vals = {a, b}
dot1(vals)
dot2(a, b)
dot3(vals)

@elapsed dot1(vals)
@elapsed dot2(a, b)
@elapsed dot3(vals)

On my machine, I get relative timings of the following orders of magnitude:

  • dot1: 230x slower
  • dot2: 1x
  • dot3: 1x

It would be a huge gain to future users of DataFrames if there were a way to make the compiler recognize that a little bit of second-stage type inference could give major performance gains.

@cdsousa
Copy link
Contributor

cdsousa commented Feb 3, 2014

The dot3(a, b) warm up must be dot3(vals).
In my system dot3 is as performant as dot2, maybe you get that time in dot3 due to it not being correctly warmed up...

@johnmyleswhite
Copy link
Member Author

You're right. Thanks for catching that!

@quinnj
Copy link
Member

quinnj commented Aug 14, 2014

Superseded by #7311.

@quinnj quinnj closed this as completed Aug 14, 2014
@simonster
Copy link
Member

@quinnj I'm not sure how staged functions solve this. Can you elaborate?

@JeffBezanson
Copy link
Member

Yeah, this is not the same issue.

@JeffBezanson JeffBezanson reopened this Aug 14, 2014
@quinnj
Copy link
Member

quinnj commented Aug 14, 2014

Sorry. I thought that

if there were a way to make the compiler recognize that a little bit of second-stage type inference could give major performance gains

was exactly what staged functions were getting us.

@JeffBezanson
Copy link
Member

The work item here is to identify poorly-typed loops, and carve them out into separate functions that can be specialized at run time. It's an extra compiler optimization pass.

@timholy
Copy link
Member

timholy commented Aug 14, 2014

@quinnj, a staged function will receive that ::Vector{Any} as its input, so it won't have any more type information than the parent.

@Keno
Copy link
Member

Keno commented Aug 14, 2014

While that is true, it can actually throw an error in that case, in which case it will be called again at runtime with the correct types. That's part of the magic :).

@timholy
Copy link
Member

timholy commented Aug 14, 2014

Not sure I understand. What if I explicitly feed it an Any[[1,2,3],[1.0,2.0,3.0]]? At no point will you get good type information about this object.

Now, if you wrote it as

function dot4{T}(vals::T)
...

and passed it ([1,2,3],[1.0,2.0,3.0]) then it would be a different story. But you don't need a staged function for that.

@Keno
Copy link
Member

Keno commented Aug 14, 2014

I think the issue is a case where the runtime type information is there but the compile time one isn't so the loop won't be optimized. If there's never good type information then we can't do anything anyway.

@barucden
Copy link
Contributor

barucden commented Mar 21, 2022

Is this still an issue? On 1.7.2, I see this result on my machine:

julia> using BenchmarkTools

julia> n = 1_000_000
1000000

julia> a, b = randn(n), randn(n);

julia> vals = Any[a, b];

julia> function dot1(vals::Vector)
           a, b = vals[1], vals[2]
           x = 0.0
           for i in 1:length(a)
               x += a[i] * b[i]
           end
           return x
       end
dot1 (generic function with 1 method)

julia> function dot2(a::Vector, b::Vector)
           x = 0.0
           for i in 1:length(a)
               x += a[i] * b[i]
           end
           return x
       end
dot2 (generic function with 1 method)

julia> function dot3(vals::Vector)
           a, b = vals[1], vals[2]
           function inner(a, b)
               x = 0.0
               for i in 1:length(a)
                   x += a[i] * b[i]
               end
               return x
           end
           return inner(a, b)
       end
dot3 (generic function with 1 method)

julia> @btime dot1(vals);
  158.501 ms (6998980 allocations: 122.05 MiB)

julia> @btime dot2(a, b);
  1.528 ms (1 allocation: 16 bytes)

julia> @btime dot3(vals);
  1.529 ms (1 allocation: 16 bytes)

EDIT: Updated the measurement according to @Keno's remark. The issue still stands.

@Keno
Copy link
Member

Keno commented Mar 21, 2022

The issue is for vals = Any[a, b];. The {} syntax was removed many years ago, but corresponds to Any[].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

8 participants