-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type instabilities in predict
#959
Comments
Nice observation. May be tricky to fix quickly. I'm curious how you came to identify the issue. Do you have a concrete use case with performance substantially compromised by this issue? |
Sure, here is a 50x slowdown with a really simple function. But keep in mind that this is just the most minimal example. Ultimately this type instability will infect any code it touches. The larger the codebase depending on a particular using SymbolicRegression, MLJBase, Random
rng = Random.MersenneTwister(0)
X = randn(rng, 30, 3)
y = @. cos(X[:, 1] * 2.1 - 0.9) * X[:, 2] - X[:, 3]
model = SRRegressor(deterministic=true, seed=0, parallelism=:serial, maxsize=10)
mach = machine(model, X, y)
fit!(mach)
# With predict(mach, X)
function f(mach, X)
y = predict(mach, X)
return mean(y), std(y)
end
# With low-level calling
function g(eqn, options, X)
y, _ = eval_tree_array(eqn, X, options)
return mean(y), std(y)
end With the following performance comparison (Julia 1.10.2) @btime f($mach, $X)
# 5.521 μs (259 allocations: 18.79 KiB)
@btime g(eqn, options, $(copy(X'))) setup=(r=report(mach); eqn = r.equations[r.best_idx]; options=mach.fitresult.options)
# 134.497 ns (3 allocations: 368 bytes) (I highly recommend Cthulhu.jl for narrowing down type instabilities) |
Perhaps one way to fix this without deep structural changes would be to allow package developers to specify the For example: MMI.fitresult_type(::Type{<:MyRegressor{A,B,C}}) where {A,B,C} = MyFitResultType{A,B,C} and simply have the default method be MMI.fitresult_type(_) = Any then, in ret = $(operator)(
model,
mach.fitresult::MMI.fitresult_type(model),
...
) This would be completely backwards compatible, because the standard Using this type assertion would actually be enough for Julia to specialize. What do you think? Edit: looks like there is also no specialization to the array type passed to |
Thanks @MilesCranmer for the further details and the very nice idea for a fix. Perhaps after I have a chance to look at a direct fix we can go this route. |
Okay, the type instability in |
I had that issue too. I put the following in my startup.jl: JuliaDebug/Cthulhu.jl#546 (comment) |
Actually it looks like it’s fixed on master version of Cthulhu, just not released yet. So just install master. |
Also, this tutorial from Tim Holy was extremely helpful for learning this type of analysis: |
I have made some more progress removing some type instability with predict. Will cross-reference to a PR shorty. However, while this reduces bloat in function g(model, fitresult, X)
y = predict(model, fitresult, X)
return mean(y), std(y)
end So that I am just comparing Furthermore, there appears to be a type instability in rng = Random.MersenneTwister(0)
X = randn(rng, 30, 3)
y = @. cos(X[:, 1] * 2.1 - 0.9) * X[:, 2] - X[:, 3]
model = SRRegressor(deterministic=true, seed=0, parallelism=:serial, maxsize=10)
mach = machine(model, X, y)
fit!(mach, verbosity=0)
fitresult = mach.fitresult; After Note the boldface Anys. Am I missing something here? |
Due to the definition of
last_model
being anAny
:all of the operations in
OPERATIONS
cannot infer output type. This is okay forreport
, but forpredict
, the output type might be important for downstream operations, and so this can slow things down.The text was updated successfully, but these errors were encountered: