Feature request: Terminal Leafs Cubist #47

azev77 · 2020-04-02T23:36:46Z

Hi and thank you for your package!
I love how flexible it is.

I've had good results w/ Cubist models: in which terminal leaves contain linear regression models, as opposed to simple averages.

Would it be hard to include this type of option in EvoTrees?

jeremiedb · 2020-04-03T01:36:40Z

Hi! Thanks for bringing this suggestion. I wasn't aware of that Cubist package. It seems to be similar to what is perfomedby piecewise-linear library.

There are some aspects of the implementation I'd need to figure out:

When it comes to fit a piece-wise linear function on a single variable, it seems relatively straighforward to implement an approach where each leaf generates a prediction with an slope and intercept parameters. If the model forces continuity that may involve a little more tricky approach. The predictions would need to be applied a little differently as instead of returning just the predict value of the leaf, it would need to apply the linear prediction using the leaf's slope and intercept along with the feature's value.
If the model fits linear segments conditional on the previous node splits, then I'm a little unsure how things should work out: let's say first split is Yes/No groups on variable 1, then on each Yes and No groups, the model would fit a linear segment on let's say var2 and var3. I could hardly see how continuity could be enforced in such condition. Which isn't necessarily an issue that being said, just wondering if that's your understanding of the heuiristic of such models?

The papers on the original implementation seem to require paid registration. But if the general general idea on how the trees should be grown is figure out, my best guess is that it should be feasible to implement, subject to some specific tricks as noted above for the prediction. Something Julia's multiple dispatch should help simplifies. Let me know what you think!

azev77 · 2020-04-03T06:48:41Z

Here is Quinlan 1992 & Quinlan 1993, a short presentation & a longer one.
Here is an R wrapper of the original C code.

I'm not an expert on the underlying algorithm, but I've found when I run all Caret models on different datasets, Cubist tends to do well. Sometimes outperforming xgb.

Yes, the idea is similar to piecewise-linear.
In theory almost any model could be used in terminal leaves, obviously in practice one needs to be judicious.

AnyBoost.jl is boosting on steroids.

azev77 · 2020-04-03T22:47:31Z

Here's a trivial example of a tree w/ two leaves where the user can define a custom Loss/Model:

using Statistics

AZ_Tree= function(x, y; j=1)
  splits= sort(unique(x[:,j]))
  sse   = []
  for (i, sp) in enumerate(splits)
    #println(i," ",sp)
    il= x[:,j] .<   sp #index less than
    ig= x[:,j] .>=  sp #index greater than
    ℓ=   L(y[il], f(x[il,:], y[il])) + L(y[ig], f(x[ig,:], y[ig]))
    push!(sse, ℓ)
  end
  split_at= splits[argmin(sse)]
  return (sc = minimum(sse), split = split_at)
end

X= rand(10,2);
y= X*rand(2,1) + rand(10);
#
L(y,ŷ)= sum( (y .- ŷ).^2 ); #L2 loss
#
f(x,y)= mean(y) * ones(size(y,1)); #constant in leaf
AZ_Tree(X,y, j=1)
#
f(x,y)= x * (x \ y);                         #linear model in leaf
AZ_Tree(X,y, j=1)

#Boston.
using MLJ
X, y =  @load_boston; X = hcat(X...);
#
f(x,y)= mean(y) * ones(size(y,1)); #constant in leaf
AZ_Tree(X,y, j=1) 

sc= []; #use all variables
for j= 1 : size(X,2)
  s = AZ_Tree(X,y, j=j)
  push!(sc, s)
end
sc

f(x,y)= det(x'x)≈ 0.0 ? 0.0 : x * (x \ y); #linear model in leaf
AZ_Tree(X,y, j=1)

sc= [];
for j= 1 : size(X,2)
  s = AZ_Tree(X,y, j=j)
  push!(sc, s)
end
sc

jeremiedb · 2020-04-17T00:34:19Z

I can see that such approach for a linear predictor would be functional. For what I can see, it would however be a costly operator as evaluation of the split involves a full fit on all the observations rather on left and right for each split, rather than going through an online approach as is used is gradient based approaches like EvoTrees. In particular, I'm seeing challenges relating to the aggregation of the observations performed by the histogram method, which allows for fast training on large datasets. The aggregation of the first and second order derivatives by simple sum works for the the taylor approximation currently used, but could such trick be used for an arbitrarily defined function?

The above are concerns that apply considering my priority for now is about performance (training speed), so being able to perform training on histogram methods is needed. I can see though applicability on a tree algorithm working on the exact method, where no prior aggregation is performed. Potential linear effect I could see feasible to implement within EvoTrees context would be a dual pass where a linear fitting is performed on each variable in addition to the regular split checks, allowing to capture linear effect where the gain is more important than the regular binary split.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Terminal Leafs Cubist #47

Feature request: Terminal Leafs Cubist #47

azev77 commented Apr 2, 2020 •

edited

Loading

jeremiedb commented Apr 3, 2020

azev77 commented Apr 3, 2020

azev77 commented Apr 3, 2020

jeremiedb commented Apr 17, 2020

Feature request: Terminal Leafs Cubist #47

Feature request: Terminal Leafs Cubist #47

Comments

azev77 commented Apr 2, 2020 • edited Loading

jeremiedb commented Apr 3, 2020

azev77 commented Apr 3, 2020

azev77 commented Apr 3, 2020

jeremiedb commented Apr 17, 2020

azev77 commented Apr 2, 2020 •

edited

Loading