-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Obscenely slow prediction #159
Comments
Thanks for reporting. That high prediction time is intriguing. Could you please provide enough detail here for others to reproduce the results, using publicly available data, or synthetic data? |
In that example above, I've run 1000 trees on a 1-dimensional binary classification data set with ~70,000 entries. It should be pretty easy to generate something like this. If you have a look at https://github.com/tecosaur/TreeComparison and run
|
Ok, I've started looking into this, and I've identified at least two major sub-optimalities in the design. One is the implementation of tree evaluation/prediction, the other is the design of the I'm currently trying to replace Leaf with this structure: struct Leaf{T, N}
features :: NTuple{N, T}
majority :: Int
values :: NTuple{N, Int}
total :: Int
end Which should make a prediction with probability on a leaf |
I've just done a bit more than the bare minimum, and so far the prediction with probability performance improvement is 2-10x with a sample iris dataset and a large-ish unidimensional data set. See: https://tecosaur.com/public/treeperf.html |
This looks like progress to me. Do you think we could get away with marking the proposed change to the Happy to review a PR. |
Ok, I was hoping to make further improvements, but it would probably be worth PRing the basic tree improvements I've made. |
Hello,
I'd love to use DecisionTree.jl for a project I'm currently working on, as it's great in lot of ways. Speedy to train, players nicely with AbstractTrees, etc.
Unfortunately, saying the prediction performance is "not good" is putting things mildly. I did a test run with an simplified version of one of the data sets I'm working with, and recorded the training and prediction times of DecisionTree.jl as well as a number of other common random forest implementations.
The competitiveness of the training time gives me hope that the DecisionTrees.jl should be able to be competitive with prediction performance too 🙂.
The text was updated successfully, but these errors were encountered: