-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need help in creating a MLJModelInterface.Model interface of a complex model #744
Comments
Thank you for taking the time to wrap your head around the MLJ API. I Also, happy to take a call at some point, if you think that would be I think implementing a single end-to-end model first is a good idea, Side question: The existing DecisionTree.jl models handle categorical
Yes, hyper-parameters are viewed as data-independent concepts. If a Per-sample weights or class-weights
Yes! More models is the MLJ way (so XGBoost is wrapped as three or four
Yes. For example, if for an ordinary regression target with missing
This sounds like an unsupervised model (aka "transformer") yes? These
Examples of "native" implementations of the MLJ model API:
No. They can live anywhere. The
The key is make sure that the predictions of classfiers (probabilistic
I suggest that you use
Your welcome.
|
Thank you :-) I'll work on your comments. I am not sure I understood your side question, as
|
Hello, I did add the metadata. While I do local testing using the fit/predict and the machine/evaluate workflow, how do I test the MLJ interface for the automatic model discovery and that the |
It seems that adding the metadata_pkg in the way I did does create a CI test error in Julia > 1.3:
I don't have this error when testing locally, and if I remove the MLJ interface it works: https://github.com/sylvaticus/BetaML.jl/actions/runs/603243482 It seems related to an How (where) should I then add the MLJ metadata for the interface ? |
I think this is essentially a bug in the metadata_utils.jl macro which turns up when your interface code is wrapped in a module. Is that what you are doing? edit Or, see workaround discussed in next comment.
This is a bit tricky to do yourself. Point me to your code, I'll check the Re: side question
Yes it will "work" but an ordering is implicitly used, which means you don't get the optimal splitting at each node, only splittings based on a separation of the classes consistent with the ordering; see this issue. It is for this reason that we restricted the You can get the optimal split using an ordering depending on the node; this ordering is defined (in case of regression) by the mean value of the target on each class, for training data that arrives at the given node. Breiman says somewhere that this works (finds the optimal split), but I can't remember if I found a proof. I implemented this here but this is very old poorly tested code 😢 |
Okay, see also this issue which states: "A quirk of submodules and evaluation scopes makes it necessary to load this submodule in the package init function." |
Hello, Concerning the nature of categorical features allowed, you was correct. DecisionTrees.jl models work only with features that are sortable. So, yes, please let me know if the interface registration is working. I will now work to the hyper-parameters range definition, but before moving to wrap the other models, I would like to have a feedback from you :-) EDIT: What |
Your metadata looks good to me. Are you willing to tag a new patch release? That will make testing the discoverability easier. I could hold off releasing the updated registry until your say-so, if you like.
You can just ignore this one for now. The idea is you specify default ranges over which to optimise each hyperparamter. Caret has this for all their models, which I understand is dearly loved, so we added this trait for future use. You could implement it, but it would mean adding as a MLJBase as a dependency, because that's where the |
Hello, I just issued a new release. On top of the Decision Tree / Random Forests models ( I am well aware that they are well obsolete now, but they could be interesting for historical reasons or to allow comparisons with more performant/new models. |
Congratulations on the new release 🎉 . And many thanks for implementing those extra models. I will add your package to the registry and test discoverability of the models shortly. |
Worked first time. I don't think that ever happened before! julia> using MLJ
julia> models("BetaML")
7-element Array{NamedTuple{(:name, :package_name, :is_supervised, :docstring, :hyperparameter_ranges, :hyperparameter_types, :hyperparameters, :implemented_methods, :is_pure_julia, :is_wrapper, :iteration_parameter, :load_path, :package_license, :package_url, :package_uuid, :prediction_type, :supports_class_weights, :supports_online, :supports_weights, :input_scitype, :target_scitype, :output_scitype),T} where T<:Tuple,1}:
(name = DecisionTreeClassifier, package_name = BetaML, ... )
(name = DecisionTreeRegressor, package_name = BetaML, ... )
(name = KernelPerceptronClassifier, package_name = BetaML, ... )
(name = PegasosClassifier, package_name = BetaML, ... )
(name = PerceptronClassifier, package_name = BetaML, ... )
(name = RandomForestClassifier, package_name = BetaML, ... )
(name = RandomForestRegressor, package_name = BetaML, ... ) This goes live after I tag a new MLJModels release. |
New models become live when JuliaRegistries/General#31325 merges. Thanks again for this contribution. Am closing but feel free to add API-related queries to this thread. |
From this discourse thread:
Hi there,
I am trying to build a MLJ interface for some ML algorithms in the BetaML package.
I am starting from the Decision Trees (I know decision trees are already available in MLJ, but I thought it was the easiest to start with), but I have a few questions.
The function creating (and fitting) the tree is:
maxFeatures
depends on the dimensionality of the explanation variables. I understood that model parameters should be part of the model struct, but how do I set defaults without seeing the data ? - [SOLVED: I did set some default of the default]forceClassification
. As in ML there are different type of models, probabilistic and deterministic, which one do I choose ? Or should I wrap it as two separate MLJ models ? [SOLVED: I created different MLJ models]Missing
data in the input. I read thatMissing
is a scientific type per se. Should I declare an Union of supported types then, including theMissing
? [UNSOLVED, but later step]predict(model,X)
method returns a vector of dictionary oflabel => prob
. I normally use arrays of T for the Y, but I saw that it works also with Y being aCategoricalArray
. However I am stuck here now, and don't know hot to return the prediction in the format wanted by MLJ. [SOLVED, but it was a pain]DecisionTree
, so that the user then select the desired one with thepkg
keyword (there are already two available in MLJ) or would be preferable a more specific name, likeBetaMLDecisionTree
?Thank you!
(PS: the "Simple User Defined Models" still refers to
MLJBase
rather thanMLJModelInterface
. )The text was updated successfully, but these errors were encountered: