-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Parameter validation and normalization #7
Conversation
lib/exgboost/parameters.ex
Outdated
default: false, | ||
doc: """ | ||
Whether to use RAPIDS Memory Manager for memory allocation. | ||
This This option is only applicable when XGBoost is built (compiled) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This This option is only applicable when XGBoost is built (compiled) | |
This option is only applicable when XGBoost is built (compiled) |
type: :boolean, | ||
default: false, | ||
doc: """ | ||
Whether to use RAPIDS Memory Manager for memory allocation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is only available in certain instances it should probably be a library config set globally and then grabbed from the application env.
Users would do:
config :exgboost, use_rmm: true
lib/exgboost/parameters.ex
Outdated
* `:gbtree`: tree-based models | ||
* `:gblinear`: linear models | ||
* `:dart`: tree-based models with dropouts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `:gbtree`: tree-based models | |
* `:gblinear`: linear models | |
* `:dart`: tree-based models with dropouts | |
* `:gbtree` - tree-based models | |
* `:gblinear` - linear models | |
* `:dart` - tree-based models with dropouts |
I think this looks better
* `:dart`: tree-based models with dropouts | ||
""" | ||
], | ||
verbosity: [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is verbosity in both?
@@ -0,0 +1,968 @@ | |||
defmodule EXGBoost.Parameters do | |||
@global_params [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh if these are set globally perhaps they should be config values
that all parameters are valid strings which is what XGBoost is expecting. | ||
""" | ||
], | ||
nthread: [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should also be a configuration value. We have it as a config for EXLA. I can't think of a scenario where you'd want to set this per call to predict/train
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing to
nthread: [
type: :non_neg_integer,
default: Application.compile_env(:exgboost, :nthread, 0),
doc: """
Number of threads to use for training and prediction. If `0`, then the
number of threads is set to the number of cores. This can be set globally
using the `:exgboost` application environment variable `:nthread`
or on a per booster basis. If set globally, the value will be used for
all boosters unless overridden by a specific booster.
To set the number of threads globally, add the following to your `config.exs`:
````elixir
config :exgboost, nthread: 4
````
"""
]
Reason is to allow a global config while also allowing per booster config which xgboost supports
`:silent`, `:warning`, `:info`, `:debug` | ||
""" | ||
], | ||
validate_parameters: [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't make this an option. You should choose one behavior or the other. I prefer that you perform validation by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the main reason I am currently allowing this is to allow "power users" to use params that perhaps are not perfectly captured using the validators. I guess it's kind of a fail safe in case a validation is broken, it would at least allow people to pass the exact string params they want
Thinking in particular stuff like ndcg@n, map@n: ‘n’ can be assigned as an integer to cut off the top positions in the lists for evaluation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plus the default is true
lib/exgboost/parameters.ex
Outdated
doc: ~S''' | ||
Minimum loss reduction required to make a further partition on a leaf node | ||
of the tree. The larger `gamma` is, the more conservative the algorithm will | ||
be. Valid range is [0, $\\infty$]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure you set up the docs to render latex if you do this. Check the axon mix.exs
for how
lib/exgboost/parameters.ex
Outdated
then the building process will give up further partitioning. In linear regression task, | ||
this simply corresponds to minimum number of instances needed to be in each node. | ||
The larger `min_child_weight` is, the more conservative the algorithm will be. | ||
Valid range is `[0, Nx.Constants.infinity()]`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latex or Elixir code?
lib/exgboost/parameters.ex
Outdated
* `:uniform`: each training instance has an equal probability of being selected. | ||
Typically set subsample >= 0.5 for good results. | ||
* `:gradient_based`: the selection probability for each training instance is proportional | ||
to the regularized absolute value of gradients. subsample may be set to as low as 0.1 | ||
without loss of model accuracy. Note that this sampling method is only supported when | ||
`tree_method` is set to `gpu_hist`; other tree methods only support `:uniform` sampling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `:uniform`: each training instance has an equal probability of being selected. | |
Typically set subsample >= 0.5 for good results. | |
* `:gradient_based`: the selection probability for each training instance is proportional | |
to the regularized absolute value of gradients. subsample may be set to as low as 0.1 | |
without loss of model accuracy. Note that this sampling method is only supported when | |
`tree_method` is set to `gpu_hist`; other tree methods only support `:uniform` sampling. | |
* `:uniform`- each training instance has an equal probability of being selected. | |
Typically set subsample >= 0.5 for good results. | |
* `:gradient_based` - the selection probability for each training instance is proportional | |
to the regularized absolute value of gradients. subsample may be set to as low as 0.1 | |
without loss of model accuracy. Note that this sampling method is only supported when | |
`tree_method` is set to `gpu_hist`; other tree methods only support `:uniform` sampling. |
lib/exgboost/parameters.ex
Outdated
type: :keyword_list, | ||
doc: """ | ||
This is a family of parameters for subsampling of columns. | ||
All `colsample_by*` parameters have a range of `(0, 1]`, the default value of `1`, and specify the fraction of columns to be subsampled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All `colsample_by*` parameters have a range of `(0, 1]`, the default value of `1`, and specify the fraction of columns to be subsampled. | |
All `colsample_by*` parameters have a range of `(0, 1]`, a default value of `1`, and each specifies the fraction of columns to be subsampled. |
lib/exgboost/parameters.ex
Outdated
doc: """ | ||
This is a family of parameters for subsampling of columns. | ||
All `colsample_by*` parameters have a range of `(0, 1]`, the default value of `1`, and specify the fraction of columns to be subsampled. | ||
`colsample_by*` parameters work cumulatively. For instance, the combination |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`colsample_by*` parameters work cumulatively. For instance, the combination | |
`colsample_by` parameters work cumulatively. For instance, the combination |
lib/exgboost/parameters.ex
Outdated
default: 1, | ||
doc: """ | ||
L2 regularization term on weights. Increasing this value will make model more conservative. | ||
Valid range is `[0, :infinity]`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a reminder to choose between latex, elixir code, atoms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'll do latex, definitely need to normalize that notation
lib/exgboost/parameters.ex
Outdated
Valid range is `[0, :infinity]`. | ||
""" | ||
], | ||
reg_lambda: [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required if it's an alias? Can't we just settle on one?
lib/exgboost/parameters.ex
Outdated
Valid range is `[0, Nx.Constants.infinity()]`. | ||
""" | ||
], | ||
reg_alpha: [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the aliases supported by xgboost itself so I thought we could too but we could easily take it out
lib/exgboost/parameters.ex
Outdated
* `:exact`: Exact greedy algorithm. Enumerates all split candidates. | ||
* `:approx`: Approximate greedy algorithm using sketching and histogram. | ||
* `:hist`: Faster histogram optimized approximate greedy algorithm. | ||
* `:gpu_hist`: GPU implementation of hist algorithm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this work out of the box? Do we need to build a gpu version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not currently working as I haven't added NIFs for CUDA support yet. I can take it out until then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I would remove until then, make an issue for GPU support and we can track all of adding this back in there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lib/exgboost/parameters.ex
Outdated
modular way to construct and to modify the trees. This is an advanced parameter that | ||
is usually set automatically, depending on some other parameters. However, it could be | ||
also set explicitly by a user. The following updaters exist: | ||
* `:grow_colmaker`: non-distributed column-based construction of trees. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Format line between with hyphen
lib/exgboost.ex
Outdated
accepts a keyword list of options that can be used to configure the training process. See the | ||
[XGBoost documentation](https://xgboost.readthedocs.io/en/latest/parameter.html) for the full list of options. | ||
|
||
`Exgbost.train/2` uses the `Exgboost.Training.train/1` function to perform the actual training. `Exgboost.Training.train/1` | ||
`Exgbost.train/2` uses the `EXGBoost.Training.train/1` function to perform the actual training. `EXGBoost.Training.train/1` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`Exgbost.train/2` uses the `EXGBoost.Training.train/1` function to perform the actual training. `EXGBoost.Training.train/1` | |
`EXGBoost.train/2` uses the `EXGBoost.Training.train/1` function to perform the actual training. `EXGBoost.Training.train/1` |
@@ -1,13 +1,13 @@ | |||
defmodule Exgboost.ArrayInterface do | |||
defmodule EXGBoost.ArrayInterface do | |||
@moduledoc false | |||
@typedoc """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably have this floating closer to the @type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just left 2 comments, but after that it should be good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just left 2 comments, but after that it should be good
Closes #3
Need to add documentation about parameters and many more test cases, but the main implementation is done