Implement Parameter validation and normalization #7

acalejos · 2023-05-19T02:17:54Z

Closes #3

Need to add documentation about parameters and many more test cases, but the main implementation is done

lib/exgboost/parameters.ex

seanmor5 · 2023-05-24T00:14:14Z

lib/exgboost/parameters.ex

+      default: false,
+      doc: """
+      Whether to use RAPIDS Memory Manager for memory allocation.
+      This This option is only applicable when XGBoost is built (compiled)


Suggested change

This This option is only applicable when XGBoost is built (compiled)

This option is only applicable when XGBoost is built (compiled)

seanmor5 · 2023-05-24T00:15:20Z

lib/exgboost/parameters.ex

+      type: :boolean,
+      default: false,
+      doc: """
+      Whether to use RAPIDS Memory Manager for memory allocation.


If this is only available in certain instances it should probably be a library config set globally and then grabbed from the application env.

Users would do:

config :exgboost, use_rmm: true

seanmor5 · 2023-05-24T00:16:06Z

lib/exgboost/parameters.ex

+          * `:gbtree`: tree-based models
+          * `:gblinear`: linear models
+          * `:dart`: tree-based models with dropouts


Suggested change

* `:gbtree`: tree-based models

* `:gblinear`: linear models

* `:dart`: tree-based models with dropouts

* `:gbtree` - tree-based models

* `:gblinear` - linear models

* `:dart` - tree-based models with dropouts

I think this looks better

seanmor5 · 2023-05-24T00:16:36Z

lib/exgboost/parameters.ex

+          * `:dart`: tree-based models with dropouts
+      """
+    ],
+    verbosity: [


Why is verbosity in both?

seanmor5 · 2023-05-24T00:16:54Z

lib/exgboost/parameters.ex

@@ -0,0 +1,968 @@
+defmodule EXGBoost.Parameters do
+  @global_params [


Oh if these are set globally perhaps they should be config values

seanmor5 · 2023-05-24T00:17:47Z

lib/exgboost/parameters.ex

+      that all parameters are valid strings which is what XGBoost is expecting.
+      """
+    ],
+    nthread: [


I think this should also be a configuration value. We have it as a config for EXLA. I can't think of a scenario where you'd want to set this per call to predict/train

Changing to

nthread: [ type: :non_neg_integer, default: Application.compile_env(:exgboost, :nthread, 0), doc: """ Number of threads to use for training and prediction. If `0`, then the number of threads is set to the number of cores. This can be set globally using the `:exgboost` application environment variable `:nthread` or on a per booster basis. If set globally, the value will be used for all boosters unless overridden by a specific booster. To set the number of threads globally, add the following to your `config.exs`: ````elixir config :exgboost, nthread: 4 ```` """ ]

Reason is to allow a global config while also allowing per booster config which xgboost supports

seanmor5 · 2023-05-24T00:18:23Z

lib/exgboost/parameters.ex

+      `:silent`, `:warning`, `:info`, `:debug`
+      """
+    ],
+    validate_parameters: [


I wouldn't make this an option. You should choose one behavior or the other. I prefer that you perform validation by default

So the main reason I am currently allowing this is to allow "power users" to use params that perhaps are not perfectly captured using the validators. I guess it's kind of a fail safe in case a validation is broken, it would at least allow people to pass the exact string params they want

Thinking in particular stuff like ndcg@n, map@n: ‘n’ can be assigned as an integer to cut off the top positions in the lists for evaluation.

Plus the default is true

seanmor5 · 2023-05-24T00:19:40Z

lib/exgboost/parameters.ex

+      doc: ~S'''
+      Minimum loss reduction required to make a further partition on a leaf node
+      of the tree. The larger `gamma` is, the more conservative the algorithm will
+      be. Valid range is [0, $\\infty$].


Make sure you set up the docs to render latex if you do this. Check the axon mix.exs for how

seanmor5 · 2023-05-24T00:19:55Z

lib/exgboost/parameters.ex

+      then the building process will give up further partitioning. In linear regression task,
+      this simply corresponds to minimum number of instances needed to be in each node.
+      The larger `min_child_weight` is, the more conservative the algorithm will be.
+      Valid range is `[0, Nx.Constants.infinity()]`.


Latex or Elixir code?

seanmor5 · 2023-05-24T00:20:21Z

lib/exgboost/parameters.ex

+          * `:uniform`: each training instance has an equal probability of being selected.
+            Typically set subsample >= 0.5 for good results.
+          * `:gradient_based`: the selection probability for each training instance is proportional
+              to the regularized absolute value of gradients. subsample may be set to as low as 0.1
+              without loss of model accuracy. Note that this sampling method is only supported when
+              `tree_method` is set to `gpu_hist`; other tree methods only support `:uniform` sampling.


Suggested change

* `:uniform`: each training instance has an equal probability of being selected.

Typically set subsample >= 0.5 for good results.

* `:gradient_based`: the selection probability for each training instance is proportional

to the regularized absolute value of gradients. subsample may be set to as low as 0.1

without loss of model accuracy. Note that this sampling method is only supported when

`tree_method` is set to `gpu_hist`; other tree methods only support `:uniform` sampling.

* `:uniform`- each training instance has an equal probability of being selected.

Typically set subsample >= 0.5 for good results.

* `:gradient_based` - the selection probability for each training instance is proportional

to the regularized absolute value of gradients. subsample may be set to as low as 0.1

without loss of model accuracy. Note that this sampling method is only supported when

`tree_method` is set to `gpu_hist`; other tree methods only support `:uniform` sampling.

seanmor5 · 2023-05-24T00:20:55Z

lib/exgboost/parameters.ex

+      type: :keyword_list,
+      doc: """
+      This is a family of parameters for subsampling of columns.
+      All `colsample_by*` parameters have a range of `(0, 1]`, the default value of `1`, and specify the fraction of columns to be subsampled.


Suggested change

All `colsample_by*` parameters have a range of `(0, 1]`, the default value of `1`, and specify the fraction of columns to be subsampled.

All `colsample_by*` parameters have a range of `(0, 1]`, a default value of `1`, and each specifies the fraction of columns to be subsampled.

seanmor5 · 2023-05-24T00:21:12Z

lib/exgboost/parameters.ex

+      doc: """
+      This is a family of parameters for subsampling of columns.
+      All `colsample_by*` parameters have a range of `(0, 1]`, the default value of `1`, and specify the fraction of columns to be subsampled.
+      `colsample_by*` parameters work cumulatively. For instance, the combination


Suggested change

`colsample_by*` parameters work cumulatively. For instance, the combination

`colsample_by` parameters work cumulatively. For instance, the combination

seanmor5 · 2023-05-24T00:21:43Z

lib/exgboost/parameters.ex

+      default: 1,
+      doc: """
+      L2 regularization term on weights. Increasing this value will make model more conservative.
+      Valid range is `[0, :infinity]`.


Just a reminder to choose between latex, elixir code, atoms

I think I'll do latex, definitely need to normalize that notation

seanmor5 · 2023-05-24T00:22:01Z

lib/exgboost/parameters.ex

+      Valid range is `[0, :infinity]`.
+      """
+    ],
+    reg_lambda: [


Is this required if it's an alias? Can't we just settle on one?

seanmor5 · 2023-05-24T00:22:11Z

lib/exgboost/parameters.ex

+      Valid range is `[0, Nx.Constants.infinity()]`.
+      """
+    ],
+    reg_alpha: [


Same as above

the aliases supported by xgboost itself so I thought we could too but we could easily take it out

seanmor5 · 2023-05-24T00:22:39Z

lib/exgboost/parameters.ex

+          * `:exact`: Exact greedy algorithm. Enumerates all split candidates.
+          * `:approx`: Approximate greedy algorithm using sketching and histogram.
+          * `:hist`: Faster histogram optimized approximate greedy algorithm.
+          * `:gpu_hist`: GPU implementation of hist algorithm.


Will this work out of the box? Do we need to build a gpu version?

This is not currently working as I haven't added NIFs for CUDA support yet. I can take it out until then

Yea I would remove until then, make an issue for GPU support and we can track all of adding this back in there

seanmor5 · 2023-05-24T00:22:54Z

lib/exgboost/parameters.ex

+      modular way to construct and to modify the trees. This is an advanced parameter that
+      is usually set automatically, depending on some other parameters. However, it could be
+      also set explicitly by a user. The following updaters exist:
+          * `:grow_colmaker`: non-distributed column-based construction of trees.


Format line between with hyphen

seanmor5 · 2023-05-24T00:44:08Z

lib/exgboost.ex

  accepts a keyword list of options that can be used to configure the training process. See the
  [XGBoost documentation](https://xgboost.readthedocs.io/en/latest/parameter.html) for the full list of options.

-  `Exgbost.train/2` uses the `Exgboost.Training.train/1` function to perform the actual training. `Exgboost.Training.train/1`
+  `Exgbost.train/2` uses the `EXGBoost.Training.train/1` function to perform the actual training. `EXGBoost.Training.train/1`


Suggested change

`Exgbost.train/2` uses the `EXGBoost.Training.train/1` function to perform the actual training. `EXGBoost.Training.train/1`

`EXGBoost.train/2` uses the `EXGBoost.Training.train/1` function to perform the actual training. `EXGBoost.Training.train/1`

lib/exgboost/application.ex

seanmor5 · 2023-05-26T14:01:17Z

lib/exgboost/array_interface.ex

@@ -1,13 +1,13 @@
-defmodule Exgboost.ArrayInterface do
+defmodule EXGBoost.ArrayInterface do
  @moduledoc false
  @typedoc """


You should probably have this floating closer to the @type

seanmor5

I just left 2 comments, but after that it should be good

seanmor5

I just left 2 comments, but after that it should be good

acalejos added 2 commits May 18, 2023 22:16

Parameters listed in module

6487e3b

Booster parameter validation implemented

3d06e74

seanmor5 reviewed May 19, 2023

View reviewed changes

lib/exgboost/parameters.ex Outdated Show resolved Hide resolved

seanmor5 reviewed May 19, 2023

View reviewed changes

lib/exgboost/parameters.ex Outdated Show resolved Hide resolved

acalejos added 3 commits May 19, 2023 22:57

Use NimbleOptions for params checking

34dee95

Rename package to EXGBoost

dad1ceb

Add parameters docs

ad3d99b

seanmor5 reviewed May 24, 2023

View reviewed changes

lib/exgboost/parameters.ex

* `:dart`: tree-based models with dropouts

"""

],

verbosity: [

Copy link

Collaborator

seanmor5 May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is verbosity in both?

seanmor5 reviewed May 24, 2023

View reviewed changes

acalejos added 5 commits May 23, 2023 23:33

Implement code review suggestions

ed3ba03

Fix typo

e12e212

improve parameter handling and add tests

fea270f

Add parameter tests

896444b

Update parameter tests

fe4fbe1

seanmor5 reviewed May 26, 2023

View reviewed changes

lib/exgboost/application.ex Show resolved Hide resolved

seanmor5 reviewed May 26, 2023

View reviewed changes

acalejos merged commit 19cb7d4 into main May 26, 2023

acalejos deleted the normalize_and_flatten_params branch May 26, 2023 15:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Parameter validation and normalization #7

Implement Parameter validation and normalization #7

acalejos commented May 19, 2023 •

edited

Loading

seanmor5 May 24, 2023

seanmor5 May 24, 2023

seanmor5 May 24, 2023

seanmor5 May 24, 2023

seanmor5 May 24, 2023

seanmor5 May 24, 2023

acalejos May 24, 2023

seanmor5 May 24, 2023

acalejos May 24, 2023

acalejos May 24, 2023

seanmor5 May 24, 2023

seanmor5 May 24, 2023

seanmor5 May 24, 2023

seanmor5 May 24, 2023

seanmor5 May 24, 2023

seanmor5 May 24, 2023

acalejos May 24, 2023

seanmor5 May 24, 2023

seanmor5 May 24, 2023

acalejos May 24, 2023

seanmor5 May 24, 2023

acalejos May 24, 2023

seanmor5 May 24, 2023

acalejos May 24, 2023

seanmor5 May 24, 2023

seanmor5 May 24, 2023

seanmor5 May 26, 2023

seanmor5 left a comment

seanmor5 left a comment

	This This option is only applicable when XGBoost is built (compiled)
	This option is only applicable when XGBoost is built (compiled)

		@@ -0,0 +1,968 @@
		defmodule EXGBoost.Parameters do
		@global_params [

	All `colsample_by*` parameters have a range of `(0, 1]`, the default value of `1`, and specify the fraction of columns to be subsampled.
	All `colsample_by*` parameters have a range of `(0, 1]`, a default value of `1`, and each specifies the fraction of columns to be subsampled.

	`colsample_by*` parameters work cumulatively. For instance, the combination
	`colsample_by` parameters work cumulatively. For instance, the combination

	`Exgbost.train/2` uses the `EXGBoost.Training.train/1` function to perform the actual training. `EXGBoost.Training.train/1`
	`EXGBoost.train/2` uses the `EXGBoost.Training.train/1` function to perform the actual training. `EXGBoost.Training.train/1`

Implement Parameter validation and normalization #7

Implement Parameter validation and normalization #7

Conversation

acalejos commented May 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanmor5 left a comment

Choose a reason for hiding this comment

seanmor5 left a comment

Choose a reason for hiding this comment

acalejos commented May 19, 2023 •

edited

Loading