Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc] Brief introduction to base_score. #9882

Merged
merged 3 commits into from
Dec 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/parameter.rst
Original file line number Diff line number Diff line change
Expand Up @@ -390,6 +390,8 @@ Specify the learning task and the corresponding learning objective. The objectiv
disable the estimation, specify a real number argument.
- For sufficient number of iterations, changing this value will not have too much effect.

See :doc:`/tutorials/intercept` for more info.

* ``eval_metric`` [default according to objective]

- Evaluation metrics for validation data, a default metric will be assigned according to objective (rmse for regression, and logloss for classification, `mean average precision` for ``rank:map``, etc.)
Expand Down
3 changes: 2 additions & 1 deletion doc/tutorials/custom_metric_obj.rst
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,8 @@ available in XGBoost:
We use ``multi:softmax`` to illustrate the differences of transformed prediction. With
``softprob`` the output prediction array has shape ``(n_samples, n_classes)`` while for
``softmax`` it's ``(n_samples, )``. A demo for multi-class objective function is also
available at :ref:`sphx_glr_python_examples_custom_softmax.py`.
available at :ref:`sphx_glr_python_examples_custom_softmax.py`. Also, see
:doc:`/tutorials/intercept` for some more explanation.


**********************
Expand Down
1 change: 1 addition & 0 deletions doc/tutorials/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,5 @@ See `Awesome XGBoost <https://github.com/dmlc/xgboost/tree/master/demo>`_ for mo
input_format
param_tuning
custom_metric_obj
intercept
privacy_preserving
104 changes: 104 additions & 0 deletions doc/tutorials/intercept.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
#########
Intercept
#########

.. versionadded:: 2.0.0

Since 2.0.0, XGBoost supports estimating the model intercept (named ``base_score``)
automatically based on targets upon training. The behavior can be controlled by setting
``base_score`` to a constant value. The following snippet disables the automatic
estimation:

.. code-block:: python

import xgboost as xgb

reg = xgb.XGBRegressor()
reg.set_params(base_score=0.5)

In addition, here 0.5 represents the value after applying the inverse link function. See
the end of the document for a description.

Other than the ``base_score``, users can also provide global bias via the data field
``base_margin``, which is a vector or a matrix depending on the task. With multi-output
and multi-class, the ``base_margin`` is a matrix with size ``(n_samples, n_targets)`` or
``(n_samples, n_classes)``.

.. code-block:: python

import xgboost as xgb
from sklearn.datasets import make_regression

X, y = make_regression()

reg = xgb.XGBRegressor()
reg.fit(X, y)
# Request for raw prediction
m = reg.predict(X, output_margin=True)

reg_1 = xgb.XGBRegressor()
# Feed the prediction into the next model
reg.fit(X, y, base_margin=m)
reg.predict(X, base_margin=m)


It specifies the bias for each sample and can be used for stacking an XGBoost model on top
of other models, see :ref:`sphx_glr_python_examples_boost_from_prediction.py` for a worked
example. When ``base_margin`` is specified, it automatically overrides the ``base_score``
parameter. If you are stacking XGBoost models, then the usage should be relatively
straightforward, with the previous model providing raw prediction and a new model using
the prediction as bias. For more customized inputs, users need to take extra care of the
link function. Let :math:`F` be the model and :math:`g` be the link function, since
``base_score`` is overridden when sample-specific ``base_margin`` is available, we will
omit it here:

.. math::

g(E[y_i]) = F(x_i)


When base margin :math:`b` is provided, it's added to the raw model output :math:`F`:

.. math::

g(E[y_i]) = F(x_i) + b_i

and the output of the final model is:


.. math::

g^{-1}(F(x_i) + b_i)

Using the gamma deviance objective ``reg:gamma`` as an example, which has a log link
function, hence:

.. math::

\ln{(E[y_i])} = F(x_i) + b_i \\
E[y_i] = \exp{(F(x_i) + b_i)}

As a result, if you are feeding outputs from models like GLM with a corresponding
objective function, make sure the outputs are not yet transformed by the inverse link.

In the case of ``base_score`` (intercept), it can be accessed through
:py:meth:`~xgboost.Booster.save_config` after estimation. Unlike the ``base_margin``, the
returned value represents a value after applying inverse link. With logistic regression
and the logit link function as an example, given the ``base_score`` as 0.5,
:math:`g(intercept) = logit(0.5) = 0` is added to the raw model output:

.. math::

E[y_i] = g^{-1}{(F(x_i) + g(intercept))}

and 0.5 is the same as :math:`base_score = g^{-1}(0) = 0.5`. This is more intuitive if you
remove the model and consider only the intercept, which is estimated before the model is
fitted:

.. math::

E[y] = g^{-1}{g(intercept))} \\
E[y] = intercept

For some objectives like MAE, there are close solutions, while for others it's estimated
with one step Newton method.
2 changes: 1 addition & 1 deletion python-package/xgboost/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -785,7 +785,7 @@ def __init__(
so it doesn't make sense to assign weights to individual data points.

base_margin :
Base margin used for boosting from existing model.
Global bias for each instance. See :doc:`/tutorials/intercept` for details.
missing :
Value in the input data which needs to be present as a missing value. If
None, defaults to np.nan.
Expand Down
8 changes: 4 additions & 4 deletions python-package/xgboost/sklearn.py
Original file line number Diff line number Diff line change
Expand Up @@ -1012,7 +1012,7 @@ def fit(
sample_weight :
instance weights
base_margin :
global bias for each instance.
Global bias for each instance. See :doc:`/tutorials/intercept` for details.
eval_set :
A list of (X, y) tuple pairs to use as validation sets, for which
metrics will be computed.
Expand Down Expand Up @@ -1152,7 +1152,7 @@ def predict(
When this is True, validate that the Booster's and data's feature_names are
identical. Otherwise, it is assumed that the feature_names are the same.
base_margin :
Margin added to prediction.
Global bias for each instance. See :doc:`/tutorials/intercept` for details.
iteration_range :
Specifies which layer of trees are used in prediction. For example, if a
random forest is trained with 100 rounds. Specifying ``iteration_range=(10,
Expand Down Expand Up @@ -1605,7 +1605,7 @@ def predict_proba(
When this is True, validate that the Booster's and data's feature_names are
identical. Otherwise, it is assumed that the feature_names are the same.
base_margin :
Margin added to prediction.
Global bias for each instance. See :doc:`/tutorials/intercept` for details.
iteration_range :
Specifies which layer of trees are used in prediction. For example, if a
random forest is trained with 100 rounds. Specifying `iteration_range=(10,
Expand Down Expand Up @@ -1948,7 +1948,7 @@ def fit(
weights to individual data points.

base_margin :
Global bias for each instance.
Global bias for each instance. See :doc:`/tutorials/intercept` for details.
eval_set :
A list of (X, y) tuple pairs to use as validation sets, for which
metrics will be computed.
Expand Down
Loading