Skip to content

Commit

Permalink
Use bst_float consistently throughout (#1824)
Browse files Browse the repository at this point in the history
* Fix various typos

* Add override to functions that are overridden

gcc gives warnings about functions that are being overridden by not
being marked as oveirridden. This fixes it.

* Use bst_float consistently

Use bst_float for all the variables that involve weight,
leaf value, gradient, hessian, gain, loss_chg, predictions,
base_margin, feature values.

In some cases, when due to additions and so on the value can
take a larger value, double is used.

This ensures that type conversions are minimal and reduces loss of
precision.
  • Loading branch information
AbdealiLoKo authored and tqchen committed Nov 30, 2016
1 parent da2556f commit 6f16f0e
Show file tree
Hide file tree
Showing 50 changed files with 392 additions and 389 deletions.
14 changes: 7 additions & 7 deletions CONTRIBUTORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ Contributors of DMLC/XGBoost
============================
XGBoost has been developed and used by a group of active community. Everyone is more than welcomed to is a great way to make the project better and more accessible to more users.

Comitters
---------
Committers
----------
Committers are people who have made substantial contribution to the project and granted write access to the project.
* [Tianqi Chen](https://github.com/tqchen), University of Washington
- Tianqi is a PhD working on large-scale machine learning, he is the creator of the project.
Expand All @@ -16,14 +16,14 @@ Committers are people who have made substantial contribution to the project and
* [Yuan Tang](https://github.com/terrytangyuan)
- Yuan is a data scientist in Chicago, US. He contributed mostly in R and Python packages.

Become a Comitter
-----------------
XGBoost is a opensource project and we are actively looking for new comitters who are willing to help maintaining and lead the project.
Become a Committer
------------------
XGBoost is a opensource project and we are actively looking for new committers who are willing to help maintaining and lead the project.
Committers comes from contributors who:
* Made substantial contribution to the project.
* Willing to spent time on maintaining and lead the project.

New committers will be proposed by current comitter memembers, with support from more than two of current comitters.
New committers will be proposed by current committer members, with support from more than two of current committers.

List of Contributors
--------------------
Expand All @@ -44,7 +44,7 @@ List of Contributors
* [Giulio](https://github.com/giuliohome)
- Giulio is the creator of windows project of xgboost
* [Jamie Hall](https://github.com/nerdcha)
- Jamie is the initial creator of xgboost sklearn modue.
- Jamie is the initial creator of xgboost sklearn module.
* [Yen-Ying Lee](https://github.com/white1033)
* [Masaaki Horikoshi](https://github.com/sinhrks)
- Masaaki is the initial creator of xgboost python plotting module.
Expand Down
2 changes: 1 addition & 1 deletion demo/distributed-training/plot_model.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"source": [
"# XGBoost Model Analysis\n",
"\n",
"This notebook can be used to load and anlysis model learnt from all xgboost bindings, including distributed training. "
"This notebook can be used to load and analysis model learnt from all xgboost bindings, including distributed training. "
]
},
{
Expand Down
4 changes: 2 additions & 2 deletions demo/guide-python/custom_objective.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,9 @@ def logregobj(preds, dtrain):

# user defined evaluation function, return a pair metric_name, result
# NOTE: when you do customized loss function, the default prediction value is margin
# this may make buildin evalution metric not function properly
# this may make builtin evaluation metric not function properly
# for example, we are doing logistic loss, the prediction is score before logistic transformation
# the buildin evaluation error assumes input is after logistic transformation
# the builtin evaluation error assumes input is after logistic transformation
# Take this in mind when you use the customization, and maybe you need write customized evaluation function
def evalerror(preds, dtrain):
labels = dtrain.get_label()
Expand Down
2 changes: 1 addition & 1 deletion demo/kaggle-higgs/higgs-numpy.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
plst = list(param.items())+[('eval_metric', '[email protected]')]

watchlist = [ (xgmat,'train') ]
# boost 120 tres
# boost 120 trees
num_round = 120
print ('loading data end, start to boost trees')
bst = xgb.train( plst, xgmat, num_round, watchlist );
Expand Down
2 changes: 1 addition & 1 deletion demo/kaggle-higgs/speedtest.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
plst = param.items()+[('eval_metric', '[email protected]')]

watchlist = [ (xgmat,'train') ]
# boost 10 tres
# boost 10 trees
num_round = 10
print ('loading data end, start to boost trees')
print ("training GBM from sklearn")
Expand Down
4 changes: 2 additions & 2 deletions demo/kaggle-otto/otto_train_pred.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ test = test[,-1]

y = train[,ncol(train)]
y = gsub('Class_','',y)
y = as.integer(y)-1 #xgboost take features in [0,numOfClass)
y = as.integer(y)-1 # xgboost take features in [0,numOfClass)

x = rbind(train[,-ncol(train)],test)
x = as.matrix(x)
Expand All @@ -22,7 +22,7 @@ param <- list("objective" = "multi:softprob",
"num_class" = 9,
"nthread" = 8)

# Run Cross Valication
# Run Cross Validation
cv.nround = 50
bst.cv = xgb.cv(param=param, data = x[trind,], label = y,
nfold = 3, nrounds=cv.nround)
Expand Down
2 changes: 1 addition & 1 deletion demo/kaggle-otto/understandingXGBoostModel.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Introduction
While XGBoost is known for its fast speed and accurate predictive power, it also comes with various functions to help you understand the model.
The purpose of this RMarkdown document is to demonstrate how easily we can leverage the functions already implemented in **XGBoost R** package. Of course, everything showed below can be applied to the dataset you may have to manipulate at work or wherever!

First we will prepare the **Otto** dataset and train a model, then we will generate two vizualisations to get a clue of what is important to the model, finally, we will see how we can leverage these information.
First we will prepare the **Otto** dataset and train a model, then we will generate two visualisations to get a clue of what is important to the model, finally, we will see how we can leverage these information.

Preparation of the data
=======================
Expand Down
4 changes: 2 additions & 2 deletions include/xgboost/base.h
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
/*! \brief namespace of xgboo st*/
namespace xgboost {
/*!
* \brief unsigned interger type used in boost,
* \brief unsigned integer type used in boost,
* used for feature index and row index.
*/
typedef uint32_t bst_uint;
Expand All @@ -62,7 +62,7 @@ struct bst_gpair {
};

/*! \brief small eps gap for minimum split decision. */
const float rt_eps = 1e-6f;
const bst_float rt_eps = 1e-6f;

/*! \brief define unsigned long for openmp loop */
typedef dmlc::omp_ulong omp_ulong;
Expand Down
19 changes: 10 additions & 9 deletions include/xgboost/c_api.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,10 @@ XGB_EXTERN_C {
#define XGB_DLL XGB_EXTERN_C
#endif

// manually define unsign long
// manually define unsigned long
typedef uint64_t bst_ulong; // NOLINT(*)


/*! \brief handle to DMatrix */
typedef void *DMatrixHandle;
/*! \brief handle to Booster */
Expand Down Expand Up @@ -86,11 +87,11 @@ XGB_EXTERN_C typedef int XGBCallbackDataIterNext(
* \brief get string message of the last error
*
* all function in this file will return 0 when success
* and -1 when an error occured,
* and -1 when an error occurred,
* XGBGetLastError can be called to retrieve the error
*
* this function is threadsafe and can be called by different thread
* \return const char* error inforomation
* this function is thread safe and can be called by different thread
* \return const char* error information
*/
XGB_DLL const char *XGBGetLastError();

Expand Down Expand Up @@ -124,7 +125,7 @@ XGB_DLL int XGDMatrixCreateFromDataIter(
* \param indptr pointer to row headers
* \param indices findex
* \param data fvalue
* \param nindptr number of rows in the matix + 1
* \param nindptr number of rows in the matrix + 1
* \param nelem number of nonzero elements in the matrix
* \param num_col number of columns; when it's set to 0, then guess from data
* \param out created dmatrix
Expand All @@ -143,7 +144,7 @@ XGB_DLL int XGDMatrixCreateFromCSREx(const size_t* indptr,
* \param indptr pointer to row headers
* \param indices findex
* \param data fvalue
* \param nindptr number of rows in the matix + 1
* \param nindptr number of rows in the matrix + 1
* \param nelem number of nonzero elements in the matrix
* \param out created dmatrix
* \return 0 when success, -1 when failure happens
Expand All @@ -159,7 +160,7 @@ XGB_DLL int XGDMatrixCreateFromCSR(const bst_ulong *indptr,
* \param col_ptr pointer to col headers
* \param indices findex
* \param data fvalue
* \param nindptr number of rows in the matix + 1
* \param nindptr number of rows in the matrix + 1
* \param nelem number of nonzero elements in the matrix
* \param num_row number of rows; when it's set to 0, then guess from data
* \param out created dmatrix
Expand All @@ -178,7 +179,7 @@ XGB_DLL int XGDMatrixCreateFromCSCEx(const size_t* col_ptr,
* \param col_ptr pointer to col headers
* \param indices findex
* \param data fvalue
* \param nindptr number of rows in the matix + 1
* \param nindptr number of rows in the matrix + 1
* \param nelem number of nonzero elements in the matrix
* \param out created dmatrix
* \return 0 when success, -1 when failure happens
Expand All @@ -201,7 +202,7 @@ XGB_DLL int XGDMatrixCreateFromCSC(const bst_ulong *col_ptr,
XGB_DLL int XGDMatrixCreateFromMat(const float *data,
bst_ulong nrow,
bst_ulong ncol,
float missing,
float missing,
DMatrixHandle *out);
/*!
* \brief create a new dmatrix from sliced content of existing matrix
Expand Down
6 changes: 3 additions & 3 deletions include/xgboost/data.h
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ struct MetaInfo {
* \param i Instance index.
* \return The weight.
*/
inline float GetWeight(size_t i) const {
inline bst_float GetWeight(size_t i) const {
return weights.size() != 0 ? weights[i] : 1.0f;
}
/*!
Expand Down Expand Up @@ -253,7 +253,7 @@ class DMatrix {
* \brief check if column access is supported, if not, initialize column access.
* \param enabled whether certain feature should be included in column access.
* \param subsample subsample ratio when generating column access.
* \param max_row_perbatch auxilary information, maximum row used in each column batch.
* \param max_row_perbatch auxiliary information, maximum row used in each column batch.
* this is a hint information that can be ignored by the implementation.
* \return Number of column blocks in the column access.
*/
Expand Down Expand Up @@ -304,7 +304,7 @@ class DMatrix {
static DMatrix* Create(std::unique_ptr<DataSource>&& source,
const std::string& cache_prefix = "");
/*!
* \brief Create a DMatrix by loaidng data from parser.
* \brief Create a DMatrix by loading data from parser.
* Parser can later be deleted after the DMatrix i created.
* \param parser The input data parser
* \param cache_prefix The path to prefix of temporary cache file of the DMatrix when used in external memory mode.
Expand Down
10 changes: 5 additions & 5 deletions include/xgboost/gbm.h
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ class GradientBooster {
* we do not limit number of trees, this parameter is only valid for gbtree, but not for gblinear
*/
virtual void Predict(DMatrix* dmat,
std::vector<float>* out_preds,
std::vector<bst_float>* out_preds,
unsigned ntree_limit = 0) = 0;
/*!
* \brief online prediction function, predict score for one instance at a time
Expand All @@ -93,7 +93,7 @@ class GradientBooster {
* \sa Predict
*/
virtual void Predict(const SparseBatch::Inst& inst,
std::vector<float>* out_preds,
std::vector<bst_float>* out_preds,
unsigned ntree_limit = 0,
unsigned root_index = 0) = 0;
/*!
Expand All @@ -105,7 +105,7 @@ class GradientBooster {
* we do not limit number of trees, this parameter is only valid for gbtree, but not for gblinear
*/
virtual void PredictLeaf(DMatrix* dmat,
std::vector<float>* out_preds,
std::vector<bst_float>* out_preds,
unsigned ntree_limit = 0) = 0;
/*!
* \brief dump the model in the requested format
Expand All @@ -127,7 +127,7 @@ class GradientBooster {
static GradientBooster* Create(
const std::string& name,
const std::vector<std::shared_ptr<DMatrix> >& cache_mats,
float base_margin);
bst_float base_margin);
};

// implementing configure.
Expand All @@ -144,7 +144,7 @@ struct GradientBoosterReg
: public dmlc::FunctionRegEntryBase<
GradientBoosterReg,
std::function<GradientBooster* (const std::vector<std::shared_ptr<DMatrix> > &cached_mats,
float base_margin)> > {
bst_float base_margin)> > {
};

/*!
Expand Down
6 changes: 3 additions & 3 deletions include/xgboost/learner.h
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ class Learner : public rabit::Serializable {
*/
virtual void Predict(DMatrix* data,
bool output_margin,
std::vector<float> *out_preds,
std::vector<bst_float> *out_preds,
unsigned ntree_limit = 0,
bool pred_leaf = false) const = 0;
/*!
Expand Down Expand Up @@ -162,7 +162,7 @@ class Learner : public rabit::Serializable {
*/
inline void Predict(const SparseBatch::Inst &inst,
bool output_margin,
std::vector<float> *out_preds,
std::vector<bst_float> *out_preds,
unsigned ntree_limit = 0) const;
/*!
* \brief Create a new instance of learner.
Expand All @@ -185,7 +185,7 @@ class Learner : public rabit::Serializable {
// implementation of inline functions.
inline void Learner::Predict(const SparseBatch::Inst& inst,
bool output_margin,
std::vector<float>* out_preds,
std::vector<bst_float>* out_preds,
unsigned ntree_limit) const {
gbm_->Predict(inst, out_preds, ntree_limit);
if (out_preds->size() == 1) {
Expand Down
6 changes: 3 additions & 3 deletions include/xgboost/metric.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,9 @@ class Metric {
* the average statistics across all the node,
* this is only supported by some metrics
*/
virtual float Eval(const std::vector<float>& preds,
const MetaInfo& info,
bool distributed) const = 0;
virtual bst_float Eval(const std::vector<bst_float>& preds,
const MetaInfo& info,
bool distributed) const = 0;
/*! \return name of metric */
virtual const char* Name() const = 0;
/*! \brief virtual destructor */
Expand Down
8 changes: 4 additions & 4 deletions include/xgboost/objective.h
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ class ObjFunction {
* \param iteration current iteration number.
* \param out_gpair output of get gradient, saves gradient and second order gradient in
*/
virtual void GetGradient(const std::vector<float>& preds,
virtual void GetGradient(const std::vector<bst_float>& preds,
const MetaInfo& info,
int iteration,
std::vector<bst_gpair>* out_gpair) = 0;
Expand All @@ -52,13 +52,13 @@ class ObjFunction {
* \brief transform prediction values, this is only called when Prediction is called
* \param io_preds prediction values, saves to this vector as well
*/
virtual void PredTransform(std::vector<float> *io_preds) {}
virtual void PredTransform(std::vector<bst_float> *io_preds) {}
/*!
* \brief transform prediction values, this is only called when Eval is called,
* usually it redirect to PredTransform
* \param io_preds prediction values, saves to this vector as well
*/
virtual void EvalTransform(std::vector<float> *io_preds) {
virtual void EvalTransform(std::vector<bst_float> *io_preds) {
this->PredTransform(io_preds);
}
/*!
Expand All @@ -67,7 +67,7 @@ class ObjFunction {
* used by gradient boosting
* \return transformed value
*/
virtual float ProbToMargin(float base_score) const {
virtual bst_float ProbToMargin(bst_float base_score) const {
return base_score;
}
/*!
Expand Down
Loading

0 comments on commit 6f16f0e

Please sign in to comment.