Fix spelling in documents (#6948)

* Update roxygen2 doc. Co-authored-by: fis <[email protected]>
dmlc · May 11, 2021 · 3e7e426 · 3e7e426
1 parent 2a9979e
commit 3e7e426
Show file tree

Hide file tree

Showing 100 changed files with 284 additions and 284 deletions.
diff --git a/CONTRIBUTORS.md b/CONTRIBUTORS.md
@@ -43,7 +43,7 @@ Committers are people who have made substantial contribution to the project and
 
 Become a Committer
 ------------------
-XGBoost is a opensource project and we are actively looking for new committers who are willing to help maintaining and lead the project.
+XGBoost is a open source project and we are actively looking for new committers who are willing to help maintaining and lead the project.
 Committers comes from contributors who:
 * Made substantial contribution to the project.
 * Willing to spent time on maintaining and lead the project.
@@ -59,7 +59,7 @@ List of Contributors
 * [Skipper Seabold](https://github.com/jseabold)
   - Skipper is the major contributor to the scikit-learn module of XGBoost.
 * [Zygmunt Zając](https://github.com/zygmuntz)
-  - Zygmunt is the master behind the early stopping feature frequently used by kagglers.
+  - Zygmunt is the master behind the early stopping feature frequently used by Kagglers.
 * [Ajinkya Kale](https://github.com/ajkl)
 * [Boliang Chen](https://github.com/cblsjtu)
 * [Yangqing Men](https://github.com/yanqingmen)
@@ -91,7 +91,7 @@ List of Contributors
 * [Henry Gouk](https://github.com/henrygouk)
 * [Pierre de Sahb](https://github.com/pdesahb)
 * [liuliang01](https://github.com/liuliang01)
-  - liuliang01 added support for the qid column for LibSVM input format. This makes ranking task easier in distributed setting.
+  - liuliang01 added support for the qid column for LIBSVM input format. This makes ranking task easier in distributed setting.
 * [Andrew Thia](https://github.com/BlueTea88)
   - Andrew Thia implemented feature interaction constraints
 * [Wei Tian](https://github.com/weitian)

diff --git a/NEWS.md b/NEWS.md
@@ -1105,7 +1105,7 @@ This release marks a major milestone for the XGBoost project.
 * Specify version macro in CMake. (#4730)
 * Include dmlc-tracker into XGBoost Python package (#4731)
 * [CI] Use long key ID for Ubuntu repository fingerprints. (#4783)
-* Remove plugin, cuda related code in automake & autoconf files (#4789)
+* Remove plugin, CUDA related code in automake & autoconf files (#4789)
 * Skip related tests when scikit-learn is not installed. (#4791)
 * Ignore vscode and clion files (#4866)
 * Use bundled Google Test by default (#4900)
@@ -1136,7 +1136,7 @@ This release marks a major milestone for the XGBoost project.
 ### Usability Improvements, Documentation
 * Add Random Forest API to Python API doc (#4500)
 * Fix Python demo and doc. (#4545)
-* Remove doc about not supporting cuda 10.1 (#4578)
+* Remove doc about not supporting CUDA 10.1 (#4578)
 * Address some sphinx warnings and errors, add doc for building doc. (#4589)
 * Add instruction to run formatting checks locally (#4591)
 * Fix docstring for `XGBModel.predict()` (#4592)
@@ -1151,7 +1151,7 @@ This release marks a major milestone for the XGBoost project.
 * Update XGBoost4J-Spark doc (#4804)
 * Regular formatting for evaluation metrics (#4803)
 * [jvm-packages] Refine documentation for handling missing values in XGBoost4J-Spark (#4805)
-* Monitor for distributed envorinment (#4829). This is useful for identifying performance bottleneck.
+* Monitor for distributed environment (#4829). This is useful for identifying performance bottleneck.
 * Add check for length of weights and produce a good error message (#4872)
 * Fix DMatrix doc (#4884)
 * Export C++ headers in CMake installation (#4897)
@@ -1623,7 +1623,7 @@ This release is packed with many new features and bug fixes.
 ### Known issues
 * Quantile sketcher fails to produce any quantile for some edge cases (#2943)
 * The `hist` algorithm leaks memory when used with learning rate decay callback (#3579)
-* Using custom evaluation funciton together with early stopping causes assertion failure in XGBoost4J-Spark (#3595)
+* Using custom evaluation function together with early stopping causes assertion failure in XGBoost4J-Spark (#3595)
 * Early stopping doesn't work with `gblinear` learner (#3789)
 * Label and weight vectors are not reshared upon the change in number of GPUs (#3794). To get around this issue, delete the `DMatrix` object and re-load.
 * The `DMatrix` Python objects are initialized with incorrect values when given array slices (#3841)
@@ -1717,7 +1717,7 @@ This version is only applicable for the Python package. The content is identical
   - Add scripts to cross-build and deploy artifacts (#3276, #3307)
   - Fix a compilation error for Scala 2.10 (#3332)
 * BREAKING CHANGES
-  - `XGBClassifier.predict_proba()` no longer accepts paramter `output_margin`. The paramater makes no sense for `predict_proba()` because the method is to predict class probabilities, not raw margin scores.
+  - `XGBClassifier.predict_proba()` no longer accepts parameter `output_margin`. The parameter makes no sense for `predict_proba()` because the method is to predict class probabilities, not raw margin scores.
 
 ## v0.71 (2018.04.11)
 * This is a minor release, mainly motivated by issues concerning `pip install`, e.g. #2426, #3189, #3118, and #3194.
@@ -1733,7 +1733,7 @@ This version is only applicable for the Python package. The content is identical
   - AUC-PR metric for ranking task (#3172)
   - Monotonic constraints for 'hist' algorithm (#3085)
 * GPU support
-  - Create an abtract 1D vector class that moves data seamlessly between the main and GPU memory (#2935, #3116, #3068). This eliminates unnecessary PCIe data transfer during training time.
+    - Create an abstract 1D vector class that moves data seamlessly between the main and GPU memory (#2935, #3116, #3068). This eliminates unnecessary PCIe data transfer during training time.
   - Fix minor bugs (#3051, #3217)
   - Fix compatibility error for CUDA 9.1 (#3218)
 * Python package:
@@ -1761,7 +1761,7 @@ This version is only applicable for the Python package. The content is identical
 * Refactored gbm to allow more friendly cache strategy
   - Specialized some prediction routine
 * Robust `DMatrix` construction from a sparse matrix
-* Faster consturction of `DMatrix` from 2D NumPy matrices: elide copies, use of multiple threads
+* Faster construction of `DMatrix` from 2D NumPy matrices: elide copies, use of multiple threads
 * Automatically remove nan from input data when it is sparse.
   - This can solve some of user reported problem of istart != hist.size
 * Fix the single-instance prediction function to obtain correct predictions
@@ -1789,7 +1789,7 @@ This version is only applicable for the Python package. The content is identical
   - Faster, histogram-based tree algorithm (`tree_method='hist'`) .
   - GPU/CUDA accelerated tree algorithms (`tree_method='gpu_hist'` or `'gpu_exact'`), including the GPU-based predictor.
   - Monotonic constraints: when other features are fixed, force the prediction to be monotonic increasing with respect to a certain specified feature.
-  - Faster gradient caculation using AVX SIMD
+  - Faster gradient calculation using AVX SIMD
   - Ability to export models in JSON format
   - Support for Tweedie regression
   - Additional dropout options for DART: binomial+1, epsilon

diff --git a/R-package/R/callbacks.R b/R-package/R/callbacks.R
@@ -188,7 +188,7 @@ cb.reset.parameters <- function(new_params) {
   pnames <- gsub("\\.", "_", names(new_params))
   nrounds <- NULL
 
-  # run some checks in the begining
+  # run some checks in the beginning
   init <- function(env) {
     nrounds <<- env$end_iteration - env$begin_iteration + 1
 

diff --git a/R-package/R/utils.R b/R-package/R/utils.R
@@ -1,6 +1,6 @@
 #
-# This file is for the low level reuseable utility functions
-# that are not supposed to be visibe to a user.
+# This file is for the low level reusable utility functions
+# that are not supposed to be visible to a user.
 #
 
 #
@@ -284,7 +284,7 @@ xgb.createFolds <- function(y, k = 10)
     for (i in seq_along(numInClass)) {
       ## create a vector of integers from 1:k as many times as possible without
       ## going over the number of samples in the class. Note that if the number
-      ## of samples in a class is less than k, nothing is producd here.
+      ## of samples in a class is less than k, nothing is produced here.
       seqVector <- rep(seq_len(k), numInClass[i] %/% k)
       ## add enough random integers to get  length(seqVector) == numInClass[i]
       if (numInClass[i] %% k > 0) seqVector <- c(seqVector, sample.int(k, numInClass[i] %% k))

diff --git a/R-package/R/xgb.DMatrix.R b/R-package/R/xgb.DMatrix.R
@@ -1,7 +1,7 @@
 #' Construct xgb.DMatrix object
 #'
 #' Construct xgb.DMatrix object from either a dense matrix, a sparse matrix, or a local file.
-#' Supported input file formats are either a libsvm text file or a binary file that was created previously by
+#' Supported input file formats are either a LIBSVM text file or a binary file that was created previously by
 #' \code{\link{xgb.DMatrix.save}}).
 #'
 #' @param data a \code{matrix} object (either numeric or integer), a \code{dgCMatrix} object, or a character
@@ -161,9 +161,9 @@ dimnames.xgb.DMatrix <- function(x) {
 #' The \code{name} field can be one of the following:
 #'
 #' \itemize{
-#'     \item \code{label}: label Xgboost learn from ;
+#'     \item \code{label}: label XGBoost learn from ;
 #'     \item \code{weight}: to do a weight rescale ;
-#'     \item \code{base_margin}: base margin is the base prediction Xgboost will boost from ;
+#'     \item \code{base_margin}: base margin is the base prediction XGBoost will boost from ;
 #'     \item \code{nrow}: number of rows of the \code{xgb.DMatrix}.
 #'
 #' }
@@ -216,9 +216,9 @@ getinfo.xgb.DMatrix <- function(object, name, ...) {
 #' The \code{name} field can be one of the following:
 #'
 #' \itemize{
-#'     \item \code{label}: label Xgboost learn from ;
+#'     \item \code{label}: label XGBoost learn from ;
 #'     \item \code{weight}: to do a weight rescale ;
-#'     \item \code{base_margin}: base margin is the base prediction Xgboost will boost from ;
+#'     \item \code{base_margin}: base margin is the base prediction XGBoost will boost from ;
 #'     \item \code{group}: number of rows in each group (to use with \code{rank:pairwise} objective).
 #' }
 #'

diff --git a/R-package/R/xgb.plot.shap.R b/R-package/R/xgb.plot.shap.R
@@ -33,7 +33,7 @@
 #' @param col_loess a color to use for the loess curves.
 #' @param span_loess the \code{span} parameter in \code{\link[stats]{loess}}'s call.
 #' @param which whether to do univariate or bivariate plotting. NOTE: only 1D is implemented so far.
-#' @param plot whether a plot should be drawn. If FALSE, only a lits of matrices is returned.
+#' @param plot whether a plot should be drawn. If FALSE, only a list of matrices is returned.
 #' @param ... other parameters passed to \code{plot}.
 #'
 #' @details
@@ -157,7 +157,7 @@ xgb.plot.shap <- function(data, shap_contrib = NULL, features = NULL, top_n = 1,
       plot(x2plot, y, pch = pch, xlab = f, col = col, xlim = x_lim, ylim = y_lim, ylab = ylab, ...)
       grid()
       if (plot_loess) {
-        # compress x to 3 digits, and mean-aggredate y
+        # compress x to 3 digits, and mean-aggregate y
         zz <- data.table(x = signif(x, 3), y)[, .(.N, y = mean(y)), x]
         if (nrow(zz) <= 5) {
           lines(zz$x, zz$y, col = col_loess)

diff --git a/R-package/R/xgb.train.R b/R-package/R/xgb.train.R
@@ -26,7 +26,7 @@
 #'   \item \code{colsample_bytree} subsample ratio of columns when constructing each tree. Default: 1
 #'   \item \code{lambda} L2 regularization term on weights. Default: 1
 #'   \item \code{alpha} L1 regularization term on weights. (there is no L1 reg on bias because it is not important). Default: 0
-#'   \item \code{num_parallel_tree} Experimental parameter. number of trees to grow per round. Useful to test Random Forest through Xgboost (set \code{colsample_bytree < 1}, \code{subsample  < 1}  and \code{round = 1}) accordingly. Default: 1
+#'   \item \code{num_parallel_tree} Experimental parameter. number of trees to grow per round. Useful to test Random Forest through XGBoost (set \code{colsample_bytree < 1}, \code{subsample  < 1}  and \code{round = 1}) accordingly. Default: 1
 #'   \item \code{monotone_constraints} A numerical vector consists of \code{1}, \code{0} and \code{-1} with its length equals to the number of features in the training data. \code{1} is increasing, \code{-1} is decreasing and \code{0} is no constraint.
 #'   \item \code{interaction_constraints} A list of vectors specifying feature indices of permitted interactions. Each item of the list represents one permitted interaction where specified features are allowed to interact with each other. Feature index values should start from \code{0} (\code{0} references the first column).  Leave argument unspecified for no interaction constraints.
 #' }
@@ -51,10 +51,10 @@
 #'     \item \code{binary:logistic} logistic regression for binary classification. Output probability.
 #'     \item \code{binary:logitraw} logistic regression for binary classification, output score before logistic transformation.
 #'     \item \code{binary:hinge}: hinge loss for binary classification. This makes predictions of 0 or 1, rather than producing probabilities.
-#'     \item \code{count:poisson}: poisson regression for count data, output mean of poisson distribution. \code{max_delta_step} is set to 0.7 by default in poisson regression (used to safeguard optimization).
+#'     \item \code{count:poisson}: Poisson regression for count data, output mean of Poisson distribution. \code{max_delta_step} is set to 0.7 by default in poisson regression (used to safeguard optimization).
 #'     \item \code{survival:cox}: Cox regression for right censored survival time data (negative values are considered right censored). Note that predictions are returned on the hazard ratio scale (i.e., as HR = exp(marginal_prediction) in the proportional hazard function \code{h(t) = h0(t) * HR)}.
 #'     \item \code{survival:aft}: Accelerated failure time model for censored survival time data. See \href{https://xgboost.readthedocs.io/en/latest/tutorials/aft_survival_analysis.html}{Survival Analysis with Accelerated Failure Time} for details.
-#'     \item \code{aft_loss_distribution}: Probabilty Density Function used by \code{survival:aft} and \code{aft-nloglik} metric.
+#'     \item \code{aft_loss_distribution}: Probability Density Function used by \code{survival:aft} and \code{aft-nloglik} metric.
 #'     \item \code{multi:softmax} set xgboost to do multiclass classification using the softmax objective. Class is represented by a number and should be from 0 to \code{num_class - 1}.
 #'     \item \code{multi:softprob} same as softmax, but prediction outputs a vector of ndata * nclass elements, which can be further reshaped to ndata, nclass matrix. The result contains predicted probabilities of each data point belonging to each class.
 #'     \item \code{rank:pairwise} set xgboost to do ranking task by minimizing the pairwise loss.
@@ -126,11 +126,11 @@
 #' Parallelization is automatically enabled if \code{OpenMP} is present.
 #' Number of threads can also be manually specified via \code{nthread} parameter.
 #'
-#' The evaluation metric is chosen automatically by Xgboost (according to the objective)
+#' The evaluation metric is chosen automatically by XGBoost (according to the objective)
 #' when the \code{eval_metric} parameter is not provided.
 #' User may set one or several \code{eval_metric} parameters.
 #' Note that when using a customized metric, only this single metric can be used.
-#' The following is the list of built-in metrics for which Xgboost provides optimized implementation:
+#' The following is the list of built-in metrics for which XGBoost provides optimized implementation:
 #'   \itemize{
 #'      \item \code{rmse} root mean square error. \url{https://en.wikipedia.org/wiki/Root_mean_square_error}
 #'      \item \code{logloss} negative log-likelihood. \url{https://en.wikipedia.org/wiki/Log-likelihood}

diff --git a/R-package/demo/00Index b/R-package/demo/00Index
@@ -1,15 +1,15 @@
 basic_walkthrough               Basic feature walkthrough
 caret_wrapper                   Use xgboost to train in caret library
-custom_objective                Cutomize loss function, and evaluation metric
+custom_objective                Customize loss function, and evaluation metric
 boost_from_prediction           Boosting from existing prediction
 predict_first_ntree             Predicting using first n trees
 generalized_linear_model        Generalized Linear Model
 cross_validation                Cross validation
 create_sparse_matrix            Create Sparse Matrix
 predict_leaf_indices            Predicting the corresponding leaves
 early_stopping                  Early Stop in training
-poisson_regression              Poisson Regression on count data
-tweedie_regression              Tweddie Regression
+poisson_regression              Poisson regression on count data
+tweedie_regression              Tweedie regression
 gpu_accelerated                 GPU-accelerated tree building algorithms
 interaction_constraints         Interaction constraints among features
 
diff --git a/R-package/demo/README.md b/R-package/demo/README.md
@@ -2,7 +2,7 @@ XGBoost R Feature Walkthrough
 ====
 * [Basic walkthrough of wrappers](basic_walkthrough.R)
 * [Train a xgboost model from caret library](caret_wrapper.R)
-* [Cutomize loss function, and evaluation metric](custom_objective.R)
+* [Customize loss function, and evaluation metric](custom_objective.R)
 * [Boosting from existing prediction](boost_from_prediction.R)
 * [Predicting using first n trees](predict_first_ntree.R)
 * [Generalized Linear Model](generalized_linear_model.R)

diff --git a/R-package/demo/basic_walkthrough.R b/R-package/demo/basic_walkthrough.R
@@ -40,7 +40,7 @@ print("Train xgboost with verbose 2, also print information about tree")
 bst <- xgboost(data = dtrain, max_depth = 2, eta = 1, nrounds = 2,
                nthread = 2, objective = "binary:logistic", verbose = 2)
 
-# you can also specify data as file path to a LibSVM format input
+# you can also specify data as file path to a LIBSVM format input
 # since we do not have this file with us, the following line is just for illustration
 # bst <- xgboost(data = 'agaricus.train.svm', max_depth = 2, eta = 1, nrounds = 2,objective = "binary:logistic")
 

diff --git a/R-package/demo/create_sparse_matrix.R b/R-package/demo/create_sparse_matrix.R
@@ -2,17 +2,17 @@ require(xgboost)
 require(Matrix)
 require(data.table)
 if (!require(vcd)) {
-  install.packages('vcd') #Available in Cran. Used for its dataset with categorical values.
+  install.packages('vcd') #Available in CRAN. Used for its dataset with categorical values.
   require(vcd)
 }
-# According to its documentation, Xgboost works only on numbers.
+# According to its documentation, XGBoost works only on numbers.
 # Sometimes the dataset we have to work on have categorical data.
 # A categorical variable is one which have a fixed number of values. By example, if for each observation a variable called "Colour" can have only "red", "blue" or "green" as value, it is a categorical variable.
 #
 # In R, categorical variable is called Factor.
 # Type ?factor in console for more information.
 #
-# In this demo we will see how to transform a dense dataframe with categorical variables to a sparse matrix before analyzing it in Xgboost.
+# In this demo we will see how to transform a dense dataframe with categorical variables to a sparse matrix before analyzing it in XGBoost.
 # The method we are going to see is usually called "one hot encoding".
 
 #load Arthritis dataset in memory.
@@ -25,13 +25,13 @@ df <- data.table(Arthritis, keep.rownames = FALSE)
 cat("Print the dataset\n")
 print(df)
 
-# 2 columns have factor type, one has ordinal type (ordinal variable is a categorical variable with values wich can be ordered, here: None > Some > Marked).
+# 2 columns have factor type, one has ordinal type (ordinal variable is a categorical variable with values which can be ordered, here: None > Some > Marked).
 cat("Structure of the dataset\n")
 str(df)
 
 # Let's add some new categorical features to see if it helps. Of course these feature are highly correlated to the Age feature. Usually it's not a good thing in ML, but Tree algorithms (including boosted trees) are able to select the best features, even in case of highly correlated features.
 
-# For the first feature we create groups of age by rounding the real age. Note that we transform it to factor (categorical data) so the algorithm treat them as independant values.
+# For the first feature we create groups of age by rounding the real age. Note that we transform it to factor (categorical data) so the algorithm treat them as independent values.
 df[, AgeDiscret := as.factor(round(Age / 10, 0))]
 
 # Here is an even stronger simplification of the real age with an arbitrary split at 30 years old. I choose this value based on nothing. We will see later if simplifying the information based on arbitrary values is a good strategy (I am sure you already have an idea of how well it will work!).