[R-package] v4.0.0 CRAN submission issues #5987

jameslamb · 2023-07-18T02:23:34Z

Description

The recent submission of {lightgbm} v4.0.0 to CRAN was rejected because of the following issues detected by R CMD check.

* checking examples ... [8s/4s] NOTE
Examples with CPU time > 2.5 times elapsed time
                    user system elapsed ratio
lgb.restore_handle 1.819  0.128   0.165  11.8

* checking S3 generic/method consistency ... NOTE
Mismatches for apparent methods not registered:
merge:
  function(x, y, ...)
merge.eval.string:
  function(env)

format:
  function(x, ...)
format.eval.string:
  function(eval_res, eval_err)
See section 'Registering S3 methods' in the 'Writing R Extensions'
manual.

These need to be fixed for CRAN to accept our submission.

Reproducible example

These check failures happened during the CRAN incoming checks.

Environment info

LightGBM version or commit hash: v4.0.0.

Additional Comments

I will put up a pull request shortly to fix these things. We can figure out later how to catch them in CI so they don't surprise us on CRAN submissions in the future.

The examples timing issue is because I did not complete #5102 as part of this release, sorry 😞 . I said I would almost a year ago (#5102 (comment)), just forgot about it.

The text was updated successfully, but these errors were encountered:

…5102, fixes #5987)

…5987) (#5988)

jameslamb · 2023-07-21T19:45:55Z

Re-opening this. Sadly our v4.0.0.1 submission still did not resolve the issue about using more than 2 threads in examples and tests, and was not accepted to CRAN.

mayer79 · 2023-07-24T14:44:05Z

Regarding the timing: in the worst case, pack the firing examples in a dontrun block:

@examples
#' \dontrun{
#' do some lgb magic
#' }

jameslamb · 2023-07-24T15:19:50Z

Thanks for the suggestion @mayer79 ! But I don't think that's the fix we should pursue.

I don't think we can safely assume it's only the one example that happened to be reported in CRAN's initial checks. The timings are unpredictable and depend on, for example, what else happens to be running on the CRAN check machines at the same time as they're checking {lightgbm}.

In addition...I no longer trust \dontrun or \donttest mechanisms as a way to avoid issues in the CRAN checks.

We used to wrap all examples in \dontrun (see #3270)... until CRAN manual review by a human told us we had to drop that in favor of \donttest (#3338).

... and then with R 4.0, R CMD check started running all the \donttest{} examples anyway 🙃

See https://cran.r-project.org/doc/manuals/r-devel/NEWS.html

*R CMD check --as-cran now runs \donttest examples

We have to find a more permanent solution to ensure that everything multi-threaded in this project uses at most 2 threads in tests and examples. It's frustrating that v4.0.0 of the project is being blocked from CRAN because of an example that takes less than 2 seconds to run but... we can only work on the things we can control.

mayer79 · 2023-07-24T16:59:14Z

Ha, darn. I recently had the same problem: Setting all num_threads in all examples in https://github.com/ModelOriented/shapviz would not bring down those example runtimes. Then I just packed them into {dontrun}.

The issue about merge.* is already solved, thx!

david-cortes · 2023-07-26T17:08:39Z

Regarding the issue with the threads on lgb.restore_handle - given that bug #4705 hasn't been addressed, I guess it could be solved by passing num_threads=0, or by a few calls to RhpcBLASctl (which is already under Suggests).

I also don't think that'd be grounds for rejection (the one about S3 mismatch would though) since, when checks fail due to large thread usage, it typically manifests as a segmentation fault on the solaris checks (or at least it does when passing the numer of threads on individual pragmas, which this library doesn't).

jameslamb · 2023-07-26T18:29:32Z

I also don't think that'd be grounds for rejection

That is not correct. Our second attempted submission was rejected by the automatic CRAN checks, and the only issue was that NOTE about the example timings.

jameslamb · 2023-09-03T06:31:55Z

There has been significant discussion about this recently on the R-pkg-devel mailing list. Many package authors have been facing CRAN's newly-strict insistence on projects only using 2 threads in all places.

(June 14 - present) Best practice for using max 2 cores on CRAN? Rdatatable/data.table#5658
(July 16) "Issue with CPU time > 2.5 times elapsed time" (link)
(Aug 24) "Re-building vignettes had CPU time 9.2 times elapsed time" (link)
(Sep 1) "CRAN Detected Using More Than 2 Cores By Default" (link)

I'm still reading through all of them, so won't attempt to summarize yet... just linking these to make it clear this is not a LightGBM-specific issue.

jameslamb · 2023-09-03T06:37:43Z

Additionally, see this recent comment on an old {data.table} issue where one maintainer of a package on CRAN reported being told that OMP_THREAD_LIMIT environment variable is no longer set on CRAN:

Rdatatable/data.table#3300 (comment)

david-cortes · 2023-09-04T15:38:44Z

Note that those discussions from data.table are around violations of the CRAN policy that are caused by calling functions and methods from the data.table package (which are multi-threaded by default) in examples and tests of packages that have data.table as a dependency.

The issue here is not coming from usage of data.table in the lightgbm package, but from direct usage of OpenMP pragmas with the global config instead of using a local num_threads clause.

jameslamb · 2023-09-04T15:48:59Z

That's not entirely true. That conversation has also continued the r-pkg-devel mailing list conversation about:

CRAN no longer setting OMP_THREAD_LIMIT (Warning on CRAN: cannot form a team with 24 threads, using 2 instead Rdatatable/data.table#3300 (comment))
whether every package using parallelism and hoping to remain on CRAN should have to add something like data.table::setDTthreads() to its public interface (Best practice for using max 2 cores on CRAN? Rdatatable/data.table#5658 (comment))

trivialfis · 2023-09-13T14:09:06Z

XGBoost release is running into similar issues with system time constraints: dmlc/xgboost#9497 (comment)

We actually test this https://github.com/dmlc/xgboost/blob/f90d034a86784f4f07417d1d28a77b4a189acc89/tests/ci_build/test_r_package.py#L109 , but I haven't been able to reproduce the error.

simonpcouch · 2023-09-18T19:36:18Z

Appreciate the tag on the bundle issue! Just wanted to drop our approach to skipping long-running or dependency-heavy examples conditionally here in case it's useful for yall. In some of our packages, we define a helper that tells us whether we ought to run long-running or dependency-heavy examples, used like so. The is_cran_check() definition assumes that we check via devtools::check() rather than via R CMD check, and would need adaptation if that's not yall's workflow.

Definitely not ideal, though somewhat more resilient to CRAN check environments than \dontrun and \donttest.

jameslamb · 2023-10-07T06:31:07Z

I'm actively working on this. Just sharing my notes as I go through it. Before putting up potential fixes, I'm first trying to find a way to reproduce the timing-related warnings.

On my macbook pro (1 Dual-Core Intel Core i5 CPU), I gave Docker for Mac access to 4 CPUs then ran the following from the root of this repo.

docker run \
    --rm \
    --cpus 4 \
    --env MAKEFLAGS=-j2 \
    -v $(pwd):/opt/LightGBM \
    -w /opt/LightGBM \
    -it rocker/verse:4.3.1 \
    /bin/bash

Installed OpenMP in there.

apt-get update
apt-get install -y \
    libomp-dev

And the one Suggests dependency of {lightgbm} not already available in that image.

Rscript \
    --vanilla \
    -e "install.packages(c('RhpcBLASctl'), repos = 'https://cran.r-project.org')"

Then built a CRAN-style source distribution of the R package, and moved it to a new directory (not on the path that's mounted, so reads and writes are faster).

mkdir -p /opt/r-check
sh build-cran-package.sh --no-build-vignettes
cp ./lightgbm_4.1.0.99.tar.gz /opt/r-check/
cd /opt/r-check

Approach 1: R CMD check

Then ran R CMD check with some customizations to force it to print all the timings.

OMP_NUM_THREADS=2 \
_R_CHECK_EXAMPLE_TIMING_THRESHOLD_=0 \
_R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD_=2.5 \
R --vanilla CMD check \
    --no-codoc \
    --no-manual \
    --no-tests \
    --no-vignettes \
    --run-dontrun \
    --run-donttest \
    --timings \
    ./lightgbm_4.1.0.99.tar.gz

For a description of what those flags mean, see dmlc/xgboost#9497 (comment).

results with `OMP_NUM_THREADS=1` (click me)

readRDS.lgb.Booster         0.531  0.015   0.405
lgb.plot.interpretation     0.438  0.003   0.410
lgb.interprete              0.285  0.002   0.255
lgb.cv                      0.270  0.005   0.174
lgb.model.dt.tree           0.246  0.005   0.193
lgb.importance              0.211  0.002   0.165
lgb.load                    0.154  0.010   0.105
saveRDS.lgb.Booster         0.145  0.001   0.093
lgb.train                   0.143  0.002   0.107
predict.lgb.Booster         0.133  0.003   0.084
lgb.plot.importance         0.128  0.000   0.097
lgb.save                    0.117  0.008   0.080
lgb.restore_handle          0.111  0.002   0.081
lgb.configure_fast_predict  0.100  0.001   0.064
lgb.get.eval.result         0.100  0.000   0.067
lgb.dump                    0.090  0.000   0.056
set_field                   0.087  0.000   0.080
lgb.Dataset.create.valid    0.053  0.026   0.079
get_field                   0.066  0.003   0.069
lgb.Dataset                 0.053  0.010   0.064
dimnames.lgb.Dataset        0.056  0.003   0.059
slice                       0.055  0.000   0.054
dim                         0.049  0.004   0.053
lgb.Dataset.set.categorical 0.049  0.001   0.049
lgb.Dataset.set.reference   0.048  0.000   0.049
lgb.Dataset.save            0.046  0.001   0.047
lgb.Dataset.construct       0.040  0.004   0.043
lgb.convert_with_rules      0.020  0.000   0.020

results with `OMP_NUM_THREADS=2` (click me)

Examples with CPU (user + system) or elapsed time > 0s
                             user system elapsed
lgb.plot.interpretation     0.442  0.003   0.329
lgb.interprete              0.322  0.002   0.215
lgb.cv                      0.192  0.004   0.124
lgb.importance              0.191  0.001   0.134
lgb.model.dt.tree           0.172  0.001   0.126
lgb.train                   0.138  0.004   0.107
lgb.plot.importance         0.125  0.000   0.090
lgb.load                    0.100  0.008   0.066
saveRDS.lgb.Booster         0.106  0.002   0.063
predict.lgb.Booster         0.104  0.002   0.065
lgb.Dataset.create.valid    0.071  0.033   0.078
dim                         0.096  0.007   0.102
readRDS.lgb.Booster         0.100  0.002   0.064
lgb.restore_handle          0.091  0.004   0.064
lgb.configure_fast_predict  0.088  0.001   0.054
lgb.dump                    0.086  0.003   0.056
lgb.save                    0.087  0.002   0.057
lgb.get.eval.result         0.082  0.004   0.055
lgb.Dataset                 0.071  0.010   0.052
slice                       0.078  0.001   0.048
dimnames.lgb.Dataset        0.074  0.004   0.067
get_field                   0.074  0.003   0.055
set_field                   0.075  0.001   0.050
lgb.Dataset.set.categorical 0.059  0.003   0.040
lgb.Dataset.save            0.059  0.001   0.040
lgb.Dataset.construct       0.057  0.001   0.038
lgb.Dataset.set.reference   0.046  0.001   0.033
lgb.convert_with_rules      0.033  0.002   0.022

results with `OMP_NUM_THREADS=3` (click me)

                             user system elapsed
lgb.plot.interpretation     0.463  0.000   0.331
lgb.interprete              0.356  0.004   0.234
lgb.cv                      0.215  0.002   0.138
lgb.importance              0.209  0.002   0.151
lgb.model.dt.tree           0.179  0.000   0.134
lgb.Dataset.create.valid    0.115  0.032   0.086
lgb.plot.importance         0.125  0.000   0.092
lgb.train                   0.118  0.003   0.089
readRDS.lgb.Booster         0.118  0.003   0.066
lgb.load                    0.109  0.010   0.072
lgb.Dataset                 0.110  0.007   0.052
slice                       0.115  0.000   0.052
saveRDS.lgb.Booster         0.101  0.000   0.062
lgb.configure_fast_predict  0.099  0.001   0.054
predict.lgb.Booster         0.097  0.002   0.058
lgb.get.eval.result         0.096  0.002   0.062
get_field                   0.095  0.001   0.053
set_field                   0.095  0.000   0.048
lgb.dump                    0.090  0.004   0.058
lgb.save                    0.089  0.005   0.059
lgb.restore_handle          0.083  0.002   0.058
lgb.Dataset.save            0.076  0.004   0.040
lgb.Dataset.set.categorical 0.076  0.003   0.040
lgb.Dataset.construct       0.071  0.001   0.038
dimnames.lgb.Dataset        0.065  0.003   0.045
lgb.Dataset.set.reference   0.057  0.002   0.034
dim                         0.044  0.002   0.046
lgb.convert_with_rules      0.043  0.003   0.021

results with `OMP_NUM_THREADS=4` (click me)

* checking examples ... OK
Examples with CPU (user + system) or elapsed time > 0s
                             user system elapsed
lgb.plot.interpretation     0.456  0.004   0.320
lgb.interprete              0.359  0.000   0.223
lgb.cv                      0.213  0.000   0.136
dimnames.lgb.Dataset        0.211  0.000   0.135
lgb.importance              0.196  0.000   0.140
lgb.Dataset.create.valid    0.186  0.002   0.101
lgb.model.dt.tree           0.181  0.001   0.131
readRDS.lgb.Booster         0.160  0.006   0.074
lgb.Dataset                 0.156  0.000   0.056
slice                       0.146  0.005   0.054
lgb.plot.importance         0.139  0.001   0.102
get_field                   0.139  0.000   0.064
dim                         0.138  0.000   0.139
lgb.configure_fast_predict  0.122  0.004   0.067
lgb.train                   0.119  0.002   0.088
set_field                   0.121  0.000   0.060
lgb.load                    0.108  0.005   0.069
saveRDS.lgb.Booster         0.102  0.000   0.063
predict.lgb.Booster         0.100  0.000   0.063
lgb.Dataset.set.categorical 0.097  0.001   0.039
lgb.Dataset.construct       0.095  0.000   0.039
lgb.dump                    0.094  0.000   0.058
lgb.save                    0.088  0.003   0.056
lgb.Dataset.set.reference   0.085  0.004   0.037
lgb.restore_handle          0.088  0.001   0.061
lgb.get.eval.result         0.086  0.000   0.054
lgb.Dataset.save            0.079  0.004   0.036
lgb.convert_with_rules      0.056  0.006   0.024

(CPU time being around 2x elapsed time is expected for many of these examples, since we hardcoded num_threads = 2 into a bunch of the examples in #5988 in an earlier attempt to fix this)

Approach 2: directly running the examples

Following @trivialfis 's excellent work from dmlc/xgboost#9497 (comment), I tried just installing the package and running the examples directly.

output of doing that (click me)

R --vanilla CMD INSTALL ./lightgbm_4.1.0.99.tar.gz

cd /opt/LightGBM

cat << EOF > check-examples.R
library(pkgload)
library(lightgbm)

files <- list.files(
    "./R-package/man"
    , pattern="*.Rd"
    , full.names = TRUE
)

run_example_timeit <- function(f) {
  print(paste("Test", f))
  flush.console()
  t0 <- proc.time()
  capture.output({
       pkgload::run_example(f, run_donttest = TRUE, quiet = TRUE)
   })
  example_timing = proc.time() - t0
  print(example_timing)
}

timings <- lapply(files, run_example_timeit)
EOF

OMP_NUM_THREADS=2 \
Rscript --vanilla ./check-examples.R

That printed the following:

[1] "Test ./R-package/man/agaricus.test.Rd"
   user  system elapsed
  0.018   0.004   0.035
[1] "Test ./R-package/man/agaricus.train.Rd"
   user  system elapsed
  0.002   0.005   0.012
[1] "Test ./R-package/man/bank.Rd"
   user  system elapsed
  0.002   0.002   0.007
[1] "Test ./R-package/man/dim.Rd"
   user  system elapsed
  0.048   0.005   0.056
[1] "Test ./R-package/man/dimnames.lgb.Dataset.Rd"
   user  system elapsed
  0.049   0.005   0.049
[1] "Test ./R-package/man/get_field.Rd"
   user  system elapsed
  0.091   0.002   0.065
[1] "Test ./R-package/man/lgb_shared_dataset_params.Rd"
   user  system elapsed
  0.009   0.001   0.007
[1] "Test ./R-package/man/lgb_shared_params.Rd"
   user  system elapsed
  0.009   0.002   0.007
[1] "Test ./R-package/man/lgb.configure_fast_predict.Rd"
   user  system elapsed
  0.114   0.006   0.088
[1] "Test ./R-package/man/lgb.convert_with_rules.Rd"
   user  system elapsed
  0.042   0.003   0.041
[1] "Test ./R-package/man/lgb.cv.Rd"
   user  system elapsed
  0.183   0.003   0.122
[1] "Test ./R-package/man/lgb.Dataset.construct.Rd"
   user  system elapsed
  0.068   0.001   0.051
[1] "Test ./R-package/man/lgb.Dataset.create.valid.Rd"
   user  system elapsed
  0.090   0.034   0.098
[1] "Test ./R-package/man/lgb.Dataset.Rd"
   user  system elapsed
  0.077   0.006   0.059
[1] "Test ./R-package/man/lgb.Dataset.save.Rd"
   user  system elapsed
  0.069   0.001   0.051
[1] "Test ./R-package/man/lgb.Dataset.set.categorical.Rd"
   user  system elapsed
  0.067   0.003   0.051
[1] "Test ./R-package/man/lgb.Dataset.set.reference.Rd"
   user  system elapsed
  0.056   0.001   0.047
[1] "Test ./R-package/man/lgb.drop_serialized.Rd"
   user  system elapsed
  0.002   0.001   0.007
[1] "Test ./R-package/man/lgb.dump.Rd"
   user  system elapsed
  0.089   0.002   0.073
[1] "Test ./R-package/man/lgb.get.eval.result.Rd"
   user  system elapsed
  0.094   0.002   0.066
[1] "Test ./R-package/man/lgb.importance.Rd"
   user  system elapsed
  0.221   0.003   0.175
[1] "Test ./R-package/man/lgb.interprete.Rd"
   user  system elapsed
  0.337   0.004   0.236
[1] "Test ./R-package/man/lgb.load.Rd"
   user  system elapsed
  0.108   0.009   0.076
[1] "Test ./R-package/man/lgb.make_serializable.Rd"
   user  system elapsed
  0.019   0.002   0.014
[1] "Test ./R-package/man/lgb.model.dt.tree.Rd"
   user  system elapsed
  0.164   0.003   0.132
[1] "Test ./R-package/man/lgb.plot.importance.Rd"
   user  system elapsed
  0.184   0.001   0.145
[1] "Test ./R-package/man/lgb.plot.interpretation.Rd"
   user  system elapsed
  0.405   0.003   0.312
[1] "Test ./R-package/man/lgb.restore_handle.Rd"
   user  system elapsed
  0.103   0.003   0.075
[1] "Test ./R-package/man/lgb.save.Rd"
   user  system elapsed
  0.124   0.002   0.092
[1] "Test ./R-package/man/lgb.train.Rd"
   user  system elapsed
  0.104   0.002   0.074
[1] "Test ./R-package/man/lightgbm.Rd"
   user  system elapsed
  0.011   0.000   0.007
[1] "Test ./R-package/man/predict.lgb.Booster.Rd"
   user  system elapsed
  0.100   0.001   0.072
[1] "Test ./R-package/man/print.lgb.Booster.Rd"
   user  system elapsed
  0.019   0.001   0.013
[1] "Test ./R-package/man/readRDS.lgb.Booster.Rd"
   user  system elapsed
  0.110   0.001   0.081
[1] "Test ./R-package/man/saveRDS.lgb.Booster.Rd"
   user  system elapsed
  0.116   0.004   0.081
[1] "Test ./R-package/man/set_field.Rd"
   user  system elapsed
  0.086   0.000   0.064
[1] "Test ./R-package/man/slice.Rd"
   user  system elapsed
  0.101   0.000   0.066
[1] "Test ./R-package/man/summary.lgb.Booster.Rd"
   user  system elapsed
  0.004   0.002   0.010

Other things I checked

No OpenMP environment variables werre set in the container.

env | grep -i omp
# (empty)

All 4 logical CPUs I was giving to Docker appeared to be visible to the R process.

Rscript \
    --vanilla \
    -e "print(RhpcBLASctl::omp_get_max_threads())"
# [1] 4

Next steps

When I return to this in the next few days, I'll try repeating these approaches on a machine with more physical CPUs. I'm hoping with something like a 12-core or 24-core machine, the sources of unintended parallelism will really show up... and then we'll be able to test fixes.

jameslamb · 2023-10-08T01:44:55Z

I repeated steps similar to the ones above tonight on a VM from AWS with the following specs:

instance type: c5a.4xlarge
vCPUs: 16
RAM: 32GB
disk storage (attached EBS volume): 32GB
operating system: Ubuntu 22.04

And installed the following on it:

R: 4.3.1
clang: 14.0.0
gcc: 11.4.0
libomp-dev: 14.0

setup commands (click me)

sudo apt-get update
sudo apt-get install --no-install-recommends -y \
    software-properties-common

sudo apt-get install --no-install-recommends -y \
    apt-utils \
    build-essential \
    ca-certificates \
    cmake \
    curl \
    git \
    iputils-ping \
    jq \
    libcurl4 \
    libicu-dev \
    libomp-dev \
    libssl-dev \
    libunwind8 \
    locales \
    locales-all \
    netcat \
    unzip \
    zip

export LANG="en_US.UTF-8"
sudo update-locale LANG=${LANG}
export LC_ALL="${LANG}"

# set up R environment
export CRAN_MIRROR="https://cran.rstudio.com"
export MAKEFLAGS=-j8
export R_LIB_PATH=~/Rlib
export R_LIBS=$R_LIB_PATH
export PATH="$R_LIB_PATH/R/bin:$PATH"
export R_APT_REPO="jammy-cran40/"
export R_LINUX_VERSION="4.3.1-1.2204.0"

mkdir -p $R_LIB_PATH

mkdir -p ~/.gnupg
echo "disable-ipv6" >> ~/.gnupg/dirmngr.conf
sudo apt-key adv \
    --homedir ~/.gnupg \
    --keyserver keyserver.ubuntu.com \
    --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9

sudo add-apt-repository \
    "deb ${CRAN_MIRROR}/bin/linux/ubuntu ${R_APT_REPO}"

sudo apt-get update
sudo apt-get install \
    --no-install-recommends \
    -y \
        autoconf \
        automake \
        clang \
        devscripts \
        r-base-core=${R_LINUX_VERSION} \
        r-base-dev=${R_LINUX_VERSION} \
        texinfo \
        texlive-latex-extra \
        texlive-latex-recommended \
        texlive-fonts-recommended \
        texlive-fonts-extra \
        tidy \
        qpdf

# install dependencies
Rscript \
    --vanilla \
    -e "install.packages(c('data.table', 'jsonlite', 'knitr', 'Matrix', 'R6', 'RhpcBLASctl', 'rmarkdown', 'testthat'), repos = '${CRAN_MIRROR}', lib = '${R_LIB_PATH}', dependencies = c('Depends', 'Imports', 'LinkingTo'), Ncpus = parallel::detectCores())"

# build LightGBM
mkdir -p ${HOME}/repos
cd ${HOME}/repos
git clone --recursive https://github.com/microsoft/LightGBM.git
cd ./LightGBM
sh build-cran-package.sh --no-build-vignettes

With gcc, I wasn't able to replicate the warnings from CRAN.

OMP_NUM_THREADS=16 \
_R_CHECK_EXAMPLE_TIMING_THRESHOLD_=0 \
_R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD_=2.5 \
R --vanilla CMD check \
    --no-codoc \
    --no-manual \
    --no-tests \
    --no-vignettes \
    --run-dontrun \
    --run-donttest \
    --timings \
    ./lightgbm_4.1.0.99.tar.gz

results (click me)

Examples with CPU (user + system) or elapsed time > 0s
                             user system elapsed
lgb.plot.interpretation     0.477  0.000   0.251
lgb.interprete              0.404  0.009   0.176
lgb.Dataset.create.valid    0.284  0.024   0.060
slice                       0.300  0.000   0.045
lgb.Dataset                 0.247  0.005   0.043
readRDS.lgb.Booster         0.190  0.011   0.062
set_field                   0.197  0.000   0.046
lgb.importance              0.170  0.023   0.125
lgb.Dataset.set.categorical 0.191  0.001   0.035
get_field                   0.189  0.001   0.046
lgb.Dataset.save            0.175  0.006   0.035
lgb.cv                      0.168  0.002   0.106
lgb.configure_fast_predict  0.162  0.000   0.051
lgb.model.dt.tree           0.140  0.000   0.109
lgb.Dataset.construct       0.131  0.004   0.035
lgb.Dataset.set.reference   0.124  0.005   0.034
saveRDS.lgb.Booster         0.121  0.000   0.091
lgb.plot.importance         0.120  0.000   0.083
lgb.convert_with_rules      0.112  0.000   0.016
dimnames.lgb.Dataset        0.087  0.015   0.040
lgb.load                    0.097  0.004   0.060
predict.lgb.Booster         0.096  0.000   0.055
lgb.restore_handle          0.078  0.000   0.059
lgb.dump                    0.076  0.000   0.053
lgb.train                   0.075  0.000   0.052
lgb.get.eval.result         0.073  0.000   0.052
lgb.save                    0.073  0.000   0.053
dim                         0.034  0.012   0.046

I noticed that CRAN was using clang 16.0.0 on the check where they rejected {lightgbm} v4.0.0 for example timings

** libs
using C++ compiler: ‘Debian clang version 16.0.6 (3)’

(CRAN install logs).

So I switched to clang...

cat << EOF > ${HOME}/.R/Makevars
CC=clang
CXX=clang++
CXX17=clang++
EOF

... and tried again.

OMP_NUM_THREADS=16 \
_R_CHECK_EXAMPLE_TIMING_THRESHOLD_=0 \
_R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD_=2.5 \
R --vanilla CMD check \
    --no-codoc \
    --no-manual \
    --no-tests \
    --no-vignettes \
    --run-dontrun \
    --run-donttest \
    --timings \
    ./lightgbm_4.1.0.99.tar.gz

That didn't reproduce the specific WARNING triggered on CRAN, about the lgb.restore_handle() example... but it did expose several places where CPU time was more than 2.5x elapsed time (the type of thing CRAN is warning about)!!!

Examples with CPU time > 2.5 times elapsed time
                          user system elapsed  ratio
readRDS.lgb.Booster      1.065  0.030   0.068 16.103
lgb.Dataset.create.valid 0.992  0.055   0.066 15.864
saveRDS.lgb.Booster      1.558  0.044   0.101 15.861
lgb.cv                   2.886  0.107   0.221 13.543
lgb.interprete           3.259  0.084   0.306 10.925
lgb.plot.interpretation  3.373  0.074   0.340 10.138

full results (click me)

Examples with CPU (user + system) or elapsed time > 0s
                             user system elapsed
lgb.plot.interpretation     3.373  0.074   0.340
lgb.interprete              3.259  0.084   0.306
lgb.cv                      2.886  0.107   0.221
saveRDS.lgb.Booster         1.558  0.044   0.101
readRDS.lgb.Booster         1.065  0.030   0.068
lgb.Dataset.create.valid    0.992  0.055   0.066
lgb.configure_fast_predict  0.938  0.024   0.060
get_field                   0.764  0.016   0.055
slice                       0.737  0.032   0.048
set_field                   0.695  0.016   0.049
lgb.Dataset                 0.675  0.013   0.045
lgb.Dataset.save            0.574  0.025   0.038
lgb.Dataset.set.categorical 0.572  0.023   0.037
lgb.Dataset.set.reference   0.570  0.024   0.038
lgb.Dataset.construct       0.584  0.008   0.037
dimnames.lgb.Dataset        0.291  0.011   0.059
lgb.convert_with_rules      0.290  0.008   0.018
lgb.importance              0.277  0.007   0.124
lgb.model.dt.tree           0.216  0.000   0.108
lgb.plot.importance         0.161  0.023   0.082
predict.lgb.Booster         0.134  0.005   0.052
lgb.get.eval.result         0.100  0.004   0.052
lgb.save                    0.095  0.008   0.052
lgb.dump                    0.093  0.008   0.051
lgb.train                   0.097  0.004   0.050
lgb.load                    0.083  0.012   0.058
lgb.restore_handle          0.080  0.000   0.059
dim                         0.046  0.000   0.046
Examples with CPU time > 2.5 times elapsed time
                          user system elapsed  ratio
readRDS.lgb.Booster      1.065  0.030   0.068 16.103
lgb.Dataset.create.valid 0.992  0.055   0.066 15.864
saveRDS.lgb.Booster      1.558  0.044   0.101 15.861
lgb.cv                   2.886  0.107   0.221 13.543
lgb.interprete           3.259  0.084   0.306 10.925
lgb.plot.interpretation  3.373  0.074   0.340 10.138

Now that we have a way to reproduce this, I'll start working through changes to address that.

I'm hoping that a mix of the following will be enough:

adding explicit num_threads() directives to every OpenMP pragma
limiting {data.table} parallelism in tests and examples, like this: Best practice for using max 2 cores on CRAN? Rdatatable/data.table#5658 (comment)
ensuring that all OpenMP pragmas read from a LightGBM-controlled value for num_threads instead of the global OMP_NUM_THREADS setting

Getting closer 😁

jameslamb · 2023-10-10T02:20:55Z

Still experimenting with this, just want to add one happy finding... the issues appear to completely be about LightGBM's own use of multi-threading + possibly how it configures {data.table}.

Using data.table::setDTthreads(1) everywhere in R code and setting OMP_NUM_THREADS() in LightGBM to just always be 1, I got timings like the following with clang:

                             user system elapsed
lgb.plot.interpretation     0.246  0.000   0.245
lgb.interprete              0.171  0.000   0.173
lgb.importance              0.121  0.004   0.124
lgb.model.dt.tree           0.112  0.000   0.112
lgb.cv                      0.107  0.000   0.107
saveRDS.lgb.Booster         0.093  0.000   0.093
lgb.plot.importance         0.084  0.000   0.084
lgb.Dataset.create.valid    0.038  0.039   0.079
lgb.load                    0.056  0.012   0.068
readRDS.lgb.Booster         0.062  0.004   0.066
lgb.restore_handle          0.061  0.004   0.065
predict.lgb.Booster         0.060  0.000   0.060
lgb.save                    0.053  0.004   0.057
lgb.dump                    0.053  0.003   0.056
lgb.train                   0.056  0.000   0.056
lgb.get.eval.result         0.047  0.008   0.054
dim                         0.047  0.004   0.056
get_field                   0.047  0.004   0.051
lgb.Dataset                 0.041  0.010   0.052
lgb.configure_fast_predict  0.050  0.000   0.051
set_field                   0.050  0.000   0.050
slice                       0.050  0.000   0.049
dimnames.lgb.Dataset        0.046  0.000   0.046
lgb.Dataset.save            0.042  0.001   0.042
lgb.Dataset.construct       0.040  0.002   0.041
lgb.Dataset.set.categorical 0.038  0.004   0.042
lgb.Dataset.set.reference   0.036  0.000   0.035
lgb.convert_with_rules      0.016  0.000   0.017

That's a very good sign. It means that this issue isn't about some other source of multi-threading from LightGBM's dependencies, such as Eigen: https://eigen.tuxfamily.org/dox/TopicMultiThreading.html.

Will share more updates soon!

jameslamb · 2023-12-06T05:10:31Z

Status update for those subscribed to this issue:

CRAN has threatened to archive {lightgbm} if we don't fix a new warning from clang by December 12: [R-package] Warnings of CRAN Package #6221
I've put up [R-package] [c++] add tighter multithreading control, avoid global OpenMP side effects (fixes #4705, fixes #5102) #6226 to attempt to fix the multithreading issues that led to CRAN rejecting v4.0.0 ([R-package] v4.0.0 CRAN submission issues #5987)
If that PR is accepted, we'll try to release the current state of master to CRAN as v4.2.0 in the next few days
If not, I'll put up a patch that limits {lightgbm} to only using at most 2 threads everywhere, unconditionally and then try to release the current state of master to CRAN as v4.2.0

@simonpcouch I know the tidymodels team was waiting for for v4.x of {lightgbm} to hit CRAN (rstudio/bundle#55 (comment))... do you have any capacity to help us with reverse-dependency checks?

I'm planning to try checking reverse dependencies using a package built from the branch on #6226 over the next few days, but would welcome any help your team could provide if you have the capacity.

simonpcouch · 2023-12-06T13:25:15Z

We do have capacity to run reverse dependency checks! I'll start up some checks based on #6226 today.

jameslamb · 2023-12-06T13:50:46Z

Thank you so much!

simonpcouch · 2023-12-06T15:04:19Z

I see no new problems with #6226! Checked with revdepcheck's revdep_check() by passing a tarball built from the branch.

Note for CRAN:

## revdepcheck results

We checked 11 reverse dependencies, comparing R CMD check results across CRAN and dev versions of this package.

 * We saw 0 new problems
 * We failed to check 0 packages

jameslamb · 2023-12-06T15:08:39Z

Thank you SO MUCH for that! It's really helpful.

I thought for sure we'd have issue, especially since a few new packages have taken on {lightgbm} as a dependency since the last time I did revdep checks in anticipation of LightGBM 4.0.

Glad that that's not the case 😁

simonpcouch · 2023-12-11T19:09:22Z

Congrats on the CRAN release!

jameslamb · 2023-12-11T19:19:53Z

thanks again for your help @simonpcouch !

jameslamb · 2023-12-19T17:35:15Z

v4.2.0 of the R package has been accepted to CRAN and has passed all of their main checks: #6191 (comment)

This issue can be closed.

Thank you so much to everyone who helped with this!!!!

jameslamb added the r-package label Jul 18, 2023

jameslamb added a commit that referenced this issue Jul 18, 2023

[R-package] limit number of threads used in tests and examples (fixes #…

934f0d0

…5102, fixes #5987)

jameslamb mentioned this issue Jul 19, 2023

[R-package] limit number of threads used in tests and examples (fixes #5987) #5988

Merged

jameslamb closed this as completed in #5988 Jul 19, 2023

jameslamb added a commit that referenced this issue Jul 19, 2023

[R-package] limit number of threads used in tests and examples (fixes #…

7dcbb8c

…5987) (#5988)

jameslamb reopened this Jul 21, 2023

jameslamb mentioned this issue Jul 21, 2023

Release v4.0.0 #5952

Merged

19 tasks

jmoralez mentioned this issue Jul 26, 2023

[R-package] how to extract cross validation evaluation metrics from lgb.cv()? #5571

Closed

This was referenced Sep 5, 2023

Release v4.1.0 #6076

Merged

[docs] R API docs not included in readthedocs PDF #5973

Closed

jameslamb mentioned this issue Sep 13, 2023

[R] Fix method name. dmlc/xgboost#9577

Merged

jameslamb mentioned this issue Sep 18, 2023

Add bundle support for LightGBM rstudio/bundle#55

Closed

jameslamb mentioned this issue Sep 29, 2023

[ci] [R-package] update to new R-hub container images #6116

Closed

jangorecki mentioned this issue Sep 30, 2023

Best practice for using max 2 cores on CRAN? Rdatatable/data.table#5658

Closed

jameslamb pinned this issue Oct 8, 2023

jameslamb mentioned this issue Oct 8, 2023

factor out uses of omp_get_num_threads() and omp_get_max_threads() outside of OpenMP wrapper #6133

Merged

This was referenced Oct 9, 2023

set explicit number of threads in every OpenMP parallel region #6135

Merged

[R-package] multiclass predictions: set reshape = TRUE by default? #6131

Closed

This was referenced Nov 7, 2023

[R-package] standardize naming of internal functions #6179

Merged

release v4.2.0 #6191

Merged

This was referenced Dec 1, 2023

[R-package] Warnings of CRAN Package #6221

Closed

[R-package] change CRAN maintainer #6224

Merged

[R-package] [c++] add tighter multithreading control, avoid global OpenMP side effects (fixes #4705, fixes #5102) #6226

Merged

jameslamb closed this as completed Dec 19, 2023

jameslamb unpinned this issue Jan 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R-package] v4.0.0 CRAN submission issues #5987

[R-package] v4.0.0 CRAN submission issues #5987

jameslamb commented Jul 18, 2023 •

edited

Loading

jameslamb commented Jul 21, 2023

mayer79 commented Jul 24, 2023

jameslamb commented Jul 24, 2023 •

edited

Loading

mayer79 commented Jul 24, 2023

david-cortes commented Jul 26, 2023

jameslamb commented Jul 26, 2023

jameslamb commented Sep 3, 2023

jameslamb commented Sep 3, 2023

david-cortes commented Sep 4, 2023

jameslamb commented Sep 4, 2023 •

edited

Loading

trivialfis commented Sep 13, 2023

simonpcouch commented Sep 18, 2023

jameslamb commented Oct 7, 2023

jameslamb commented Oct 8, 2023 •

edited

Loading

jameslamb commented Oct 10, 2023

jameslamb commented Dec 6, 2023

simonpcouch commented Dec 6, 2023

jameslamb commented Dec 6, 2023

simonpcouch commented Dec 6, 2023

jameslamb commented Dec 6, 2023

simonpcouch commented Dec 11, 2023

jameslamb commented Dec 11, 2023

jameslamb commented Dec 19, 2023

[R-package] v4.0.0 CRAN submission issues #5987

[R-package] v4.0.0 CRAN submission issues #5987

Comments

jameslamb commented Jul 18, 2023 • edited Loading

Description

Reproducible example

Environment info

Additional Comments

jameslamb commented Jul 21, 2023

mayer79 commented Jul 24, 2023

jameslamb commented Jul 24, 2023 • edited Loading

mayer79 commented Jul 24, 2023

david-cortes commented Jul 26, 2023

jameslamb commented Jul 26, 2023

jameslamb commented Sep 3, 2023

jameslamb commented Sep 3, 2023

david-cortes commented Sep 4, 2023

jameslamb commented Sep 4, 2023 • edited Loading

trivialfis commented Sep 13, 2023

simonpcouch commented Sep 18, 2023

jameslamb commented Oct 7, 2023

jameslamb commented Oct 8, 2023 • edited Loading

jameslamb commented Oct 10, 2023

jameslamb commented Dec 6, 2023

simonpcouch commented Dec 6, 2023

jameslamb commented Dec 6, 2023

simonpcouch commented Dec 6, 2023

jameslamb commented Dec 6, 2023

simonpcouch commented Dec 11, 2023

jameslamb commented Dec 11, 2023

jameslamb commented Dec 19, 2023

jameslamb commented Jul 18, 2023 •

edited

Loading

jameslamb commented Jul 24, 2023 •

edited

Loading

jameslamb commented Sep 4, 2023 •

edited

Loading

jameslamb commented Oct 8, 2023 •

edited

Loading