[R] Destruct `dmlc::Error` objects before long jumps #9916

david-cortes · 2023-12-21T11:28:33Z

ref #9810
As a follow-up from #9903

I just noticed that the custom exception class dmlc::Error is subcblassed from std::runtime_error, which is in turn subclassed from std::exception:
https://github.com/dmlc/dmlc-core/blob/ea21135fbb141ae103fb5fc960289b5601b468f2/include/dmlc/logging.h#L34

Since these objects hold dynamically-allocated memory for the error message, they need to be destructed before doing a long jump, which is how R handles errors.

Currently, I think this type of error can only be thrown by the few lines in the R interface which explicitly call LOG(), as otherwise the xgboost C functions signal errors by return codes; and in many cases the conditions leading to these LOG() calls would have been checked beforehand in R code, so I'm not sure if it's even possible to end up in that catch clause, but this should prevent any potential leaks if that happens.

By the way, it would be quite helpful to set up an optional CI job running the R tests and examples with either ASAN or valgrind, as then these types of leaks could be detected more easily.

trivialfis · 2023-12-21T15:26:46Z

We can have asan on CI. Let me do some tests to see the leak in action, I would like to have some hands-on experience with it.

david-cortes · 2023-12-21T17:11:18Z

We can have asan on CI. Let me do some tests to see the leak in action, I would like to have some hands-on experience with it.

Note that you will also need to build R with support for ASAN in order to load a library that uses it.

Easiest way is to use one of the images from https://github.com/wch/r-debug

jameslamb · 2023-12-22T05:50:37Z

Note that you will also need to build R with support for ASAN in order to load a library that uses it.

Easiest way is to use one of the images from https://github.com/wch/r-debug

We've had success with that approach in LightGBM, here's the GitHub Actions config we use:

https://github.com/microsoft/LightGBM/blob/aa774f38f88ceedb3eb6ee19f94d94c8b1199bf9/.github/workflows/r_package.yml#L224

In LightGBM, we saw that exactly replicate errors reported during CRAN's sanitizer checks.

trivialfis · 2023-12-22T07:28:00Z

Thank you for sharing! I have run asan using LD_PRELOAD without a custom build of R. So far, no leak in XGBoost unittests. @david-cortes Do you think it's practical to write some tests for checking memory leaks, tests that would fail under asan environment without your recent fixes?

Details on how I ran the asan tests, modify the R global config to have asan enabled, then preload the asan shared lib before running tests.

~/.R/Makevars
CXXFLAGS=-fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/jiamingy/.anaconda/envs/xgboost_dev/include -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/r-base-split_1700316003380/work=/usr/local/src/conda/r-base-4.3.2 -fdebug-prefix-map=/home/jiamingy/.anaconda/envs/xgboost_dev=/usr/local/src/conda-prefix -fsanitize=address -lasan -L/home/jiamingy/.anaconda/envs/xgboost_dev/lib/
CPPFLAGS=-DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/jiamingy/.anaconda/envs/xgboost_dev/include -I/home/jiamingy/.anaconda/envs/xgboost_dev/include -Wl,-rpath-link,/home/jiamingy/.anaconda/envs/xgboost_dev/lib -fsanitize=address -lasan -L/home/jiamingy/.anaconda/envs/xgboost_dev/lib/

LD_PRELOAD=/home/jiamingy/.anaconda/envs/xgboost_dev/lib/libasan.so Rscript ./testthat.R

david-cortes · 2023-12-23T11:02:08Z

I ran some small tests and can indeed confirm that ASAN didn't show any leaks, but valgrind does show them as expected:

library(xgboost)
set.seed(123)
x <- matrix(rnorm(500), nrow = 50)
y <- rnorm(400)
dim(y) <- c(50, 4, 2)
dm = xgb.DMatrix(data = x, label = y)

==4062== 
==4062== HEAP SUMMARY:
==4062==     in use at exit: 167,556,925 bytes in 35,791 blocks
==4062==   total heap usage: 76,098 allocs, 40,307 frees, 261,034,029 bytes allocated
==4062== 
==4062== LEAK SUMMARY:
==4062==    definitely lost: 0 bytes in 0 blocks
==4062==    indirectly lost: 0 bytes in 0 blocks
==4062==      possibly lost: 7,734 bytes in 21 blocks
==4062==    still reachable: 167,549,191 bytes in 35,770 blocks
==4062==                       of which reachable via heuristic:
==4062==                         stdstring          : 902 bytes in 1 blocks
==4062==                         newarray           : 4,264 bytes in 1 blocks
==4062==         suppressed: 0 bytes in 0 blocks
==4062== Rerun with --leak-check=full to see details of leaked memory
==4062== 
==4062== For lists of detected and suppressed errors, rerun with: -s
==4062== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

trivialfis · 2023-12-23T18:53:51Z

I ran some small tests and can indeed confirm that ASAN didn't show any leaks, but valgrind does show them as expected:

Thank you for sharing! I will run it, hopefully can get a backtrace.

Do you think it's practical to write some tests for checking memory leaks, tests that would fail under asan environment without your recent fixes (like #9823), Valgrind is quite time-consuming on CI.

david-cortes · 2023-12-23T20:46:53Z

I don't think it'd be practical - we'd have to start adding code with conditional compilation for that specific job that would for example manually throw exceptions like std::bad_alloc everywhere where it's possible, throw R errors wherever possible too, etc.

Plus, there's no guarantee that they'd be catched by ASAN even without the fixes.

avoid leaking exception objects

79173c5

trivialfis approved these changes Dec 26, 2023

View reviewed changes

trivialfis merged commit a197899 into dmlc:master Dec 26, 2023
27 of 28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R] Destruct `dmlc::Error` objects before long jumps #9916

[R] Destruct `dmlc::Error` objects before long jumps #9916

david-cortes commented Dec 21, 2023

trivialfis commented Dec 21, 2023

david-cortes commented Dec 21, 2023

jameslamb commented Dec 22, 2023

trivialfis commented Dec 22, 2023 •

edited

Loading

david-cortes commented Dec 23, 2023

trivialfis commented Dec 23, 2023 •

edited

Loading

david-cortes commented Dec 23, 2023

[R] Destruct dmlc::Error objects before long jumps #9916

[R] Destruct dmlc::Error objects before long jumps #9916

Conversation

david-cortes commented Dec 21, 2023

trivialfis commented Dec 21, 2023

david-cortes commented Dec 21, 2023

jameslamb commented Dec 22, 2023

trivialfis commented Dec 22, 2023 • edited Loading

david-cortes commented Dec 23, 2023

trivialfis commented Dec 23, 2023 • edited Loading

david-cortes commented Dec 23, 2023

[R] Destruct `dmlc::Error` objects before long jumps #9916

[R] Destruct `dmlc::Error` objects before long jumps #9916

trivialfis commented Dec 22, 2023 •

edited

Loading

trivialfis commented Dec 23, 2023 •

edited

Loading