Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Destruct dmlc::Error objects before long jumps #9916

Merged
merged 1 commit into from
Dec 26, 2023

Conversation

david-cortes
Copy link
Contributor

ref #9810
As a follow-up from #9903

I just noticed that the custom exception class dmlc::Error is subcblassed from std::runtime_error, which is in turn subclassed from std::exception:
https://github.com/dmlc/dmlc-core/blob/ea21135fbb141ae103fb5fc960289b5601b468f2/include/dmlc/logging.h#L34

Since these objects hold dynamically-allocated memory for the error message, they need to be destructed before doing a long jump, which is how R handles errors.

Currently, I think this type of error can only be thrown by the few lines in the R interface which explicitly call LOG(), as otherwise the xgboost C functions signal errors by return codes; and in many cases the conditions leading to these LOG() calls would have been checked beforehand in R code, so I'm not sure if it's even possible to end up in that catch clause, but this should prevent any potential leaks if that happens.


By the way, it would be quite helpful to set up an optional CI job running the R tests and examples with either ASAN or valgrind, as then these types of leaks could be detected more easily.

@trivialfis
Copy link
Member

We can have asan on CI. Let me do some tests to see the leak in action, I would like to have some hands-on experience with it.

@david-cortes
Copy link
Contributor Author

We can have asan on CI. Let me do some tests to see the leak in action, I would like to have some hands-on experience with it.

Note that you will also need to build R with support for ASAN in order to load a library that uses it.

Easiest way is to use one of the images from https://github.com/wch/r-debug

@jameslamb
Copy link
Contributor

Note that you will also need to build R with support for ASAN in order to load a library that uses it.

Easiest way is to use one of the images from https://github.com/wch/r-debug

We've had success with that approach in LightGBM, here's the GitHub Actions config we use:

https://github.com/microsoft/LightGBM/blob/aa774f38f88ceedb3eb6ee19f94d94c8b1199bf9/.github/workflows/r_package.yml#L224

In LightGBM, we saw that exactly replicate errors reported during CRAN's sanitizer checks.

@trivialfis
Copy link
Member

trivialfis commented Dec 22, 2023

Thank you for sharing! I have run asan using LD_PRELOAD without a custom build of R. So far, no leak in XGBoost unittests. @david-cortes Do you think it's practical to write some tests for checking memory leaks, tests that would fail under asan environment without your recent fixes?

Details on how I ran the asan tests, modify the R global config to have asan enabled, then preload the asan shared lib before running tests.

~/.R/Makevars
CXXFLAGS=-fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/jiamingy/.anaconda/envs/xgboost_dev/include -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/r-base-split_1700316003380/work=/usr/local/src/conda/r-base-4.3.2 -fdebug-prefix-map=/home/jiamingy/.anaconda/envs/xgboost_dev=/usr/local/src/conda-prefix -fsanitize=address -lasan -L/home/jiamingy/.anaconda/envs/xgboost_dev/lib/
CPPFLAGS=-DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/jiamingy/.anaconda/envs/xgboost_dev/include -I/home/jiamingy/.anaconda/envs/xgboost_dev/include -Wl,-rpath-link,/home/jiamingy/.anaconda/envs/xgboost_dev/lib -fsanitize=address -lasan -L/home/jiamingy/.anaconda/envs/xgboost_dev/lib/
LD_PRELOAD=/home/jiamingy/.anaconda/envs/xgboost_dev/lib/libasan.so Rscript ./testthat.R

@david-cortes
Copy link
Contributor Author

I ran some small tests and can indeed confirm that ASAN didn't show any leaks, but valgrind does show them as expected:

library(xgboost)
set.seed(123)
x <- matrix(rnorm(500), nrow = 50)
y <- rnorm(400)
dim(y) <- c(50, 4, 2)
dm = xgb.DMatrix(data = x, label = y)
==4062== 
==4062== HEAP SUMMARY:
==4062==     in use at exit: 167,556,925 bytes in 35,791 blocks
==4062==   total heap usage: 76,098 allocs, 40,307 frees, 261,034,029 bytes allocated
==4062== 
==4062== LEAK SUMMARY:
==4062==    definitely lost: 0 bytes in 0 blocks
==4062==    indirectly lost: 0 bytes in 0 blocks
==4062==      possibly lost: 7,734 bytes in 21 blocks
==4062==    still reachable: 167,549,191 bytes in 35,770 blocks
==4062==                       of which reachable via heuristic:
==4062==                         stdstring          : 902 bytes in 1 blocks
==4062==                         newarray           : 4,264 bytes in 1 blocks
==4062==         suppressed: 0 bytes in 0 blocks
==4062== Rerun with --leak-check=full to see details of leaked memory
==4062== 
==4062== For lists of detected and suppressed errors, rerun with: -s
==4062== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

@trivialfis
Copy link
Member

trivialfis commented Dec 23, 2023

I ran some small tests and can indeed confirm that ASAN didn't show any leaks, but valgrind does show them as expected:

Thank you for sharing! I will run it, hopefully can get a backtrace.

Do you think it's practical to write some tests for checking memory leaks, tests that would fail under asan environment without your recent fixes (like #9823), Valgrind is quite time-consuming on CI.

@david-cortes
Copy link
Contributor Author

I don't think it'd be practical - we'd have to start adding code with conditional compilation for that specific job that would for example manually throw exceptions like std::bad_alloc everywhere where it's possible, throw R errors wherever possible too, etc.

Plus, there's no guarantee that they'd be catched by ASAN even without the fixes.

@trivialfis trivialfis merged commit a197899 into dmlc:master Dec 26, 2023
27 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants