-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] prevent symbol lookup conflicts (fixes #4045) #4155
Conversation
# like the one documented in | ||
# https://github.com/microsoft/LightGBM/issues/4045#issuecomment-812289182 | ||
test_that("lightgbm routine registration worked", { | ||
testthat::skip_if_not(ON_WINDOWS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this skip_if_not()
is to work around this error I saw on Linux jobs in CI
* checking tests ...
Running ‘testthat.R’
� � ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
══ Failed tests ════════════════════════════════════════════════════════════════
── Failure (test_registration.R:8:5): lightgbm routine registration worked ─────
dll_info[["dynamicLookup"]] is not FALSE
`actual` is NULL
`expected` is a logical vector (FALSE)
── Error (test_registration.R:11:5): lightgbm routine registration worked ──────
Error: no applicable method for 'getDLLRegisteredRoutines' applied to an object of class "NULL"
Backtrace:
█
1. └─base::getDLLRegisteredRoutines(dll_info[["path"]]) test_registration.R:11:4
[ FAIL 2 | WARN 0 | SKIP 3 | PASS 634 ]
Error: Test failures
Execution halted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah! The problem with this test isn't actually about operating system. This test will fail with an error like the one above on LightGBM's CMake-based builds and pass on CRAN builds. That's because, by using install.libs.R
to build lib_lightgbm
instead of R's built-in machinery, the routine registration stuff doesn't work.
Given that as of this PR the package will now use literal R symbols (e.g. .Call(LGBM_BoosterGetCurrentIteration_R,...
), I don't think this test is necessary. Any failures in the registration will result in "object not found errors".
I'm going to remove this test, and will provide more explanation in a separate comment for how to make this work with CMake-based builds.
hmmm, seeing a few things in CI that I didn't see locally.
I'm going to remove reviewers and move this back to draft for now |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Can resolving #3016 help with this issue as well? |
Right now, I don't think so. I think the problem is that Line 62 in d517ba1
R CMD check --as-cran expects that the first argument to .Call() is one of the symbols you've registered. Like this: https://github.com/Rdatatable/data.table/blob/ec1259af1bf13fc0c96a1d3f9e84d55d8106a9a4/R/bmerge.R#L181.
I'm experimenting with this today. I think error handling can be moved into the C++ side and then |
/gha run r-valgrind Workflow R valgrind tests has been triggered! 🚀 Status: success ✔️. |
Ok, I think that this PR is ready for review! I know the description has a lot of information and this requires some deep knowledge of how R's build system works, so I'll summarize here:
As a result of the changes in this PR:
|
@@ -59,53 +57,6 @@ lgb.last_error <- function() { | |||
|
|||
} | |||
|
|||
lgb.call <- function(fun_name, ret, ...) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lgb.call()
functions can be removed because once routine registration is used, referencing routines using strings does not work: #4155 (comment).
@@ -13,7 +13,12 @@ Dataset <- R6::R6Class( | |||
if (!lgb.is.null.handle(x = private$handle)) { | |||
|
|||
# Freeing up handle | |||
lgb.call(fun_name = "LGBM_DatasetFree_R", ret = NULL, private$handle) | |||
call_state <- 0L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though error-handling has now been moved into the C++ side (#4163), as of this PR passing in call_state
is still needed. The functions in https://github.com/microsoft/LightGBM/blob/b6c71e5e941b885b1a5169d460c430df9d392fa9/R-package/src/lightgbm_R.h has a return type LGBM_SE
, which is the LightGBM equivalent of R's built-in SEXP
type.
If you change those routines to all be void
(since they modify their inputs by reference and don't technically have to refer to anything), errors are thrown about not being able to convert a null to SEXP
.
So the call_state
thing is a simple way to allocate an LGBM_SE
(an R integer, in this case) that can then be returned.
As part of #3016 we'll be able to remove these unnecessary call_state
things, but I believe they're harmless for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent improvement for the package! I'm approving based on the detailed PR description, linked paragraph of article in comments and green CI checks.
Just two minor comments below.
# CMake-based builds can't currently use R's builtin routine registration, | ||
# so have to update NAMESPACE manually, with a statement like this: | ||
# | ||
# useDynLib(lib_lightgbm, LGBM_GetLastError_R, LGBM_DatasetCreateFromFile_R, ...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it will be easier to have this content by default in the file, and remove it for CRAN build?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm I'm not sure. This looks like more code than having this in the file by default and removing it for CRAN builds, but I thought this R code would be easier to write than the equivalent code with sed
in build-cran-package.sh
. Just because it involves replacing / writing literal .
and literal )
/ (
and I was intimidated by figuring out how to do the escaping correctly in sed
haha.
Do you have a strong preference for this change? Otherwise, I'd prefer to leave this as-is for now and maybe make that change as some maintenance work in the future. Just because I'd like to keep pushing forward on #3016 and it doesn't seem like there is anything wrong about the current approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, this was just a proposal to simplify the code. I'm totally fine with the working current approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok great, thank you very much!
return( | ||
lgb.call.return.str( | ||
fun_name = "LGBM_BoosterSaveModelToString_R" | ||
# Create buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe write a new helper function for calls with buffer allocation? I believe it will improve the code readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was exactly what lgb.call.return.str()
was. But since you cannot pass a string with a function name to .Call()
once registration is used (#4155 (comment)), I couldn't see a good way to keep a shared helper function.
I also think we'll be able to factor this code away completely soon, as part of #3016, by just creating the R character vector on the C++ side and returning it. See #4242 for a first example of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, OK, I see!
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
This PR fixes a critical issue affecting Windows users of the R package. In #4045, several users reported a problem like "if I load certain R libraries before
{lightgbm}
,{lightgbm}
can cause the R session to crash when constructing a dataset".After some investigation (detailed in #4045 (comment)), I believe I found the root cause for this. The routine
R_registerRoutines()
atLightGBM/R-package/src/lightgbm_R.cpp
Line 724 in d517ba1
The key piece of documentation is at https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Converting-a-package-to-use-registration.
This PR adds that declaration to
lightgbm_R.cpp
and adds a unit test to prevent this type of mistake from being re-introduced in the future.Fixes #4045.
Addresses the Windows half of #4034.
I need to test if this would fix #4007. Will update this description once I try those.
Background
R provides an interface for "native routine registration", a mechanism to tell R exactly which symbols it should expect to find in a shared library (
.dll
/.so
). This makes calls to compiled code with.Call()
safer because R looks in a specific place.If that type of registration isn't done, then it can create the possibility for conflicts with other loaded DLLs. From https://cran.r-project.org/doc/manuals/r-release/R-exts.html#dyn_002eload-and-dyn_002eunload:
This note from https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Registering-native-routines is relevant as well.
How to test this
You can run the following code to check if registration was successful.