-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Java SWIG - how to extract featureNames? #2814
Comments
I really need to use that function, so I made an extension to the lightgbmlib.i SWIG interface as an MVP just to make sure I can extract the feature names in my project until I get a reply from you on how to do it. Disclaimer To avoid touching any existing code at all, I added a single line at the end of lightgbmlib.i In that interface file I create a class FeatureNames + typedef FeatureNamesHandle that manages an array of fixed length strings. I then build a wrapper to the This is how you use it: `
|
ping @imatiach-msft |
@AlbertoEAF yes, some of the SWIG wrappers don't work because SWIG can't handle string arrays automatically, I only ensured the ones that are used by mmlspark work. For this method, we will need to add another wrapper similar to this custom one that I added in lightgbmlib.i: https://github.com/microsoft/LightGBM/blob/master/swig/lightgbmlib.i#L76
|
@AlbertoEAF just read your other comment. I like this style too, I'm fine either way, but we should make sure that LGBM_BoosterGetEvalNamesSWIG is written in a consistent way too then. If you could either re-write the above to be in the same style or rewrite LGBM_BoosterGetEvalNamesSWIG to be in the style above that would be great, I just want to make sure the code is consistent. |
note to self - it looks like this array is not freed in mmlspark after we allocate it in mmlspark when calling LGBM_BoosterGetEvalNamesSWIG: |
Nice! Yeah, I've seen other cases where MMLSpark does not delete every resource it creates, it's not just that one, there are other memory leaks. If you're fine with it I'll clean my code up, address LGBM_BoosterGetEvalNamesSWIG as well and I'll send you a pull request where you're free to tell me if you want me to change anything and I'll follow your suggestions. I think cleaning up the FeatureNames (FixedLengthStringArray class) and passing it around as a first class citizen whenever we need arrays of strings without SWIG messing it up is a good approach, so we can manage memory properly. I'm also going to send the SWIG guys an issue regarding what's the proper wrapping for arrays of strings so we follow their recommended best practices and possibly simplify our wrapping of string arrays with maybe a fancy typemap. I have my main work to do but should have something next week or the other after @imatiach-msft if that's ok for you, unless you plan to address that sooner. |
@AlbertoEAF that sounds terrific, thanks, I'm looking forward to your PR |
Hello @imatiach-msft, If you look at their answer, there's a module to handle char**<->String[] conversion ( However, unless we want to rewrite our C API (I don't), we have the extra constraint of needing to pass previously-allocated memory on the C side. Proposed solution summary Thus, I propose that we stick to the custom FixedLenStringArray class (previously called FeatureNames) which will manage memory resources for arrays of strings and we can pass it around in C++ and Java, but use the Proposed solution specifics We can create wrappers to the original functions by replacing the We'll then add a method to the class which returns the raw char ** ptr:
as long as that array is null-terminated (OK, because the class can handle that logic), and thus, getting the full String[] in Java will be one call, no need to do a for loop to extract the parameters like in the opening comment for this issue. What do you think? Extra questions
|
Ok, I'm submitting a pull request, I'll move the discussion there. |
@AlbertoEAF this looks great, terrific work "unless we want to rewrite our C API (I don't)" "we stick to the custom FixedLenStringArray class..." Extra questions 1.) not sure yet |
Hello,
I'm having issues calling the SWIG wrappers in Java to extract the model feature names.
I get a segmentation fault when calling LGBM_BoosterGetFeatureNames.
In the Java-exposed lightgbmJNI.LGBM_BoosterGetFeatureNames method, the parameters are:
LGBM_BoosterGetFeatureNames(long boosterHandle, long outNumFeaturesPtr, String[] strings)
(https://lightgbm.readthedocs.io/en/latest/C-API.html#c.LGBM_BoosterGetFeatureNames)
The issue is that LGBM_BoosterGetFeatureNames requires an already allocated array of strings on the C side, which I cannot do if I pass a java String[] object. This then results in a memory segfault.
I've seen this comment related to implementing typemaps in SWIG for trickier cases #909 (comment).
I also saw some notices in the swig interface file:
/* Note: there is a bug in the SWIG generated string arrays when creating a new array with strings where the strings are prematurely deallocated */ %array_functions(char *, stringArray)
Am I using the API incorrectly or currently this function cannot be used?
Thank you! :D
The text was updated successfully, but these errors were encountered: