Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] test on non-ASCII feature names fails on debian + clang CRAN check #4105

Closed
jameslamb opened this issue Mar 24, 2021 · 12 comments
Closed

Comments

@jameslamb
Copy link
Collaborator

Description

{lightgbm} 3.2.0 was recently released to CRAN (#3872 , https://cran.r-project.org/web/packages/lightgbm/index.html). One of the CRAN checks is currently failing while running unit tests.

https://cran.r-project.org/web/checks/check_results_lightgbm.html

image

  == Failed tests ================================================================
  -- Failure (test_basic.R:1233:5): lgb.train() supports non-ASCII feature names --
  dumped_model[["feature_names"]] not identical to `feature_names`.
  4/4 mismatches
  x[1]: "F_é\u009b¶"
  y[1]: "F_<U+96F6>"
  
  x[2]: "F_äž\u0080"
  y[2]: "F_<U+4E00>"
  
  x[3]: "F_äº\u008c"
  y[3]: "F_<U+4E8C>"
  
  x[4]: "F_äž\u0089"
  y[4]: "F_<U+4E09>"
  
  [ FAIL 1 | WARN 0 | SKIP 3 | PASS 633 ]
  Error: Test failures
  Execution halted
checking PDF version of manual ... OK
checking for non-standard things in the check directory ... OK
DONE
Status: 1 ERROR

Full logs are too large to post, but they'll be available at https://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-debian-clang/lightgbm-00check.html for a few days.

Reproducible example

N/A - this failure happened on CRAN. I haven't had a chance yet to try it on the equivalent R Hub environment.

Environment info

Details on this test can be found at https://cran.r-project.org/web/checks/check_flavors.html#r-devel-linux-x86_64-debian-clang.

Additional Comments

{lightgbm}'s CRAN package is currently only tested with with gcc:

- os: ubuntu-latest
task: r-package
compiler: gcc
r_version: 4.0
build_type: cran

although the R package built with CMake is tested on Linux with clang:

- os: ubuntu-latest
task: r-package
compiler: clang
r_version: 3.6
build_type: cmake
- os: ubuntu-latest
task: r-package
compiler: clang
r_version: 4.0
build_type: cmake

@jameslamb
Copy link
Collaborator Author

Some additional notes:

  1. I'll open a PR tonight that just adds a testthat::skip_on_cran() to the one problematic test. Then we can avoid being removed from CRAN and can fix the issue on our own timeline.
  2. Based on previous interactions with CRAN, I won't re-submit until they send an email asking us to. @guolinke can you please let me know if you receive any emails from CRAN?
  3. Tonight, I'll also try adding a CI job that uses clang on ubuntu-latest with the cran package to see if this would have caught it. If not, I'll try running on r-devel-linux-x86_64-debian-clang using R Hub.

@StrikerRUS
Copy link
Collaborator

I also see some weird warnings in the logs:

     [LightGBM] [Warning] Unknown parameter: 0x76d8268>
     [LightGBM] [Warning] Unknown parameter: valids
     [LightGBM] [Warning] Unknown parameter: 0x76d8268>
     [LightGBM] [Warning] Unknown parameter: valids
     [LightGBM] [Warning] Unknown parameter: 0x76d8268>
     [LightGBM] [Warning] Unknown parameter: valids

...

     [LightGBM] [Warning] Unknown parameter: categorical_featurs
     [LightGBM] [Warning] Unknown parameter: categorical_featurs
     [LightGBM] [Warning] Unknown parameter: categorical_featurs
     [LightGBM] [Warning] Unknown parameter: categorical_featurs

...

     [LightGBM] [Warning] Unknown parameter: 2,3
     [LightGBM] [Warning] Unknown parameter: 127,3
     [LightGBM] [Warning] Unknown parameter: 2,3
     [LightGBM] [Warning] Unknown parameter: 2,3

...

     [LightGBM] [Warning] Unknown parameter: 2,3
     [LightGBM] [Warning] Unknown parameter interaction_constraints=cap-shape=bell
     [LightGBM] [Warning] Unknown parameter cap-shape=conical,cap-shape=convex
     [LightGBM] [Warning] Unknown parameter: ,
     [LightGBM] [Warning] Unknown parameter: 2,3

Are they all from that failing test?..

@jameslamb
Copy link
Collaborator Author

hmmm not sure! Will have to investigate. Those all seem related to places where we take an R vector, concatenate its values, and write the to a string so they could also be related to encoding issues.

@StrikerRUS
Copy link
Collaborator

categorical_featurs

This one looks like a typo (categorical_featurEs). And seems it comes from here (not failing test), so I guess we see logs from ALL tests on CRAN page in case of at least one of them fails.

, categorical_featurs = 1L

@jameslamb
Copy link
Collaborator Author

ah!!! Good eye.

@jameslamb
Copy link
Collaborator Author

I wasn't able to add a CI test for this and investigate further, will try to check tomorrow. I've opened #4109 to fix the categorical_features typo. I also kicked off an R Hub build on debian-clang-devel, with the package built from master.

Posting the link here, it should be done in the next few minutes but I'm signing off for the day: https://builder.r-hub.io/status/lightgbm_3.2.0.99.tar.gz-2492c0f28ca84993b53a3778f3dc5138

@jameslamb
Copy link
Collaborator Author

Well, "good" news...the R Hub check failed with exactly the same error we got on CRAN.

image

So at least that means that check is probably a reliable way to replicate CRAN, and therefore a good way to test fixes.

@StrikerRUS
Copy link
Collaborator

@jameslamb

guolinke can you please let me know if you receive any emails from CRAN?

I remember, he asked email him for important things.

If not, I'll try running on r-devel-linux-x86_64-debian-clang using R Hub.

We can replicate valuable things or use this Docker directly, I think.
https://github.com/r-hub/rhub-linux-builders/blob/master/debian-clang-devel/Dockerfile

@jameslamb
Copy link
Collaborator Author

Oh right! Will send an email right now.

@jameslamb
Copy link
Collaborator Author

Guolin emailed me back...CRAN has not asked for a re-submission. So this issue should still be fixed but it seems that right now we don't need to worry about the time pressure of re-submitting to CRAN.

@jameslamb
Copy link
Collaborator Author

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants