Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] [ci] Add Windows CI for R package (fixes #2335) #2936

Merged
merged 93 commits into from
Apr 26, 2020

Conversation

jameslamb
Copy link
Collaborator

This PR adds CI support for the R package on Windows. It also bumps up the version of R we test against (on all operating systems) from 3.6.1 to 3.6.3.

As of this PR, 4 R CMD check NOTES remain on Windows. All of them can be addressed in later PRs and none of them were introduced by changes in this PR.

R CMD CHECK NOTES (click for details)

note-1

* checking package dependencies ... NOTE
Packages suggested but not available for checking:
  'ggplot2', 'knitr', 'rmarkdown'

note-2

* checking installed package size ... NOTE
  installed size is  6.9Mb
  sub-directories of 1Mb or more:
    libs   6.3Mb

note-3

* checking compiled code ... NOTE
File 'lightgbm/libs/x64/lib_lightgbm.dll':
  Found no calls to: 'R_registerRoutines', 'R_useDynamicSymbols'

It is good practice to register native routines and to disable symbol
search.

note-4

* checking compiled code ... NOTE
Note: information on .o files for x64 is not available
File 'C:/projects/lightgbm/lightgbm.Rcheck/lightgbm/libs/x64/lib_lightgbm.dll':
  Found 'abort', possibly from 'abort' (C), 'runtime' (Fortran)
Compiled code should not call entry points which might terminate R nor
write to stdout/stderr instead of to the console, nor use Fortran I/O
nor system RNGs. The detected symbols are linked into the code but
might come from libraries and not actually be called.
  • note-1: not relevant for us, won't actually show up on CRAN and we shouldn't incur the extra time to install those packages. CRAN will have all Suggests dependencies available.
  • note-2: only showing up on Windows. Probably some build files not caught by .Rbuildignore. Can be addressed in a follow-up PR.
  • note-3: addressed by [R-package] adding routine registration in R package (fixes #1910) #2911
  • note-4: Can be addressed in a follow-up PR.

Currently, .appveyor.yml on master only has code for the Python package. It has some code there (like setting up the test-env conda environment) that is identical to code in .vsts-ci.yml. In this PR, I decided to consolidate that code in .ci/test_windows.ps1. That made it easier to do things like if $env:TASK -eq "r-package", and is similar to how we handle Mac and Linux in .ci/test.sh.

The R portion of .ci/test_windows.ps1 borrows heavily from @StrikerRUS 's setup for the RGF package (https://github.com/RGF-team/rgf/blob/master/R-package/.R.appveyor.ps1). Thank you for that!

This PR has so many commits in it because I spent a lot of time getting this working in the Travis, Appveyor, and Azure DevOps setups for my fork.

successful builds from jameslamb/LightGBM fork

Azure DevOps

image

AppVeyor

image

Travis

image

This PR supports the progress on #629 and would give us more confidence in things like #2901 and #987

@jameslamb
Copy link
Collaborator Author

Azure is failing with this error, which I did see one other time during my testing:

image

This is a server for TeX packages. The one time I saw this while I was preparing this PR, it was fixed by just waiting a minute or two and rebuilding, so I'm going to close and reopen to trigger CI.

Everything on Travis is passing except our old friend, the check-docs task.

@jameslamb jameslamb closed this Mar 23, 2020
@jameslamb
Copy link
Collaborator Author

close-reopen to trigger CI

@StrikerRUS
Copy link
Collaborator

@jameslamb Awesome job, thanks!

I see Azure is failing again with the error you've mentioned. As we know that it occurs quite often, maybe we should retry download? For example: https://forums.hak5.org/topic/36911-powershell-retry-download-until-complete/?tab=comments#comment-268464

.ci/test_windows.ps1 Outdated Show resolved Hide resolved
@jameslamb
Copy link
Collaborator Author

Ok I addressed and in e54954a.

Once I got past those, I also had to add some changes for MSVC. I realized we don't really need LIBR_LIB_DIR in FindLibR.cmake...it was just an intermediary used to find R.dll. So I tried changing some things there to get MSVC working.

On my personal Azure DevOps, the R build took 32 minutes to complete (but did complete successfully) and the Appveyor build timed out after an hour. I'm investigating more on my fork.

I think the issue here is that this PR was working for just "compile lib_lightgbm and build the R package that calls it". But now that as of #2901 we're linking to R's shared library, there are additional things I need to fix.

@jameslamb
Copy link
Collaborator Author

I'm currently working in my fork on getting the builds to work again now that we're linking to R.dll. I am seeing this tough-to-understand behavior where, when using MSVC, the build just hangs for an hour and dies: https://ci.appveyor.com/project/jameslamb/lightgbm/builds/31695220/job/ia9pdstbxx0r3anj

I'm not seeing that behavior locally on my laptop running Windows 10, using MSVC bundled with Visual Studio 16 2019. Will let you all know when this is ready for review, it was probably premature to turn it from Draft to Ready :/

@jameslamb
Copy link
Collaborator Author

jameslamb commented Mar 25, 2020

Ok here's where we stand:

  • Appveyor build with MinGW seems to be working, and ran in 14 minute for the R package which seems ok.
  • On AzureDevops, I'm still seeing an issue with downloading MiKTeX. The retry-logic is working but now it's hitting a different error where some step in miktexsetup.exe is trying to hit a server that's timing out. Did you ever face that in RGF @StrikerRUS ?

image

@StrikerRUS
Copy link
Collaborator

@jameslamb

Did you ever face that in RGF?

Never. I remember only one issue related to miktexsetup:

Update Miktex installation: https://miktex.org/announcement/miktex-portable-sfx-discontinued.
RGF-team/rgf#295

Maybe Azure blocks some IPs like Travis blocks apts outside of their whitelist?..
https://github.com/travis-ci/apt-package-safelist

@jameslamb
Copy link
Collaborator Author

@jameslamb

Did you ever face that in RGF?

Never. I remember only one issue related to miktexsetup:

Update Miktex installation: https://miktex.org/announcement/miktex-portable-sfx-discontinued.
RGF-team/rgf#295

Maybe Azure blocks some IPs like Travis blocks apts outside of their whitelist?..
https://github.com/travis-ci/apt-package-safelist

yeah I was thinking it might be something Azure-specific because I've only ever seen it there. That's a good idea!

I have an idea too, going to try it on my fork 😀

@StrikerRUS
Copy link
Collaborator

@jameslamb
If nothing helped, we can re-host installation package at our releases page like we do for AMD-APP-SDK

wget -q https://github.com/microsoft/LightGBM/releases/download/v2.0.12/AMD-APP-SDKInstaller-v3.0.130.136-GA-linux64.tar.bz2

build_r.R Outdated
@@ -5,6 +5,9 @@
# Sys.setenv("CXX" = "/usr/local/bin/g++-8")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not strictly related to this PR and is being added in #2957 . It'll disappear from the diff when that gets merged

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that #2957 has been merged, I'll rebase to master before the next commit I push here

@jameslamb
Copy link
Collaborator Author

@StrikerRUS thanks for the great review so far!

@Laurae2 @guolinke could one of you take a look at this? @StrikerRUS and I will keep talking through miscellaneous things in CI scripts, but I don't want to merge this until an R reviewer also approves.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameslamb Can we fix this warning?

Warning message:
In normalizePath(path.expand(path), winslash, mustWork) :
  path[1]="C:/projects/lightgbm/RLibrary/R/lib": The system cannot find the file specified

Please check one more round of review below.

@guolinke Printing out build logs with MinGW allowed to spot many compilation warnings. Can you please check them?

.ci/test_r_package_windows.ps1 Outdated Show resolved Hide resolved
.ci/test_r_package_windows.ps1 Outdated Show resolved Hide resolved
.ci/test_r_package_windows.ps1 Outdated Show resolved Hide resolved
.ci/test_r_package_windows.ps1 Outdated Show resolved Hide resolved
.ci/test_r_package_windows.ps1 Show resolved Hide resolved
.vsts-ci.yml Outdated Show resolved Hide resolved
R-package/src/cmake/modules/FindLibR.cmake Outdated Show resolved Hide resolved
@jameslamb
Copy link
Collaborator Author

@jameslamb Can we fix this warning?

Warning message:
In normalizePath(path.expand(path), winslash, mustWork) :
  path[1]="C:/projects/lightgbm/RLibrary/R/lib": The system cannot find the file specified

Please check one more round of review below.

@guolinke Printing out build logs with MinGW allowed to spot many compilation warnings. Can you please check them?

On the point of compilation warnings...if we determine that they were not introduced by this PR (I really think they were not), let's please open a separate issue and not delay this PR any further.

The normalizePath() warning is fixed in #2963. The issue is that we use R.home('lib') but that function just constructs a directory. It doesn't guarantee that that directory exists. That code is used to generate LIBR_LIB_DIR, which is only necessary for MSVC builds. I completely removed it in #2963. It can safely be ignored in this PR and will be addressed in the followup MSVC PRs.

@StrikerRUS
Copy link
Collaborator

@jameslamb

On the point of compilation warnings...if we determine that they were not introduced by this PR (I really think they were not), let's please open a separate issue and not delay this PR any further.

I wanted to init the discussion here where we can see these logs live. Of course, they shouldn't prevent merging of this PR in any way.

The normalizePath() warning is fixed in #2963. The issue is that we use R.home('lib') but that function just constructs a directory.

Ah, OK!

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameslamb Thanks a lot for all your hard work! I don't think that the rest minor comments should prevent from merging, but please check them.

.ci/test_r_package_windows.ps1 Outdated Show resolved Hide resolved
.ci/test_r_package_windows.ps1 Outdated Show resolved Hide resolved
.ci/test_r_package_windows.ps1 Outdated Show resolved Hide resolved
R-package/src/cmake/modules/FindLibR.cmake Show resolved Hide resolved
.ci/test_r_package_windows.ps1 Show resolved Hide resolved
@jameslamb
Copy link
Collaborator Author

Just merged in master to get all the new tests from #3002, #3004, #3005, #3006, #3007, and #3009.

@Laurae2 @guolinke once this builds, it is just waiting on a review from one of you whenever you have time. Thank you!

@@ -67,6 +67,7 @@ if (!use_precompile) {
# Check if Windows installation (for gcc vs Visual Studio)
if (WINDOWS) {
if (use_mingw) {
print("Trying to build with MinGW")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change in another PR print for message.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure! Let's talk about it in a separate issue / PR. We have several other print() statements in install.libs.R and no uses of message() so I'm curious what makes message() preferable here.

} while(!$?);
}

$env:R_WINDOWS_VERSION = "3.6.3"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should test with R 4.0 ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I think we should have a separate issue for testing on R 4.0.0 across all platforms, now that it's officially released. Good call!

I just opened #3024 to capture that work (and added it to #2302 )

set(LIBR_HOME ${LIBR_HOME} CACHE PATH "R home directory")
set(LIBR_EXECUTABLE ${LIBR_EXECUTABLE} CACHE PATH "R executable")
set(LIBR_INCLUDE_DIRS ${LIBR_INCLUDE_DIRS} CACHE PATH "R include directory")
set(LIBR_LIB_DIR ${LIBR_LIB_DIR} CACHE PATH "R shared libraries directory")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In another PR, maybe we should change the name and use LIBR_LIB_DIRS instead of LIBR_LIB_DIR (cmake Find result variables should be plural for include_dirs / lib_dirs, usually they are placeholders for include_dir / lib_dir if they are singular directories).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it is a single directory. This isn't "all directories passed to the linker in -L flags", it is "what is the one directory where R.dll / R.so is likely to live".

Anyway, I'm removing LIBR_LIB_DIR completely in #2963 . It's only being used in MSVC installs, and only indirectly as a way to find LIBR_CORE_LIBRARY.

@jameslamb
Copy link
Collaborator Author

Thanks for the helpful reviews! I'm excited to merge this one 😀

@jameslamb jameslamb merged commit 2c18a0f into microsoft:master Apr 26, 2020
jameslamb added a commit to jameslamb/LightGBM that referenced this pull request Apr 26, 2020
@jameslamb jameslamb deleted the ci/r-windows-ci branch April 26, 2020 23:09
@StrikerRUS
Copy link
Collaborator

@guolinke Now as we run R-package tests at Appveyor we'll face corresponding job failures due to CRAN/CTAN network availability issues quite often (for example, refer to the latest master failure). I wonder, is it possible to give maintainers privileges to re-run failed jobs as we have for Travis/Azure?

It seems that it is technically possible now: https://www.appveyor.com/docs/team-setup/.
But it may require to manually transfer the current account (via e-mail request).

@jameslamb
Copy link
Collaborator Author

^ related to #2936 (comment), without that we can find ourselves in this situation where one long-running AppVeyor build is blocking all others from starting:

image

Notice that the build at the bottom has been running for 45 minutes. It is stuck trying to download something from a CTAN mirror. Ideally, I'd like to be able to kill it in the AppVeyor UI. But since I don't have permissions right now all other builds have to wait.

@StrikerRUS
Copy link
Collaborator

Gently ping @guolinke for #2936 (comment)

@guolinke
Copy link
Collaborator

@StrikerRUS @jameslamb
I have sent invitation to you.

@jameslamb
Copy link
Collaborator Author

@guolinke thanks! Although now that we have #3168 , as soon as that is merged AppVeyor should no longer be a problem 😂

That PR and all others are blocked until you can help us address #3000 (comment)

@StrikerRUS
Copy link
Collaborator

@guolinke Thanks a lot! Now I can re-run failed jobs.

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants