-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] [ci] mingw gcc often fails on CMake builds #3469
Comments
@StrikerRUS I've opened an issue to document this problem, but I'm not sure we'll be able to reproduce it. It looks like it has not happened very much: https://github.com/microsoft/LightGBM/commits/master (notice how many have ✅ ) |
Those green checks are results of my everyday manual reruns of failed jobs. Most of them happen due to random network problems. So, if the first rerun is successful I even do not look into logs. |
Oh I see. That is going to make it hard to know how often this error happens, then. https://github.com/microsoft/LightGBM/actions?query=branch%3Amaster+workflow%3A%22R+GitHub+Actions%22 also only shows one result per commit |
I am still seeing this a lot. Here's a recent example: https://github.com/microsoft/LightGBM/pull/3510/checks?check_run_id=1337404177 Will look into it when I can. It's so weird....it seems like a problem with rtools, but we host that at https://github.com/microsoft/LightGBM/releases/tag/v2.0.12 so I'm now sure how it could be sometimes broken and sometimes successful. |
@jameslamb That error happens in almost every PR. Maybe the reason is in random PATH order? I think before you find a root cause we can remove unwanted Line 25 in c39afb9
|
It looks like the
I'll open a PR and start testing. |
I also want to add to this thread that it seems this job specifically is the one that often suffers from this issue - os: windows-latest
task: r-package
compiler: MINGW
toolchain: MINGW
r_version: 3.6
build_type: cmake |
I just tested and found that the PATH is identical on successful and failed jobs with this issue. I added this to every job in https://github.com/jameslamb/LightGBM/actions/runs/340974450 $env:PATH -Split ";" PATH on success
PATH on failure
|
I downloaded the So that rules out my theory that maybe the upload only partially succeeded when I uploaded it and we somehow have a not-quite-complete Rtools (I know it's a strange idea, just trying whatever I can think of) I just checked and
(found by adding this to the CI script) Get-FileHash ./Rtools.exe -Algorithm MD5 |
Ok next theory: maybe the environment is changing from job to job and something important like I don't think that's it. The output of environment on successful job
environment on failed job
|
next theory: the wrong The text below shows the output of running failed run
successful run
|
next theory: we're getting the wrong failed run
successful run
So that DOES seem like a problem (we should be getting the It's late here in Chicago so I'm going to stop debugging for tonight. This is the PR I've been using to test, if anyone wants to look at more logs: jameslamb#36 |
Looks like Chocolatey is a real problem for us. I think, normally, we shouldn't get any utility from its' folder, everything should be used from Rtools or other software we install manually (and not via Chocolatey). I believe we can disable Chocolatey for our builds as we do not use it. But I'm not sure how to do that: seems that removing from PATH will not help, real uninstallation is slow... |
That can't be the root of the problem by itself though, right? Since we're getting Now that I've confirmed that the PATH and environment are the same between successful and failed jobs, tonight I'm going to break out my Windows laptop and this things on there. Other theories I have:
|
Yeah, that's true! But I believe that the usage of |
yep I think you're right. I'm investigating stuff in a Windows environment tonight. Will keep trying things. |
ooooo ok I found something interesting! I moved my logging calls down further (after install of MikTeX), and now I see this in failed builds
And this in successful builds
So it seems like maybe installing MiKTeX is doing something that changes the PATH? Still very confusing, but the fact that we're getting a |
I'm going to keep dumping thoughts here... @StrikerRUS if you want to unsubscribe from this issue, I can |
I just tried hard-coding an exact path to WINDOWS_BUILD_TOOLS <- list(
"MinGW" = c(
build_tool = "C:\\Rtools\\mingw_64\\bin\\mingw32-make.exe" still got the same errors! https://github.com/jameslamb/LightGBM/runs/1345827662?check_suite_focus=true |
I MIGHT HAVE FOUND IT! Can't believe I missed it before. I see this at the top of PATH
But looking on my Windows environment, Rtools stores mingw32-make and ld at Which we knew, because it's in the LightGBM R package documentation: https://github.com/microsoft/LightGBM/tree/master/R-package#installing-from-source-with-cmake- Testing now... Ok that did seem to fix the issue with Chocolatey build tools being found: but some builds are still failing https://github.com/jameslamb/LightGBM/runs/1345873666?check_suite_focus=true |
ugh now it's back to failing again: https://github.com/jameslamb/LightGBM/runs/1345978487?check_suite_focus=true |
I remember Appveyor gives a possibility to connect via RDP for free.
No, thanks, everything is fine! I'm OK with skimming all messages. |
WAT?! So weird! https://github.com/microsoft/LightGBM/runs/1398022470
|
WHAT IS HAPPENING 😫 I'm so confused. It is strange that the paths switched from forward slashes to backslashes, right?
I don't understand what is happening 😕 |
After rerun another error (we have already seen it) is shown:
|
UGH |
New error:
|
One another popular error report about conflict installations:
|
@StrikerRUS just confirming...was that new error also only seen on the MINGW + R 3.6 + CMake build? |
I think so... Sorry, I didn't pay much attention, so maybe it was R4 + CMake, not sure. |
I used this awesome hack to connect to GitHub Actions runner via RDP and found that LightGBM/.ci/test_r_package_windows.ps1 Lines 94 to 96 in 6356e65
|
Maybe we can containerize our Windows builds for R jobs with the aim to minimize weird build errors caused by overpopulated by various packages environment? |
WHOA that's amazing that you can do that! Yes I definitely think that could be related. I also didn't know that there were any free Windows containers. Can that only be used on Docker for Windows? |
Seems to be so:
|
I'm going to pick this up tonight, thanks for the excellent tips @StrikerRUS ! This RDP hack gives me a whole set of new ideas to try. |
This comment shouldn't affect merging of #4053. I think a better long-term workaround for strange environment problems will be using Windows containers. It looks like it is not difficult: |
* [ci] add CMake + R 3.6 test back (fixes #3469) * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update .ci/test_r_package_windows.ps1 * -Wait and remove rtools40 * empty commit Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* [docs]Add alt text on images * Update docs/GPU-Windows.rst Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update docs/GPU-Windows.rst Co-authored-by: James Lamb <jaylamb20@gmail.com> * Apply suggestions from code review Co-authored-by: James Lamb <jaylamb20@gmail.com> * Apply suggestions from code review Co-authored-by: James Lamb <jaylamb20@gmail.com> * Merge main branch commit updates (#1) * [docs] Add alt text to image in Parameters-Tuning.rst (#4035) * [docs] Add alt text to image in Parameters-Tuning.rst Add alt text to Leaf-wise growth image, as part of #4028 * Update docs/Parameters-Tuning.rst Co-authored-by: James Lamb <jaylamb20@gmail.com> Co-authored-by: James Lamb <jaylamb20@gmail.com> * [ci] [R-package] upgrade to R 4.0.4 in CI (#4042) * [docs] update description of deterministic parameter (#4027) * update description of deterministic parameter to require using with force_row_wise or force_col_wise * Update include/LightGBM/config.h Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * update docs Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * [dask] Include support for init_score (#3950) * include support for init_score * use dataframe from init_score and test difference with and without init_score in local model * revert refactoring * initial docs. test between distributed models with and without init_score * remove ranker from tests * test value for root node and change docs * comma * re-include parametrize * fix incorrect merge * use single init_score and the booster_ attribute * use np.float64 instead of float * [ci] ignore untitle Jupyter notebooks in .gitignore (#4047) * [ci] prevent getting incompatible dask and distributed versions (#4054) * [ci] prevent getting incompatible dask and distributed versions * Update .ci/test.sh Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * empty commit Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * [ci] fix R CMD CHECK note about example timings (fixes #4049) (#4055) * [ci] fix R CMD CHECK note about example timings (fixes #4049) * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * empty commit Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * [ci] add CMake + R 3.6 test back (fixes #3469) (#4053) * [ci] add CMake + R 3.6 test back (fixes #3469) * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update .ci/test_r_package_windows.ps1 * -Wait and remove rtools40 * empty commit Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * [dask] include multiclass-classification task in tests (#4048) * include multiclass-classification task and task_to_model_factory dicts * define centers coordinates. flatten init_scores within each partition for multiclass-classification * include issue comment and fix linting error * Update index.rst (#4029) Add alt text to logo image Co-authored-by: James Lamb <jaylamb20@gmail.com> * [dask] raise more informative error for duplicates in 'machines' (fixes #4057) (#4059) * [dask] raise more informative error for duplicates in 'machines' * uncomment * avoid test failure * Revert "avoid test failure" This reverts commit 9442bdf. * [dask] add tutorial documentation (fixes #3814, fixes #3838) (#4030) * [dask] add tutorial documentation (fixes #3814, fixes #3838) * add notes on saving the model * quick start examples * add examples * fix timeouts in examples * remove notebook * fill out prediction section * table of contents * add line back * linting * isort * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * move examples under python-guide * remove unused pickle import Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * set 'pending' commit status for R Solaris optional workflow (#4061) * [docs] add Yu Shi to repo maintainers (#4060) * Update FAQ.rst * Update CODEOWNERS * set is_linear_ to false when it is absent from the model file (fix #3778) (#4056) * Add CMake option to enable sanitizers and build gtest (#3555) * Add CMake option to enable sanitizer * Set up gtest * Address reviewer's feedback * Address reviewer's feedback * Update CMakeLists.txt Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * added type hint (#4070) * [ci] run Dask examples on CI (#4064) * Update Parallel-Learning-Guide.rst * Update test.sh * fix path * address review comments * [python-package] add type hints on Booster.set_network() (#4068) * [python-package] add type hints on Booster.set_network() * change behavior * [python-package] Some mypy fixes (#3916) * Some mypy fixes * address James' comments * Re-introduce pass in empty classes * Update compat.py Remove extra lines * [dask] [ci] fix flaky network-setup test (#4071) * [tests][dask] simplify code in Dask tests (#4075) * simplify Dask tests code * enable CI * disable CI * Revert "[ci] prevent getting incompatible dask and distributed versions (#4054)" (#4076) This reverts commit 4e9c976. * Fix parsing of non-finite values (#3942) * Fix index out-of-range exception generated by BaggingHelper on small datasets. Prior to this change, the line "score_t threshold = tmp_gradients[top_k - 1];" would generate an exception, since tmp_gradients would be empty when the cnt input value to the function is zero. * Update goss.hpp * Update goss.hpp * Add API method LGBM_BoosterPredictForMats which runs prediction on a data set given as of array of pointers to rows (as opposed to existing method LGBM_BoosterPredictForMat which requires data given as contiguous array) * Fix incorrect upstream merge * Add link to LightGBM.NET * Fix indenting to 2 spaces * Dummy edit to trigger CI * Dummy edit to trigger CI * remove duplicate functions from merge * Fix parsing of non-finite values. Current implementation silently returns zero when input string is "inf", "-inf", or "nan" when compiled with VS2017, so instead just explicitly check for these values and fail if there is no match. No attempt to optimise string allocations in this implementation since it is usually rarely invoked. * Dummy commit to trigger CI * Also handle -nan in double parsing method * Update include/LightGBM/utils/common.h Remove trailing whitespace to pass linting tests Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: matthew-peacock <matthew.peacock@whiteoakam.com> Co-authored-by: Guolin Ke <guolin.ke@outlook.com> Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * [dask] remove unused imports from typing (#4079) * Range check for DCG position discount lookup (#4069) * Add check to prevent out of index lookup in the position discount table. Add debug logging to report number of queries found in the data. * Change debug logging location so that we can print the data file name as well. * Revert "Change debug logging location so that we can print the data file name as well." This reverts commit 3981b34. * Add data file name to debug logging. * Move log line to a place where it is output even when query IDs are read from a separate file. * Also add the out-of-range check to rank metrics. * Perform check after number of queries is initialized. * Update * [ci] upgrade R CI scripts to work on Ubuntu 20.04 (#4084) * [ci] install additional LaTeX packages in R CI jobs * update autoconf version * bump upper limit on package size to 100 * [SWIG] Add streaming data support + cpp tests (#3997) * [feature] Add ChunkedArray to SWIG * Add ChunkedArray * Add ChunkedArray_API_extensions.i * Add SWIG class wrappers * Address some review comments * Fix linting issues * Move test to tests/test_ChunkedArray_manually.cpp * Add test note * Move ChunkedArray to include/LightGBM/utils/ * Declare more explicit types of ChunkedArray in the SWIG API. * Port ChunkedArray tests to googletest * Please C++ linter * Address StrikerRUS' review comments * Update SWIG doc & disable ChunkedArray<int64_t> * Use CHECK_EQ instead of assert * Change include order (linting) * Rename ChunkedArray -> chunked_array files * Change header guards * Address last comments from StrikerRUS * store all CMake files in one place (#4087) * v3.2.0 release (#3872) * Update VERSION.txt * update appveyor.yml and configure * fix Appveyor builds Co-authored-by: James Lamb <jaylamb20@gmail.com> Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: StrikerRUS <nekit94-12@hotmail.com> * [ci] Bump version for development (#4094) * Update .appveyor.yml * Update cran-comments.md * Update VERSION.txt * update configure Co-authored-by: James Lamb <jaylamb20@gmail.com> * [ci] fix flaky Azure Pipelines jobs (#4095) * Update test.sh * Update setup.sh * Update .vsts-ci.yml * Update test.sh * Update setup.sh * Update .vsts-ci.yml * Update setup.sh * Update setup.sh Co-authored-by: Subham Agrawal <34346812+subhamagrawal7@users.noreply.github.com> Co-authored-by: James Lamb <jaylamb20@gmail.com> Co-authored-by: shiyu1994 <shiyu_k1994@qq.com> Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: jmoralez <jmoralz92@gmail.com> Co-authored-by: marcelonieva7 <72712805+marcelonieva7@users.noreply.github.com> Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Deddy Jobson <dedjob@hotmail.com> Co-authored-by: Alberto Ferreira <AlbertoEAF@users.noreply.github.com> Co-authored-by: mjmckp <mjmckp@users.noreply.github.com> Co-authored-by: matthew-peacock <matthew.peacock@whiteoakam.com> Co-authored-by: Guolin Ke <guolin.ke@outlook.com> Co-authored-by: ashok-ponnuswami-msft <57648631+ashok-ponnuswami-msft@users.noreply.github.com> Co-authored-by: StrikerRUS <nekit94-12@hotmail.com> * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: James Lamb <jaylamb20@gmail.com> Co-authored-by: Subham Agrawal <34346812+subhamagrawal7@users.noreply.github.com> Co-authored-by: shiyu1994 <shiyu_k1994@qq.com> Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: jmoralez <jmoralz92@gmail.com> Co-authored-by: marcelonieva7 <72712805+marcelonieva7@users.noreply.github.com> Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Deddy Jobson <dedjob@hotmail.com> Co-authored-by: Alberto Ferreira <AlbertoEAF@users.noreply.github.com> Co-authored-by: mjmckp <mjmckp@users.noreply.github.com> Co-authored-by: matthew-peacock <matthew.peacock@whiteoakam.com> Co-authored-by: Guolin Ke <guolin.ke@outlook.com> Co-authored-by: ashok-ponnuswami-msft <57648631+ashok-ponnuswami-msft@users.noreply.github.com> Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
See https://github.com/microsoft/LightGBM/runs/1264985038?check_suite_focus=true, from #3443 (comment)
full logs
The text was updated successfully, but these errors were encountered: