Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build integrated Python package library #3144

Merged
merged 31 commits into from
Sep 21, 2020

Conversation

tpboudreau
Copy link
Contributor

@tpboudreau tpboudreau commented Jun 3, 2020

PLEASE DO NOT MERGE WITHOUT MODIFICATION -- SEE BELOW.

This PR introduces a new build option that builds lib_lightgbm.{so,dll} with the GPU code enabled, but without any runtime dependency on an OpenCL ICD loader or Boost libraries. It does this by integrating the Khronos ICD Loader and Boost filesystem and system libraries into the LightGMB shared library at build time (akin to static linking). This build strategy is useful for building the library for inclusion in a Python wheel targeted for Windows, as it allows the package to run on both GPU and non-GPU (OpenCL) equipped machines.

There are at least two issues that must be resolved before merging:

  1. the static build technique also works on Linux, but will require a small upstream change in the build options for the Khronos ICD library. For now, I'm referencing my fork with that change in the Khronos build script. Depending on community feedback (that is, whether this build option would be useful on Linux too) I'll either pursue the upstream change in the Khronos library or pull out the Linux build code from this PR.

  2. Ideally, when building a dll for inclusion in a Windows wheel, the library name would contain a unique stamp or hash of some kind as a failsafe to avoid conflicts with other copies of the library that might be present on the runtime system (see for example this discussion: https://discuss.python.org/t/packaging-dlls-on-windows/1401). The current dll build options don't do this. I've included logic in the new build path to generate such a stamp, but I've commented out the lines of the script that rename the final library until I hear back from the community on whether this approach is desirable. (It would require some changes to setup.py and perhaps other files, and maybe it belongs in a different PR.)

To run this build you can do: "python setup.py install --opencl-python-package".

Alternatively, to build directly from a git-bash command line on Windows run:

mkdir build && cd build
cmake .. -D__OPENCL_PYTHON_PACKAGE=ON -Wno-dev
cmake --build . --target _lightgbm --config "Release"

You can also build with the equivalent MSVC GUI actions.

On Linux, build with:

mkdir build && cd build
cmake -D__OPENCL_PYTHON_PACKAGE=ON -Wno-dev -DCMAKE_BUILD_TYPE=Release ..
make

(As mentioned above, the Linux build won't run to completion until we resolve the Khronos upstream change issue.) [EDIT: the Linux build should work now, but the upstream change issue still must be resolved before merging.]

@ghost
Copy link

ghost commented Jun 3, 2020

CLA assistant check
All CLA requirements met.

@itamarst itamarst mentioned this pull request Jun 3, 2020
7 tasks
set(USE_TIMETAG OFF CACHE BOOL "")
set(USE_DEBUG OFF CACHE BOOL "")
set(BUILD_FOR_R OFF CACHE BOOL "")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have other comments to offer yet, but can you please be sure that CMake cache files are not checked into the repo?

Maybe add a *Cache.txt entry to .gitignore, I don't think that would conflict with anything we have intentionally checked in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the early feedback.

The file was meant as a cmake preload file, but since it's likely to cause further confusion, I've moved its contents into the main CMakeLists.txt file and removed the offending file.

@tpboudreau tpboudreau changed the title [WIP] Build integrated Python package library Build integrated Python package library Jun 9, 2020
@tpboudreau
Copy link
Contributor Author

I don't believe the failing test (doc build) is related to my patch.

@tpboudreau
Copy link
Contributor Author

Just checking in to see if there are any comments on this PR. I don't believe the test failures are related to my changes (two are R-package issues and one seems like a transient download error). Thanks.

@guolinke
Copy link
Collaborator

guolinke commented Jul 7, 2020

@StrikerRUS can you help to review this PR?

@StrikerRUS
Copy link
Collaborator

@guolinke

can you help to review this PR?

Sorry, I'm not sure that I'm able to provide a thoughtful review for this PR as I'm not familiar enough with OpenCl. I believe that @huanzhang12 should be the main reviewer for this PR. Of course, after his review I'll review the Python part.

@itamarst
Copy link
Contributor

@huanzhang12 any chance you could review this? Thank you!

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tpboudreau Thanks a lot for your amazing work! Please take a look at one more round of review when have time.

CMakeIntegratedOpenCL.cmake Outdated Show resolved Hide resolved
CMakeIntegratedOpenCL.cmake Outdated Show resolved Hide resolved
CMakeIntegratedOpenCL.cmake Show resolved Hide resolved
endif()
list(APPEND INTEGRATED_OPENCL_INCLUDES ${OPENCL_ICD_LOADER_HEADERS_DIR})
list(APPEND INTEGRATED_OPENCL_LIBRARIES ${opencl-icd-loader_BINARY_DIR}/Release/OpenCL.lib cfgmgr32.lib runtimeobject.lib)
list(APPEND INTEGRATED_OPENCL_DEFINITIONS CL_TARGET_OPENCL_VERSION=120)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please comment on this from compatibility side? Why 1.2?

Copy link
Contributor Author

@tpboudreau tpboudreau Sep 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be the latest earliest stable version that contains all the required api's (and 2.x seems to have been a long experiment: https://www.extremetech.com/computing/309842-opencl-3-0-kicks-off-with-a-huge-step-backwards).

But the library compiles without specifying the version (with a few extra messages), so we can remove this if you prefer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification! I'm afraid I have too little knowledge about OpenCL to take a responsibility for this change. I'd better leave it for your decision or more experienced in OpenCL maintainer.

I just want to make it possible that as many as possible users will be able to simply download our wheel and run LightGBM with GPU support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, including the 1.2 guard when you compile the headers prevents you from inadvertently including a call to a 2.x function, which would limit the library to newer implementations. OpenCL versions are backward compatible, so a 2.x implementation can run a 1.2 application/library. (See item 3 here for example: http://developer.amd.com/wordpress/media/2013/12/AMD_APP_SDK_FAQ1.pdf)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! Thanks again!

CMakeIntegratedOpenCL.cmake Outdated Show resolved Hide resolved
CMakeLists.txt Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
set(BOOST_BOOTSTRAP "${BOOST_BASE}/source/bootstrap.bat")
set(BOOST_BUILD "${BOOST_BASE}/source/b2.exe")
set(BOOST_FLAGS "")
list(APPEND BOOST_SUBMODULES "libs/*" "tools/*")
Copy link
Collaborator

@StrikerRUS StrikerRUS Sep 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can workaround 1.5Gb+ size problem by specifying only required submodules here.

IIRC the issue is that all boost libraries were being cloned even though only a handful were needed and I couldn't figure out a way to get around this.
#3144 (comment)

Starting from chrono, filesystem, headers, system, tools/* by trials and errors I came up with the following list which allows to compile successfully, but I bet you know better which submodules are needed. Also, I believe we can reduce number of submodules under the tools folder.

Suggested change
list(APPEND BOOST_SUBMODULES "libs/*" "tools/*")
list(APPEND BOOST_SUBMODULES "libs/algorithm" "libs/align" "libs/any" "libs/array" "libs/assert" "libs/bind" "libs/chrono" "libs/compute" "libs/concept_check" "libs/config" "libs/container" "libs/container_hash" "libs/core" "libs/detail" "libs/filesystem" "libs/foreach" "libs/format" "libs/function" "libs/function_types" "libs/fusion" "libs/headers" "libs/integer" "libs/io" "libs/iterator" "libs/lexical_cast" "libs/math" "libs/move" "libs/mpl" "libs/multi_index" "libs/numeric/conversion" "libs/optional" "libs/predef" "libs/preprocessor" "libs/property_tree" "libs/range" "libs/ratio" "libs/serialization" "libs/smart_ptr" "libs/static_assert" "libs/system" "libs/throw_exception" "libs/tuple" "libs/typeof" "libs/type_index" "libs/type_traits" "libs/utility" "libs/uuid" "libs/winapi" "tools/boost_install" "tools/build")

I know, this list is looking super scary, but explicitly listing submodules it is possible to decrease build time and downloading size significantly (1.5Gb+ -> 700Mb)

Copy link
Collaborator

@StrikerRUS StrikerRUS Sep 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I believe we can reduce number of submodules under the tools folder.

OK, I've experimented more and it looks like we do not need submodules from tools/ at all. I'm updating my yesterday original comment with new suggestion.

UPD: Was wrong. tools/boost_install and tools/build are needed.

Copy link
Contributor Author

@tpboudreau tpboudreau Sep 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StrikerRUS , thank you for doing this exploration.

I'm not opposed to this change, and I've committed the suggested patch, but I think there are some considerations the maintainers should be aware of.

One of the challenges in this patch was coaxing three different build systems (Boost's B2, cmake, and MSVC/make) to cooperate. And as you know I was originally targeting more than just the Windows platform. I found that the Boost dependency list generated at build time differed significantly by platform -- on Linux (generally) only the few expected dependent submodules were built, on Windows (generally), many more submodules were fetched. This was a bit surprising, but I didn't spend a lot of time chasing down why, because it seemed to me that downloading all submodules was always safe, and that even if I figured it all out for each platform, it would lead to complicated platform checking code that might turn out to be fragile in the long run (meaning a risk of not working anymore if the Boost libs or any of the three build systems changed or started interacting differently).

Of course I realize that there may be costs associated with fetching and processing code unecessarily in the CI pipeline, and also that non-Windows builds may never be needed, so I'm OK with the suggested change. I just wanted to alert you all that you may have to rethink this targeted download approach if other platforms are added or the build becomes unstable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I understand! Please correct me: as long as we fix a particular commit in the boost repository and building the library only for Windows users, it is safe to go with my list. In future we can update this list to support building on Linux machines or add conditioning if(WIN32) ... else() ... or fall back to downloading all submodules for all OSes.

Despite that we are using only free CI services (BTW, it will be not true anymore after introducing CUDA support #3160), we build nightly artifacts for each commit in master and in general I believe it is good practice to download things only that are needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, but just my curiosity. Don't you know were there any changes in macOS related to OpenCL support? Refer to #1523 (comment) and #667. I believe some macOS users with eGPUs will benefit a lot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, but I'm not too familiar with the details of OpenCL on MacOS. But the #667 issue you linked to seems (at first glance) like it could be resolved by an approach similar to this PR. I'll look into it further when I get a bit of time.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tpboudreau Excellent contribution! I hope it will greatly help to make GPU installation easier for users. Just one super minor comment below.

CMakeLists.txt Outdated Show resolved Hide resolved
@tpboudreau
Copy link
Contributor Author

Looks green after the latest sync to master.

Thanks @StrikerRUS and @jameslamb for your attention and contributions to this PR.

@StrikerRUS
Copy link
Collaborator

@tpboudreau Could you please resolve two conflicts and I think we can merge this PR.

@tpboudreau
Copy link
Contributor Author

I think this is ready for merging. I believe the failing test is a resource availability issue unrelated to this patch -- please let me know if you think otherwise.

@StrikerRUS
Copy link
Collaborator

Hey @tpboudreau @itamarst ! Is there any news about integration of OpenCL on non-Windows platforms?

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants