Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move ResourceInformation abstract base class to FWCore/AbstractServices, and few additional improvements #47280

Merged
merged 4 commits into from
Feb 11, 2025

Conversation

makortel
Copy link
Contributor

@makortel makortel commented Feb 6, 2025

PR description:

This PR is part of #30044.

It primarily moves the ResourceInformation abstract base class to a new FWCore/AbstractServices package, in order to (in a subsequent PR) to have the ResourceInformation to return HardwareResourcesDescription object (added in #47175, located in DataFormats/Provenance). We can't make FWCore/Utilities to depend on DataFormats/Provenance, and moving a service base class to DataFormats/Provenance didn't feel good either. We will later move other abstract base classes from FWCore/Utilities to FWCore/AbstractServices.

In addition, to prepare for HardwareResourcesDescription, this PR replaces the enum for "available accelerator types" as string(s) for "selected accelerators", that is just what comes from process.options.accelerators.

In addition (mostly to just take advantage of the PR touching these packages), this PR adds hasGpuNvidia() member function to ResourceInformation that can be used (e.g. in TensorFlow or PyTorch packages) to check if the job should use NVIDIA GPUs in a way that does not directly depend on CUDA. The last commit then changes the code in TensorFlow to use this function.

Resolves cms-sw/framework-team#1198
Resolves cms-sw/framework-team#1200

PR validation:

Code compiles

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

To be eventually backported to 15_0_X.

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 6, 2025

cms-bot internal usage

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 6, 2025

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47280/43591

  • There are other open Pull requests which might conflict with changes you have proposed:

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 6, 2025

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47280/43593

  • There are other open Pull requests which might conflict with changes you have proposed:

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 6, 2025

A new Pull Request was created by @makortel for master.

It involves the following packages:

  • FWCore/AbstractServices (****)
  • FWCore/Framework (core)
  • FWCore/Services (core)
  • FWCore/Utilities (core)
  • HeterogeneousCore/CUDAServices (heterogeneous)
  • HeterogeneousCore/ROCmServices (heterogeneous)
  • PhysicsTools/PyTorch (ml)
  • PhysicsTools/TensorFlow (ml)

The following packages do not have a category, yet:

FWCore/AbstractServices
Please create a PR for https://github.com/cms-sw/cms-bot/blob/master/categories_map.py to assign category

@Dr15Jones, @cmsbuild, @fwyzard, @makortel, @smuzaffar, @valsdav, @y19y19 can you please review it and eventually sign? Thanks.
@felicepantaleo, @fwyzard, @missirol, @riga, @rovere, @wddgit this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

makortel commented Feb 6, 2025

enable gpu

@makortel
Copy link
Contributor Author

makortel commented Feb 6, 2025

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 6, 2025

-1

Failed Tests: Build
Size: This PR adds an extra 64KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-bfad9e/44235/summary.html
COMMIT: d5d136b
CMSSW: CMSSW_15_0_X_2025-02-05-2300/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/47280/44235/install.sh to create a dev area with all the needed externals and cmssw changes.

Build

I found compilation error when building:

>> Compiling  src/HeterogeneousCore/ROCmServices/test/testROCmService.cpp
/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/bin/c++ -c -DCMS_MICRO_ARCH='x86-64-v3' -DGNU_GCC -D_GNU_SOURCE -DTBB_USE_GLIBCXX_VERSION=120301 -DTBB_SUPPRESS_DEPRECATED_MESSAGES -DTBB_PREVIEW_RESUMABLE_TASKS=1 -DTBB_PREVIEW_TASK_GROUP_EXTENSIONS=1 -D__HIP_PLATFORM_HCC__ -D__HIP_PLATFORM_AMD__ -DBOOST_SPIRIT_THREADSAFE -DPHOENIX_THREADSAFE -DBOOST_MATH_DISABLE_STD_FPCLASSIFY -DBOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX -DCMSSW_GIT_HASH='CMSSW_15_0_X_2025-02-05-2300' -DPROJECT_NAME='CMSSW' -DPROJECT_VERSION='CMSSW_15_0_X_2025-02-05-2300' -Isrc -Ipoison -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_15_0_X_2025-02-05-2300/src -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/pcre/8.43-2d141998cfe5424b8f7aff48035cc2da/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/boost/1.80.0-a2c84315bd72151dcb3b6e3fe5018437/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/bz2lib/1.0.6-d065ccd79984efc6d4660f410e4c81de/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/libuuid/2.34-27ce4c3579b5b1de2808ea9c4cd8ed29/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/python3/3.9.14-ccc34bac15aa449b4c76ba24d02d2fd7/include/python3.9 -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/rocm/6.2.4-0a366585c16e3116ddeaba7741c05c93/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/lcg/root/6.32.09-47cefdd6f737afcf5535fa577c928c47/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/tbb/v2021.9.0-63e1493f6c63f7899f38cf6d13a1d19f/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/xz/5.2.5-6f3f49b07db84e10c9be594a1176c114/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/zlib/1.2.13-d217cdbdd8d586e845e05946de2796be/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/catch2/2.13.6-17102db92de47c6a473c6e67627c548a/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/fmt/10.2.1-e35fd1db5eb3abc8ac0452e8ee427196/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/md5/1.0.0-5b594b264e04ae51e893b1d69a797ec6/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/py3-pybind11/2.13.6-16793d3657f4e0749f8f9007c2eabf31/lib/python3.9/site-packages/pybind11/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/tinyxml2/6.2.0-f99ae2781d074227d47e8a3e7c8ec87e/include -O3 -pthread -pipe -Werror=main -Werror=pointer-arith -Werror=overlength-strings -Wno-vla -Werror=overflow -std=c++20 -ftree-vectorize -Werror=array-bounds -Werror=format-contains-nul -Werror=type-limits -fvisibility-inlines-hidden -fno-math-errno --param vect-max-version-for-alias-checks=50 -Xassembler --compress-debug-sections -Wno-error=array-bounds -Warray-bounds -fuse-ld=bfd -march=x86-64-v3 -felide-constructors -fmessage-length=0 -Wall -Wno-non-template-friend -Wno-long-long -Wreturn-type -Wextra -Wpessimizing-move -Wclass-memaccess -Wno-cast-function-type -Wno-unused-but-set-parameter -Wno-ignored-qualifiers -Wno-unused-parameter -Wunused -Wparentheses -Werror=return-type -Werror=missing-braces -Werror=unused-value -Werror=unused-label -Werror=address -Werror=format -Werror=sign-compare -Werror=write-strings -Werror=delete-non-virtual-dtor -Werror=strict-aliasing -Werror=narrowing -Werror=unused-but-set-variable -Werror=reorder -Werror=unused-variable -Werror=conversion-null -Werror=return-local-addr -Wnon-virtual-dtor -Werror=switch -fdiagnostics-show-option -Wno-unused-local-typedefs -Wno-attributes -Wno-psabi -Wno-error=unused-variable -DBOOST_DISABLE_ASSERTS -flto=auto -fipa-icf -flto-odr-type-merging -fno-fat-lto-objects -Wodr -fPIC -MMD -MF tmp/el8_amd64_gcc12/src/HeterogeneousCore/ROCmServices/test/testROCmService/testROCmService.cpp.d src/HeterogeneousCore/ROCmServices/test/testROCmService.cpp -o tmp/el8_amd64_gcc12/src/HeterogeneousCore/ROCmServices/test/testROCmService/testROCmService.cpp.o
>> Compiling  src/HeterogeneousCore/ROCmServices/test/test_main.cpp
/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/bin/c++ -c -DCMS_MICRO_ARCH='x86-64-v3' -DGNU_GCC -D_GNU_SOURCE -DTBB_USE_GLIBCXX_VERSION=120301 -DTBB_SUPPRESS_DEPRECATED_MESSAGES -DTBB_PREVIEW_RESUMABLE_TASKS=1 -DTBB_PREVIEW_TASK_GROUP_EXTENSIONS=1 -D__HIP_PLATFORM_HCC__ -D__HIP_PLATFORM_AMD__ -DBOOST_SPIRIT_THREADSAFE -DPHOENIX_THREADSAFE -DBOOST_MATH_DISABLE_STD_FPCLASSIFY -DBOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX -DCMSSW_GIT_HASH='CMSSW_15_0_X_2025-02-05-2300' -DPROJECT_NAME='CMSSW' -DPROJECT_VERSION='CMSSW_15_0_X_2025-02-05-2300' -Isrc -Ipoison -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_15_0_X_2025-02-05-2300/src -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/pcre/8.43-2d141998cfe5424b8f7aff48035cc2da/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/boost/1.80.0-a2c84315bd72151dcb3b6e3fe5018437/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/bz2lib/1.0.6-d065ccd79984efc6d4660f410e4c81de/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/libuuid/2.34-27ce4c3579b5b1de2808ea9c4cd8ed29/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/python3/3.9.14-ccc34bac15aa449b4c76ba24d02d2fd7/include/python3.9 -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/rocm/6.2.4-0a366585c16e3116ddeaba7741c05c93/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/lcg/root/6.32.09-47cefdd6f737afcf5535fa577c928c47/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/tbb/v2021.9.0-63e1493f6c63f7899f38cf6d13a1d19f/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/xz/5.2.5-6f3f49b07db84e10c9be594a1176c114/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/zlib/1.2.13-d217cdbdd8d586e845e05946de2796be/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/catch2/2.13.6-17102db92de47c6a473c6e67627c548a/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/fmt/10.2.1-e35fd1db5eb3abc8ac0452e8ee427196/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/md5/1.0.0-5b594b264e04ae51e893b1d69a797ec6/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/py3-pybind11/2.13.6-16793d3657f4e0749f8f9007c2eabf31/lib/python3.9/site-packages/pybind11/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02875/el8_amd64_gcc12/external/tinyxml2/6.2.0-f99ae2781d074227d47e8a3e7c8ec87e/include -O3 -pthread -pipe -Werror=main -Werror=pointer-arith -Werror=overlength-strings -Wno-vla -Werror=overflow -std=c++20 -ftree-vectorize -Werror=array-bounds -Werror=format-contains-nul -Werror=type-limits -fvisibility-inlines-hidden -fno-math-errno --param vect-max-version-for-alias-checks=50 -Xassembler --compress-debug-sections -Wno-error=array-bounds -Warray-bounds -fuse-ld=bfd -march=x86-64-v3 -felide-constructors -fmessage-length=0 -Wall -Wno-non-template-friend -Wno-long-long -Wreturn-type -Wextra -Wpessimizing-move -Wclass-memaccess -Wno-cast-function-type -Wno-unused-but-set-parameter -Wno-ignored-qualifiers -Wno-unused-parameter -Wunused -Wparentheses -Werror=return-type -Werror=missing-braces -Werror=unused-value -Werror=unused-label -Werror=address -Werror=format -Werror=sign-compare -Werror=write-strings -Werror=delete-non-virtual-dtor -Werror=strict-aliasing -Werror=narrowing -Werror=unused-but-set-variable -Werror=reorder -Werror=unused-variable -Werror=conversion-null -Werror=return-local-addr -Wnon-virtual-dtor -Werror=switch -fdiagnostics-show-option -Wno-unused-local-typedefs -Wno-attributes -Wno-psabi -Wno-error=unused-variable -DBOOST_DISABLE_ASSERTS -flto=auto -fipa-icf -flto-odr-type-merging -fno-fat-lto-objects -Wodr -fPIC -MMD -MF tmp/el8_amd64_gcc12/src/HeterogeneousCore/ROCmServices/test/testROCmService/test_main.cpp.d src/HeterogeneousCore/ROCmServices/test/test_main.cpp -o tmp/el8_amd64_gcc12/src/HeterogeneousCore/ROCmServices/test/testROCmService/test_main.cpp.o
In file included from src/HeterogeneousCore/ROCmServices/test/testROCmService.cpp:20:
poison/FWCore/Utilities/interface/ResourceInformation.h:1:2: error: #error THIS FILE HAS BEEN REMOVED FROM THE PACKAGE.
    1 | #error THIS FILE HAS BEEN REMOVED FROM THE PACKAGE.
      |  ^~~~~
src/HeterogeneousCore/ROCmServices/test/testROCmService.cpp: In function 'void ____C_A_T_C_H____T_E_S_T____0()':
src/HeterogeneousCore/ROCmServices/test/testROCmService.cpp:134:25: error: 'ResourceInformation' is not a member of 'edm'
  134 |       edm::Service ri;


@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 7, 2025

Pull request #47280 was updated. @Dr15Jones, @cmsbuild, @fwyzard, @makortel, @smuzaffar, @valsdav, @y19y19 can you please check and sign again.

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 7, 2025

+1

Size: This PR adds an extra 24KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-bfad9e/44262/summary.html
COMMIT: 9b9a39b
CMSSW: CMSSW_15_0_X_2025-02-06-2300/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/47280/44262/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 3 lines to the logs
  • Reco comparison results: 9 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 4016960
  • DQMHistoTests: Total failures: 85
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4016855
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 218 log files, 189 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 24 differences found in the comparisons
  • DQMHistoTests: Total files compared: 7
  • DQMHistoTests: Total histograms compared: 53071
  • DQMHistoTests: Total failures: 872
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 52199
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
  • Checked 24 log files, 30 edm output root files, 7 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor Author

makortel commented Feb 7, 2025

CPU differences are related to #47071. GPU differences seem compatible with the non-reproducibilities in the pixel code.

@makortel
Copy link
Contributor Author

makortel commented Feb 7, 2025

+core

@makortel
Copy link
Contributor Author

makortel commented Feb 7, 2025

+heterogeneous

@makortel
Copy link
Contributor Author

@cms-sw/ml-l2 Could you review and sign, please? Thanks!

@valsdav
Copy link
Contributor

valsdav commented Feb 10, 2025

+ml

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @mandrenguyen, @antoniovilela, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2)

@mandrenguyen
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment