Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dcgm: 3.3.5 -> 3.3.9; cudaPackages_10{,_0,_1,_2}: drop #357655

Merged
merged 8 commits into from
Nov 22, 2024

Conversation

emilazy
Copy link
Member

@emilazy emilazy commented Nov 20, 2024

An update of DCGM to a newer version that has dropped CUDA 10 support upstream, fixing the build and cleaning it up a bit in the process, and a scheduled visit from the reaper now that 24.11 has branched off and this is the last thing standing in the way of my unsupported‐compiler‐removing rampage. See the commit messages for details. CUDA 11 lives for now, mostly pending action on #345658, though it’s not urgent as there are other blockers to removing the slightly newer GCCs that I have to deal with first.

I don’t use DCGM, so cc @de11n for testing. It passes tests that we weren’t running before, at least.

nixpkgs-review result

Generated using nixpkgs-review.

Command: nixpkgs-review


x86_64-linux

⏩ 1 package blacklisted:
  • nixos-install-tools
✅ 12 packages built:
  • caffe
  • caffe.bin
  • cudaPackages.cudatoolkit-legacy-runfile
  • cudaPackages.cudatoolkit-legacy-runfile.doc
  • cudaPackages.cudatoolkit-legacy-runfile.lib
  • cudaPackages_11.cudatoolkit-legacy-runfile
  • cudaPackages_11.cudatoolkit-legacy-runfile.doc
  • cudaPackages_11.cudatoolkit-legacy-runfile.lib
  • dcgm
  • prometheus-dcgm-exporter
  • python311Packages.caffe
  • python311Packages.caffe.bin

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 25.05 Release Notes (or backporting 24.11 and 25.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@emilazy emilazy requested a review from a team November 20, 2024 20:31
@github-actions github-actions bot added 6.topic: python 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: documentation This PR adds or changes documentation 8.has: changelog 6.topic: cuda Parallel computing platform and API labels Nov 20, 2024
@nix-owners nix-owners bot requested a review from natsukium November 20, 2024 20:32
@emilazy emilazy changed the title cudaPackages_10{,_0,_1,_2}: drop dcgm: 3.3.5 -> 3.3.9; cudaPackages_10{,_0,_1,_2}: drop Nov 20, 2024
Copy link

@de11n de11n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, thank you for cleaning up and fixing DCGM. I had poked at it a while ago and didn't have time to push it forward. This is a great improvement.

Comment on lines 62 to 63
# gcc11 is required by DCGM's very particular build system
# C.f. https://github.com/NVIDIA/DCGM/blob/7e1012302679e4bb7496483b32dcffb56e528c92/dcgmbuild/build.sh#L22
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess gcc11 is no longer strictly required?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, I just forgot to drop this comment; it’s building with GCC 13 now. Fixed :)

@ofborg ofborg bot added the 2.status: merge conflict This PR has merge conflicts with the target branch label Nov 21, 2024
Fixes the build and matches upstream in dropping CUDA 10.

Diff: NVIDIA/DCGM@refs/tags/v3.3.5...v3.3.9
Static CUDA seems to be broken anyway, the upstream build system is
awkward and uncooperative, and it’s simpler to just patch it to
use dynamic libraries.
Just a few missing includes, really nothing too bad at all.
It’s been marked as broken for over a year and requires CUDA
10. Even the non‐CUDA variant of the package refused to evaluate
without enabling broken packages due to `cudnn`, so I’m not sure
anyone is using this package at all…
@emilazy
Copy link
Member Author

emilazy commented Nov 21, 2024

Rebased for conflicts.

Copy link
Contributor

@ConnorBaker ConnorBaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking this on!

@wegank wegank added the 12.approvals: 2 This PR was reviewed and approved by two reputable people label Nov 22, 2024
@emilazy emilazy merged commit 811c0af into NixOS:master Nov 22, 2024
18 of 19 checks passed
@emilazy emilazy deleted the push-zyuzyylslsrp branch November 22, 2024 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.status: merge conflict This PR has merge conflicts with the target branch 6.topic: cuda Parallel computing platform and API 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 6.topic: python 8.has: changelog 8.has: documentation This PR adds or changes documentation 12.approvals: 2 This PR was reviewed and approved by two reputable people
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants