Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install nvidia drivers and container toolkit #1

Open
wants to merge 37 commits into
base: main
Choose a base branch
from
Open

Install nvidia drivers and container toolkit #1

wants to merge 37 commits into from

Conversation

ruffsl
Copy link

@ruffsl ruffsl commented Dec 11, 2024

This patches the existing packer template for building GPU enabled AWS AMIs for RunsOn runners, specifically just for Ubuntu 24.04, the latest LTS release. This simply done by adding two additional scrips for installing Nvidia drivers and container toolkit, both necessary for GPU acceleration with containerized jobs for GitHub Action via RunsOn.

Additionally, this also customized the CI/CD infrastructure for our own use case and AWS regions of interest. This simplifies the process to maintain our custom AMIs while we wait for such options to be up streamed into RunsOn. The tests jobs are also altered to include sanity checks and verify installation of NVIDIA software on a GPU enabled instance.

TODO:

@ruffsl ruffsl deployed to ubuntu24-full-x64 December 11, 2024 20:39 — with GitHub Actions Active
@ruffsl
Copy link
Author

ruffsl commented Dec 11, 2024

Example of successfully dispatched workflow building and testing a GPU enabled AMI for RunsOn:

@@ -7,14 +7,17 @@ on:
type: string
required: true
description: 'Distribution(s) to build'
default: '["ubuntu22-full-x64", "ubuntu22-full-arm64", "ubuntu24-full-x64", "ubuntu24-full-arm64"]'
# default: '["ubuntu22-full-x64", "ubuntu22-full-arm64", "ubuntu24-full-x64", "ubuntu24-full-arm64"]'
default: '["ubuntu24-full-x64"]'
# schedule:
# - cron: '0 8 */15 * *'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@biofotis or @jamescnewman , if you're good with this patch, then I may recommend re-enabling the schedule event before merging into the default branch of our fork, so that our GPU AMI's are kept up to date until we can swap back to upstream RunsOn AMIs with GPU support. If you make the commit to change this line, then you'll also be notified scheduled workflows to keep tabs.

Notifications for scheduled workflows are sent to the user who last modified the cron syntax in the workflow file. For more information, see Notifications for workflow runs.

Copy link

@jamescnewman jamescnewman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should discuss offline in Slack before finalising and update this PR with the result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants