Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Run Nvidia workflow on UC22 #217

Merged
merged 11 commits into from
Jan 25, 2025
26 changes: 20 additions & 6 deletions .github/workflows/nvidia-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,19 +49,33 @@ jobs:

- name: Create Testflinger job queue
run: |
envsubst '$JOB_QUEUE' \
< $TESTFLINGER_DIR/nvidia-job.yaml \
> $TESTFLINGER_DIR/nvidia-job.temp

targetDistros=("noble" "core22-latest")

for DISTRO in ${targetDistros[@]}; do
export DISTRO

envsubst '$JOB_QUEUE $DISTRO' \
< $TESTFLINGER_DIR/nvidia-job.yaml \
> $TESTFLINGER_DIR/nvidia-job-"$DISTRO".temp

mv $TESTFLINGER_DIR/nvidia-job-"$DISTRO".temp $TESTFLINGER_DIR/nvidia-job-"$DISTRO".yaml
done

envsubst '$SNAP_CHANNEL' \
< $TESTFLINGER_DIR/scripts/setup.sh \
> $TESTFLINGER_DIR/scripts/setup.temp

mv $TESTFLINGER_DIR/nvidia-job.temp $TESTFLINGER_DIR/nvidia-job.yaml
mv $TESTFLINGER_DIR/scripts/setup.temp $TESTFLINGER_DIR/scripts/setup.sh

- name: Submit Testflinger job
- name: Submit Testflinger job for Ubuntu 24.04 (Noble)
uses: canonical/testflinger/.github/actions/submit@main
with:
poll: true
job-path: ${{ env.TESTFLINGER_DIR }}/nvidia-job-noble.yaml

- name: Submit Testflinger job for Ubuntu Core 22
uses: canonical/testflinger/.github/actions/submit@main
with:
poll: true
job-path: ${{ env.TESTFLINGER_DIR }}/nvidia-job.yaml
job-path: ${{ env.TESTFLINGER_DIR }}/nvidia-job-core22-latest.yaml
11 changes: 7 additions & 4 deletions .github/workflows/testflinger/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,17 @@ The tests run on devices within Canonical's test farm.
## Run locally
Running the tests locally is only possible if your machine has access to the Testflinger server.

Export the following variables:
Export the needed variables, for example:
```bash
export JOB_QUEUE=<queue> SNAP_CHANNEL=<channel>
export JOB_QUEUE=docker-nvidia SNAP_CHANNEL=latest/edge DISTRO=noble
```
Tested distros:
- `noble`
- `core22-latest`

Then, modify the files:
```bash
envsubst '$JOB_QUEUE' < nvidia-job.yaml > temp-job.yaml
envsubst '$JOB_QUEUE $DISTRO' < nvidia-job.yaml > temp-job.yaml

envsubst '$SNAP_CHANNEL' < scripts/setup.sh > scripts/temp-setup.sh

Expand All @@ -25,4 +28,4 @@ sed -i "s|.github/workflows/testflinger/||" temp-job.yaml
Finally, submit the job:
```bash
testflinger submit --poll temp-job.yaml
```
```
20 changes: 13 additions & 7 deletions .github/workflows/testflinger/nvidia-job.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ job_queue: $JOB_QUEUE
global_timeout: 3600
output_timeout: 1800
provision_data:
distro: "noble"
distro: $DISTRO
test_data:

# Copy files from the GH runner to the Testflinger Agent
Expand All @@ -22,15 +22,21 @@ test_data:

SCRIPTS=./attachments/test/scripts

echo "Testing: DEVICE_IP = $DEVICE_IP"
# Setup the environment on the target device
# On Ubuntu Core, kernel, core, snapd snaps get refreshed right after first boot,
# causing unexpected errors and triggering a reboot
while ! ssh ubuntu@$DEVICE_IP "$(< $SCRIPTS/check-snap-changes.sh)"; do
echo "Wait for ssh server and/or snap changes..."
sleep 30
done

ssh ubuntu@$DEVICE_IP "$(< $SCRIPTS/setup.sh)"

# Reboot the device in background to avoid breaking the SSH connection prematurely
ssh ubuntu@$DEVICE_IP "(sleep 3 && sudo reboot) &"

echo "Wait for the device to boot and start its SSH server"
$SCRIPTS/wait_for_port.sh $DEVICE_IP 22

# Run the tests
while ! ssh ubuntu@$DEVICE_IP "sudo docker version"; do
echo "Wait for ssh server and/or Docker daemon..."
sleep 30
done

ssh ubuntu@$DEVICE_IP "$(< $SCRIPTS/test.sh)"
32 changes: 32 additions & 0 deletions .github/workflows/testflinger/scripts/check-snap-changes.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/usr/bin/env bash

# This script is adapted from
# https://github.com/canonical/hwcert-jenkins-tools/blob/c5cf512d968100db90998abe61c474de0be681ca/scriptlets/check_for_snap_changes

echo "Get snap changes"

# list the snap changes on the device and store the output in a temp file
OUTPUT=$(mktemp)
snap changes > $OUTPUT

RESULT=$?
if [ ! "$RESULT" -eq 0 ]; then exit $RESULT; fi

# tail -n +2: remove the header
# awk 'NF {print $2}': print the second column on non-empty lines (i.e. the status)
# grep -q -E "...": succeed when changes are still ongoing or pending
cat $OUTPUT | \
tail -n +2 | \
awk 'NF {print $2}' | \
grep -q -E "\b(Doing|Undoing|Wait|Do|Undo)\b"

if [ "$?" -eq 0 ]; then
# changes are still ongoing or pending: display output as a diagnostic
cat "$OUTPUT" | grep -E "\b(Doing|Undoing|Wait|Do|Undo)\b"
rm "$OUTPUT"

exit 1
fi

echo "No ongoing or pending snap changes"
rm "$OUTPUT"
2 changes: 1 addition & 1 deletion .github/workflows/testflinger/scripts/setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ install_docker() {
sudo snap install docker --channel="$DOCKER_SNAP_CHANNEL"

# check the installation
docker --version || exit 1
sudo docker --version || exit 1
}

setup_classic() {
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/testflinger/scripts/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ smi_test() {
if [[ $ID == "ubuntu" ]]; then
sudo docker run --rm --runtime=nvidia --gpus all --env PATH="${PATH}:/var/lib/snapd/hostfs/usr/bin" ubuntu nvidia-smi || exit 1
elif [[ $ID == "ubuntu-core" ]]; then
sudo docker run --rm --runtime nvidia --gpus all -it ubuntu bash -c "/snap/docker/*/graphics/bin/nvidia-smi" || exit 1
sudo docker run --rm --runtime nvidia --gpus all ubuntu bash -c "/snap/docker/*/graphics/bin/nvidia-smi" || exit 1
else
echo "Unexpected operating system ID: $ID"
exit 1
Expand Down
11 changes: 0 additions & 11 deletions .github/workflows/testflinger/scripts/wait_for_port.sh

This file was deleted.

Loading