Skip to content
This repository has been archived by the owner on Aug 9, 2024. It is now read-only.

Commit

Permalink
James longdev
Browse files Browse the repository at this point in the history
* Improve the way slurmd sends slurmctld its node-config
* Add an action, `node-config` to get and set unit level  node-configuration
* Add `partition-config` charm configuration that allows an operator to set
partition configuration
* Consolidate the yaml files into `charmcraft.yaml`
* Remove unused code
* Remove slurm-ops-manager
* Replace nhc resource with nhc in build process in charmcraft.yaml
* Remove dependencies on slurmdbd
* rename interface slurmd -> slurmctld
* update readme
* remove fluentbit
* replace machine.py with slurmd -C
  • Loading branch information
jamesbeedy committed Jun 27, 2024
1 parent 39bbea5 commit 2916b7d
Show file tree
Hide file tree
Showing 31 changed files with 2,746 additions and 1,206 deletions.
20 changes: 16 additions & 4 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: woke
uses: get-woke/woke-action@v0
with:
Expand All @@ -35,18 +35,29 @@ jobs:
runs-on: ubuntu-22.04
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Install dependencies
run: python3 -m pip install tox
- name: Run linters
run: tox -e lint

type:
name: Type check with pyright
runs-on: ubuntu-22.04
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install dependencies
run: python3 -m pip install tox
- name: Run pyright
run: tox -e type

unit-test:
name: Unit tests
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Install dependencies
run: python3 -m pip install tox
- name: Run tests
Expand All @@ -63,10 +74,11 @@ jobs:
needs:
- inclusive-naming-check
- lint
- type
- unit-test
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Setup operator environment
uses: charmed-kubernetes/actions-operator@main
with:
Expand Down
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,6 @@ __pycache__/
*.py[cod]
.idea
.vscode/
version

# Disable woke checking for nhc.conf.tmpl
src/templates/nhc.conf.tmpl
34 changes: 27 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,33 @@ This operator should be used with Juju 3.x or greater.
```shell
$ juju deploy slurmctld --channel edge
$ juju deploy slurmd --channel edge
$ juju deploy slurmdbd --channel edge
$ juju deploy mysql --channel 8.0/edge
$ juju deploy mysql-router slurmdbd-mysql-router --channel dpe/edge
$ juju integrate slurmctld:slurmd slurmd:slurmd
$ juju integrate slurmdbd-mysql-router:backend-database mysql:database
$ juju integrate slurmdbd:database slurmdbd-mysql-router:database
$ juju integrate slurmctld:slurmdbd slurmdbd:slurmdbd
$ juju integrate slurmctld:slurmd slurmd:slurmctld
```

### Operations
This charm hardens and simplifies operations by codifying common administration operations as charm actions.

#### Partition Configuration
Specify partition parameters using the charm configuration, `partition-config`.

##### Use the `partition-config` to set custom partition parameters.
```bash
$ juju config slurmd partition-config="State=INACTIVE"
```

#### Node Configuration Parameters
You can get and set the node configuration using the `node-config` action.

##### Use the `node-config` action to get the node configuration for the unit.
```bash
$ juju run --quiet slurmd/0 node-config --format json | jq ".[].results.node.config"
"NodeName=juju-462521-4 NodeAddr=10.240.222.28 State=UNKNOWN RealMemory=64012 CPUs=12 ThreadsPerCore=2 CoresPerSocket=6 SocketsPerBoard=1"
```

##### Use the `node-config` action to set a custom weight value for the node.
```bash
$ juju run --quiet slurmd/0 node-config parameters="Weight=5000" --format json | jq ".[].results.node.config"
"NodeName=juju-462521-4 NodeAddr=10.240.222.28 State=UNKNOWN RealMemory=64012 CPUs=12 ThreadsPerCore=2 CoresPerSocket=6 SocketsPerBoard=1 Weight=5000"
```

## Project & Community
Expand Down
15 changes: 0 additions & 15 deletions actions.yaml

This file was deleted.

108 changes: 88 additions & 20 deletions charmcraft.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,29 @@
# Copyright 2020 Omnivector, LLC
# See LICENSE file for licensing details.

name: slurmd
type: charm

summary: |
Slurmd, the compute node daemon of Slurm.
description: |
This charm provides slurmd, munged, and the bindings to other utilities
that make lifecycle operations a breeze.
slurmd is the compute node daemon of SLURM. It monitors all tasks running
on the compute node, accepts work (tasks), launches tasks, and kills
running tasks upon request.
links:
contact: https://matrix.to/#/#hpc:ubuntu.com

issues:
- https://github.com/charmed-hpc/slurmd-operator/issues

source:
- https://github.com/charmed-hpc/slurmd-operator

assumes:
- juju

bases:
- build-on:
- name: ubuntu
Expand All @@ -10,25 +32,71 @@ bases:
- name: ubuntu
channel: "22.04"
architectures: [amd64]
- name: centos
channel: "7"
architectures: [amd64]

parts:
charm:
build-packages: [git]
charm-python-packages: [setuptools]

# Create a version file and pack it into the charm. This is dynamically generated
# as part of the build process for a charm to ensure that the git revision of the
# charm is always recorded in this version file.
version-file:
plugin: nil
build-packages:
- git
- wget
override-build: |
VERSION=$(git -C $CRAFT_PART_SRC/../../charm/src describe --dirty --always)
echo "Setting version to $VERSION"
echo $VERSION > $CRAFT_PART_INSTALL/version
stage:
- version
wget https://github.com/mej/nhc/releases/download/1.4.3/lbnl-nhc-1.4.3.tar.gz
craftctl default
provides:
slurmctld:
interface: slurmd
limit: 1

config:
options:
partition-config:
type: string
default: ""
description: >
Additional partition configuration parameters, specified as a space separated `key=value`
in a single line. Find a list of all possible partition configuration parameters
[here](https://slurm.schedmd.com/slurm.conf.html#SECTION_PARTITION-CONFIGURATION).
Example usage:
```bash
$ juju config slurmd partition-config="DefaultTime=45:00 MaxTime=1:00:00"
```

nhc-conf:
default: ""
type: string
description: >
Multiline string.
These lines are appended to the `nhc.conf` maintained by the charm.
Example usage:
```bash
$ juju config slurmd nhc-conf="$(cat extra-nhc.conf)"
```
actions:
node-configured:
description: Remove a node from DownNodes when the reason is `New node`.

node-config:
description: >
Set or return node configuration parameters.
To get the current node configuration for this unit:
``bash
$ juju run slurmd/0 node-parameters
```
To set node level configuration parameters for the unit `slurmd/0`:
``bash
$ juju run slurmd/0 node-config parameters="Weight=200 Gres=gpu:tesla:1,gpu:kepler:1,bandwidth:lustre:no_consume:4G"
```
params:
parameters:
type: string
description: >
Node configuration parameter as defined [here](https://slurm.schedmd.com/slurm.conf.html#SECTION_NODE-CONFIGURATION).
show-nhc-config:
description: Display `nhc.conf`.
40 changes: 0 additions & 40 deletions config.yaml

This file was deleted.

41 changes: 4 additions & 37 deletions dispatch
Original file line number Diff line number Diff line change
@@ -1,44 +1,11 @@
#!/bin/bash
# This hook installs the dependencies needed to run the charm,
# creates the dispatch executable, regenerates the symlinks for start and
# upgrade-charm, and kicks off the operator framework.

set -e

# Source the os-release information into the env
. /etc/os-release

if ! [[ -f '.installed' ]]
then
if [[ $ID == 'centos' ]]
then
# Install dependencies and build custom python
yum -y install epel-release
yum -y install wget gcc make tar bzip2-devel zlib-devel xz-devel openssl-devel libffi-devel sqlite-devel ncurses-devel

export PYTHON_VERSION=3.8.16
wget https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tar.xz -P /tmp
tar xvf /tmp/Python-${PYTHON_VERSION}.tar.xz -C /tmp
cd /tmp/Python-${PYTHON_VERSION}
./configure --enable-optimizations
make -C /tmp/Python-${PYTHON_VERSION} -j $(nproc) altinstall
cd $OLDPWD
rm -rf /tmp/Python*

elif [[ $ID == 'ubuntu' ]]
then
# Necessary to compile and install NHC
apt-get install --assume-yes make
fi
touch .installed
fi

# set the correct python bin path
if [[ $ID == "centos" ]]
then
PYTHON_BIN="/usr/bin/env python3.8"
else
PYTHON_BIN="/usr/bin/env python3"
# Necessary to compile and install NHC
apt-get install --assume-yes make
touch .installed
fi

JUJU_DISPATCH_PATH="${JUJU_DISPATCH_PATH:-$0}" PYTHONPATH=lib:venv $PYTHON_BIN ./src/charm.py
JUJU_DISPATCH_PATH="${JUJU_DISPATCH_PATH:-$0}" PYTHONPATH=lib:venv /usr/bin/env python3 ./src/charm.py
Loading

0 comments on commit 2916b7d

Please sign in to comment.