Running node-configured
action on slurmd
can cause Uncaught SlurmOpsError in charm code: command ['systemctl', 'reload-or-restart', 'slurmd'] failed
#63
Labels
bug
Something isn't working
needs triage
Needs further investigation to determine cause and/or work required to implement fix/feature
Bug Description
On a minimally deployed Charmed HPC cluster with single slurmctld and slurmd units, running
juju run slurmd/0 node-configured
can result in error:Uncaught SlurmOpsError in charm code: command ['systemctl', 'reload-or-restart', 'slurmd'] failed
See attached video.
error.webm
To Reproduce
Running on a bootstrapped AWS controller with an empty model:
juju deploy --channel latest/edge --base [email protected] slurmctld
juju deploy --channel latest/edge --base [email protected] slurmd --constraints="instance-type=g4dn.xlarge"
juju integrate slurmctld:slurmd slurmd:slurmctld
juju run slurmd/0 node-configured
until error.Environment
Deploying latest/edge on AWS. The slurmd instance on a GPU-enabled
g4dn.xlarge
node.Relevant log output
Additional context
No response
The text was updated successfully, but these errors were encountered: