-
Notifications
You must be signed in to change notification settings - Fork 5
/
charmcraft.yaml
154 lines (126 loc) · 3.8 KB
/
charmcraft.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
# Copyright 2020-2024 Omnivector Solutions, LLC
# See LICENSE file for licensing details.
name: slurmctld
summary: |
Slurmctld, the central management daemon of Slurm.
description: |
This charm provides slurmctld, munged, and the bindings to other utilities
that make lifecycle operations a breeze.
slurmctld is the central management daemon of SLURM. It monitors all other
SLURM daemons and resources, accepts work (jobs), and allocates resources
to those jobs. Given the critical functionality of slurmctld, there may be
a backup server to assume these functions in the event that the primary
server fails.
links:
contact: https://matrix.to/#/#hpc:ubuntu.com
issues:
- https://github.com/charmed-hpc/slurm-charms/issues
source:
- https://github.com/charmed-hpc/slurm-charms
requires:
slurmd:
interface: slurmd
slurmdbd:
interface: slurmdbd
slurmrestd:
interface: slurmrestd
login-node:
interface: sackd
provides:
cos-agent:
interface: cos_agent
limit: 1
peers:
slurmctld-peer:
interface: slurmctld-peer
assumes:
- juju
type: charm
base: [email protected]
platforms:
amd64:
parts:
charm:
charm-binary-python-packages:
- cryptography ~= 44.0.0
- pydantic
config:
options:
cluster-name:
type: string
default: "osd-cluster"
description: |
Name to be recorded in database for jobs from this cluster.
This is important if a single database is used to record information from
multiple Slurm-managed clusters.
default-partition:
type: string
default: ""
description: |
Default Slurm partition. This is only used if defined, and must match an
existing partition.
slurm-conf-parameters:
type: string
default: ""
description: |
User supplied Slurm configuration as a multiline string.
Example usage:
$ juju config slurmcltd slurm-conf-parameters="$(cat additional.conf)"
cgroup-parameters:
type: string
default: ""
description: |
User supplied configuration for `cgroup.conf`.
health-check-params:
default: ""
type: string
description: |
Extra parameters for NHC command.
This option can be used to customize how NHC is called, e.g. to send an
e-mail to an admin when NHC detects an error set this value to.
`-M [email protected]`.
health-check-interval:
default: 600
type: int
description: Interval in seconds between executions of the Health Check.
health-check-state:
default: "ANY,CYCLE"
type: string
description: Only run the Health Check on nodes in this state.
actions:
show-current-config:
description: |
Display the currently used `slurm.conf`.
Example usage:
```bash
juju run slurmctld/leader show-current-config \
--quiet --format=json | jq .[].results.slurm.conf | xargs -I % -0 python3 -c 'print(%)'
```
drain:
description: |
Drain specified nodes.
Example usage:
$ juju run slurmctld/leader drain nodename="node-[1,2]" reason="Updating kernel"
params:
nodename:
type: string
description: The nodes to drain, using the Slurm format, e.g. `"node-[1,2]"`.
reason:
type: string
description: Reason to drain the nodes.
required:
- nodename
- reason
resume:
description: |
Resume specified nodes.
Note: Newly added nodes will remain in the `down` state until configured,
with the `node-configured` action.
Example usage: $ juju run slurmctld/leader resume nodename="node-[1,2]"
params:
nodename:
type: string
description: |
The nodes to resume, using the Slurm format, e.g. `"node-[1,2]"`.
required:
- nodename