You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are experiencing an issue where the "Enable core oversubscription" checkbox is not enabled in the JupyterHub spawner form, even though the SLURM configuration seems to support oversubscription. Here's the breakdown:
Environment Details:
Magic Castle Version: Provide the exact version (if known).
JupyterHub Config:
spawner_class: slurmformspawner.SlurmFormSpawner
SLURM Config (Relevant Snippet from /etc/slurm/slurm.conf):
PartitionName=DEFAULT OverSubscribe=YES
PartitionName=cpubase_bycore_b1 OverSubscribe=YES:4
AllowAccounts=ALL AllowGroups=ALL
JupyterHub Config (Relevant Snippet from /etc/jupyterhub/jupyterhub_config.json):
"spawner_class": "slurmformspawner.SlurmFormSpawner"
"submit_template_path": "/etc/jupyterhub/submit.sh"
Expected Behavior:
The "Enable core oversubscription" option should be selectable in the JupyterHub form when creating new server instances, based on the SLURM configuration allowing oversubscription.
Observed Behavior:
The "Enable core oversubscription" option remains disabled in the form (screenshot attached).
Troubleshooting Steps Taken:
SLURM Configuration: Verified that oversubscription is enabled in SLURM for the relevant partitions:
Checked /etc/slurm/slurm.conf, confirming OverSubscribe=YES is set for both DEFAULT and cpubase_bycore_b1 partitions.
Used the command scontrol show partition to ensure that the partition configuration reflects the OverSubscribe=YES setting.
JupyterHub Configuration:
Verified the JupyterHub spawner class (slurmformspawner.SlurmFormSpawner) in the jupyterhub_config.json.
Restarted the JupyterHub service to ensure any configuration changes are applied.
SLURM Jobs:
Attempted to manually submit jobs with oversubscription using srun and salloc commands, which failed due to account/partition combination issues. Verified account associations with sacctmgr and attempted to set the default partition for the user (def-sponsor00).
Puppet Configuration:
Ensured that changes to the SLURM and JupyterHub configurations are persistent by disabling Puppet for those files temporarily and monitoring the log files. Ensured that no unintended changes are being reverted by Puppet.
Attachments:
Screenshots of the JupyterHub spawner form.
Additional Notes:
We suspect there may be an issue with either the JupyterHub form configuration or how SLURM interacts with the spawner form. Guidance on enabling oversubscription in the JupyterHub interface or debugging the form behavior would be appreciated.
We are experiencing an issue where the "Enable core oversubscription" checkbox is not enabled in the JupyterHub spawner form, even though the SLURM configuration seems to support oversubscription. Here's the breakdown:
Environment Details:
Magic Castle Version: Provide the exact version (if known).
JupyterHub Config:
spawner_class: slurmformspawner.SlurmFormSpawner
SLURM Config (Relevant Snippet from /etc/slurm/slurm.conf):
PartitionName=DEFAULT OverSubscribe=YES
PartitionName=cpubase_bycore_b1 OverSubscribe=YES:4
AllowAccounts=ALL AllowGroups=ALL
JupyterHub Config (Relevant Snippet from /etc/jupyterhub/jupyterhub_config.json):
"spawner_class": "slurmformspawner.SlurmFormSpawner"
"submit_template_path": "/etc/jupyterhub/submit.sh"
Expected Behavior:
The "Enable core oversubscription" option should be selectable in the JupyterHub form when creating new server instances, based on the SLURM configuration allowing oversubscription.
Observed Behavior:
The "Enable core oversubscription" option remains disabled in the form (screenshot attached).
Troubleshooting Steps Taken:
SLURM Configuration: Verified that oversubscription is enabled in SLURM for the relevant partitions:
Checked /etc/slurm/slurm.conf, confirming OverSubscribe=YES is set for both DEFAULT and cpubase_bycore_b1 partitions.
Used the command scontrol show partition to ensure that the partition configuration reflects the OverSubscribe=YES setting.
JupyterHub Configuration:
Verified the JupyterHub spawner class (slurmformspawner.SlurmFormSpawner) in the jupyterhub_config.json.
Restarted the JupyterHub service to ensure any configuration changes are applied.
SLURM Jobs:
Attempted to manually submit jobs with oversubscription using srun and salloc commands, which failed due to account/partition combination issues. Verified account associations with sacctmgr and attempted to set the default partition for the user (def-sponsor00).
Puppet Configuration:
Ensured that changes to the SLURM and JupyterHub configurations are persistent by disabling Puppet for those files temporarily and monitoring the log files. Ensured that no unintended changes are being reverted by Puppet.
Attachments:
Screenshots of the JupyterHub spawner form.
Additional Notes:
We suspect there may be an issue with either the JupyterHub form configuration or how SLURM interacts with the spawner form. Guidance on enabling oversubscription in the JupyterHub interface or debugging the form behavior would be appreciated.
module "azure" {
source = "./azure"
config_git_url = "https://github.com/ComputeCanada/puppet-magic_castle.git"
config_version = "13.5.0"
image = {
publisher = "almalinux",
offer = "almalinux-x86_64",
sku = "9-gen2",
version = "9.4.2024050902"
}
instances = {
mgmt = { type = "Standard_B2ms", count = 1, tags = ["mgmt", "puppet", "nfs"] },
login = { type = "Standard_B2s", count = 1, tags = ["login", "public", "proxy"] },
node = { type = "Standard_B2s", count = 4, tags = ["node"] },
The text was updated successfully, but these errors were encountered: