Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Oversubscription Option Not Available in JupyterHub Spawner #324

Open
OscarDiez opened this issue Sep 7, 2024 · 0 comments
Open

Comments

@OscarDiez
Copy link

OscarDiez commented Sep 7, 2024

We are experiencing an issue where the "Enable core oversubscription" checkbox is not enabled in the JupyterHub spawner form, even though the SLURM configuration seems to support oversubscription. Here's the breakdown:

Environment Details:

Magic Castle Version: Provide the exact version (if known).
JupyterHub Config:
spawner_class: slurmformspawner.SlurmFormSpawner
SLURM Config (Relevant Snippet from /etc/slurm/slurm.conf):
PartitionName=DEFAULT OverSubscribe=YES
PartitionName=cpubase_bycore_b1 OverSubscribe=YES:4
AllowAccounts=ALL AllowGroups=ALL
JupyterHub Config (Relevant Snippet from /etc/jupyterhub/jupyterhub_config.json):
"spawner_class": "slurmformspawner.SlurmFormSpawner"
"submit_template_path": "/etc/jupyterhub/submit.sh"
Expected Behavior:

The "Enable core oversubscription" option should be selectable in the JupyterHub form when creating new server instances, based on the SLURM configuration allowing oversubscription.
Observed Behavior:

The "Enable core oversubscription" option remains disabled in the form (screenshot attached).
Troubleshooting Steps Taken:

SLURM Configuration: Verified that oversubscription is enabled in SLURM for the relevant partitions:
Checked /etc/slurm/slurm.conf, confirming OverSubscribe=YES is set for both DEFAULT and cpubase_bycore_b1 partitions.
Used the command scontrol show partition to ensure that the partition configuration reflects the OverSubscribe=YES setting.

JupyterHub Configuration:
Verified the JupyterHub spawner class (slurmformspawner.SlurmFormSpawner) in the jupyterhub_config.json.
Restarted the JupyterHub service to ensure any configuration changes are applied.

SLURM Jobs:
Attempted to manually submit jobs with oversubscription using srun and salloc commands, which failed due to account/partition combination issues. Verified account associations with sacctmgr and attempted to set the default partition for the user (def-sponsor00).

Puppet Configuration:
Ensured that changes to the SLURM and JupyterHub configurations are persistent by disabling Puppet for those files temporarily and monitoring the log files. Ensured that no unintended changes are being reverted by Puppet.

Attachments:
Screenshots of the JupyterHub spawner form.

Additional Notes:
We suspect there may be an issue with either the JupyterHub form configuration or how SLURM interacts with the spawner form. Guidance on enabling oversubscription in the JupyterHub interface or debugging the form behavior would be appreciated.

module "azure" {
source = "./azure"
config_git_url = "https://github.com/ComputeCanada/puppet-magic_castle.git"
config_version = "13.5.0"

image = {
publisher = "almalinux",
offer = "almalinux-x86_64",
sku = "9-gen2",
version = "9.4.2024050902"
}

instances = {
mgmt = { type = "Standard_B2ms", count = 1, tags = ["mgmt", "puppet", "nfs"] },
login = { type = "Standard_B2s", count = 1, tags = ["login", "public", "proxy"] },
node = { type = "Standard_B2s", count = 4, tags = ["node"] },

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant