Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloudwatch command11ExternalLoginNodeDeconfigure is incorrect. #300

Open
gwolski opened this issue Jan 18, 2025 · 0 comments
Open

cloudwatch command11ExternalLoginNodeDeconfigure is incorrect. #300

gwolski opened this issue Jan 18, 2025 · 0 comments

Comments

@gwolski
Copy link

gwolski commented Jan 18, 2025

After configuring a cluster, CloudFormation Outputs provides command11ExternalLoginNodeDeconfigure. Here is an entry from one of my stacks:

sudo /opt/slurm/tsi3/config/bin/external_login_nodes_deconfigure.sh && sudo umount /opt/slurm/tsi3

Unfortunately, the first shell script is really:

/opt/slurm/tsi3/config/bin/external_login_node_deconfigure.sh

"node" vs "nodes"...

In addition, the ansible playbook seems to do the umount:

TASK [ParallelClusterExternalLoginNodeDeconfigure : Unmount /opt/slurm/tsi3] *******************************************************************************************************************************************
changed: [local]

TASK [ParallelClusterExternalLoginNodeDeconfigure : Show umount results] ***********************************************************************************************************************************************
ok: [local] => 
  msg: |-
    umount_results: {'changed': True, 'stdout': 'Mount point is hung. Source has already been deleted.\n/opt/slurm/tsi3 is not a mountpoint\n/opt/slurm/tsi3 already unmounted.', 'stderr': "+ timeout 1s /opt/slurm/tsi3\ntimeout: failed to run command ‘/opt/slurm/tsi3’: Permission denied\n+ echo 'Mount point is hung. Source has already been deleted.'\n+ umount -lf /opt/slurm/tsi3\n+ mountpoint /opt/slurm/tsi3\n+ echo '/opt/slurm/tsi3 already unmounted.'\n+ exit 0", 'rc': 0, 'cmd': 'set -ex\n\n# Handle case where cluster was already deleted so the mountpoint is hung\nif ! timeout 1s /opt/slurm/tsi3; then\n    echo "Mount point is hung. Source has already been deleted."\n    umount -lf /opt/slurm/tsi3\nfi\nif ! mountpoint /opt/slurm/tsi3; then\n    echo "/opt/slurm/tsi3 already unmounted."\n    exit 0\nfi\numount /opt/slurm/tsi3 || lsof /opt/slurm/tsi3\n', 'start': '2025-01-18 12:20:57.957280', 'end': '2025-01-18 12:20:59.566132', 'delta': '0:00:01.608852', 'msg': '', 'stdout_lines': ['Mount point is hung. Source has already been deleted.', '/opt/slurm/tsi3 is not a mountpoint', '/opt/slurm/tsi3 already unmounted.'], 'stderr_lines': ['+ timeout 1s /opt/slurm/tsi3', 'timeout: failed to run command ‘/opt/slurm/tsi3’: Permission denied', "+ echo 'Mount point is hung. Source has already been deleted.'", '+ umount -lf /opt/slurm/tsi3', '+ mountpoint /opt/slurm/tsi3', "+ echo '/opt/slurm/tsi3 already unmounted.'", '+ exit 0'], 'failed': False}

yet, the CloudFormation Output command11 has a sudo umount of same mount point so that command fails.
Also not sure how to interpret all the info that comes from ansible w.r.t. the "Show umount results" section?

Might be nice if you document a bit more what this deconfigure does? for some reason it copies the config to /tmp/_config and then deletes it? Why?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant