You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the text we recommend waiting for up to 10 minutes.
In the script we wait for 60 seconds.
Could we reach out to Xavier, for example, on how he deals with this in his end to end scripts?
Maybe there is a way to probe the VM every 30 seconds or so?
I hope for a better solution than 'wait for an indeterminate amount of time'
The text was updated successfully, but these errors were encountered:
Blurb for John:
"The GPU driver installer doesn't work properly and keeps rebooting the machine after it claims it is finished. This means there is no robust way to check when the VM is actually ready to use. That means we can't build a reliable end-to-end solution. Options are that we get MS to fix their system or we have to break the script into two parts and discuss the reason we can't do end-to-end scripting."
Info sent to Phil from MS support. We don't think this is very good at all!
I got an update from my PG team, the behaviour what you see is expected. The provisioningstate as “success” return is due to the short time window in which installation must actually succeed and constraints of being able to work around reboots. The installation can have up to 3 steps (2 reboots) depending on the requirements of the VM. Giving adequate time for the installation to finish post-reboots or tailing the log file for success is currently the only way this complicated multi-step installation can be handled with VM extensions.
manasa@Azure:~$ az vm extension list --resource-group manasa-cyclecloud --vm-name nctest4 -o table
Name ProvisioningState Publisher Version AutoUpgradeMinorVersion
-------------------- ------------------- -------------------- --------- -------------------------
NvidiaGpuDriverLinux Succeeded Microsoft.HpcCompute 1.3 True
Please let us know if you have any further questions!
In the text we recommend waiting for up to 10 minutes.
In the script we wait for 60 seconds.
Could we reach out to Xavier, for example, on how he deals with this in his end to end scripts?
Maybe there is a way to probe the VM every 30 seconds or so?
I hope for a better solution than 'wait for an indeterminate amount of time'
The text was updated successfully, but these errors were encountered: