You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was able to successfully train the Mixtral code on a single node with 8 GPUs by reducing the size, but when I switched to multiple nodes, I noticed that the loss per iteration does not decrease compared to the single node setup. Is there something wrong with the multi-node configuration?
The text was updated successfully, but these errors were encountered:
I was able to successfully train the Mixtral code on a single node with 8 GPUs by reducing the size, but when I switched to multiple nodes, I noticed that the loss per iteration does not decrease compared to the single node setup. Is there something wrong with the multi-node configuration?
The text was updated successfully, but these errors were encountered: