mixtral multi node #193

kumagai6 · 2024-04-27T14:50:05Z

I was able to successfully train the Mixtral code on a single node with 8 GPUs by reducing the size, but when I switched to multiple nodes, I noticed that the loss per iteration does not decrease compared to the single node setup. Is there something wrong with the multi-node configuration?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mixtral multi node #193

mixtral multi node #193

kumagai6 commented Apr 27, 2024

mixtral multi node #193

mixtral multi node #193

Comments

kumagai6 commented Apr 27, 2024