You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There exists code for using the SLURM scheduler as a tracker for distributed training, but it was removed as an option from submit.py some time ago.
Lately I've been training XGBoost using an MPI cluster and while I haven't been able to get the mpi tracker to work, re-instating the SLURM tracker seems to work, after I made some changes to the command being called.
So would the community consider adding back SLURM as an option or is it supposed to be superseded by the mpi tracker now? In that case has anyone gotten the MPI tracker to train XGBoost recently?
The text was updated successfully, but these errors were encountered:
There exists code for using the SLURM scheduler as a tracker for distributed training, but it was removed as an option from
submit.py
some time ago.Lately I've been training XGBoost using an MPI cluster and while I haven't been able to get the
mpi
tracker to work, re-instating the SLURM tracker seems to work, after I made some changes to the command being called.So would the community consider adding back SLURM as an option or is it supposed to be superseded by the
mpi
tracker now? In that case has anyone gotten the MPI tracker to train XGBoost recently?The text was updated successfully, but these errors were encountered: