Skip to content

Commit ac656ed

Browse files
committed
add support for CPU and MPS
do not use distributed when not available, instead use CPU or MPS. This entails a few changes: --device is now a valid flag to the library since `ilab` can pass CPU, MPS, or default to cuda when using CPU or MPS, do not initialize DS, instead put the model on the device and initialize `Adafactor` optimizer which is more efficient and than Adam based one inside of `train` add logic for handling if torch.cuda.is_available and torch.distributed.is_initialized() we dont use distributed torch on consumer systems the train loop needs some custom step and loss logic for a LlamaForCausalLM model, add that in when using CPU or MPS we are always world_size == 1 and local_rank == 0 Signed-off-by: Charlie Doern <[email protected]>
1 parent 0de1e36 commit ac656ed

File tree

5 files changed

+206
-95
lines changed

5 files changed

+206
-95
lines changed

src/instructlab/training/__init__.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@
2222

2323

2424
# defer import of main_ds
25-
def run_training(torch_args: TorchrunArgs, train_args: TrainingArgs) -> None:
25+
def run_training(torch_args: TorchrunArgs, train_args: TrainingArgs, device: str) -> None:
2626
"""Wrapper around the main training job that calls torchrun."""
2727
# Local
2828
from .main_ds import run_training
2929

30-
return run_training(torch_args=torch_args, train_args=train_args)
30+
return run_training(torch_args=torch_args, train_args=train_args, device=device)

0 commit comments

Comments
 (0)