How to run it in multi-GPU setting without slurm #14

ShawnKing98 · 2023-12-03T06:48:40Z

Hi,

I tried to run the fine_tune.py script on my lab's server, which is just a normal 4-GPU Ubuntu station without slurm support. When I ran it without distributed training setup, everything was okay. Then I tried to switch to multi-GPU setting and somehow I just couldn't get it work. I have tried the following ways and none of them seemed to work:

accelerate config and accelerate launch fine_tune.py --py_args, which gave me the following error while initializing the accelerator object:
```
ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable RANK expected, but not set
```
torchrun fine_tune.py --py_args, which gave me the same error as method 1.
Write another shell script which calls the fine_tune_pascal.sh script 4 times and passes in different $SLURM_ARRAY_TASK_ID, which seems not to be the correct way since every process claimed to be the main process and I guess they were just generating replicate things.

Could you help me out with this? I'm pretty sure my accelerate library setting is okay since I'm able to run their official toy example. Is that because the codes inside if __name__ == "__main__": block is not fully wrapped as a main() function, as instructed by huggingface accelerate? Should I wrap it again?

The text was updated successfully, but these errors were encountered:

ShawnKing98 · 2023-12-15T18:00:43Z

I'll appreciate it if someone could see this and give me some suggestion

ShawnKing98 · 2024-02-13T20:43:47Z

Have you solved that??

Unfortunately no

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run it in multi-GPU setting without slurm #14

How to run it in multi-GPU setting without slurm #14

ShawnKing98 commented Dec 3, 2023 •

edited

Loading

ShawnKing98 commented Dec 15, 2023

ShawnKing98 commented Feb 13, 2024

How to run it in multi-GPU setting without slurm #14

How to run it in multi-GPU setting without slurm #14

Comments

ShawnKing98 commented Dec 3, 2023 • edited Loading

ShawnKing98 commented Dec 15, 2023

ShawnKing98 commented Feb 13, 2024

ShawnKing98 commented Dec 3, 2023 •

edited

Loading