Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train.py: error: unrecognized arguments: --local-rank=0 #134

Open
davidvct opened this issue Jan 23, 2024 · 6 comments
Open

train.py: error: unrecognized arguments: --local-rank=0 #134

davidvct opened this issue Jan 23, 2024 · 6 comments

Comments

@davidvct
Copy link

davidvct commented Jan 23, 2024

Encounter this error when trying to train GoPro datasets:
python -m torch.distributed.launch --nproc_per_node=1 --master_port=4321 train.py -opt options/train/GoPro/NAFNet-width32.yml --launcher pytorch

I searched the train.py, there is no --local-rank=0.

How to fix?

@txy00001
Copy link

在train里添加
image

@sentinel8b
Copy link

sentinel8b commented Apr 24, 2024

Change

parser.add_argument('--local_rank', type=int, default=0)

To

parser.add_argument('--local-rank', type=int, default=0)

And I didn't add

os.environ['RANK'] = str(0)

@rp7sv
Copy link

rp7sv commented May 10, 2024

Change

parser.add_argument('--local_rank', type=int, default=0)

To

parser.add_argument('--local-rank', type=int, default=0)

And I didn't add

os.environ['RANK'] = str(0)

thanks,when i try to use torchrun it reported:”can not open python:no such file“,when i follow your change,it works!

@tobymuller233
Copy link

Change

parser.add_argument('--local_rank', type=int, default=0)

To

parser.add_argument('--local-rank', type=int, default=0)

And I didn't add

os.environ['RANK'] = str(0)

It seems that "local-rank" with a "-" in the middle instead of "_" doesn't follow the naming rule in Python.
I'm trying to debug a multi GPU program in vscode and config launch.json as followed:
{ "version": "0.2.0", "configurations": [ { "name": "Debug Distributed Training (GPU 0)", "type": "debugpy", "request": "launch", "program": "${workspaceFolder}/train.py", "console": "integratedTerminal", "args": [ "~/stu_motion/scrfd/configs/scrfd/scrfd_1g.py", "--launcher", "pytorch", ], "env": { "PYTHONPATH": "${workspaceFolder}/..:${env:PYTHONPATH}", "MASTER_ADDR": "127.0.0.1", "MASTER_PORT": "29500", "WORLD_SIZE": "2", "RANK": "0" }, "pythonArgs": [ "-m", "torch.distributed.launch", "--nproc_per_node=2", "--master_port=29500" ] }, { "name": "Debug Distributed Training (GPU 1)", "type": "debugpy", "request": "launch", "program": "${workspaceFolder}/train.py", "console": "integratedTerminal", "args": [ "~/stu_motion/scrfd/configs/scrfd/scrfd_1g.py", "--launcher", "pytorch", ], "env": { "PYTHONPATH": "${workspaceFolder}/..:${env:PYTHONPATH}", "MASTER_ADDR": "127.0.0.1", "MASTER_PORT": "29500", "WORLD_SIZE": "2", "RANK": "1" }, "pythonArgs": [ "-m", "torch.distributed.launch", "--nproc_per_node=2", "--master_port=29500" ] } ] }
I have no idea about whether it's true or not, but it turns out that the program failed to run correctly.

@dr-smgad
Copy link

Hi,

I had this issue when I tried to debug from VSCode by setting the "module":"torch.distributed.launch" in my launch.json, and I was getting this unrecognized argument --local-rank=0 error as my Python file didn't expect it (not part of the args). It turned out that you need to set "--use-env" as the first arg in launch.json "args":["--use-env", <continue other args here>] and torch.distributed.launch will stop automatically adding this argument.

I hope it helps

@Yolo1-gguo
Copy link

usage: train.py [-h] -opt OPT [--launcher {none,pytorch,slurm}] [--local-rank LOCAL_RANK]
[--input_path INPUT_PATH] [--output_path OUTPUT_PATH]
train.py: error: unrecognized arguments: --local_rank=0 pytorch 我应该怎么解决?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants