-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
torch.distributed.elastic.multiprocessing.errors.ChildFailedError #173
Comments
我的环境配置:
|
@MyGitHub-G 我看图片里的报错是没有指明work-dir这个参数 |
|
这个应该不是没有指明work-dir参数的问题,代码里面如果没有指明会有默认目录创建。这个问题貌似是程序运行的问题,我如果不用sh运行,直接运行train.py,会报错core dump |
请问你解决了吗?我在训练完一轮,接着测试就报错了
|
你好,我运行
![image](https://private-user-images.githubusercontent.com/71758483/373106679-8d4855df-16ff-4c5a-ba65-1eff1779290c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxNjUyNjQsIm5iZiI6MTczOTE2NDk2NCwicGF0aCI6Ii83MTc1ODQ4My8zNzMxMDY2NzktOGQ0ODU1ZGYtMTZmZi00YzVhLWJhNjUtMWVmZjE3NzkyOTBjLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEwVDA1MjI0NFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTE3YjE0MDRmOGNiYjUwMGUxODVjOGVhNzA2ZjUwMGExMTlmYTFhNjhlMDgwMDNlYWI2NmVkMjNmN2I0ZTVhMjQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.4UQVvDlZWWH8uuFFAO7l5Umz4_rzBJzSHBzxKslbnhw)
sh tools/dist_train.sh projects/configs/co_dino_vit/co_dino_5scale_vit_large_coco.py 1
时,会如下错误,我看之前也有人报这个错,请问这个问题如何解决?谢谢The text was updated successfully, but these errors were encountered: