!sh scripts/run_car.sh throws erros !!!! #49

dellshan · 2023-03-05T04:37:23Z

/usr/local/lib/python3.8/site-packages/torch/distributed/launch.py:180: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects --local_rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

warnings.warn(
WARNING:torch.distributed.run:

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

/usr/local/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
warnings.warn(
/usr/local/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
warnings.warn(
/usr/local/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
warnings.warn(
/usr/local/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
warnings.warn(
/usr/local/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: /usr/local/lib/python3.8/site-packages/torchvision/image.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKSs
warn(f"Failed to load image Python extension: {e}")
/usr/local/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: /usr/local/lib/python3.8/site-packages/torchvision/image.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKSs
warn(f"Failed to load image Python extension: {e}")
/usr/local/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: /usr/local/lib/python3.8/site-packages/torchvision/image.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKSs
warn(f"Failed to load image Python extension: {e}")
/usr/local/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: /usr/local/lib/python3.8/site-packages/torchvision/image.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKSs
warn(f"Failed to load image Python extension: {e}")
Load config from yml file: configs/car.yml
Loading configs from configs/car.yml
Load config from yml file: configs/car.ymlLoad config from yml file: configs/car.yml
Loading configs from configs/car.yml
Load config from yml file: configs/car.yml
Loading configs from configs/car.yml

Loading configs from configs/car.yml
{'checkpoint_dir': 'results/car', 'save_checkpoint_freq': 500, 'keep_num_checkpoint': 2, 'use_logger': True, 'log_freq': 100, 'joint_train': False, 'independent': False, 'reset_weight': True, 'save_results': True, 'num_stage': 4, 'flip1_cfg': [False, False, False, False], 'flip3_cfg': [False, False, False, False], 'stage_len_dict': {'step1': 700, 'step2': 700, 'step3': 600}, 'stage_len_dict2': {'step1': 200, 'step2': 500, 'step3': 400}, 'image_size': 128, 'load_gt_depth': False, 'img_list_path': 'data/car/list.txt', 'img_root': 'data/car', 'latent_root': 'data/car/latents', 'model_name': 'gan2shape_car', 'category': 'car', 'share_weight': True, 'relative_enc': False, 'use_mask': True, 'add_mean_L': True, 'add_mean_V': True, 'min_depth': 0.9, 'max_depth': 1.1, 'xyz_rotation_range': 60, 'xy_translation_range': 0.1, 'z_translation_range': 0, 'collect_iters': 100, 'batchsize': 8, 'lr': 0.0001, 'lam_perc': 0.5, 'lam_smooth': 0.01, 'lam_regular': 0.01, 'view_mvn_path': 'checkpoints/view_light/view_mvn.pth', 'light_mvn_path': 'checkpoints/view_light/light_mvn.pth', 'rand_light': [-1, 1, -0.2, 0.8, -0.1, 0.6, -0.6], 'channel_multiplier': 2, 'gan_size': 512, 'gan_ckpt': 'checkpoints/stylegan2/stylegan2-car-config-f.pt', 'F1_d': 2, 'rot_center_depth': 1.0, 'fov': 10, 'tex_cube_size': 2, 'config': 'configs/car.yml', 'seed': 0, 'num_workers': 4, 'distributed': True}
{'checkpoint_dir': 'results/car', 'save_checkpoint_freq': 500, 'keep_num_checkpoint': 2, 'use_logger': True, 'log_freq': 100, 'joint_train': False, 'independent': False, 'reset_weight': True, 'save_results': True, 'num_stage': 4, 'flip1_cfg': [False, False, False, False], 'flip3_cfg': [False, False, False, False], 'stage_len_dict': {'step1': 700, 'step2': 700, 'step3': 600}, 'stage_len_dict2': {'step1': 200, 'step2': 500, 'step3': 400}, 'image_size': 128, 'load_gt_depth': False, 'img_list_path': 'data/car/list.txt', 'img_root': 'data/car', 'latent_root': 'data/car/latents', 'model_name': 'gan2shape_car', 'category': 'car', 'share_weight': True, 'relative_enc': False, 'use_mask': True, 'add_mean_L': True, 'add_mean_V': True, 'min_depth': 0.9, 'max_depth': 1.1, 'xyz_rotation_range': 60, 'xy_translation_range': 0.1, 'z_translation_range': 0, 'collect_iters': 100, 'batchsize': 8, 'lr': 0.0001, 'lam_perc': 0.5, 'lam_smooth': 0.01, 'lam_regular': 0.01, 'view_mvn_path': 'checkpoints/view_light/view_mvn.pth', 'light_mvn_path': 'checkpoints/view_light/light_mvn.pth', 'rand_light': [-1, 1, -0.2, 0.8, -0.1, 0.6, -0.6], 'channel_multiplier': 2, 'gan_size': 512, 'gan_ckpt': 'checkpoints/stylegan2/stylegan2-car-config-f.pt', 'F1_d': 2, 'rot_center_depth': 1.0, 'fov': 10, 'tex_cube_size': 2, 'config': 'configs/car.yml', 'seed': 0, 'num_workers': 4, 'distributed': True}
{'checkpoint_dir': 'results/car', 'save_checkpoint_freq': 500, 'keep_num_checkpoint': 2, 'use_logger': True, 'log_freq': 100, 'joint_train': False, 'independent': False, 'reset_weight': True, 'save_results': True, 'num_stage': 4, 'flip1_cfg': [False, False, False, False], 'flip3_cfg': [False, False, False, False], 'stage_len_dict': {'step1': 700, 'step2': 700, 'step3': 600}, 'stage_len_dict2': {'step1': 200, 'step2': 500, 'step3': 400}, 'image_size': 128, 'load_gt_depth': False, 'img_list_path': 'data/car/list.txt', 'img_root': 'data/car', 'latent_root': 'data/car/latents', 'model_name': 'gan2shape_car', 'category': 'car', 'share_weight': True, 'relative_enc': False, 'use_mask': True, 'add_mean_L': True, 'add_mean_V': True, 'min_depth': 0.9, 'max_depth': 1.1, 'xyz_rotation_range': 60, 'xy_translation_range': 0.1, 'z_translation_range': 0, 'collect_iters': 100, 'batchsize': 8, 'lr': 0.0001, 'lam_perc': 0.5, 'lam_smooth': 0.01, 'lam_regular': 0.01, 'view_mvn_path': 'checkpoints/view_light/view_mvn.pth', 'light_mvn_path': 'checkpoints/view_light/light_mvn.pth', 'rand_light': [-1, 1, -0.2, 0.8, -0.1, 0.6, -0.6], 'channel_multiplier': 2, 'gan_size': 512, 'gan_ckpt': 'checkpoints/stylegan2/stylegan2-car-config-f.pt', 'F1_d': 2, 'rot_center_depth': 1.0, 'fov': 10, 'tex_cube_size': 2, 'config': 'configs/car.yml', 'seed': 0, 'num_workers': 4, 'distributed': True}
{'checkpoint_dir': 'results/car', 'save_checkpoint_freq': 500, 'keep_num_checkpoint': 2, 'use_logger': True, 'log_freq': 100, 'joint_train': False, 'independent': False, 'reset_weight': True, 'save_results': True, 'num_stage': 4, 'flip1_cfg': [False, False, False, False], 'flip3_cfg': [False, False, False, False], 'stage_len_dict': {'step1': 700, 'step2': 700, 'step3': 600}, 'stage_len_dict2': {'step1': 200, 'step2': 500, 'step3': 400}, 'image_size': 128, 'load_gt_depth': False, 'img_list_path': 'data/car/list.txt', 'img_root': 'data/car', 'latent_root': 'data/car/latents', 'model_name': 'gan2shape_car', 'category': 'car', 'share_weight': True, 'relative_enc': False, 'use_mask': True, 'add_mean_L': True, 'add_mean_V': True, 'min_depth': 0.9, 'max_depth': 1.1, 'xyz_rotation_range': 60, 'xy_translation_range': 0.1, 'z_translation_range': 0, 'collect_iters': 100, 'batchsize': 8, 'lr': 0.0001, 'lam_perc': 0.5, 'lam_smooth': 0.01, 'lam_regular': 0.01, 'view_mvn_path': 'checkpoints/view_light/view_mvn.pth', 'light_mvn_path': 'checkpoints/view_light/light_mvn.pth', 'rand_light': [-1, 1, -0.2, 0.8, -0.1, 0.6, -0.6], 'channel_multiplier': 2, 'gan_size': 512, 'gan_ckpt': 'checkpoints/stylegan2/stylegan2-car-config-f.pt', 'F1_d': 2, 'rot_center_depth': 1.0, 'fov': 10, 'tex_cube_size': 2, 'config': 'configs/car.yml', 'seed': 0, 'num_workers': 4, 'distributed': True}
Setting up Perceptual loss...
/usr/local/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/usr/local/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG16_Weights.IMAGENET1K_V1`. You can also use `weights=VGG16_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
Setting up Perceptual loss...
/usr/local/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/usr/local/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG16_Weights.IMAGENET1K_V1`. You can also use `weights=VGG16_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
Setting up Perceptual loss...
/usr/local/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/usr/local/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG16_Weights.IMAGENET1K_V1`. You can also use `weights=VGG16_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
Setting up Perceptual loss...
/usr/local/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/usr/local/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG16_Weights.IMAGENET1K_V1`. You can also use `weights=VGG16_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
Loading model from: /content/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/weights/v0.1/vgg.pth
Loading model from: /content/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/weights/v0.1/vgg.pth
Traceback (most recent call last):
File "run.py", line 31, in
trainer = Trainer(cfgs, GAN2Shape)
File "/content/GAN2Shape/gan2shape/trainer.py", line 23, in init
self.model = model(cfgs)
File "/content/GAN2Shape/gan2shape/model.py", line 88, in init
self.PerceptualLoss = PerceptualLoss(
File "/content/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/init.py", line 21, in init
self.model.initialize(model=model, net=net, use_gpu=use_gpu, colorspace=colorspace, spatial=self.spatial, gpu_ids=gpu_ids)
File "/content/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/dist_model.py", line 75, in initialize
self.net.load_state_dict(torch.load(model_path, kw), strict=False)
File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 795, in load
return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args)
File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 1012, in _legacy_load
result = unpickler.load()
File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 958, in persistent_load
wrap_storage=restore_location(obj, location),
File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 1055, in restore_location
return default_restore_location(storage, str(map_location))
File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 215, in default_restore_location
result = fn(storage, location)
File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 182, in _cuda_deserialize
device = validate_cuda_device(location)
File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 173, in validate_cuda_device
raise RuntimeError('Attempting to deserialize object on CUDA device '
RuntimeError: Attempting to deserialize object on CUDA device 2 but torch.cuda.device_count() is 1. Please use torch.load with map_location to map your storages to an existing device.
Loading model from: /content/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/weights/v0.1/vgg.pth
Traceback (most recent call last):
File "run.py", line 31, in
trainer = Trainer(cfgs, GAN2Shape)
File "/content/GAN2Shape/gan2shape/trainer.py", line 23, in init
self.model = model(cfgs)
File "/content/GAN2Shape/gan2shape/model.py", line 88, in init
self.PerceptualLoss = PerceptualLoss(
File "/content/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/init.py", line 21, in init
self.model.initialize(model=model, net=net, use_gpu=use_gpu, colorspace=colorspace, spatial=self.spatial, gpu_ids=gpu_ids)
File "/content/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/dist_model.py", line 75, in initialize
self.net.load_state_dict(torch.load(model_path, kw), strict=False)
File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 795, in load
return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args)
File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 1012, in _legacy_load
result = unpickler.load()
File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 958, in persistent_load
wrap_storage=restore_location(obj, location),
File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 1055, in restore_location
return default_restore_location(storage, str(map_location))
File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 215, in default_restore_location
result = fn(storage, location)
File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 182, in _cuda_deserialize
device = validate_cuda_device(location)
File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 173, in validate_cuda_device
raise RuntimeError('Attempting to deserialize object on CUDA device '
RuntimeError: Attempting to deserialize object on CUDA device 1 but torch.cuda.device_count() is 1. Please use torch.load with map_location to map your storages to an existing device.
Loading model from: /content/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/weights/v0.1/vgg.pth
Traceback (most recent call last):
File "run.py", line 31, in
trainer = Trainer(cfgs, GAN2Shape)
File "/content/GAN2Shape/gan2shape/trainer.py", line 23, in init
self.model = model(cfgs)
File "/content/GAN2Shape/gan2shape/model.py", line 88, in init
self.PerceptualLoss = PerceptualLoss(
File "/content/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/init.py", line 21, in init
self.model.initialize(model=model, net=net, use_gpu=use_gpu, colorspace=colorspace, spatial=self.spatial, gpu_ids=gpu_ids)
File "/content/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/dist_model.py", line 75, in initialize
self.net.load_state_dict(torch.load(model_path, kw), strict=False)
File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 795, in load
return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args)
File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 1012, in _legacy_load
result = unpickler.load()
File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 958, in persistent_load
wrap_storage=restore_location(obj, location),
File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 1055, in restore_location
return default_restore_location(storage, str(map_location))
File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 215, in default_restore_location
result = fn(storage, location)
File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 182, in _cuda_deserialize
device = validate_cuda_device(location)
File "/usr/local/lib/python3.8/site-packages/torch/serialization.py", line 173, in validate_cuda_device
raise RuntimeError('Attempting to deserialize object on CUDA device '
RuntimeError: Attempting to deserialize object on CUDA device 3 but torch.cuda.device_count() is 1. Please use torch.load with map_location to map your storages to an existing device.
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 42184 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 42185) of binary: /usr/local/bin/python
Traceback (most recent call last):
File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.8/site-packages/torch/distributed/launch.py", line 195, in
main()
File "/usr/local/lib/python3.8/site-packages/torch/distributed/launch.py", line 191, in main
launch(args)
File "/usr/local/lib/python3.8/site-packages/torch/distributed/launch.py", line 176, in launch
run(args)
File "/usr/local/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/usr/local/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

run.py FAILED

Failures:
[1]:
time : 2023-03-05_04:31:36
host : 1cdb853b957d
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 42186)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2023-03-05_04:31:36
host : 1cdb853b957d
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 42187)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2023-03-05_04:31:36
host : 1cdb853b957d
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 42185)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

!sh scripts/run_car.sh throws erros !!!! #49

!sh scripts/run_car.sh throws erros !!!! #49

dellshan commented Mar 5, 2023

!sh scripts/run_car.sh throws erros !!!! #49

!sh scripts/run_car.sh throws erros !!!! #49

Comments

dellshan commented Mar 5, 2023

run.py FAILED

Root Cause (first observed failure): [0]: time : 2023-03-05_04:31:36 host : 1cdb853b957d rank : 1 (local_rank: 1) exitcode : 1 (pid: 42185) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2023-03-05_04:31:36
host : 1cdb853b957d
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 42185)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html