Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when launching training within the notebook nuplan_framework.ipynb #333

Closed
CBeaune opened this issue Jun 26, 2023 · 4 comments · May be fixed by #407
Closed

Error when launching training within the notebook nuplan_framework.ipynb #333

CBeaune opened this issue Jun 26, 2023 · 4 comments · May be fixed by #407
Assignees

Comments

@CBeaune
Copy link

CBeaune commented Jun 26, 2023

Hello,
I'm a newcomer in the nuplan framework so I apologize if these are simple issues, but I did not find any closed issues involving these.
I'm trying to run the tutorials starting by the nuplan_framework.ipynb but i get the following issue when trying to launch the training within the notebook, it seems related to the loading of the pretrained model:

AttributeError: Error instantiating 'nuplan.planning.training.modeling.models.raster_model.RasterModel' : module 'torch' has no attribute 'frombuffer'

It seems that the torch attribute is not reachable. Is it an issue with the torch version?
Mine is '1.9.0+cu111' as written in the requirements_torch.txt of the devkit.

Any help is welcome!
Thanks

@ZhaoYangbjtu
Copy link

I have same problem whenI use version nuplan-devkit-v1.2-release . All the installation steps are successful. stack trace is below; seem like another lib safetensors need newer version of torch. Is there anything wrong inside requirments.txt or requirments_torch.txt?


AttributeError Traceback (most recent call last)
File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/_internal/instantiate/_instantiate2.py:62, in _call_target(target, *args, **kwargs)
60 v._set_parent(None)
---> 62 return target(*args, **kwargs)
63 except Exception as e:

File ~/workspace/nuplan-devkit/nuplan/planning/training/modeling/models/raster_model.py:58, in RasterModel.init(self, feature_builders, target_builders, model_name, pretrained, num_input_channels, num_features_per_pose, future_trajectory_sampling)
57 num_output_features = future_trajectory_sampling.num_poses * num_features_per_pose
---> 58 self._model = timm.create_model(model_name, pretrained=pretrained, num_classes=0, in_chans=num_input_channels)
59 mlp = torch.nn.Linear(in_features=self._model.num_features, out_features=num_output_features)

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/timm/models/_factory.py:114, in create_model(model_name, pretrained, pretrained_cfg, pretrained_cfg_overlay, checkpoint_path, scriptable, exportable, no_jit, **kwargs)
113 with set_layer_config(scriptable=scriptable, exportable=exportable, no_jit=no_jit):
--> 114 model = create_fn(
115 pretrained=pretrained,
116 pretrained_cfg=pretrained_cfg,
117 pretrained_cfg_overlay=pretrained_cfg_overlay,
118 **kwargs,
119 )
121 if checkpoint_path:

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/timm/models/resnet.py:1276, in resnet50(pretrained, **kwargs)
1275 model_args = dict(block=Bottleneck, layers=[3, 4, 6, 3], **kwargs)
-> 1276 return _create_resnet('resnet50', pretrained, **dict(model_args, **kwargs))

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/timm/models/resnet.py:547, in _create_resnet(variant, pretrained, **kwargs)
546 def _create_resnet(variant, pretrained=False, **kwargs):
--> 547 return build_model_with_cfg(ResNet, variant, pretrained, **kwargs)

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/timm/models/_builder.py:393, in build_model_with_cfg(model_cls, variant, pretrained, pretrained_cfg, pretrained_cfg_overlay, model_cfg, feature_cfg, pretrained_strict, pretrained_filter_fn, kwargs_filter, **kwargs)
392 if pretrained:
--> 393 load_pretrained(
394 model,
395 pretrained_cfg=pretrained_cfg,
396 num_classes=num_classes_pretrained,
397 in_chans=kwargs.get('in_chans', 3),
398 filter_fn=pretrained_filter_fn,
399 strict=pretrained_strict,
400 )
402 # Wrap the model in a feature extraction module if enabled

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/timm/models/_builder.py:186, in load_pretrained(model, pretrained_cfg, num_classes, in_chans, filter_fn, strict)
185 else:
--> 186 state_dict = load_state_dict_from_hf(pretrained_loc)
187 else:

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/timm/models/_hub.py:183, in load_state_dict_from_hf(model_id, filename)
180 _logger.info(
181 f"[{model_id}] Safe alternative available for '{filename}' "
182 f"(as '{safe_filename}'). Loading weights using safetensors.")
--> 183 return safetensors.torch.load_file(cached_safe_file, device="cpu")
184 except EntryNotFoundError:

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/safetensors/torch.py:261, in load_file(filename, device)
260 for k in f.keys():
--> 261 result[k] = f.get_tensor(k)
262 return result

AttributeError: module 'torch' has no attribute 'frombuffer'

During handling of the above exception, another exception occurred:

AttributeError Traceback (most recent call last)
Cell In[6], line 4
1 from nuplan.planning.script.run_training import main as main_train
3 # Run the training loop, optionally inspect training artifacts through tensorboard (above cell)
----> 4 main_train(cfg)

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/main.py:44, in main..main_decorator..decorated_main(cfg_passthrough)
41 @functools.wraps(task_function)
42 def decorated_main(cfg_passthrough: Optional[DictConfig] = None) -> Any:
43 if cfg_passthrough is not None:
---> 44 return task_function(cfg_passthrough)
45 else:
46 args = get_args_parser()

File ~/workspace/nuplan-devkit/nuplan/planning/script/run_training.py:59, in main(cfg)
56 if cfg.py_func == 'train':
57 # Build training engine
58 with ProfilerContextManager(cfg.output_dir, cfg.enable_profiling, "build_training_engine"):
---> 59 engine = build_training_engine(cfg, worker)
61 # Run training
62 logger.info('Starting training...')

File ~/workspace/nuplan-devkit/nuplan/planning/training/experiments/training.py:44, in build_training_engine(cfg, worker)
41 logger.info('Building training engine...')
43 # Create model
---> 44 torch_module_wrapper = build_torch_module_wrapper(cfg.model)
46 # Build the datamodule
47 datamodule = build_lightning_datamodule(cfg, worker, torch_module_wrapper)

File ~/workspace/nuplan-devkit/nuplan/planning/script/builders/model_builder.py:19, in build_torch_module_wrapper(cfg)
13 """
14 Builds the NN module.
15 :param cfg: DictConfig. Configuration that is used to run the experiment.
16 :return: Instance of TorchModuleWrapper.
17 """
18 logger.info('Building TorchModuleWrapper...')
---> 19 model = instantiate(cfg)
20 validate_type(model, TorchModuleWrapper)
21 logger.info('Building TorchModuleWrapper...DONE!')

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/_internal/instantiate/_instantiate2.py:180, in instantiate(config, *args, **kwargs)
177 recursive = config.pop(_Keys.RECURSIVE, True)
178 convert = config.pop(_Keys.CONVERT, ConvertMode.NONE)
--> 180 return instantiate_node(config, *args, recursive=recursive, convert=convert)
181 else:
182 raise InstantiationException(
183 "Top level config has to be OmegaConf DictConfig, plain dict, or a Structured Config class or instance"
184 )

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/_internal/instantiate/_instantiate2.py:249, in instantiate_node(node, convert, recursive, *args)
245 value = instantiate_node(
246 value, convert=convert, recursive=recursive
247 )
248 kwargs[key] = _convert_node(value, convert)
--> 249 return _call_target(target, *args, **kwargs)
250 else:
251 # If ALL or PARTIAL non structured, instantiate in dict and resolve interpolations eagerly.
252 if convert == ConvertMode.ALL or (
253 convert == ConvertMode.PARTIAL and node._metadata.object_type is None
254 ):

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/_internal/instantiate/_instantiate2.py:64, in _call_target(target, *args, **kwargs)
62 return target(*args, **kwargs)
63 except Exception as e:
---> 64 raise type(e)(
65 f"Error instantiating '{_convert_target_to_string(target)}' : {e}"
66 ).with_traceback(sys.exc_info()[2])

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/_internal/instantiate/_instantiate2.py:62, in _call_target(target, *args, **kwargs)
59 if OmegaConf.is_config(v):
60 v._set_parent(None)
---> 62 return target(*args, **kwargs)
63 except Exception as e:
64 raise type(e)(
65 f"Error instantiating '{_convert_target_to_string(target)}' : {e}"
66 ).with_traceback(sys.exc_info()[2])

File ~/workspace/nuplan-devkit/nuplan/planning/training/modeling/models/raster_model.py:58, in RasterModel.init(self, feature_builders, target_builders, model_name, pretrained, num_input_channels, num_features_per_pose, future_trajectory_sampling)
51 super().init(
52 feature_builders=feature_builders,
53 target_builders=target_builders,
54 future_trajectory_sampling=future_trajectory_sampling,
55 )
57 num_output_features = future_trajectory_sampling.num_poses * num_features_per_pose
---> 58 self._model = timm.create_model(model_name, pretrained=pretrained, num_classes=0, in_chans=num_input_channels)
59 mlp = torch.nn.Linear(in_features=self._model.num_features, out_features=num_output_features)
61 if hasattr(self._model, 'classifier'):

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/timm/models/_factory.py:114, in create_model(model_name, pretrained, pretrained_cfg, pretrained_cfg_overlay, checkpoint_path, scriptable, exportable, no_jit, **kwargs)
112 create_fn = model_entrypoint(model_name)
113 with set_layer_config(scriptable=scriptable, exportable=exportable, no_jit=no_jit):
--> 114 model = create_fn(
115 pretrained=pretrained,
116 pretrained_cfg=pretrained_cfg,
117 pretrained_cfg_overlay=pretrained_cfg_overlay,
118 **kwargs,
119 )
121 if checkpoint_path:
122 load_checkpoint(model, checkpoint_path)

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/timm/models/resnet.py:1276, in resnet50(pretrained, **kwargs)
1273 """Constructs a ResNet-50 model.
1274 """
1275 model_args = dict(block=Bottleneck, layers=[3, 4, 6, 3], **kwargs)
-> 1276 return _create_resnet('resnet50', pretrained, **dict(model_args, **kwargs))

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/timm/models/resnet.py:547, in _create_resnet(variant, pretrained, **kwargs)
546 def _create_resnet(variant, pretrained=False, **kwargs):
--> 547 return build_model_with_cfg(ResNet, variant, pretrained, **kwargs)

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/timm/models/_builder.py:393, in build_model_with_cfg(model_cls, variant, pretrained, pretrained_cfg, pretrained_cfg_overlay, model_cfg, feature_cfg, pretrained_strict, pretrained_filter_fn, kwargs_filter, **kwargs)
391 num_classes_pretrained = 0 if features else getattr(model, 'num_classes', kwargs.get('num_classes', 1000))
392 if pretrained:
--> 393 load_pretrained(
394 model,
395 pretrained_cfg=pretrained_cfg,
396 num_classes=num_classes_pretrained,
397 in_chans=kwargs.get('in_chans', 3),
398 filter_fn=pretrained_filter_fn,
399 strict=pretrained_strict,
400 )
402 # Wrap the model in a feature extraction module if enabled
403 if features:

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/timm/models/_builder.py:186, in load_pretrained(model, pretrained_cfg, num_classes, in_chans, filter_fn, strict)
184 state_dict = load_state_dict_from_hf(*pretrained_loc)
185 else:
--> 186 state_dict = load_state_dict_from_hf(pretrained_loc)
187 else:
188 model_name = pretrained_cfg.get('architecture', 'this model')

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/timm/models/_hub.py:183, in load_state_dict_from_hf(model_id, filename)
179 cached_safe_file = hf_hub_download(repo_id=hf_model_id, filename=safe_filename, revision=hf_revision)
180 _logger.info(
181 f"[{model_id}] Safe alternative available for '{filename}' "
182 f"(as '{safe_filename}'). Loading weights using safetensors.")
--> 183 return safetensors.torch.load_file(cached_safe_file, device="cpu")
184 except EntryNotFoundError:
185 pass

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/safetensors/torch.py:261, in load_file(filename, device)
259 with safe_open(filename, framework="pt", device=device) as f:
260 for k in f.keys():
--> 261 result[k] = f.get_tensor(k)
262 return result

AttributeError: Error instantiating 'nuplan.planning.training.modeling.models.raster_model.RasterModel' : module 'torch' has no attribute 'frombuffer'

@michael-motional michael-motional self-assigned this Jun 26, 2023
@michael-motional
Copy link

michael-motional commented Jun 26, 2023

Could you try locking timm==0.6.7 here and rebuild the environment from scratch -- we use pytorch 1.9 which I think isn't compatible with recent (or maybe any) versions of safetensor.

@CBeaune
Copy link
Author

CBeaune commented Jun 27, 2023

It worked when changing the required timm version!
I had an issue with the train loss being nan after 1st epoch but resolved it as mentionned in #91
Thanks for the help !

@CBeaune CBeaune closed this as completed Jun 27, 2023
@michael-motional
Copy link

Nice, we'll address whenever the next release is made

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants