Error when launching training within the notebook nuplan_framework.ipynb #333

CBeaune · 2023-06-26T09:20:46Z

Hello,
I'm a newcomer in the nuplan framework so I apologize if these are simple issues, but I did not find any closed issues involving these.
I'm trying to run the tutorials starting by the nuplan_framework.ipynb but i get the following issue when trying to launch the training within the notebook, it seems related to the loading of the pretrained model:

AttributeError: Error instantiating 'nuplan.planning.training.modeling.models.raster_model.RasterModel' : module 'torch' has no attribute 'frombuffer'

It seems that the torch attribute is not reachable. Is it an issue with the torch version?
Mine is '1.9.0+cu111' as written in the requirements_torch.txt of the devkit.

Any help is welcome!
Thanks

The text was updated successfully, but these errors were encountered:

ZhaoYangbjtu · 2023-06-26T11:50:23Z

I have same problem whenI use version nuplan-devkit-v1.2-release . All the installation steps are successful. stack trace is below; seem like another lib safetensors need newer version of torch. Is there anything wrong inside requirments.txt or requirments_torch.txt?

AttributeError Traceback (most recent call last)
File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/_internal/instantiate/_instantiate2.py:62, in _call_target(target, *args, **kwargs)
60 v._set_parent(None)
---> 62 return target(*args, **kwargs)
63 except Exception as e:

File ~/workspace/nuplan-devkit/nuplan/planning/training/modeling/models/raster_model.py:58, in RasterModel.init(self, feature_builders, target_builders, model_name, pretrained, num_input_channels, num_features_per_pose, future_trajectory_sampling)
57 num_output_features = future_trajectory_sampling.num_poses * num_features_per_pose
---> 58 self._model = timm.create_model(model_name, pretrained=pretrained, num_classes=0, in_chans=num_input_channels)
59 mlp = torch.nn.Linear(in_features=self._model.num_features, out_features=num_output_features)

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/timm/models/_factory.py:114, in create_model(model_name, pretrained, pretrained_cfg, pretrained_cfg_overlay, checkpoint_path, scriptable, exportable, no_jit, **kwargs)
113 with set_layer_config(scriptable=scriptable, exportable=exportable, no_jit=no_jit):
--> 114 model = create_fn(
115 pretrained=pretrained,
116 pretrained_cfg=pretrained_cfg,
117 pretrained_cfg_overlay=pretrained_cfg_overlay,
118 **kwargs,
119 )
121 if checkpoint_path:

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/timm/models/resnet.py:1276, in resnet50(pretrained, **kwargs)
1275 model_args = dict(block=Bottleneck, layers=[3, 4, 6, 3], **kwargs)
-> 1276 return _create_resnet('resnet50', pretrained, **dict(model_args, **kwargs))

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/timm/models/resnet.py:547, in _create_resnet(variant, pretrained, **kwargs)
546 def _create_resnet(variant, pretrained=False, **kwargs):
--> 547 return build_model_with_cfg(ResNet, variant, pretrained, **kwargs)

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/timm/models/_builder.py:393, in build_model_with_cfg(model_cls, variant, pretrained, pretrained_cfg, pretrained_cfg_overlay, model_cfg, feature_cfg, pretrained_strict, pretrained_filter_fn, kwargs_filter, **kwargs)
392 if pretrained:
--> 393 load_pretrained(
394 model,
395 pretrained_cfg=pretrained_cfg,
396 num_classes=num_classes_pretrained,
397 in_chans=kwargs.get('in_chans', 3),
398 filter_fn=pretrained_filter_fn,
399 strict=pretrained_strict,
400 )
402 # Wrap the model in a feature extraction module if enabled

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/timm/models/_builder.py:186, in load_pretrained(model, pretrained_cfg, num_classes, in_chans, filter_fn, strict)
185 else:
--> 186 state_dict = load_state_dict_from_hf(pretrained_loc)
187 else:

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/timm/models/_hub.py:183, in load_state_dict_from_hf(model_id, filename)
180 _logger.info(
181 f"[{model_id}] Safe alternative available for '{filename}' "
182 f"(as '{safe_filename}'). Loading weights using safetensors.")
--> 183 return safetensors.torch.load_file(cached_safe_file, device="cpu")
184 except EntryNotFoundError:

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/safetensors/torch.py:261, in load_file(filename, device)
260 for k in f.keys():
--> 261 result[k] = f.get_tensor(k)
262 return result

AttributeError: module 'torch' has no attribute 'frombuffer'

During handling of the above exception, another exception occurred:

AttributeError Traceback (most recent call last)
Cell In[6], line 4
1 from nuplan.planning.script.run_training import main as main_train
3 # Run the training loop, optionally inspect training artifacts through tensorboard (above cell)
----> 4 main_train(cfg)

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/main.py:44, in main..main_decorator..decorated_main(cfg_passthrough)
41 @functools.wraps(task_function)
42 def decorated_main(cfg_passthrough: Optional[DictConfig] = None) -> Any:
43 if cfg_passthrough is not None:
---> 44 return task_function(cfg_passthrough)
45 else:
46 args = get_args_parser()

File ~/workspace/nuplan-devkit/nuplan/planning/script/run_training.py:59, in main(cfg)
56 if cfg.py_func == 'train':
57 # Build training engine
58 with ProfilerContextManager(cfg.output_dir, cfg.enable_profiling, "build_training_engine"):
---> 59 engine = build_training_engine(cfg, worker)
61 # Run training
62 logger.info('Starting training...')

File ~/workspace/nuplan-devkit/nuplan/planning/training/experiments/training.py:44, in build_training_engine(cfg, worker)
41 logger.info('Building training engine...')
43 # Create model
---> 44 torch_module_wrapper = build_torch_module_wrapper(cfg.model)
46 # Build the datamodule
47 datamodule = build_lightning_datamodule(cfg, worker, torch_module_wrapper)

File ~/workspace/nuplan-devkit/nuplan/planning/script/builders/model_builder.py:19, in build_torch_module_wrapper(cfg)
13 """
14 Builds the NN module.
15 :param cfg: DictConfig. Configuration that is used to run the experiment.
16 :return: Instance of TorchModuleWrapper.
17 """
18 logger.info('Building TorchModuleWrapper...')
---> 19 model = instantiate(cfg)
20 validate_type(model, TorchModuleWrapper)
21 logger.info('Building TorchModuleWrapper...DONE!')

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/_internal/instantiate/_instantiate2.py:180, in instantiate(config, *args, **kwargs)
177 recursive = config.pop(_Keys.RECURSIVE, True)
178 convert = config.pop(_Keys.CONVERT, ConvertMode.NONE)
--> 180 return instantiate_node(config, *args, recursive=recursive, convert=convert)
181 else:
182 raise InstantiationException(
183 "Top level config has to be OmegaConf DictConfig, plain dict, or a Structured Config class or instance"
184 )

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/_internal/instantiate/_instantiate2.py:249, in instantiate_node(node, convert, recursive, *args)
245 value = instantiate_node(
246 value, convert=convert, recursive=recursive
247 )
248 kwargs[key] = _convert_node(value, convert)
--> 249 return _call_target(target, *args, **kwargs)
250 else:
251 # If ALL or PARTIAL non structured, instantiate in dict and resolve interpolations eagerly.
252 if convert == ConvertMode.ALL or (
253 convert == ConvertMode.PARTIAL and node._metadata.object_type is None
254 ):

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/_internal/instantiate/_instantiate2.py:64, in _call_target(target, *args, **kwargs)
62 return target(*args, **kwargs)
63 except Exception as e:
---> 64 raise type(e)(
65 f"Error instantiating '{_convert_target_to_string(target)}' : {e}"
66 ).with_traceback(sys.exc_info()[2])

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/_internal/instantiate/_instantiate2.py:62, in _call_target(target, *args, **kwargs)
59 if OmegaConf.is_config(v):
60 v._set_parent(None)
---> 62 return target(*args, **kwargs)
63 except Exception as e:
64 raise type(e)(
65 f"Error instantiating '{_convert_target_to_string(target)}' : {e}"
66 ).with_traceback(sys.exc_info()[2])

File ~/workspace/nuplan-devkit/nuplan/planning/training/modeling/models/raster_model.py:58, in RasterModel.init(self, feature_builders, target_builders, model_name, pretrained, num_input_channels, num_features_per_pose, future_trajectory_sampling)
51 super().init(
52 feature_builders=feature_builders,
53 target_builders=target_builders,
54 future_trajectory_sampling=future_trajectory_sampling,
55 )
57 num_output_features = future_trajectory_sampling.num_poses * num_features_per_pose
---> 58 self._model = timm.create_model(model_name, pretrained=pretrained, num_classes=0, in_chans=num_input_channels)
59 mlp = torch.nn.Linear(in_features=self._model.num_features, out_features=num_output_features)
61 if hasattr(self._model, 'classifier'):

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/timm/models/_factory.py:114, in create_model(model_name, pretrained, pretrained_cfg, pretrained_cfg_overlay, checkpoint_path, scriptable, exportable, no_jit, **kwargs)
112 create_fn = model_entrypoint(model_name)
113 with set_layer_config(scriptable=scriptable, exportable=exportable, no_jit=no_jit):
--> 114 model = create_fn(
115 pretrained=pretrained,
116 pretrained_cfg=pretrained_cfg,
117 pretrained_cfg_overlay=pretrained_cfg_overlay,
118 **kwargs,
119 )
121 if checkpoint_path:
122 load_checkpoint(model, checkpoint_path)

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/timm/models/resnet.py:1276, in resnet50(pretrained, **kwargs)
1273 """Constructs a ResNet-50 model.
1274 """
1275 model_args = dict(block=Bottleneck, layers=[3, 4, 6, 3], **kwargs)
-> 1276 return _create_resnet('resnet50', pretrained, **dict(model_args, **kwargs))

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/timm/models/resnet.py:547, in _create_resnet(variant, pretrained, **kwargs)
546 def _create_resnet(variant, pretrained=False, **kwargs):
--> 547 return build_model_with_cfg(ResNet, variant, pretrained, **kwargs)

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/timm/models/_builder.py:393, in build_model_with_cfg(model_cls, variant, pretrained, pretrained_cfg, pretrained_cfg_overlay, model_cfg, feature_cfg, pretrained_strict, pretrained_filter_fn, kwargs_filter, **kwargs)
391 num_classes_pretrained = 0 if features else getattr(model, 'num_classes', kwargs.get('num_classes', 1000))
392 if pretrained:
--> 393 load_pretrained(
394 model,
395 pretrained_cfg=pretrained_cfg,
396 num_classes=num_classes_pretrained,
397 in_chans=kwargs.get('in_chans', 3),
398 filter_fn=pretrained_filter_fn,
399 strict=pretrained_strict,
400 )
402 # Wrap the model in a feature extraction module if enabled
403 if features:

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/timm/models/_builder.py:186, in load_pretrained(model, pretrained_cfg, num_classes, in_chans, filter_fn, strict)
184 state_dict = load_state_dict_from_hf(*pretrained_loc)
185 else:
--> 186 state_dict = load_state_dict_from_hf(pretrained_loc)
187 else:
188 model_name = pretrained_cfg.get('architecture', 'this model')

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/timm/models/_hub.py:183, in load_state_dict_from_hf(model_id, filename)
179 cached_safe_file = hf_hub_download(repo_id=hf_model_id, filename=safe_filename, revision=hf_revision)
180 _logger.info(
181 f"[{model_id}] Safe alternative available for '{filename}' "
182 f"(as '{safe_filename}'). Loading weights using safetensors.")
--> 183 return safetensors.torch.load_file(cached_safe_file, device="cpu")
184 except EntryNotFoundError:
185 pass

File ~/miniconda3/envs/nuplan/lib/python3.9/site-packages/safetensors/torch.py:261, in load_file(filename, device)
259 with safe_open(filename, framework="pt", device=device) as f:
260 for k in f.keys():
--> 261 result[k] = f.get_tensor(k)
262 return result

AttributeError: Error instantiating 'nuplan.planning.training.modeling.models.raster_model.RasterModel' : module 'torch' has no attribute 'frombuffer'

michael-motional · 2023-06-26T18:22:47Z

Could you try locking timm==0.6.7 here and rebuild the environment from scratch -- we use pytorch 1.9 which I think isn't compatible with recent (or maybe any) versions of safetensor.

CBeaune · 2023-06-27T09:17:33Z

It worked when changing the required timm version!
I had an issue with the train loss being nan after 1st epoch but resolved it as mentionned in #91
Thanks for the help !

michael-motional · 2023-06-27T15:48:39Z

Nice, we'll address whenever the next release is made

michael-motional self-assigned this Jun 26, 2023

CBeaune closed this as completed Jun 27, 2023

ll-nick mentioned this issue Feb 10, 2025

Fix version issues #407

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when launching training within the notebook nuplan_framework.ipynb #333

Error when launching training within the notebook nuplan_framework.ipynb #333

CBeaune commented Jun 26, 2023

ZhaoYangbjtu commented Jun 26, 2023

michael-motional commented Jun 26, 2023 •

edited

Loading

CBeaune commented Jun 27, 2023

michael-motional commented Jun 27, 2023

Error when launching training within the notebook nuplan_framework.ipynb #333

Error when launching training within the notebook nuplan_framework.ipynb #333

Comments

CBeaune commented Jun 26, 2023

ZhaoYangbjtu commented Jun 26, 2023

michael-motional commented Jun 26, 2023 • edited Loading

CBeaune commented Jun 27, 2023

michael-motional commented Jun 27, 2023

michael-motional commented Jun 26, 2023 •

edited

Loading