Running "Neural Network DMD on Slow Manifold" error #36

lk1983823 · 2023-09-14T08:24:04Z

I have lightning version 2.0.5 installed. The pykoopman version is 1.0.4.
When I run "dlk_regressor.fit(traj_list)" in the example "tutorial_koopman_nndmd_examples.ipynb".

It shows errors

INFO: GPU available: True (cuda), used: True
[rank_zero.py:48 -                _info() ] GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
[rank_zero.py:48 -                _info() ] TPU available: False, using: 0 TPU cores
INFO: IPU available: False, using: 0 IPUs
[rank_zero.py:48 -                _info() ] IPU available: False, using: 0 IPUs
INFO: HPU available: False, using: 0 HPUs
[rank_zero.py:48 -                _info() ] HPU available: False, using: 0 HPUs
INFO: Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2[/2](https://file+.vscode-resource.vscode-cdn.net/2)
[distributed.py:245 - _init_dist_connection() ] Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2[/2](https://file+.vscode-resource.vscode-cdn.net/2)
INFO: Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1[/2](https://file+.vscode-resource.vscode-cdn.net/2)
[distributed.py:245 - _init_dist_connection() ] Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1[/2](https://file+.vscode-resource.vscode-cdn.net/2)
INFO: ----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 2 processes
----------------------------------------------------------------------------------------------------

[rank_zero.py:48 -                _info() ] ----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 2 processes
----------------------------------------------------------------------------------------------------

2023-09-14 16:20:30.863437: I tensorflow[/core/platform/cpu_feature_guard.cc:193](https://file+.vscode-resource.vscode-cdn.net/core/platform/cpu_feature_guard.cc:193)] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-14 16:20:31.016351: I tensorflow[/core/util/port.cc:104](https://file+.vscode-resource.vscode-cdn.net/core/util/port.cc:104)] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-09-14 16:20:32.029631: W tensorflow[/compiler/xla/stream_executor/platform/default/dso_loader.cc:64](https://file+.vscode-resource.vscode-cdn.net/compiler/xla/stream_executor/platform/default/dso_loader.cc:64)] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: [/usr/local/cuda-11.7/lib64](https://file+.vscode-resource.vscode-cdn.net/usr/local/cuda-11.7/lib64)
2023-09-14 16:20:32.029742: W tensorflow[/compiler/xla/stream_executor/platform/default/dso_loader.cc:64](https://file+.vscode-resource.vscode-cdn.net/compiler/xla/stream_executor/platform/default/dso_loader.cc:64)] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: [/usr/local/cuda-11.7/lib64](https://file+.vscode-resource.vscode-cdn.net/usr/local/cuda-11.7/lib64)
2023-09-14 16:20:32.029751: W tensorflow[/compiler/tf2tensorrt/utils/py_utils.cc:38](https://file+.vscode-resource.vscode-cdn.net/compiler/tf2tensorrt/utils/py_utils.cc:38)] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
---------------------------------------------------------------------------
ProcessRaisedException                    Traceback (most recent call last)
Cell In[6], line 1
----> 1 dlk_regressor.fit(traj_list)

File [~/anaconda3/envs/dpc/lib/python3.10/site-packages/pykoopman/regression/_nndmd.py:1187](https://file+.vscode-resource.vscode-cdn.net/media/lk/lksgcc/lk_git/3_Reinforcement_Learning/3_4_MPC/pykoopman/docs/~/anaconda3/envs/dpc/lib/python3.10/site-packages/pykoopman/regression/_nndmd.py:1187), in NNDMD.fit(self, x, y, dt)
   1184     raise ValueError("check `x` and `y` for `self.fit`")
   1186 # trainer starts to train
-> 1187 self.trainer.fit(self._regressor, self.dm)
   1189 # compute Koopman operator information
   1190 self._state_matrix_ = (
   1191     self._regressor._koopman_propagator.get_discrete_time_Koopman_Operator()
   1192     .detach()
   1193     .numpy()
   1194 )

File [~/anaconda3/envs/dpc/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py:529](https://file+.vscode-resource.vscode-cdn.net/media/lk/lksgcc/lk_git/3_Reinforcement_Learning/3_4_MPC/pykoopman/docs/~/anaconda3/envs/dpc/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py:529), in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    527 model = _maybe_unwrap_optimized(model)
    528 self.strategy._lightning_module = model
--> 529 call._call_and_handle_interrupt(
    530     self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
    531 )

File [~/anaconda3/envs/dpc/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py:41](https://file+.vscode-resource.vscode-cdn.net/media/lk/lksgcc/lk_git/3_Reinforcement_Learning/3_4_MPC/pykoopman/docs/~/anaconda3/envs/dpc/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py:41), in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
     39 try:
     40     if trainer.strategy.launcher is not None:
---> 41         return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
     42     return trainer_fn(*args, **kwargs)
     44 except _TunerExitException:

File [~/anaconda3/envs/dpc/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/multiprocessing.py:124](https://file+.vscode-resource.vscode-cdn.net/media/lk/lksgcc/lk_git/3_Reinforcement_Learning/3_4_MPC/pykoopman/docs/~/anaconda3/envs/dpc/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/multiprocessing.py:124), in _MultiProcessingLauncher.launch(self, function, trainer, *args, **kwargs)
    116 process_context = mp.start_processes(
    117     self._wrapping_function,
    118     args=process_args,
   (...)
    121     join=False,  # we will join ourselves to get the process references
    122 )
    123 self.procs = process_context.processes
--> 124 while not process_context.join():
    125     pass
    127 worker_output = return_queue.get()

File [~/anaconda3/envs/dpc/lib/python3.10/site-packages/torch/multiprocessing/spawn.py:160](https://file+.vscode-resource.vscode-cdn.net/media/lk/lksgcc/lk_git/3_Reinforcement_Learning/3_4_MPC/pykoopman/docs/~/anaconda3/envs/dpc/lib/python3.10/site-packages/torch/multiprocessing/spawn.py:160), in ProcessContext.join(self, timeout)
    158 msg = "\n\n-- Process %d terminated with the following error:\n" % error_index
    159 msg += original_trace
--> 160 raise ProcessRaisedException(msg, error_index, failed_process.pid)

ProcessRaisedException: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "[/home/lk/anaconda3/envs/dpc/lib/python3.10/site-packages/torch/multiprocessing/spawn.py](https://file+.vscode-resource.vscode-cdn.net/home/lk/anaconda3/envs/dpc/lib/python3.10/site-packages/torch/multiprocessing/spawn.py)", line 69, in _wrap
    fn(i, *args)
  File "[/home/lk/anaconda3/envs/dpc/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/multiprocessing.py](https://file+.vscode-resource.vscode-cdn.net/home/lk/anaconda3/envs/dpc/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/multiprocessing.py)", line 147, in _wrapping_function
    results = function(*args, **kwargs)
  File "[/home/lk/anaconda3/envs/dpc/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py](https://file+.vscode-resource.vscode-cdn.net/home/lk/anaconda3/envs/dpc/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py)", line 568, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "[/home/lk/anaconda3/envs/dpc/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py](https://file+.vscode-resource.vscode-cdn.net/home/lk/anaconda3/envs/dpc/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py)", line 934, in _run
    call._call_setup_hook(self)  # allow user to setup lightning_module in accelerator environment
  File "[/home/lk/anaconda3/envs/dpc/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py](https://file+.vscode-resource.vscode-cdn.net/home/lk/anaconda3/envs/dpc/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py)", line 83, in _call_setup_hook
    _call_lightning_datamodule_hook(trainer, "setup", stage=fn)
  File "[/home/lk/anaconda3/envs/dpc/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py](https://file+.vscode-resource.vscode-cdn.net/home/lk/anaconda3/envs/dpc/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py)", line 164, in _call_lightning_datamodule_hook
    return fn(*args, **kwargs)
  File "[/home/lk/anaconda3/envs/dpc/lib/python3.10/site-packages/pykoopman/regression/_nndmd.py](https://file+.vscode-resource.vscode-cdn.net/home/lk/anaconda3/envs/dpc/lib/python3.10/site-packages/pykoopman/regression/_nndmd.py)", line 902, in setup
    self._tr_x, self._tr_yseq, self._tr_ys, self.normalization
AttributeError: 'SeqDataModule' object has no attribute '_tr_x'

Anyone can help ?

The text was updated successfully, but these errors were encountered:

pswpswpsw · 2023-09-22T00:38:22Z

I actually just created a blank conda env and there is no error coming out for running that jupyter notebook.

So you need to setup conda env carefully:

conda env create --name pyk python=3.10
conda activate pyk
python -m pip install -r requirements-dev.txt

Then if there is no error coming out, this environment should be good to go. Maybe you won't have GPU-version of pytorch depending on which OS and which pytorch you are using but the code will run anyway.

pswpswpsw closed this as completed Oct 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running "Neural Network DMD on Slow Manifold" error #36

Running "Neural Network DMD on Slow Manifold" error #36

lk1983823 commented Sep 14, 2023

pswpswpsw commented Sep 22, 2023 •

edited

Loading

Running "Neural Network DMD on Slow Manifold" error #36

Running "Neural Network DMD on Slow Manifold" error #36

Comments

lk1983823 commented Sep 14, 2023

pswpswpsw commented Sep 22, 2023 • edited Loading

pswpswpsw commented Sep 22, 2023 •

edited

Loading