-
When I use deepmd_v2.2.7 for fine-tuning training of DPA large models, he has the following reported error.
OMP: Info #254: KMP_AFFINITY: pid 53485 tid 54101 thread 18 bound to OS proc set 18
OMP: Info #254: KMP_AFFINITY: pid 53485 tid 54100 thread 17 bound to OS proc set 17
Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
Traceback (most recent call last):
File "/export/home/liluotonggpu2/anaconda3/envs/dpmd/bin/dp", line 10, in <module>
sys.exit(main())
File "/export/home/liluotonggpu2/anaconda3/envs/dpmd/lib/python3.10/site-packages/deepmd_cli/main.py", line 63in main
deepmd_main(args)
File "/export/home/liluotonggpu2/anaconda3/envs/dpmd/lib/python3.10/site-packages/deepmd/entrypoints/main.py",ne 74, in main
train_dp(**dict_args)
File "/export/home/liluotonggpu2/anaconda3/envs/dpmd/lib/python3.10/site-packages/deepmd/entrypoints/train.py"ine 168, in train
_do_work(jdata, run_opt, is_compress)
File "/export/home/liluotonggpu2/anaconda3/envs/dpmd/lib/python3.10/site-packages/deepmd/entrypoints/train.py"ine 280, in _do_work
model.build(train_data, stop_batch, origin_type_map=origin_type_map)
File "/export/home/liluotonggpu2/anaconda3/envs/dpmd/lib/python3.10/site-packages/deepmd/train/trainer.py", li290, in build
self._init_from_pretrained_model(
File "/export/home/liluotonggpu2/anaconda3/envs/dpmd/lib/python3.10/site-packages/deepmd/train/trainer.py", li1137, in _init_from_pretrained_model
self._change_energy_bias(
File "/export/home/liluotonggpu2/anaconda3/envs/dpmd/lib/python3.10/site-packages/deepmd/train/trainer.py", li1145, in _change_energy_bias
self.model.change_energy_bias(
File "/export/home/liluotonggpu2/anaconda3/envs/dpmd/lib/python3.10/site-packages/deepmd/model/ener.py", line , in change_energy_bias
self.fitting.change_energy_bias(
File "/export/home/liluotonggpu2/anaconda3/envs/dpmd/lib/python3.10/site-packages/deepmd/fit/ener.py", line 85in change_energy_bias
delta_bias = np.linalg.lstsq(type_numbs, bias_diff, rcond=None)[0]
File "<__array_function__ internals>", line 180, in lstsq
File "/export/home/liluotonggpu2/anaconda3/envs/dpmd/lib/python3.10/site-packages/numpy/linalg/linalg.py", lin292, in lstsq
x, resids, rank, s = gufunc(a, b, rcond, signature=signature, extobj=extobj)
File "/export/home/liluotonggpu2/anaconda3/envs/dpmd/lib/python3.10/site-packages/numpy/linalg/linalg.py", lin00, in _raise_linalgerror_lstsq
raise LinAlgError("SVD did not converge in Linear Least Squares")
numpy.linalg.LinAlgError: SVD did not converge in Linear Least Squares |
Beta Was this translation helpful? Give feedback.
Answered by
njzjz
Jan 4, 2024
Replies: 1 comment 7 replies
-
would you get the same error if the training is from scratch? |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Perhaps you can use this model to evaluate your data, and see if there is anything strange.