You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The problem seems to be happening in the setup_hparams() function, specifically line 27 hparam_sets = [HPARAMS_REGISTRY[x.strip()] for x in hparam_set_names if x] + [kwargs].
I tried to fix the functionality of this line, but unfortunately I don't really get what it's trying to achieve.
I did, however, seem to bypass the problem by simply commenting out some lines in this function. So far this doesn't seem to cause many problems, I got no problems starting to train.
The function now looks as follows:
def setup_hparams(hparam_set_names, kwargs):
H = Hyperparams()
if not isinstance(hparam_set_names, tuple):
hparam_set_names = hparam_set_names.split(",")
// hparam_sets = [HPARAMS_REGISTRY[x.strip()] for x in hparam_set_names if x] + [kwargs]
for k, v in DEFAULTS.items():
H.update(v)
//for hps in hparam_sets:
// for k in hps:
// if k not in H:
// raise ValueError(f"{k} not in default args")
// H.update(**hps)
H.update(**kwargs)
return H
With this, I'd like to add that there are also some changes I needed to make in the terminal arguments to make this work.
The command now looks like this: python3 train.py --new -af par/arch.basic.json -tf par/train.basic.json -nb 4 -si 1000 -vqn 1000 --ckpt_template $run_dir/model%.ckpt $run_dir/model%.ckpt $run_dir/model%.ckpt -hw 'GPU'
Note: I changed the hardware to GPU because I don't use a TPU, I changed new to --new, and I added the --ckpt_template flag before the %.ckpt location.
I'd recommend checking out if, when training, the checkpoints still get stored properly though. I didn't get this far because I ran into CUDA problems, which is my own little problem hahaha. But we are messing with some setup with the checkpoint location.
Let me know if you run into any problems when using this hotfix!
Hi @hrbigelow
I am working on CPU+GPU .I got error while executing this command line
python train.py new -af par/arch.basic.json -tf par/train.basic.json -nb 4 -si 1000
-vqn 1000 $run_dir/model%.ckpt $run_dir/librispeech.dev-clean.dat $run_dir/data_slices.dat
"KeyError: '/root/data_mtbox_d/VQ_VAE/VQ_VAE_Speech/ae_wavenet/model%.ckpt'". Can you help me to resolve it
The text was updated successfully, but these errors were encountered: