-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
why show “No matching checkpoint file found” #125
Comments
如果你没有训练过,是没有这个checkpoint文件的,它是保存上次训练的参数的 |
我已经改好了,谢谢! |
你的可以训练了嘛? |
嗯嗯可以了 |
为什么我的一训练就会出现nan啊,你把数据集全部下载了嘛? |
我没有用全部数据集,太大了,只用了部分 |
好的,谢谢
…------------------ 原始邮件 ------------------
发件人: ***@***.***>;
发送时间: 2022年9月17日(星期六) 中午12:03
收件人: ***@***.***>;
抄送: ***@***.***>; ***@***.***>;
主题: Re: [chenxin-dlut/TransT] why show “No matching checkpoint file found” (Issue #125)
为什么我的一训练就会出现nan啊,你把数据集全部下载了嘛?
我没有用全部数据集,太大了,只用了部分
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
你好,方便发一下你的嘛,我想试试,我的一直有问题。还有你是在什么设备上面训练的? |
你好,请问可以把您修改后的训练代码发一下吗?我的训练部分还没能跑起来,谢谢,我的邮箱[email protected] |
你好,能说一下是怎么改的吗? |
|
你好,请问你是怎么改的,谢谢 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
when i run "run_training.py", it shows "No matching checkpoint file found"
Restarting training from last epoch ...
No matching checkpoint file found
Training crashed at epoch 1
Traceback for the error!
Traceback (most recent call last):
File "/ai/lu/TransT-main/ltr/../ltr/trainers/base_trainer.py", line 70, in train
self.train_epoch()
File "/ai/lu/TransT-main/ltr/../ltr/trainers/ltr_trainer.py", line 79, in train_epoch
self.cycle_dataset(loader)
File "/ai/lu/TransT-main/ltr/../ltr/trainers/ltr_trainer.py", line 52, in cycle_dataset
for i, data in enumerate(loader, 1):
File "/root/anaconda3/envs/transt/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/root/anaconda3/envs/transt/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
return self._process_data(data)
File "/root/anaconda3/envs/transt/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/root/anaconda3/envs/transt/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/root/anaconda3/envs/transt/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/root/anaconda3/envs/transt/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/root/anaconda3/envs/transt/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/ai/lu/TransT-main/ltr/../ltr/data/sampler.py", line 92, in getitem
dataset = random.choices(self.datasets, self.p_datasets)[0]
File "/root/anaconda3/envs/transt/lib/python3.7/random.py", line 361, in choices
raise ValueError('The number of weights does not match the population')
The text was updated successfully, but these errors were encountered: