Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets #4

Open
lyc728 opened this issue Jan 6, 2022 · 9 comments
Open

Datasets #4

lyc728 opened this issue Jan 6, 2022 · 9 comments

Comments

@lyc728
Copy link

lyc728 commented Jan 6, 2022

你好,制作的lmdb数据加载只能加载一个文件吗?我数据集分布很杂,在多个文件夹下,不好合并,有没有好的办法,谢谢!

@JingyeChen
Copy link
Member

您好,可以使用这个API
torch.utils.data.ConcatDataset([dataset1, dataset2, ...])

@lyc728
Copy link
Author

lyc728 commented Jan 6, 2022

text_input[i][j + 1] = alp2num[label[i][j]]

KeyError: '₂'
这个报错是什么情况呢?

@hyangyu
Copy link
Member

hyangyu commented Jan 6, 2022

text_input[i][j + 1] = alp2num[label[i][j]]

KeyError: '₂'
这个报错是什么情况呢?

您好,您使用的额外的数据么?如果您使用的是额外的数据集进行的测试,那么是您测试的数据集中的该字符不在我们benchmark统计的alphabet中。

@lyc728
Copy link
Author

lyc728 commented Jan 7, 2022

有没有办法直接跳过这张图片对应字符

@JingyeChen
Copy link
Member

可以的,请在lmdbDataset类修改,感谢您

@lyc728
Copy link
Author

lyc728 commented Jan 10, 2022

你好,为什么训练1个epoch花了21分钟,验证花了1个小时44分钟,是进行写入记录错误的行文本导致耗时久吗?可以优化吗?

@hyangyu
Copy link
Member

hyangyu commented Jan 10, 2022

你好,为什么训练1个epoch花了21分钟,验证花了1个小时44分钟,是进行写入记录错误的行文本导致耗时久吗?可以优化吗?

训练的时候是并行的,但是测试是串行的;且中文数据集上平均长度较长。所以验证阶段的耗时较长。

@lyc728
Copy link
Author

lyc728 commented Jan 10, 2022

这个感觉训练1周都训不完,得优化一下啊大哥

@lyc728
Copy link
Author

lyc728 commented Jan 10, 2022

你好,关于你说的除了输出准确度,还有编辑距离,但是代码中并没有体现

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants