Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于distinct-1/2的计算,用了多少句子/conversations? #135

Open
ZenzenDatabase opened this issue May 6, 2022 · 1 comment
Open

Comments

@ZenzenDatabase
Copy link

ZenzenDatabase commented May 6, 2022

is:issue is:open 你好,我想问一下,在计算distinct-1/2的时候,用了多少句子和多少词汇,是全部test 数据吗,还是选择了top 10? 50?200?我们想知道这个分母是基于多少数据计算的。我们主要是为了做对照实验,想知道这个重要的数据信息。谢谢

@sserdoubleh
Copy link
Collaborator

sserdoubleh commented May 7, 2022

200个话题,各进行一个 self-chat
每个 多轮self-chat,除开始的话题以外,模型生成了9个 utterance,计算 distinct 是根据 200 * 9个 utterance 来计算的

对于 distinct-1、distinct-2,分母分别是200 * 9个 utterance 包含的 unigram 和 bigram 数量

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants