Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some issues #9

Open
sanwei111 opened this issue Mar 2, 2021 · 1 comment
Open

some issues #9

sanwei111 opened this issue Mar 2, 2021 · 1 comment

Comments

@sanwei111
Copy link

作者您好。心中有几个疑问,希望您能不吝赐教
1.pre-train上来就是一堆超参(这些超参在哪个文件里面的);pre-train部分的最后一句是训练把,而且后面带了一堆参数?到底我要输入什么指令从而接下去运行。
2.我的服务器只有一个gpu,要运行你的代码,是不是要改一些配置?但是到底要改哪些参数
3.数据集的路径在哪个文件,没看到有
4."we use the English Wikipedia corpus and BookCorpus (Zhu et al., 2015) for pre-training. By concatenating these two datasets, we obtain a corpus with roughly 16GB in size. We set the vocabulary size (sub-word tokens) as 32,768. We use the GLUE (General Language Understanding Evaluation) dataset (Wang et al., 2018) as the downstream tasks to evaluate the performance of the pre-trained models".这个是原文中的内容,为什么要concatenating 两个数据集,具体怎么concat

@guolinke
Copy link
Owner

@sanwei111 you may need to learn more about linux/unix shell, to run the commands.
You should crawl/download data by yourself.

The concat of two dataset is standard process in BETR pretraining. You can use cat command in Linux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants