some issues #9

sanwei111 · 2021-03-02T08:22:10Z

作者您好。心中有几个疑问，希望您能不吝赐教
1.pre-train上来就是一堆超参（这些超参在哪个文件里面的）;pre-train部分的最后一句是训练把，而且后面带了一堆参数？到底我要输入什么指令从而接下去运行。
2.我的服务器只有一个gpu，要运行你的代码，是不是要改一些配置？但是到底要改哪些参数
3.数据集的路径在哪个文件，没看到有
4."we use the English Wikipedia corpus and BookCorpus (Zhu et al., 2015) for pre-training. By concatenating these two datasets, we obtain a corpus with roughly 16GB in size. We set the vocabulary size (sub-word tokens) as 32,768. We use the GLUE (General Language Understanding Evaluation) dataset (Wang et al., 2018) as the downstream tasks to evaluate the performance of the pre-trained models".这个是原文中的内容，为什么要concatenating 两个数据集，具体怎么concat

guolinke · 2021-04-16T02:04:22Z

@sanwei111 you may need to learn more about linux/unix shell, to run the commands.
You should crawl/download data by yourself.

The concat of two dataset is standard process in BETR pretraining. You can use cat command in Linux.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some issues #9

some issues #9

sanwei111 commented Mar 2, 2021

guolinke commented Apr 16, 2021

some issues #9

some issues #9

Comments

sanwei111 commented Mar 2, 2021

guolinke commented Apr 16, 2021