About the long text input in Scholar-XL dataset #1

DavidLeexxxx · 2023-05-27T13:35:29Z

Thanks for this excellent work in this specific domain. When I ran the CNN-base method on the Scholar-XL dataset which is a dataset for NER tasks, I came across with the token limitation errors, as we all know that bert can only accept an input token length within 512. I recognize that this error occurred pobaby because some inputs' lengths are larger than 512. This "out of token length" problem has also been mentioned in https://www.aminer.cn/scholar-profiling. However I counldn't find an solution or hyper parameters settings such as "max-length" about this problem in the source code(train_CNN.py). So could you please update your code about this or give an explanation about how to treat this problem in order to get the baseline results mentioned in this repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the long text input in Scholar-XL dataset #1

About the long text input in Scholar-XL dataset #1

DavidLeexxxx commented May 27, 2023

About the long text input in Scholar-XL dataset #1

About the long text input in Scholar-XL dataset #1

Comments

DavidLeexxxx commented May 27, 2023