We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi! I'm working on a long document QA problem and looked into the MultiFieldQA-en dataset recently.
I downloaded the dataset using the following code snippet:
from datasets import load_dataset dataset = load_dataset("THUDM/LongBench",'multifieldqa_en')
While examining the content, I noticed that out of 150 entries, 2 are in Chinese rather than English: .
Can you please take a look? Thank you!
The text was updated successfully, but these errors were encountered:
Hi! They are classified as English samples as they contain more English characters (a-zA-Z) than Chinese characters.
Sorry, something went wrong.
No branches or pull requests
Hi! I'm working on a long document QA problem and looked into the MultiFieldQA-en dataset recently.
I downloaded the dataset using the following code snippet:
While examining the content, I noticed that out of 150 entries, 2 are in Chinese rather than English:
.
Can you please take a look? Thank you!
The text was updated successfully, but these errors were encountered: