Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chinese Examples in MultiFieldQA-en #64

Open
wendywangwwt opened this issue May 5, 2024 · 1 comment
Open

Chinese Examples in MultiFieldQA-en #64

wendywangwwt opened this issue May 5, 2024 · 1 comment

Comments

@wendywangwwt
Copy link

Hi! I'm working on a long document QA problem and looked into the MultiFieldQA-en dataset recently.

I downloaded the dataset using the following code snippet:

from datasets import load_dataset

dataset = load_dataset("THUDM/LongBench",'multifieldqa_en')

While examining the content, I noticed that out of 150 entries, 2 are in Chinese rather than English:
Screenshot 2024-05-05 at 4 27 36 PM.

Can you please take a look? Thank you!

@bys0318
Copy link
Member

bys0318 commented May 9, 2024

Hi! They are classified as English samples as they contain more English characters (a-zA-Z) than Chinese characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants