New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Chinese Examples in MultiFieldQA-en #64

Open

wendywangwwt opened this issue May 5, 2024 · 1 comment

wendywangwwt commented May 5, 2024

Hi! I'm working on a long document QA problem and looked into the MultiFieldQA-en dataset recently.

I downloaded the dataset using the following code snippet:

from datasets import load_dataset

dataset = load_dataset("THUDM/LongBench",'multifieldqa_en')

While examining the content, I noticed that out of 150 entries, 2 are in Chinese rather than English:
.

Can you please take a look? Thank you!

The text was updated successfully, but these errors were encountered:

Member

bys0318 commented May 9, 2024

Hi! They are classified as English samples as they contain more English characters (a-zA-Z) than Chinese characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment