Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

读取数据集报错 #16

Open
linjie7674 opened this issue Dec 19, 2024 · 0 comments
Open

读取数据集报错 #16

linjie7674 opened this issue Dec 19, 2024 · 0 comments

Comments

@linjie7674
Copy link

当读取数据的时候,遇到了以下问题:

Generating train split: 112142 examples [00:00, 160735.47 examples/s]
Traceback (most recent call last):
  File "/home/lj/software/miniconda3/lib/python3.9/site-packages/datasets/builder.py", line 1989, in _prepare_split_single
    writer.write_table(table)
  File "/home/lj/software/miniconda3/lib/python3.9/site-packages/datasets/arrow_writer.py", line 574, in write_table
    pa_table = table_cast(pa_table, self._schema)
  File "/home/lj/software/miniconda3/lib/python3.9/site-packages/datasets/table.py", line 2322, in table_cast
    return cast_table_to_schema(table, schema)
  File "/home/lj/software/miniconda3/lib/python3.9/site-packages/datasets/table.py", line 2276, in cast_table_to_schema
    raise CastError(
datasets.table.CastError: Couldn't cast
role: string
question: string
generated: list<item: string>
  child 0, item: string
type: string
to
{'role': Value(dtype='string', id=None), 'question': Value(dtype='string', id=None), 'generated': Sequence(feature=Value(dtype='string', id=None), length=-1, id=None)}
because column names don't match

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/lj/code/mine/role_play_llm/test.py", line 42, in <module>
    test_role_llm_dataset()
  File "/home/lj/code/mine/role_play_llm/test.py", line 35, in test_role_llm_dataset
    dst = load_dataset("/home/lj/datasets/rolebench")
  File "/home/lj/software/miniconda3/lib/python3.9/site-packages/datasets/load.py", line 2549, in load_dataset
    builder_instance.download_and_prepare(
  File "/home/lj/software/miniconda3/lib/python3.9/site-packages/datasets/builder.py", line 1005, in download_and_prepare
    self._download_and_prepare(
  File "/home/lj/software/miniconda3/lib/python3.9/site-packages/datasets/builder.py", line 1100, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
  File "/home/lj/software/miniconda3/lib/python3.9/site-packages/datasets/builder.py", line 1860, in _prepare_split
    for job_id, done, content in self._prepare_split_single(
  File "/home/lj/software/miniconda3/lib/python3.9/site-packages/datasets/builder.py", line 1991, in _prepare_split_single
    raise DatasetGenerationCastError.from_cast_error(
datasets.exceptions.DatasetGenerationCastError: An error occurred while generating the dataset

All the data files must have the same columns, but at some point there are 1 new columns ({'type'})

This happened while the json dataset builder was generating data using

/home/lj/datasets/rolebench/rolebench-eng/instruction-generalization/role_specific/train.jsonl

Please either edit the data files to have matching columns, or separate them into different configurations (see docs at https://hf.co/docs/hub/datasets-manual-configuration#multiple-configurations)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant