You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran into this error when running prepare_aliccp() on downloaded Ali-CCP datasets.
Traceback (most recent call last):
File "/share/suh-scrap/zh338/aliccp/preprocess.py", line 13, in <module>
prepare_aliccp(DATA_DIR, convert_train=False, convert_test=True)
File "/home/zh338/.conda/envs/merlin-env/lib/python3.10/site-packages/merlin/datasets/ecommerce/aliccp/dataset.py", line 164, in prepare_aliccp
_convert_data(
File "/home/zh338/.conda/envs/merlin-env/lib/python3.10/site-packages/merlin/datasets/ecommerce/aliccp/dataset.py", line 449, in _convert_data
merlin.io.Dataset(tmp_files, dtypes=dtypes).to_parquet(out_dir)
File "/home/zh338/.conda/envs/merlin-env/lib/python3.10/site-packages/merlin/io/dataset.py", line 380, in __init__
self.infer_schema()
File "/home/zh338/.conda/envs/merlin-env/lib/python3.10/site-packages/merlin/io/dataset.py", line 1240, in infer_schema
dtypes = self.sample_dtypes(n=n, annotate_lists=True)
File "/home/zh338/.conda/envs/merlin-env/lib/python3.10/site-packages/merlin/io/dataset.py", line 1264, in sample_dtypes
_real_meta = _set_dtypes(_real_meta, self.dtypes)
File "/home/zh338/.conda/envs/merlin-env/lib/python3.10/site-packages/merlin/io/dataset.py", line 1301, in _set_dtypes
chunk[col] = chunk[col].astype(dtype)
File "/home/zh338/.conda/envs/merlin-env/lib/python3.10/site-packages/pandas/core/generic.py", line 6240, in astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
File "/home/zh338/.conda/envs/merlin-env/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 448, in astype
return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
File "/home/zh338/.conda/envs/merlin-env/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 352, in apply
applied = getattr(b, f)(**kwargs)
File "/home/zh338/.conda/envs/merlin-env/lib/python3.10/site-packages/pandas/core/internals/blocks.py", line 526, in astype
new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
File "/home/zh338/.conda/envs/merlin-env/lib/python3.10/site-packages/pandas/core/dtypes/astype.py", line 299, in astype_array_safe
new_values = astype_array(values, dtype, copy=copy)
File "/home/zh338/.conda/envs/merlin-env/lib/python3.10/site-packages/pandas/core/dtypes/astype.py", line 230, in astype_array
values = astype_nansafe(values, dtype, copy=copy)
File "/home/zh338/.conda/envs/merlin-env/lib/python3.10/site-packages/pandas/core/dtypes/astype.py", line 170, in astype_nansafe
return arr.astype(dtype, copy=True)
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
I saw another issue (#507 ) talking about a similar problem but didn't really mention the solution/workaround, so I'm wondering what is a workaround to avoid this error?
Thanks!
The text was updated successfully, but these errors were encountered:
I ran into this error when running
prepare_aliccp()
on downloaded Ali-CCP datasets.I saw another issue (#507 ) talking about a similar problem but didn't really mention the solution/workaround, so I'm wondering what is a workaround to avoid this error?
Thanks!
The text was updated successfully, but these errors were encountered: