Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numpy.random.mtrand.RandomState.shuffle ValueError: array is read-only #48

Closed
jmrichardson opened this issue Aug 1, 2024 · 4 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@jmrichardson
Copy link

Hi,

I am testing GrootCV and got the following error:

 Cross Validation:   0%|          | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "D:\Anaconda3\envs\mld\lib\site-packages\IPython\core\interactiveshell.py", line 3553, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-8-2bb4632a1898>", line 1, in <module>
    feat_selector.fit(X, y, sample_weight=None)
  File "D:\Anaconda3\envs\mld\lib\site-packages\arfs\feature_selection\allrelevant.py", line 2076, in fit
    self.selected_features_, self.cv_df, self.sha_cutoff = _reduce_vars_lgb_cv(
  File "D:\Anaconda3\envs\mld\lib\site-packages\arfs\feature_selection\allrelevant.py", line 2306, in _reduce_vars_lgb_cv
    new_x_tr, shadow_names = _create_shadow(X_train)
  File "D:\Anaconda3\envs\mld\lib\site-packages\arfs\feature_selection\allrelevant.py", line 1695, in _create_shadow
    np.random.shuffle(X_shadow[c].values)
  File "numpy\\random\\mtrand.pyx", line 4594, in numpy.random.mtrand.RandomState.shuffle
ValueError: array is read-only

Here is the respective code:

 ts_cv = TimeSeriesSplit(
            n_splits=3,
            gap=5,
        )

        feat_selector = GrootCV(
            objective="mse",
            n_folds=3,
            folds=ts_cv,
            n_iter=2,
            silent=True,
            fastshap=False,
            n_jobs=4,
        )
        feat_selector.fit(X, y, sample_weight=None)

In allreveant.py line 1696, I changed

np.random.shuffle(X_shadow[c].values)

to

X_shadow[c] = np.random.permutation(X_shadow[c].values)

It seems to work now. Hoping you could have a look.

Thanks!

@ThomasBury ThomasBury self-assigned this Aug 2, 2024
@ThomasBury ThomasBury added the bug Something isn't working label Aug 2, 2024
@ThomasBury
Copy link
Owner

Hello @jmrichardson, could you print out the version of numpy and arfs you are using?

import arfs
print(f"numpy {np.__version__} and ARFS {arfs.__version__}")

As the error says, the array is read-only. It might be due how you instantiate X and y. A simple solution is copying your array or changing the numpy flag. Everything should be fine if you use pandas DF

Are you able to run the timeseries tuto?
It runs fine with numpy 1.26.4, numpy 2.0.1 and ARFS 2.3.0

I prefer not to change shuffle to permutation, as permutation creates a copy of the numpy variable, which can be solved upward by instantiating X, y and w.

Let me know if that works, thanks for reaching out

@jmrichardson
Copy link
Author

Hi @ThomasBury ,

Thank you for the fast reply!

import arfs
print(f"numpy {np.__version__} and ARFS {arfs.__version__}")
numpy 1.26.4 and ARFS 2.3.0

It fails on the tutorial. I just pasted the tutorial below in my python terminal and got the same error:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.model_selection import cross_validate
from sklearn.model_selection import TimeSeriesSplit
from arfs.benchmark import highlight_tick
from arfs.feature_selection.allrelevant import GrootCV
bike_sharing = fetch_openml("Bike_Sharing_Demand", version=2, as_frame=True)
df = bike_sharing.frame
y = df["count"] #/ df["count"].max()
X = df.drop("count", axis="columns")
X["weather"] = (
    X["weather"]
    .astype(object)
    .replace(to_replace="heavy_rain", value="rain")
    .astype("category")
)
ts_cv = TimeSeriesSplit(
    n_splits=5,
    gap=48,
    max_train_size=10000,
    test_size=1000,
)
feat_selector = GrootCV(
    objective="poisson",
    cutoff=1,
    n_folds=5,
    folds=ts_cv,
    n_iter=5,
    silent=True,
    fastshap=False,
    n_jobs=0,
)
feat_selector.fit(X, y, sample_weight=None)
Python 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 15:27:34) [MSC v.1937 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.20.0 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 8.20.0
Cross Validation:   0%|          | 0/5 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "D:\Anaconda3\envs\mld\lib\site-packages\IPython\core\interactiveshell.py", line 3553, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-1-9ec4508f8aff>", line 39, in <module>
    feat_selector.fit(X, y, sample_weight=None)
  File "D:\Anaconda3\envs\mld\lib\site-packages\arfs\feature_selection\allrelevant.py", line 2077, in fit
    self.selected_features_, self.cv_df, self.sha_cutoff = _reduce_vars_lgb_cv(
  File "D:\Anaconda3\envs\mld\lib\site-packages\arfs\feature_selection\allrelevant.py", line 2307, in _reduce_vars_lgb_cv
    new_x_tr, shadow_names = _create_shadow(X_train)
  File "D:\Anaconda3\envs\mld\lib\site-packages\arfs\feature_selection\allrelevant.py", line 1696, in _create_shadow
    np.random.shuffle(X_shadow[c].values)
  File "numpy\\random\\mtrand.pyx", line 4594, in numpy.random.mtrand.RandomState.shuffle
ValueError: array is read-only

My X and y are pandas dataframe and series respectively. Ive added a .copy() to both X and y and got the same error:

feat_selector.fit(X.copy(), y.copy(), sample_weight=None)

Not sure what is different in our environments which could cause the issue?

@ThomasBury
Copy link
Owner

Alright, we can try two things:

  • run in a fresh env:
    • conda create -n arfs python jupyter ipykernel
    • conda activate arfs
    • pip install arfs -U

Then run the tuto using this python kernel.

If it still fails, try to change the numpy flag (see the link in my previous message)

If none works, I'll need to investigate further. I just tested on two different laptops with fresh env, it works fine (linux and windows, numpy 1.26 and 2.01)

🤞

@jmrichardson
Copy link
Author

Hi, creating a new environment did work. I tested both numpy 1.26 and 2.01 on my windows PC and no issue. There must be something else in my other environment that is conflicting. No worries, I will just create a fork and make the changes I need and hopefully have more time later to pin point the issue. Thanks for your help :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants