Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducibility issue #43

Open
janezlapajne opened this issue Oct 11, 2023 · 2 comments
Open

Reproducibility issue #43

janezlapajne opened this issue Oct 11, 2023 · 2 comments

Comments

@janezlapajne
Copy link

Hello,

I noticed that results are not reproducible by using the library i.e. when using sklearn drop-down-replacement classes, they will each time produce slightly different results.

For example, when using:

features_engineer = AutoFeatClassifier()
features_engineer.fit_transform(data_train.data, data_train.target.value)

, it will calculate (or select) different features each time.

The issue above I temporarily fixed by using:

 random.seed(seed)
 np.random.seed(seed)

, so that the outputs produced by AutoFeatClassifier stay constant among runs.

However, when I tried using the following:

selector = FeatureSelector(verbose=self.verbose, problem_type="classification", featsel_runs=5)
selector.fit_transform(df_indices, target)

, the above-mentioned seed setting trick didn't translate into desirable outcome - the selected features still change during runs...

Is there an easy fix to correct this? Somewhere in the source randomness must be introduced somewhere, damn.

@janezlapajne
Copy link
Author

Also, just now I noticed, that if the number of cores used (acr. n_jobs) is >1, then the results are not reproducible for the first scenario as well. So, the results are reproducible if n_jobs==1 and stochastic if n_jobs==-1.

Cheers.

@jtimko16
Copy link

Hey @janezlapajne, thanks for pointing this out. A great observation!

I found several reproducibility issues in the code that I managed to fix. However, there is still some remaining randomness that is unresolved.

For reference the PR in here: (in case you want to review or further contribute)
#45

Cheers,
J.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants