Reproducibility issue #43

janezlapajne · 2023-10-11T09:50:48Z

Hello,

I noticed that results are not reproducible by using the library i.e. when using sklearn drop-down-replacement classes, they will each time produce slightly different results.

For example, when using:

features_engineer = AutoFeatClassifier()
features_engineer.fit_transform(data_train.data, data_train.target.value)

, it will calculate (or select) different features each time.

The issue above I temporarily fixed by using:

 random.seed(seed)
 np.random.seed(seed)

, so that the outputs produced by AutoFeatClassifier stay constant among runs.

However, when I tried using the following:

selector = FeatureSelector(verbose=self.verbose, problem_type="classification", featsel_runs=5)
selector.fit_transform(df_indices, target)

, the above-mentioned seed setting trick didn't translate into desirable outcome - the selected features still change during runs...

Is there an easy fix to correct this? Somewhere in the source randomness must be introduced somewhere, damn.

The text was updated successfully, but these errors were encountered:

janezlapajne · 2023-10-14T18:43:00Z

Also, just now I noticed, that if the number of cores used (acr. n_jobs) is >1, then the results are not reproducible for the first scenario as well. So, the results are reproducible if n_jobs==1 and stochastic if n_jobs==-1.

Cheers.

jtimko16 · 2024-07-25T21:07:12Z

Hey @janezlapajne, thanks for pointing this out. A great observation!

I found several reproducibility issues in the code that I managed to fix. However, there is still some remaining randomness that is unresolved.

For reference the PR in here: (in case you want to review or further contribute)
#45

Cheers,
J.

janezlapajne mentioned this issue Jul 26, 2024

#43 - Reproducibility #45

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducibility issue #43

Reproducibility issue #43

janezlapajne commented Oct 11, 2023

janezlapajne commented Oct 14, 2023

jtimko16 commented Jul 25, 2024

Reproducibility issue #43

Reproducibility issue #43

Comments

janezlapajne commented Oct 11, 2023

janezlapajne commented Oct 14, 2023

jtimko16 commented Jul 25, 2024