You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi!
While trying to use the AutoFeatClassifier using units, I stumbled upon a validation error caused by an infinite value.
Presumably one of the generated features (I assume from the ones coming from the Pi theorem) has an infinite value, which breaks the StandardScaler used during the filtering of correlated features.
This is how I am calling the classifier, with fitting the training data that comes in a numpy ndarray
These are the features logged for the Pi Theorem, and all of them include divisions (which could lead to a division by 0 issue).
...
[AutoFeat] Applying the Pi Theorem
[AutoFeat] Pi Theorem 1: x002 / x001
[AutoFeat] Pi Theorem 2: x006 / x000
[AutoFeat] Pi Theorem 3: x010 / x005
[AutoFeat] Pi Theorem 4: x003 / x001
[AutoFeat] Pi Theorem 5: x013 / x001
[AutoFeat] Pi Theorem 6: x014 / x005
[AutoFeat] Pi Theorem 7: x000 * x005 * x012 / x015
[AutoFeat] Pi Theorem 8: x016 / x000
[AutoFeat] Pi Theorem 9: x017 / x012
...
The full logs output by a failing run is the following:
[AutoFeat] Applying the Pi Theorem
[AutoFeat] Pi Theorem 1: x002 / x001
[AutoFeat] Pi Theorem 2: x006 / x000
[AutoFeat] Pi Theorem 3: x007 / x005
[AutoFeat] Pi Theorem 4: x003 / x001
[AutoFeat] Pi Theorem 5: x009 / x001
[AutoFeat] Pi Theorem 6: x010 / x005
[AutoFeat] Pi Theorem 7: x000 * x005 * x008 / x011
[AutoFeat] Pi Theorem 8: x012 / x000
[AutoFeat] Pi Theorem 9: x013 / x008
[AutoFeat] The 3 step feature engineering process could generate up to 118923 features.
[AutoFeat] With 121 data points this new feature matrix would use about 0.06 gb of space.
[feateng] Step 1: transformation of original features
[feateng] Generated 40 transformed features from 14 original features - done.
[feateng] Step 2: first combination of features
[feateng] Generated 1524 feature combinations from 1431 original feature tuples - done.
[feateng] Step 3: transformation of new features
[feateng] Generated 4564 transformed features from 1524 original features - done.
[feateng] Generated altogether 6233 new features in 3 steps
[feateng] Removing correlated features, as well as additions at the highest level
And after that, the error is reported with the following stack trace:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-323-53dcdfc1b68e> in <module>
32 # categorical_cols = []
33 auto = AutoFeatClassifier(categorical_cols=categorical_cols, units=units, verbose=1, feateng_steps=3, featsel_runs=5, n_jobs=5, apply_pi_theorem=True)
---> 34 X_train_new = auto.fit_transform(X_train_sampled, y_train_sampled)
35 X_test_new = auto.transform(X_test.to_numpy())
36 pretty_names = feature_names(auto, USEFUL_ACTUALS)
~/.pyenv/versions/features/lib/python3.7/site-packages/autofeat/autofeat.py in fit_transform(self, X, y)
299 # generate features
300 df_subs, self.feature_formulas_ = engineer_features(df_subs, self.feateng_cols_, _parse_units(self.units, verbose=self.verbose),
--> 301 self.feateng_steps, self.transformations, self.verbose)
302 # select predictive features
303 if self.featsel_runs <= 0:
~/.pyenv/versions/features/lib/python3.7/site-packages/autofeat/feateng.py in engineer_features(df_org, start_features, units, max_steps, transformations, verbose)
354 if cols:
355 # check for correlated features again; this time with the start features
--> 356 corrs = dict(zip(cols, np.max(np.abs(np.dot(StandardScaler().fit_transform(df[cols]).T, StandardScaler().fit_transform(df_org))/df_org.shape[0]), axis=1)))
357 cols = [c for c in cols if corrs[c] < 0.9]
358 cols = list(df_org.columns) + cols
~/.pyenv/versions/features/lib/python3.7/site-packages/sklearn/base.py in fit_transform(self, X, y, **fit_params)
688 if y is None:
689 # fit method of arity 1 (unsupervised transformation)
--> 690 return self.fit(X, **fit_params).transform(X)
691 else:
692 # fit method of arity 2 (supervised transformation)
~/.pyenv/versions/features/lib/python3.7/site-packages/sklearn/preprocessing/_data.py in fit(self, X, y)
665 # Reset internal state before fitting
666 self._reset()
--> 667 return self.partial_fit(X, y)
668
669 def partial_fit(self, X, y=None):
~/.pyenv/versions/features/lib/python3.7/site-packages/sklearn/preprocessing/_data.py in partial_fit(self, X, y)
696 X = self._validate_data(X, accept_sparse=('csr', 'csc'),
697 estimator=self, dtype=FLOAT_DTYPES,
--> 698 force_all_finite='allow-nan')
699
700 # Even in the case of `with_mean=False`, we update the mean anyway
~/.pyenv/versions/features/lib/python3.7/site-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
418 f"requires y to be passed, but the target y is None."
419 )
--> 420 X = check_array(X, **check_params)
421 out = X
422 else:
~/.pyenv/versions/features/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
70 FutureWarning)
71 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72 return f(**kwargs)
73 return inner_f
74
~/.pyenv/versions/features/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
643 if force_all_finite:
644 _assert_all_finite(array,
--> 645 allow_nan=force_all_finite == 'allow-nan')
646
647 if ensure_min_samples > 0:
~/.pyenv/versions/features/lib/python3.7/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
97 msg_err.format
98 (type_err,
---> 99 msg_dtype if msg_dtype is not None else X.dtype)
100 )
101 # for object dtype data, we only check for NaNs (GH-13254)
ValueError: Input contains infinity or a value too large for dtype('float64').
I tried removing all the constant features from the original dataset, so that all the original features have std() > 0.
Looks like a feature generated has a division by zero somewhere that leads to an infinite value somewhere deep in the generated features.
Maybe there should be some handling there, ignoring the feature or replacing the infinites with NaN which the scalers know to ignore?
The text was updated successfully, but these errors were encountered:
Hi!
While trying to use the AutoFeatClassifier using units, I stumbled upon a validation error caused by an infinite value.
Presumably one of the generated features (I assume from the ones coming from the Pi theorem) has an infinite value, which breaks the StandardScaler used during the filtering of correlated features.
This is how I am calling the classifier, with fitting the training data that comes in a numpy ndarray
These are the features logged for the Pi Theorem, and all of them include divisions (which could lead to a division by 0 issue).
The full logs output by a failing run is the following:
And after that, the error is reported with the following stack trace:
I tried removing all the constant features from the original dataset, so that all the original features have std() > 0.
Looks like a feature generated has a division by zero somewhere that leads to an infinite value somewhere deep in the generated features.
Maybe there should be some handling there, ignoring the feature or replacing the infinites with NaN which the scalers know to ignore?
The text was updated successfully, but these errors were encountered: