You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
--
Even though here only the correlation of features are being considered during feature selection (not regarding the target into consideration), the computation of correlation-coefficient itself requires the knowledge of the mean of the entire column, thus unknowingly (in this case) causing a leakage of test information to the train. This needs correction in my opinion.
The text was updated successfully, but these errors were encountered:
Feature selection is being done prior to train-test split, which can cause data leakage.
Link to discussion: https://stackoverflow.com/questions/56308116/should-feature-selection-be-done-before-train-test-split-or-after
--
Even though here only the correlation of features are being considered during feature selection (not regarding the target into consideration), the computation of correlation-coefficient itself requires the knowledge of the mean of the entire column, thus unknowingly (in this case) causing a leakage of test information to the train. This needs correction in my opinion.
The text was updated successfully, but these errors were encountered: