You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The main problem with the standard logic of pipelines is that fit_transform, which is applied to all pre-processors in the pipeline, first applies fit and then transform, where transform uses the same logic on the training data as for other data that would pass the pipeline later in a standard transform call.
Some pre-processors of a pipeline should only be used in the transform step coupled to the fit step, i.e., only in fit_transform but not in an ordinary transform. One solution is to use three different methods: fit, transform_fitted_data, transform.
A classical example is SMOTE, whose job is to do the following things during the different phases:
fit: Memorizes the data
transform_fitted_data: Applies upsampling based on the given data
transform: Does nothing
Alternatively, one could extend the signature of the transform function with an optional parameter fitted_data: bool. The pipeline then can set this parameter to true when the fit_transform function is used. If the parameter is abscent, then no different should be made between the fitted data and other data.
The text was updated successfully, but these errors were encountered:
The main problem with the standard logic of pipelines is that
fit_transform
, which is applied to all pre-processors in the pipeline, first appliesfit
and thentransform
, wheretransform
uses the same logic on the training data as for other data that would pass the pipeline later in a standardtransform
call.Some pre-processors of a pipeline should only be used in the transform step coupled to the fit step, i.e., only in
fit_transform
but not in an ordinarytransform
. One solution is to use three different methods:fit
,transform_fitted_data
,transform
.A classical example is SMOTE, whose job is to do the following things during the different phases:
fit
: Memorizes the datatransform_fitted_data
: Applies upsampling based on the given datatransform
: Does nothingAlternatively, one could extend the signature of the
transform
function with an optional parameterfitted_data: bool
. The pipeline then can set this parameter to true when thefit_transform
function is used. If the parameter is abscent, then no different should be made between the fitted data and other data.The text was updated successfully, but these errors were encountered: