Latest Release | |
Package Status | |
License | |
Build Status |
The scikit_ext
package contains various scikit-learn extensions, built entirely on top of sklearn
base classes. The package is separated into two modules: estimators and scorers. Full documentation can be found here.
Package Index on PyPI
To install:
pip install scikit-ext
MultiGridSearchCV
: Extension to native sklearnGridSearchCV
for multiple estimators and param_grids. Accepts a list of estimators and param_grids, iterating through each fitting aGridSearchCV
model for each estimator/param_grid. Chooses the best fittedGridSearchCV
model. Inherits sklearn'sBaseSearchCV
class, so attributes and methods are all similar toGridSearchCV
.PrunedPipeline
: Extension to native sklearnPipeline
intended for text learning pipelines with a vectorization step and a feature selection step. Instead of remembering all vectorizer vocabulary elements and selecting appropriate features at prediction time, the extension prunes the vocabulary after fitting to only include elements who will ultimately survive the feature selection filter applied later in the pipeline. This reduces memory and improves prediction latency. Predictions will be identical to those made with a trainedPipeline
model. Inherits sklearn'sPipeline
class, so attributes and methods are all similar toPipeline
.ZoomGridSearchCV
: Extension to native sklearnGridSearchCV
. Fits multipleGridSearchCV
models, updating theparam_grid
after each iteration. The update looks at successful parameter values for each grid key. A new list of values is created which expands the resolution of the search values centered around the best performing value of the previous fit. This allows the standard grid search process to start with a small number of distant values for each parameter, and zoom in as the better performing corner of the hyperparameter search space becomes clear.IterRandomEstimator
: Meta-Estimator intended primarily for unsupervised estimators whose fitted model can be heavily dependent on an arbitrary random initialization state. It is
best used for problems where afit_predict
method is intended, so the only data used for prediction will be the same data on which the model was fitted.OptimizedEnsemble
: An optimized ensemble class. Will find the optimaln_estimators
parameter for the given ensemble estimator, according to the specified input parameters.OneVsRestAdjClassifier
: One-Vs-Rest multiclass strategy. The adjusted version is a custom extension which overwrites the inheritedpredict_proba
method with a more flexible method allowing custom normalization for the predicted probabilities. Any norm argument that can be passed directly tosklearn.preprocessing.normalize
is allowed. Additionally, norm=None will skip the normalization step alltogeter. To mimick the inheritedOneVsRestClassfier
behavior, set norm='l2'. All other methods are inherited fromOneVsRestClassifier
.
TimeScorer
: Score using estimated prediction latency of estimator.MemoryScorer
: Score using estimated memory of pickled estimator object.CombinedScorer
: Score combining multiple scorers by averaging their scores.cluster_distribution_score
: Scoring function which scores the resulting cluster distribution accross classes. A more even distribution indicates a higher score.
Evan Harris
This project is licensed under the MIT License - see the LICENSE file for details