Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: feature selector #17

Merged
merged 4 commits into from
Jun 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ How can you use it?
smith forge
```

🚧 As a TUI (terminal user interface): Working in progress!
🚧 As a TUI (terminal user interface): Work in progress!

All these tools will prompt a series of questions regarding the estimator you want to create, and then it will generate the boilerplate code for you.

Expand Down Expand Up @@ -66,6 +66,17 @@ and it should be compatible with scikit-learn Pipeline, GridSearchCV, etc.
Scikit-learn documentation on how to
[develop estimators](https://scikit-learn.org/dev/developers/develop.html#developing-scikit-learn-estimators).

## Supported estimators

The following types of scikit-learn estimator are supported:

- Classifier
- Regressor
- Transformer
- Feature Selector
- Outlier Detector
- Clusterer

## Installation

sklearn-smithy is available on [pypi](https://pypi.org/project/sklearn-smithy), so you can install it directly from there:
Expand Down
13 changes: 12 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,21 @@ How can you use it?
smith forge
```

- [ ] As a TUI (terminal user interface): [Working in progress](https://github.com/FBruzzesi/sklearn-smithy/issues/1)!
- [ ] As a TUI (terminal user interface): [Work in progress](https://github.com/FBruzzesi/sklearn-smithy/issues/1)!

All these tools will prompt a series of questions regarding the estimator you want to create, and then it will generate the boilerplate code for you.

## Supported estimators

The following types of scikit-learn estimator are supported:

- Classifier
- Regressor
- Transformer
- Feature Selector
- Outlier Detector
- Clusterer

## Origin story

The idea for this tool originated from [scikit-lego #660](https://github.com/koaning/scikit-lego/pull/660){:target="_blank"}, which I cannot better explain than quoting the PR description itself:
Expand Down
4 changes: 2 additions & 2 deletions docs/user-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,14 @@ Let's see an example of how to use `smith forge` command:
```console
$ <font color="#4E9A06">smith</font> forge
# 🐍 How would you like to name the estimator?:$ MightyClassifier
# 🎯 Which kind of estimator is it? (classifier, outlier, regressor, transformer, cluster):$ classifier
# 🎯 Which kind of estimator is it? (classifier, outlier, regressor, transformer, cluster, feature-selector):$ classifier
# 📜 Please list the required parameters (comma-separated) []:$ alpha,beta
# 📑 Please list the optional parameters (comma-separated) []:$ mu,sigma
# 📶 Does the `.fit()` method support `sample_weight`? [y/N]:$ y
# 📏 Is the estimator linear? [y/N]:$ N
# 🎲 Should the estimator implement a `predict_proba` method? [y/N]:$ N
# ❓ Should the estimator implement a `decision_function` method? [y/N]:$ y
# 🧪 We are almost there... Is there any tag you want to add? (comma-separated) []:$ binary_only
# 🧪 We are almost there... Is there any tag you want to add? (comma-separated) []:$ binary_only,non_deterministic
# 📂 Where would you like to save the class? [mightyclassifier.py]:$ path/to/file.py
<span style="color: green; font-weight: bold;">Template forged at path/to/file.py </span>
```
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "hatchling.build"

[project]
name = "sklearn-smithy"
version = "0.0.9"
version = "0.0.10"
description = "Toolkit to forge scikit-learn compatible estimators."
requires-python = ">=3.10"

Expand Down
1 change: 1 addition & 0 deletions sksmithy/_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ class EstimatorType(str, Enum):
RegressorMixin = "regressor"
TransformerMixin = "transformer"
ClusterMixin = "cluster"
SelectorMixin = "feature-selector"


class TagType(str, Enum):
Expand Down
2 changes: 1 addition & 1 deletion sksmithy/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@
estimator = st.selectbox(
label=PROMPT_ESTIMATOR,
options=tuple(e.value for e in EstimatorType),
format_func=lambda x: x.capitalize(),
format_func=lambda v: " ".join(x.capitalize() for x in v.split("-")),
index=None,
key="estimator",
)
Expand Down
30 changes: 27 additions & 3 deletions sksmithy/template.py.jinja
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{%- if estimator_type=='classifier' %}
{%- if estimator_type in ('classifier', 'feature-selector') %}
import numpy as np
{% endif -%}
{%- if estimator_type == 'classifier' and linear %}
Expand All @@ -7,6 +7,9 @@
{% elif estimator_type == 'regressor' and linear%}
from sklearn.base import {{ mixin }}
from sklearn.linear_model._base import LinearModel
{% elif estimator_type == 'feature-selector'%}
from sklearn.base import BaseEstimator
from sklearn.feature_selection import SelectorMixin
{% else %}
from sklearn.base import BaseEstimator, {{ mixin }}
{% endif -%}
Expand Down Expand Up @@ -56,7 +59,7 @@
{% endfor -%}
{% endif %}

def fit(self, X, y{% if estimator_type == 'transformer' %}=None{% endif %}{% if sample_weight %}, sample_weight=None{% endif %}):
def fit(self, X, y{% if estimator_type in ('transformer', 'feature-selector') %}=None{% endif %}{% if sample_weight %}, sample_weight=None{% endif %}):
"""
Fit {{name}} estimator.

Expand All @@ -82,7 +85,7 @@
self : {{name}}
Fitted {{name}} estimator.
"""
{%- if estimator_type == 'transformer' %}
{%- if estimator_type in ('transformer', 'feature-selector') %}
X = check_array(X, ...) #TODO: Fill in `check_array` arguments
{% else %}
X, y = check_X_y(X, y, ...) #TODO: Fill in `check_X_y` arguments
Expand All @@ -105,6 +108,13 @@
{% if 'max_iter' in parameters -%}self.n_iter_ = ...{%- endif %}
{% if estimator_type=='outlier' -%}self.offset_ = ...{%- endif %}
{% if estimator_type=='cluster' -%}self.labels_ = ...{%- endif %}
{% if estimator_type=='feature-selector'%}
self.selected_features_ = ... # TODO: Indexes of selected features
self.support_ = np.isin(
np.arange(0, self.n_features_in_), # all_features

Check warning on line 114 in sksmithy/template.py.jinja

View workflow job for this annotation

GitHub Actions / run-typos

"arange" should be "arrange".
self.selected_features_
)
{%- endif %}

return self

Expand Down Expand Up @@ -255,6 +265,20 @@
return X_ts
{%- endif %}

{% if estimator_type=='feature-selector' -%}
def _get_support_mask(self, X):
"""Get the boolean mask indicating which features are selected.

Returns
-------
support : boolean array of shape [# input features]
An element is True iff its corresponding feature is selected for retention.
"""

check_is_fitted(self)
return self.support_
{%- endif %}

{% if tags %}
def _more_tags(self):
return {
Expand Down
24 changes: 23 additions & 1 deletion tests/test_render.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ def test_common_estimator(name: str, estimator: EstimatorType, sample_weight: bo
assert ("sample_weight = _check_sample_weight(sample_weight)" in result) == sample_weight

match estimator:
case EstimatorType.TransformerMixin:
case EstimatorType.TransformerMixin | EstimatorType.SelectorMixin:
assert "X = check_array(X, ...)" in result
assert ("def fit(self, X, y=None, sample_weight=None)" in result) == (sample_weight)
assert ("def fit(self, X, y=None)" in result) == (not sample_weight)
Expand Down Expand Up @@ -191,6 +191,28 @@ def test_transformer(name: str) -> None:
assert "def predict(self, X)" not in result


def test_feature_selector(name: str) -> None:
"""Tests transformer specific rendering."""
estimator_type = EstimatorType.SelectorMixin

result = render_template(
name=name,
estimator_type=estimator_type,
required=[],
optional=[],
sample_weight=False,
linear=False,
predict_proba=False,
decision_function=False,
tags=None,
)
# Transformer specific
assert "class MightyEstimator(SelectorMixin, BaseEstimator)" in result
assert "def _get_support_mask(self, X)" in result
assert "self.support_" in result
assert "def predict(self, X)" not in result


def test_cluster(name: str) -> None:
"""Tests cluster specific rendering."""
estimator_type = EstimatorType.ClusterMixin
Expand Down
Loading