Skip to content

Commit

Permalink
Improve documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
ClaudioSalvatoreArcidiacono committed Jul 26, 2023
1 parent d4ff5e3 commit 0982404
Show file tree
Hide file tree
Showing 5 changed files with 36 additions and 3 deletions.
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
[![pytest](https://github.com/ClaudioSalvatoreArcidiacono/felimination/workflows/Tests/badge.svg)](https://github.com/ClaudioSalvatoreArcidiacono/felimination/actions?query=workflow%3A%22Tests%22)
[![PyPI](https://img.shields.io/pypi/v/felimination)](#)
[![documentation](https://img.shields.io/badge/docs-mkdocs%20material-blue.svg?style=flat)](https://claudiosalvatorearcidiacono.github.io/felimination/)


# felimination

This library contains some useful scikit-learn compatible classes for feature selection.

## [Check out documentation here](https://claudiosalvatorearcidiacono.github.io/felimination/)

## Features

- [Recursive Feature Elimination with Cross Validation using Permutation Importance](reference/RFE.md#felimination.rfe.PermutationImportanceRFECV)
Expand Down
2 changes: 2 additions & 0 deletions docs/reference/RFE.md
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
::: felimination.rfe
options:
inherited_members: true
3 changes: 3 additions & 0 deletions docs/reference/drift.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
::: felimination.drift
options:
inherited_members: true
29 changes: 28 additions & 1 deletion felimination/drift.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,31 @@
"""Module with tools to perform drift-based feature selection.
"""The idea behind this module comes from the conjunction of two concepts:
- [1] [Classifier Two-Sample Test](https://arxiv.org/abs/1610.06545)
- [2] [Recursive Feature Elimination](\
https://scikit-learn.org/stable/modules/generated/\
sklearn.feature_selection.RFE.html)
In [1] classifier performances are used to determine how similar two samples are. More
specifically, imagine to have two samples: `reference` and `test`. In order to assess
whether `reference` and `test` have been drawn from the same distribution, we could
train a classifier in classifying which instances belong to which sample. If the
model easily distinguishes instances from the two samples, then the two samples
have been probably drawn from two different distributions. Conversely, if the
classifier struggles to distinguish them, then it is likely that the samples have
been drawn from the same distribution.
In the context of drift detection, the classifier two-sample test can be used to
assess whether drift has happened between the reference and the test set and to
which degree.
The classes of this module take this idea one step further and attempt
to reduce the drift using recursive feature selection. After a classifier
is trained to distinguish between `reference` and `test`, the feature
importance of the classifier is used to determine which features contribute
the most in distinguishing between the two sets. The most important features
are then eliminated and the procedure is repeated until the classifier is not
able anymore to distinguish between the two samples, or until a certain amount
of features has been removed.
This module contains the following classes:
- `SampleSimilarityDriftRFE`: base class for drift-based sample similarity
Expand Down
2 changes: 2 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ plugins:
python:
options:
docstring_style: numpy
import:
- https://scikit-learn.org/stable/objects.inv

markdown_extensions:
- pymdownx.highlight:
Expand Down

0 comments on commit 0982404

Please sign in to comment.