-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
GPU support - Installation extras of CuPy and Implementation of Cuda …
…kernels (#41) * GPU support - Installation extras of CuPy * cupy dependencies * bump hisel version * Cuda kernels * Examples and profilers * Error 137 in tests * Error 137 in tests * Files for README
- Loading branch information
1 parent
e36a562
commit 76142ba
Showing
17 changed files
with
1,287 additions
and
381 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,25 +1,126 @@ | ||
# hisel | ||
Feature selection tool based on Hilbert-Schmidt Independence Criterion | ||
# HISEL | ||
## Feature selection tool based on Hilbert-Schmidt Independence Criterion | ||
Feature selection is | ||
the machine learning | ||
task | ||
of selecting from a data set | ||
the features | ||
that are relevant | ||
for the prediction of a given target. | ||
The `hisel` package | ||
provides feature selection methods | ||
based on | ||
Hilbert-Schmidt Independence Criterion. | ||
In particular, | ||
it provides an implementation of the HSIC Lasso algorithm of | ||
[Yamada, M. et al. (2012)](https://arxiv.org/abs/1202.0515). | ||
|
||
## Why is `hisel` cool? | ||
|
||
#### `hisel` is accurate | ||
HSIC Lasso is an excellent algorihtm for feature selection. | ||
This makes `hisel` an accurate tool in your machine learning modelling. | ||
Moreover, | ||
`hisel` implements clever routines | ||
that address common causes of poor accuracy in other feature selection methods. | ||
|
||
Examples of where `hisel` outperforms the methods in | ||
[sklearn.feature\_selection](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection) | ||
are given in the notebooks | ||
`ensemble-example.ipynb` | ||
and | ||
`nonlinear-transform.ipynb`. | ||
|
||
|
||
#### `hisel` is fast | ||
A crucial step in the HSIC Lasso algorithm | ||
is the computation of | ||
certain Gram matrices. | ||
`hisel` implemets such computations | ||
in a highly vectorised and performant way. | ||
Moreover, | ||
`hisel` allows you to | ||
accelerate these computations | ||
using a GPU. | ||
The image below shows | ||
the average run time | ||
of the computations | ||
of Gram matrices | ||
via | ||
`hisel` on CPU, | ||
via | ||
`hisel` on GPU, | ||
and | ||
via | ||
[pyHSICLasso](https://pypi.org/project/pyHSICLasso/). | ||
|
||
![gramtimes](gramtimes.png) | ||
|
||
|
||
#### `hisel` has a friendly user interface | ||
|
||
Getting started with `hisel` is as straightforward as the following code snippet: | ||
``` | ||
>>> import pandas as pd | ||
>>> import hisel | ||
>>> df = pd.read_csv('mydata.csv') | ||
>>> xdf = df.iloc[:, :-1] | ||
>>> yser = df.iloc[:, -1] | ||
>>> hisel.feature_selection.select_features(xdf, yser) | ||
['d2', 'd7', 'c3', 'c10', 'c12', 'c24', 'c22', 'c21', 'c5'] | ||
``` | ||
If you are not interested in more details, | ||
please read no further. | ||
If you would like to | ||
explore more about | ||
how to tune the hyper-parameters used by `hisel` | ||
or | ||
how to have more advanced control on `hisel`'s selection, | ||
please browse the examples in | ||
[examples/](https://github.com/transferwise/hisel/tree/trunk/examples) | ||
and in | ||
[notebooks](https://github.com/transferwise/hisel/tree/trunk/notebooks). | ||
|
||
This package provides an implementtion of the HSIC Lasso of [Yamada, M. et al. (2012)](https://arxiv.org/abs/1202.0515). | ||
|
||
Usage is demontrated in the notebooks and in the scripts available under `examples/`. | ||
|
||
|
||
## Installation | ||
|
||
### Install via `pip` | ||
|
||
The package `hisel` is available from `arti`. You can install it via `pip`. | ||
While on the Wise-VPN, in the environment where you intende to sue `hisel`, just do | ||
``` | ||
pip install hisel --index-url=https://arti.tw.ee/artifactory/api/pypi/pypi-virtual/simple | ||
``` | ||
|
||
### Install from source | ||
|
||
#### Basic installation: | ||
Checkout the repo and navigate to the root directory. Then, | ||
``` | ||
poetry install | ||
``` | ||
|
||
|
||
#### Installation with GPU support | ||
You need to have cuda-toolkit installed and you need to know its version. | ||
To know that, you can do | ||
``` | ||
nvidia-smi | ||
``` | ||
and read the cuda version from the top right corner of the table that is printed out. | ||
Once you know your version of `cuda`, do | ||
``` | ||
poetry install -E cudaXXX | ||
``` | ||
where `cudaXXX` is one of the following: | ||
`cuda102` if you have version 10.2; | ||
`cuda110` if you have version 11.0; | ||
`cuda111` if you have version 11.1; | ||
`cuda11x` if you have version 11.2 - 11.8; | ||
`cuda12x` if you have version 12.x. | ||
This aligns to the [installation guide of CuPy](https://docs.cupy.dev/en/stable/install.html#installing-cupy). | ||
|
||
|
||
## Why is this cool? | ||
|
||
Examples of where `hisel` outperforms the methods in | ||
[sklearn.feature\_selection](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection) | ||
are given in the notebooks | ||
`ensemble-example.ipynb` | ||
and | ||
`nonlinear-trasnform.ipynb`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
import pandas as pd | ||
import hisel | ||
|
||
|
||
def main(): | ||
# Minimial example of `hisel` usage with specification of parameters | ||
df = pd.read_csv('mydata.csv') | ||
xdf = df.iloc[:, :-1] | ||
yser = df.iloc[:, -1] | ||
categorical_search_parameters = hisel.feature_selection.SearchParameters( | ||
num_permutations=1, | ||
im_ratio=.03, | ||
max_iter=2, | ||
parallel=True, | ||
random_state=None, | ||
) | ||
hsiclasso_parameters = hisel.feature_selection.HSICLassoParameters( | ||
mi_threshold=.00001, | ||
hsic_threshold=0.005, | ||
batch_size=5000, | ||
minibatch_size=500, | ||
number_of_epochs=3, | ||
use_preselection=True, | ||
device=hisel.kernels.Device.CPU # if cuda is available you can pass GPU | ||
) | ||
results = hisel.feature_selection.select_features( | ||
xdf, yser, hsiclasso_parameters, categorical_search_parameters) | ||
print('\n\n##########################################################') | ||
print( | ||
f'The following features are relevant for the prediction of {yser.name}:') | ||
print(f'{results.selected_features}') | ||
|
||
|
||
if __name__ == '__main__': | ||
main() |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.