Skip to content
/ fires Public

A framework for stable online feature selection.

Notifications You must be signed in to change notification settings

haugjo/fires

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

FIRES

This repository contains an implementation of the FIRES framework that is introduced in

Johannes Haug, Martin Pawelczyk, Klaus Broelemann, and Gjergji Kasneci. 2020. Leveraging Model Inherent Variable Importance for Stable Online Feature Selection. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '20). Association for Computing Machinery, New York, NY, USA, 1478–1502. DOI:https://doi.org/10.1145/3394486.3403200

Please refer to our paper at KDD '20 when using this implementation. Archived versions of the paper can be found at ACM and arXiv.

Note: FIRES is now also included in the float evaluation framework.

Apply FIRES to Your Project

The FIRES implementation provided here uses a Probit base model to select features in binary classification settings. It can be used as follows:

import numpy as np
from skmultiflow.data import FileStream
from skmultiflow.neural_networks import PerceptronMask
from fires import FIRES
from sklearn.metrics import accuracy_score

# Load data as scikit-multiflow FileStream
# NOTE: FIRES accepts only numeric values. Please one-hot-encode or factorize string/char variables
# Additionally, we suggest users to normalize all features, e.g. by using scikit-learn's MinMaxScaler()
stream = FileStream('yourData.csv', target_idx=0)
stream.prepare_for_use()

# Initial fit of the predictive model
predictor = PerceptronMask()
x, y = stream.next_sample(batch_size=100)
predictor.partial_fit(x, y, stream.target_values)

# Initialize FIRES
fires_model = FIRES(n_total_ftr=stream.n_features,          # Total no. of features
                    target_values=stream.target_values,     # Unique target values (class labels)
                    mu_init=0,                              # Initial importance parameter
                    sigma_init=1,                           # Initial uncertainty parameter
                    penalty_s=0.01,                         # Penalty factor for the uncertainty (corresponds to gamma_s in the paper)
                    penalty_r=0.01,                         # Penalty factor for the regularization (corresponds to gamma_r in the paper)
                    epochs=1,                               # No. of epochs that we use each batch of observations to update the parameters
                    lr_mu=0.01,                             # Learning rate for the gradient update of the importance
                    lr_sigma=0.01,                          # Learning rate for the gradient update of the uncertainty
                    scale_weights=True,                     # If True, scale feature weights into the range [0,1]
                    model='probit')                         # Name of the base model to compute the likelihood

# Prequential evaluation
n_selected_ftr = 10

while stream.has_more_samples():
    # Load a new sample
    x, y = stream.next_sample(batch_size=10)

    # Select features
    ftr_weights = fires_model.weigh_features(x, y)  # Get feature weights with FIRES
    ftr_selection = np.argsort(ftr_weights)[::-1][:n_selected_ftr]

    # Truncate x (retain only selected features, 'remove' all others, e.g. by replacing them with 0)
    x_reduced = np.zeros(x.shape)
    x_reduced[:, ftr_selection] = x[:, ftr_selection]

    # Test
    y_pred = predictor.predict(x_reduced)
    print(accuracy_score(y, y_pred))

    # Train
    predictor.partial_fit(x_reduced, y)

# Restart the FileStream
stream.restart()

Use Your Own Predictive Model

To use FIRES with your own base model, you need to substitute the placeholders ### ADD YOUR OWN MODEL HERE ### in fires.py accordingly.

If you have developed a new instantiation of FIRES that is worth sharing with others, feel free to submit a pull request.

Related Works

The FIRES framework has been used in

Haug, Johannes, and Gjergji Kasneci. "Learning Parameter Distributions to Detect Concept Drift in Data Streams." 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2021. (code)

About

A framework for stable online feature selection.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages