- [BUG] Fix errors generated when updating dependencies with different naming for arguments
- [BUG] replace np.Inf by np.inf for compatibility purpose
- [BUG] corrected the column names for the GrootCV scheme, setting the shadow var in last position to guarantee the real names are used
- [ENHANCEMENT] support user defined cross-validation scheme for time series applications for GrootCV
- [BUG] fix the calculation of the SHAP feature importance for multi-class
- [ENHANCEMENT] Update pandas aggregation to get rid of the future deprecation warnings
- [BUG] fix the calculation of the SHAP feature importance for multi-class
- [ENHANCEMENT] return the feature for the importance
- [BUG] add axis=1 to compute the max on the right dimension in _reduce_vars_sklearn
- [BUG] remove merge causing duplication of the feature importance in _reduce_vars_sklearn
- [BUG] change the default of the weighted correlation for consistency with existing doc
- [ENHANCEMENTS] speedup the correlation feature selector
- [BUG] add copy() to prevent modifying the input pandas DF in the mrmr when fitting the mrmr selector
- [BUG] fix the collinearity feature elimination
- [BUG] fix the feature importance if fasttreeshap not installed
- [REFACTORING] refactor the association module for removing redundancy and faster computation
- [BUG] fix the hardcoded threshold in collinearity elimination, closes #33
- [BUG] fix a bug in computing the association matrix when a single column of a specific dtype is passed in the sub_matrix (nom-nom, num-num) calculators.
- Refactor TreeDiscretizer
- Add a mechanism to the TreeDiscretizer that restricts the length of combined strings for categorical columns, preventing excessively lengthy entries.
- implement link for the lasso feature selection, e.g. log for ensuring positivity
- downgrade the lightgbm version to 3.3.1 for compatibility reasons (with optuna for instance)
- Fix: strictly greater than threshold rather than geq in the base threshold transformer
- Update: due to a change in the lightgbm train API (v4), update the code for GBM
- Documentation: fix the format of some docstrings and remove old sphinx generated files
- Fix: remove unnecessary
__all__
in the preprocessing module and improve the consistency of the module docstrings
- Fix: when the L1 == 0 in fit_regularized, statsmodels returns the regularized wrapper without refit, which breaks the class (statistics not available)
- Build: remove explicit dependencies on holoviews and panel
- Add fasttreeshap implementation as an option to compute shap importance (fasttreeshap does not work with XGBoost though)
- New feature: lasso feature selection, especially useful for models without interactions (LM, GLM, GAM)
- New feature: pass lightgbm parameters to GrootCV
- Bug: fix sample weight shape in mrMR
- Documentation: update and upgrade tuto NB
- update the required python version >= 3.9
- Change tqdm to auto for better rendering in NB for variable importance selector
- User defined n_jobs for association matrix computation
- Corrected an issue in Leshy that occurred when using categorical variables. The use of NumPy functions and methods instead of Pandas ones resulted in the modification of original data types.
- Patch preventing zero division in the conditional entropy calculation
- Return self in mrmr, fixing error when in scikit-learn pipeline
- Patching classes where old unused argument was causing an error
- Distribute a toy dataset for regression by modifying the Boston dataset adding noise and made up columns
- Fix pkg data distribution
- Parallelization of functions applied on pandas data frame
- Faster and more modular association measures
- Removing dependencies (e.g. dython)
- Better static and interactive visualization
- Sklearn selectors rather than a big class
- Discretization of continuous and categorical predictors
- Minimal redundancy maximal relevance feature selection added (a subset of all relevant predictors), based on Uber's MRmr flavor
- architecture closer to the scikit-learn one
- Fix bug when compute shap importance for classifier in GrootCV
- Add defensive check if no categorical found in the subsampling of the dataset
- Re-run the notebooks with the new version
- Fix clustering when plotting only strongly correlated predictors
- Remove palettable dependencies for plotting
- Add default colormap but implement the user defined option
- Enable clustering before plotting the correlation/association matrix, optional
- Decrease fontsize for the lables of the correlation matrix
- Update requirements
- Upgrade documentation
- Fix typo for distributing the dataset and pinned the dependencies
- Update the syntax for computing associations using the latest version of dython
- Fix the Boruta_py feature counts, now adds up to n_features
- Fix the boxplot colours, when only rejected and accepted (no tentative) the background color was the tentative color
- Numpy docstring style
- Implement the new lightGBM callbacks. The new lgbm version (>3.3.0) implements the early stopping using a callback rather than an argument
- Fix a bug for computing the shap importance when the estimator is lightGBM and the task is classification
- Add ranking and absolute ranking attributes for all the classes
- Fix future pandas TypeError when computing numerical values on a dataframe containing non-numerical columns
- Add housing data to the distribution
- Add "extreme" sampling methods
- Re-run the NBs
- reindex to keep the original columns order
- Update syntax to stick to the new argument names in Dython
- Check if no feature selected, warn rather than throw error
- Fix a bug when removing collinear columns
- Prefilters now support the filtering of continuous and nominal (categorical) collinear variables
- improve the plot_y_vs_X function
- remove gc.collect()
- fix readme (typos)
- move utilities in utils sub-package
- make unit tests lighter
- fix bug when using catboost, clone estimator (avoid error and be sure to use a non-fitted estimator)
- change the defaut for categorical encoding in pre-filters (pd.cat to integers as default)
- fix the unit tests with new defaults and names
- change arguments name in pre-filters
- remove old attribute names in unit-tests
- Fix lightGBM warnings
- Typo in repr
- Provide load_data utility
- Enhance jupyter NB examples
- highlighting synthetic random predictors
- Benchmark using sklearn permutation importance
- Harmonization of the attributes and parameters
- Fix categoricals handling
- setting optimal number of features (according to "Elements of statistical learning") when using lightGBM random forest boosting.
- Providing random forest, lightgbm implementation, estimators
- Adding examples and expanding documentation
- fix bug: relative import removed