-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Improving Flexibility and Usability for Visualization, Results Output, and Statistical Adjustments #137
Labels
enhancement
New feature or request
Comments
Hi Wei,
All of those features sound great I don't have much time on my hands these
days with work.
But if you want to implement them and make a pull request that would be
fantastic.
Thanks,
Eoghan.
…On Wed 4 Dec 2024, 00:19 Wei Xuxu, ***@***.***> wrote:
Background
First, I want to acknowledge that borutashap provides a more flexible and
detailed feature selection process compared to boruta_py, especially with
its plot and results_to_csv methods. These tools are useful for
understanding and interpreting results. However, I believe there are
opportunities to enhance usability and customization further, especially
for users in interactive or exploratory environments.
Notably, while borutashap currently employs Bonferroni correction for
multiple hypothesis testing, the boruta_py library includes support for
the Benjamini-Hochberg FDR (False Discovery Rate) correction. Introducing
this as an option in borutashap could further improve its utility and
flexibility.
Proposed Enhancements 1. Improving the plot Method
-
Current Issue:
- The plot lacks legends and explanations, making it less intuitive
for users unfamiliar with the visual representation.
- The plot method does not return matplotlib objects (e.g.,
plt.figure or ax), limiting users' ability to customize the
resulting figure or embed it into larger visualizations.
-
Suggested Improvements:
- Add legends and textual descriptions to the plot to help users
understand what the visual elements represent.
- Modify the plot method to:
- Return the matplotlib.figure.Figure and/or
matplotlib.axes.Axes objects.
- Accept an optional ax parameter to allow users to draw the
plot on a provided axes object, enabling seamless integration into custom
visualizations.
fig, ax = boruta_shap.plot(ax=None) # Returns the figure and axes for further customization
2. Enhancing Results Output with a leaderboard Method
-
Current Issue:
- The results_to_csv method directly outputs results to a file, which
is less convenient for users working in interactive environments where they
need to quickly inspect the results or customize them before saving.
-
Suggested Improvements:
- Add a method (e.g., leaderboard) that returns the results as a
pandas.DataFrame. This allows users to inspect, modify, or further
process the results programmatically.
- The results_to_csv method can then build upon this leaderboard
functionality, ensuring consistency and code reuse.
results_df = boruta_shap.leaderboard() # Returns a DataFrame with resultsresults_df.to_csv("output.csv") # Users can decide if/when to save
3. Providing More Statistical Adjustment Options
-
Current Issue:
Currently, borutashap implements Bonferroni correction for multiple
hypothesis testing. While this is a robust method, it is highly
conservative and may result in false negatives, especially when the number
of features is large.
In comparison, boruta_py offers the Benjamini-Hochberg FDR correction,
which is less conservative and often more suitable for feature selection
tasks.
-
Suggested Improvements:
- Add an optional parameter to the fit method, allowing users to
choose among multiple p-value adjustment methods (e.g., Bonferroni, FDR).
- Include both raw p-values and adjusted p-values in the output,
giving users the flexibility to apply their own thresholds or perform
custom visualizations.
boruta_shap.fit(p_adjust_method='fdr_bh') # Allow user to select p-value correction method
4. Enhancing the Docstring
-
Current Issue:
The BorutaShap object’s docstring does not explicitly inform users
about available methods (plot, results_to_csv, etc.) for accessing or
visualizing results.
-
Suggested Improvements:
Add clear descriptions and usage examples to the class-level docstring
to highlight:
- How to visualize results using plot.
- How to save results using results_to_csv.
- How to programmatically access results using the proposed
leaderboard method.
"""BorutaShap is a wrapper feature selection method built on the foundations of both the SHAP and Boruta algorithms....Methods:--------plot(): Visualize feature selection results. Optionally, returns the matplotlib Figure and Axes for further customization.leaderboard(): Return feature selection results as a pandas `DataFrame` for inspection and further processing.results_to_csv(file_path): Save feature selection results (`leaderboard`) to a CSV file."""
Benefits
1.
Enhanced Usability:
Returning matplotlib objects allows users to seamlessly integrate
borutashap plots into their workflows and modify visual elements as needed.
Providing a DataFrame of results enables quick exploration and
downstream analysis in interactive environments.
2.
Greater Flexibility:
Supporting alternative p-value correction methods (e.g., FDR) offers
more control over the trade-off between Type I and Type II errors.
Including raw and adjusted p-values empowers advanced users to tailor
their analysis to specific needs.
3.
Improved Documentation:
Including method descriptions and usage examples in the docstring
ensures users can easily discover and use the provided features, reducing
potential confusion.
Additional Notes
@Ekeany <https://github.com/Ekeany> I’d be happy to discuss these
suggestions further or contribute code if needed. Thank you for maintaining
such a fantastic tool for feature selection!
—
Reply to this email directly, view it on GitHub
<#137>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMDEERXD4KRO5U26JEETWYD2DZDBNAVCNFSM6AAAAABS65OBJGVHI2DSMVQWIX3LMV43ASLTON2WKOZSG4YTMMRWGUZTIOA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Background
First, I want to acknowledge that
borutashap
provides a more flexible and detailed feature selection process compared toboruta_py
, especially with itsplot
andresults_to_csv
methods. These tools are useful for understanding and interpreting results. However, I believe there are opportunities to enhance usability and customization further, especially for users in interactive or exploratory environments.Notably, while borutashap currently employs Bonferroni correction for multiple hypothesis testing, the
boruta_py
library includes support for the Benjamini-Hochberg FDR (False Discovery Rate) correction. Introducing this as an option inborutashap
could further improve its utility and flexibility.Proposed Enhancements
1. Improving the plot Method
Current Issue:
plot
method does not return matplotlib objects (e.g.,plt.figure
orax
), limiting users' ability to customize the resulting figure or embed it into larger visualizations.Suggested Improvements:
plot
method to:2. Enhancing Results Output with a leaderboard Method
Current Issue:
results_to_csv
method directly outputs results to a file, which is less convenient for users working in interactive environments where they need to quickly inspect the results or customize them before saving.Suggested Improvements:
leaderboard
) that returns the results as apandas.DataFrame
. This allows users to inspect, modify, or further process the results programmatically.results_to_csv
method can then build upon thisleaderboard
functionality, ensuring consistency and code reuse.3. Providing More Statistical Adjustment Options
Current Issue:
Currently, borutashap implements Bonferroni correction for multiple hypothesis testing. While this is a robust method, it is highly conservative and may result in false negatives, especially when the number of features is large.
In comparison, boruta_py offers the Benjamini-Hochberg FDR correction, which is less conservative and often more suitable for feature selection tasks.
Suggested Improvements:
4. Enhancing the Docstring
Current Issue:
The BorutaShap object’s docstring does not explicitly inform users about available methods (plot, results_to_csv, etc.) for accessing or visualizing results.
Suggested Improvements:
Add clear descriptions and usage examples to the class-level docstring to highlight:
Benefits
Enhanced Usability:
Returning matplotlib objects allows users to seamlessly integrate borutashap plots into their workflows and modify visual elements as needed.
Providing a DataFrame of results enables quick exploration and downstream analysis in interactive environments.
Greater Flexibility:
Supporting alternative p-value correction methods (e.g., FDR) offers more control over the trade-off between Type I and Type II errors.
Including raw and adjusted p-values empowers advanced users to tailor their analysis to specific needs.
Improved Documentation:
Including method descriptions and usage examples in the docstring ensures users can easily discover and use the provided features, reducing potential confusion.
Additional Notes
@Ekeany I’d be happy to discuss these suggestions further or contribute code if needed. Thank you for maintaining such a fantastic tool for feature selection!
The text was updated successfully, but these errors were encountered: