Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Improving Flexibility and Usability for Visualization, Results Output, and Statistical Adjustments #137

Open
xuxu-wei opened this issue Dec 4, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@xuxu-wei
Copy link

xuxu-wei commented Dec 4, 2024

Background

First, I want to acknowledge that borutashap provides a more flexible and detailed feature selection process compared to boruta_py, especially with its plot and results_to_csv methods. These tools are useful for understanding and interpreting results. However, I believe there are opportunities to enhance usability and customization further, especially for users in interactive or exploratory environments.

Notably, while borutashap currently employs Bonferroni correction for multiple hypothesis testing, the boruta_py library includes support for the Benjamini-Hochberg FDR (False Discovery Rate) correction. Introducing this as an option in borutashap could further improve its utility and flexibility.

Proposed Enhancements

1. Improving the plot Method

  • Current Issue:

    • The plot lacks legends and explanations, making it less intuitive for users unfamiliar with the visual representation.
    • The plot method does not return matplotlib objects (e.g., plt.figure or ax), limiting users' ability to customize the resulting figure or embed it into larger visualizations.
  • Suggested Improvements:

    • Add legends and textual descriptions to the plot to help users understand what the visual elements represent.
    • Modify the plot method to:
      • Return the matplotlib.figure.Figure and/or matplotlib.axes.Axes objects.
      • Accept an optional ax parameter to allow users to draw the plot on a provided axes object, enabling seamless integration into custom visualizations.
      fig, ax = boruta_shap.plot(ax=None)  # Returns the figure and axes for further customization

2. Enhancing Results Output with a leaderboard Method

  • Current Issue:

    • The results_to_csv method directly outputs results to a file, which is less convenient for users working in interactive environments where they need to quickly inspect the results or customize them before saving.
  • Suggested Improvements:

    • Add a method (e.g., leaderboard) that returns the results as a pandas.DataFrame. This allows users to inspect, modify, or further process the results programmatically.
    • The results_to_csv method can then build upon this leaderboard functionality, ensuring consistency and code reuse.
    results_df = boruta_shap.leaderboard()  # Returns a DataFrame with results
    results_df.to_csv("output.csv")        # Users can decide if/when to save

3. Providing More Statistical Adjustment Options

  • Current Issue:
    Currently, borutashap implements Bonferroni correction for multiple hypothesis testing. While this is a robust method, it is highly conservative and may result in false negatives, especially when the number of features is large.
    In comparison, boruta_py offers the Benjamini-Hochberg FDR correction, which is less conservative and often more suitable for feature selection tasks.

  • Suggested Improvements:

    • Add an optional parameter to the fit method, allowing users to choose among multiple p-value adjustment methods (e.g., Bonferroni, FDR).
    • Include both raw p-values and adjusted p-values in the output, giving users the flexibility to apply their own thresholds or perform custom visualizations.
    boruta_shap.fit(p_adjust_method='fdr_bh')  # Allow user to select p-value correction method

4. Enhancing the Docstring

  • Current Issue:
    The BorutaShap object’s docstring does not explicitly inform users about available methods (plot, results_to_csv, etc.) for accessing or visualizing results.

  • Suggested Improvements:
    Add clear descriptions and usage examples to the class-level docstring to highlight:

    • How to visualize results using plot.
    • How to save results using results_to_csv.
    • How to programmatically access results using the proposed leaderboard method.
"""
BorutaShap is a wrapper feature selection method built on the foundations of both the SHAP and Boruta algorithms.

...

Methods:
--------
plot():
    Visualize feature selection results. Optionally, returns the matplotlib Figure and Axes for further customization.

leaderboard():
    Return feature selection results as a pandas `DataFrame` for inspection and further processing.

results_to_csv(file_path):
    Save feature selection results (`leaderboard`) to a CSV file.
"""

Benefits

  1. Enhanced Usability:
    Returning matplotlib objects allows users to seamlessly integrate borutashap plots into their workflows and modify visual elements as needed.
    Providing a DataFrame of results enables quick exploration and downstream analysis in interactive environments.

  2. Greater Flexibility:
    Supporting alternative p-value correction methods (e.g., FDR) offers more control over the trade-off between Type I and Type II errors.
    Including raw and adjusted p-values empowers advanced users to tailor their analysis to specific needs.

  3. Improved Documentation:
    Including method descriptions and usage examples in the docstring ensures users can easily discover and use the provided features, reducing potential confusion.

Additional Notes

@Ekeany I’d be happy to discuss these suggestions further or contribute code if needed. Thank you for maintaining such a fantastic tool for feature selection!

@xuxu-wei xuxu-wei added the enhancement New feature or request label Dec 4, 2024
@Ekeany
Copy link
Owner

Ekeany commented Dec 7, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants