[ENH] Improving Flexibility and Usability for Visualization, Results Output, and Statistical Adjustments #137

xuxu-wei · 2024-12-04T00:15:29Z

Background

First, I want to acknowledge that borutashap provides a more flexible and detailed feature selection process compared to boruta_py, especially with its plot and results_to_csv methods. These tools are useful for understanding and interpreting results. However, I believe there are opportunities to enhance usability and customization further, especially for users in interactive or exploratory environments.

Notably, while borutashap currently employs Bonferroni correction for multiple hypothesis testing, the boruta_py library includes support for the Benjamini-Hochberg FDR (False Discovery Rate) correction. Introducing this as an option in borutashap could further improve its utility and flexibility.

Proposed Enhancements

1. Improving the plot Method

Current Issue:
- The plot lacks legends and explanations, making it less intuitive for users unfamiliar with the visual representation.
- The plot method does not return matplotlib objects (e.g., plt.figure or ax), limiting users' ability to customize the resulting figure or embed it into larger visualizations.
Suggested Improvements:
- Add legends and textual descriptions to the plot to help users understand what the visual elements represent.
- Modify the plot method to:
  - Return the matplotlib.figure.Figure and/or matplotlib.axes.Axes objects.
  - Accept an optional ax parameter to allow users to draw the plot on a provided axes object, enabling seamless integration into custom visualizations.
```
fig, ax = boruta_shap.plot(ax=None)  # Returns the figure and axes for further customization
```

2. Enhancing Results Output with a leaderboard Method

Current Issue:
- The results_to_csv method directly outputs results to a file, which is less convenient for users working in interactive environments where they need to quickly inspect the results or customize them before saving.
Suggested Improvements:
- Add a method (e.g., leaderboard) that returns the results as a pandas.DataFrame. This allows users to inspect, modify, or further process the results programmatically.
- The results_to_csv method can then build upon this leaderboard functionality, ensuring consistency and code reuse.
```
results_df = boruta_shap.leaderboard()  # Returns a DataFrame with results
results_df.to_csv("output.csv")        # Users can decide if/when to save
```

3. Providing More Statistical Adjustment Options

Current Issue:
Currently, borutashap implements Bonferroni correction for multiple hypothesis testing. While this is a robust method, it is highly conservative and may result in false negatives, especially when the number of features is large.
In comparison, boruta_py offers the Benjamini-Hochberg FDR correction, which is less conservative and often more suitable for feature selection tasks.
Suggested Improvements:
- Add an optional parameter to the fit method, allowing users to choose among multiple p-value adjustment methods (e.g., Bonferroni, FDR).
- Include both raw p-values and adjusted p-values in the output, giving users the flexibility to apply their own thresholds or perform custom visualizations.
```
boruta_shap.fit(p_adjust_method='fdr_bh')  # Allow user to select p-value correction method
```

4. Enhancing the Docstring

Current Issue:
The BorutaShap object’s docstring does not explicitly inform users about available methods (plot, results_to_csv, etc.) for accessing or visualizing results.
Suggested Improvements:
Add clear descriptions and usage examples to the class-level docstring to highlight:
- How to visualize results using plot.
- How to save results using results_to_csv.
- How to programmatically access results using the proposed leaderboard method.

"""
BorutaShap is a wrapper feature selection method built on the foundations of both the SHAP and Boruta algorithms.

...

Methods:
--------
plot():
    Visualize feature selection results. Optionally, returns the matplotlib Figure and Axes for further customization.

leaderboard():
    Return feature selection results as a pandas `DataFrame` for inspection and further processing.

results_to_csv(file_path):
    Save feature selection results (`leaderboard`) to a CSV file.
"""

Benefits

Enhanced Usability:
Returning matplotlib objects allows users to seamlessly integrate borutashap plots into their workflows and modify visual elements as needed.
Providing a DataFrame of results enables quick exploration and downstream analysis in interactive environments.
Greater Flexibility:
Supporting alternative p-value correction methods (e.g., FDR) offers more control over the trade-off between Type I and Type II errors.
Including raw and adjusted p-values empowers advanced users to tailor their analysis to specific needs.
Improved Documentation:
Including method descriptions and usage examples in the docstring ensures users can easily discover and use the provided features, reducing potential confusion.

Additional Notes

@Ekeany I’d be happy to discuss these suggestions further or contribute code if needed. Thank you for maintaining such a fantastic tool for feature selection!

The text was updated successfully, but these errors were encountered:

Ekeany · 2024-12-07T13:53:56Z

Hi Wei, All of those features sound great I don't have much time on my hands these days with work. But if you want to implement them and make a pull request that would be fantastic. Thanks, Eoghan.

…

On Wed 4 Dec 2024, 00:19 Wei Xuxu, ***@***.***> wrote: Background First, I want to acknowledge that borutashap provides a more flexible and detailed feature selection process compared to boruta_py, especially with its plot and results_to_csv methods. These tools are useful for understanding and interpreting results. However, I believe there are opportunities to enhance usability and customization further, especially for users in interactive or exploratory environments. Notably, while borutashap currently employs Bonferroni correction for multiple hypothesis testing, the boruta_py library includes support for the Benjamini-Hochberg FDR (False Discovery Rate) correction. Introducing this as an option in borutashap could further improve its utility and flexibility. Proposed Enhancements 1. Improving the plot Method - Current Issue: - The plot lacks legends and explanations, making it less intuitive for users unfamiliar with the visual representation. - The plot method does not return matplotlib objects (e.g., plt.figure or ax), limiting users' ability to customize the resulting figure or embed it into larger visualizations. - Suggested Improvements: - Add legends and textual descriptions to the plot to help users understand what the visual elements represent. - Modify the plot method to: - Return the matplotlib.figure.Figure and/or matplotlib.axes.Axes objects. - Accept an optional ax parameter to allow users to draw the plot on a provided axes object, enabling seamless integration into custom visualizations. fig, ax = boruta_shap.plot(ax=None) # Returns the figure and axes for further customization 2. Enhancing Results Output with a leaderboard Method - Current Issue: - The results_to_csv method directly outputs results to a file, which is less convenient for users working in interactive environments where they need to quickly inspect the results or customize them before saving. - Suggested Improvements: - Add a method (e.g., leaderboard) that returns the results as a pandas.DataFrame. This allows users to inspect, modify, or further process the results programmatically. - The results_to_csv method can then build upon this leaderboard functionality, ensuring consistency and code reuse. results_df = boruta_shap.leaderboard() # Returns a DataFrame with resultsresults_df.to_csv("output.csv") # Users can decide if/when to save 3. Providing More Statistical Adjustment Options - Current Issue: Currently, borutashap implements Bonferroni correction for multiple hypothesis testing. While this is a robust method, it is highly conservative and may result in false negatives, especially when the number of features is large. In comparison, boruta_py offers the Benjamini-Hochberg FDR correction, which is less conservative and often more suitable for feature selection tasks. - Suggested Improvements: - Add an optional parameter to the fit method, allowing users to choose among multiple p-value adjustment methods (e.g., Bonferroni, FDR). - Include both raw p-values and adjusted p-values in the output, giving users the flexibility to apply their own thresholds or perform custom visualizations. boruta_shap.fit(p_adjust_method='fdr_bh') # Allow user to select p-value correction method 4. Enhancing the Docstring - Current Issue: The BorutaShap object’s docstring does not explicitly inform users about available methods (plot, results_to_csv, etc.) for accessing or visualizing results. - Suggested Improvements: Add clear descriptions and usage examples to the class-level docstring to highlight: - How to visualize results using plot. - How to save results using results_to_csv. - How to programmatically access results using the proposed leaderboard method. """BorutaShap is a wrapper feature selection method built on the foundations of both the SHAP and Boruta algorithms....Methods:--------plot(): Visualize feature selection results. Optionally, returns the matplotlib Figure and Axes for further customization.leaderboard(): Return feature selection results as a pandas `DataFrame` for inspection and further processing.results_to_csv(file_path): Save feature selection results (`leaderboard`) to a CSV file.""" Benefits 1. Enhanced Usability: Returning matplotlib objects allows users to seamlessly integrate borutashap plots into their workflows and modify visual elements as needed. Providing a DataFrame of results enables quick exploration and downstream analysis in interactive environments. 2. Greater Flexibility: Supporting alternative p-value correction methods (e.g., FDR) offers more control over the trade-off between Type I and Type II errors. Including raw and adjusted p-values empowers advanced users to tailor their analysis to specific needs. 3. Improved Documentation: Including method descriptions and usage examples in the docstring ensures users can easily discover and use the provided features, reducing potential confusion. Additional Notes @Ekeany <https://github.com/Ekeany> I’d be happy to discuss these suggestions further or contribute code if needed. Thank you for maintaining such a fantastic tool for feature selection! — Reply to this email directly, view it on GitHub <#137>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMDEERXD4KRO5U26JEETWYD2DZDBNAVCNFSM6AAAAABS65OBJGVHI2DSMVQWIX3LMV43ASLTON2WKOZSG4YTMMRWGUZTIOA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

xuxu-wei added the enhancement New feature or request label Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Improving Flexibility and Usability for Visualization, Results Output, and Statistical Adjustments #137

[ENH] Improving Flexibility and Usability for Visualization, Results Output, and Statistical Adjustments #137

xuxu-wei commented Dec 4, 2024 •

edited

Loading

Ekeany commented Dec 7, 2024 via email

[ENH] Improving Flexibility and Usability for Visualization, Results Output, and Statistical Adjustments #137

[ENH] Improving Flexibility and Usability for Visualization, Results Output, and Statistical Adjustments #137

Comments

xuxu-wei commented Dec 4, 2024 • edited Loading

Background

Proposed Enhancements

1. Improving the plot Method

2. Enhancing Results Output with a leaderboard Method

3. Providing More Statistical Adjustment Options

4. Enhancing the Docstring

Benefits

Additional Notes

Ekeany commented Dec 7, 2024 via email

xuxu-wei commented Dec 4, 2024 •

edited

Loading