Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is greedy selection overfitting to specific data splits? #1

Open
freezevector opened this issue Mar 3, 2025 · 0 comments
Open

Is greedy selection overfitting to specific data splits? #1

freezevector opened this issue Mar 3, 2025 · 0 comments

Comments

@freezevector
Copy link

Hi, thanks for sharing your code—its a really interesting work!

I have a question about the greedy selection strategy implemented in the code, particularly in greedy_selection.py:L138

It appears that the current implementation records only the best result from each run of the greedy selection, rather than aggregating the results across all 10 cross-validation folds. I might be missing something here, but usually, in cross-validation, we expect to aggregate performance metrics (such as TPR@1%FPR) over all folds to get a more reliable measure of the model’s generalization. In your implementation if TPR@1%FPR is zero you do not aggregate it to the final results...

Could you clarify if this approach is intentional? It seems that by only recording the best result, the evaluation might be biased toward specific data splits, and we wouldn't get a good indication of how well the greedy selection strategy works on average across different data partitions.

Thanks in advance for your response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant