Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Private leaderboard percentiles for each individual competition #19

Open
boranhan opened this issue Oct 29, 2024 · 4 comments
Open

Private leaderboard percentiles for each individual competition #19

boranhan opened this issue Oct 29, 2024 · 4 comments
Labels
question Further information is requested

Comments

@boranhan
Copy link

Hello, thank you for your team's great work.

I'm wondering if you can provide the private leaderboard percentiles for each individual competition?

Thanks in advanced!

Boran

@AnirudhDagar
Copy link

image

Yes, please can you share the scores or the private leaderboard percentile rank which were eventually used for the claimed number for outperforming 50% of humans. Are the python files in https://github.com/WecoAI/aideml/tree/main/sample_results enough to reproduce those numbers? I understand that these numbers are an average over 12 submissions.

@ZhengyaoJiang this was also requested earlier in #4 (comment)

Even sharing some raw results would be helpful to understand the performance of WecoAI. I checked OpenAI's MLE-bench but that doesn't seem to report any numbers either.

@dexhunter dexhunter added the question Further information is requested label Nov 1, 2024
@dexhunter
Copy link
Member

I think you can try to submit the best solution to Kaggle to confirm the performance/leaderboard percentile after running the aide application

@AnirudhDagar
Copy link

AnirudhDagar commented Nov 1, 2024

I tried submitting based on the provided code, it gets me extremely bad results. Running the aide application for all the competitions would also require compute resources and rather than reproducing the results, it would be more easy, and best if you could share either the code or the scores/percentiles based on your experiments.

For example for the competition https://www.kaggle.com/competitions/tabular-playground-series-apr-2021/ by using the code, I made this submission (https://www.kaggle.com/code/anirudhdagar/aide-solution-tabular-playground-series-apr-2021) only to get a private lb score of 0.70524 (which places me 1192/1250 on the leaderboard) which means only 4.64%.

@dexhunter
Copy link
Member

I tried submitting based on the provided code

this is an example output, might be different from the final submission to Kaggle, also you can try with different large language models and other metrics in the config

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants