Useful benchmarks that have human scores beyond AI SOTA. - Google Docs #954
Labels
ai-leaderboards
leaderdoards for llm's and other ml models
llm-benchmarks
testing and benchmarking large language models
llm-evaluation
Evaluating Large Language Models performance and behavior through human-written evaluation sets
Useful benchmarks that have human scores beyond AI SOTA
Snippet
Useful benchmarks that have human scores beyond AI SOTA.
Full Content
Useful benchmarks that have human scores beyond AI SOTA.
There are a number of important real-world benchmarks where human performance surpasses the current state-of-the-art (SOTA) in AI:
These benchmarks suggest that there remain significant gaps between current AI capabilities and human-level performance on many real-world tasks. Closing these gaps will be an important area of research going forward.
Suggested labels
None
The text was updated successfully, but these errors were encountered: