Skip to content

BerriAI/llm-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 

Repository files navigation

LLM-Benchmark

This is a list tracking good LLM Benchmarks. Unfortunately not all can be run with API endpoints. If there's any you'd like to use with API endpoints, create an issue.

Library ChatCompletions Completions Custom Proxy Comments
EvalPlus Evaluates code gen
LM Eval harness
MT-Bench w/ LLM Judge Evaluates chat assistants. Asks turn-by-turn conversation questions and then uses another LLM to evaluate results
RAGAS
HELM Link
FLASK
bigcode-project
HumanEval
BigBench
Fiddler
LLM Attacks
GPT Fathomhttps://github.com/GPT-Fathom/GPT-Fathom

About

List of good LLM Benchmarks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published