Athina is building monitoring and evaluation tools for LLM developers.
- Evals SDK: Open-source framework for evaluating LLMs (Python + CLI)
- Platform: Monitor your production inferences, and automatically run evals
Documentation | Quick Start | Running Evals
We have a library of preset evaluators, but you can also write custom evaluators within the Athina framework.
- Context Contains Enough Information: Detect bad or insufficient retrievals.
- Does Response Answer Query: Detect incomplete or irrelevant responses.
- Response Faithfulness: Detect when responses are deviating from the provided context.
- Summarization Accuracy: Detect hallucinations and mistakes in summaries
- Grading Criteria: If X, then fail. Otherwise pass.
- Custom Evals: Custom prompt for LLM-powered evaluation.
- RAGAS: A set of evaluators that return RAGAS metrics.
Results can also be viewed and tracked on our platform.
Documentation | Demo Video | Sign Up
- UI for monitoring and visibility into your LLM inferences.
- Run evals automatically against logged inferences in production.
- Track cost, token usage, response times, feedback, pass rate and other eval metrics.
- Analytics segmented by Customer ID, Model, Prompt, Environment, and More.
- Topic Classification
- Data Exports
- ... and more
Contact [email protected] if you have any questions.