OpenBench
Provider-agnostic, open-source evaluation infrastructure for language models. Standardized benchmarking across 20+ evaluation suites.
Eris
An evaluation framework using debate simulations to assess AI models' reasoning and communication skills.
Set-Eval
A multimodal benchmark for testing vision capabilities and reasoning in AI models.