Skip to content

Projects

Research projects, evaluation frameworks, and technical work.

OpenBench

Provider-agnostic, open-source evaluation infrastructure for language models. Standardized benchmarking across 20+ evaluation suites.

Eris

An evaluation framework using debate simulations to assess AI models' reasoning and communication skills.

Set-Eval

A multimodal benchmark for testing vision capabilities and reasoning in AI models.