Eris
An evaluation framework using debate simulations to assess AI models' reasoning and communication skills.
Set-Eval
A multimodal benchmark for testing vision capabilities and reasoning in AI models.
Research projects, evaluation frameworks, and technical work.
An evaluation framework using debate simulations to assess AI models' reasoning and communication skills.
A multimodal benchmark for testing vision capabilities and reasoning in AI models.