Projects

Research projects, evaluation frameworks, and technical work.

Eris

An evaluation framework using debate simulations to assess AI models' reasoning and communication skills.

Set-Eval

A multimodal benchmark for testing vision capabilities and reasoning in AI models.