prev. Researcher in Residence @ Nous Research
A novel evaluation framework using debate simulations to assess AI models' reasoning, knowledge, and communication skills. Released with OpenRouter and Weights & Biases.
View ProjectFortis Fortuna Adiuvat