Aarush Sah

Infra & Benchmarking @ Groq

prev. Researcher in Residence @ Nous Research

What I like to do:

Featured Projects

ChronoBench

COMING SOON

View Project

Eris

A novel evaluation framework using debate simulations to assess AI models' reasoning, knowledge, and communication skills. Released with OpenRouter and Weights & Biases.

View Project

Set-Eval

A multimodal benchmark that tests vision capabilites and reasoning.

View Project

Fortis Fortuna Adiuvat