
Aarush Sah
Head of Evals at Groq
I lead the Evals team at Groq, where we're building openbench - an open-source standard for running evals easily, reliably, and in a reproducible manner. Before Groq, I was at Nous Research developing synthetic data pipelines for training language models.
When I'm not working on eval infrastructure, I find great joy in reading epic fantasy novels (especially Brandon Sanderson's works) and optimizing little parts of my life with software.
Projects
openbench
pip install openbench
Provider-agnostic, open-source evaluation infrastructure for language models. Standardized benchmarking across 20+ evaluation suites.
July 2024
An evaluation framework using debate simulations to assess AI models' reasoning and communication skills.
Writing
Speaking
RAG, Agents, and Latency
June 2025Webinar with Jason Liu
AI Evals for Engineers & PMs
May 2025Maven Course Guest Lecture
Guest Lecture on Evals
February 2025Stanford CS224G