Aarush Sah

Aarush Sah

Head of Evals at Groq

I lead the Evals team at Groq, where we're building openbench - an open-source standard for running evals easily, reliably, and in a reproducible manner. Before Groq, I was at Nous Research developing synthetic data pipelines for training language models. When I'm not working on eval infrastructure, I find great joy in reading epic fantasy novels (especially Brandon Sanderson's works) and optimizing little parts of my life with software.

Projects

openbench

July 2025

Provider-agnostic, open-source evaluation infrastructure for language models. Standardized benchmarking across 20+ evaluation suites.

July 2024

An evaluation framework using debate simulations to assess AI models' reasoning and communication skills.

March 2024

A multimodal benchmark for testing vision capabilities and reasoning in AI models.

Writing

Speaking

RAG, Agents, and Latency

June 2025

Webinar with Jason Liu

AI Evals for Engineers & PMs

May 2025

Maven Course Guest Lecture

Guest Lecture on Evals

February 2025

Stanford CS224G