Work

NVIDIA

Jan 2026 – Feb 2026

Evals Lead

Joined after the Groq–NVIDIA partnership. Worked on inference and evaluation systems at the intersection of hardware and model performance.

Groq

Jul 2024 – Jan 2026

Head of Evals

Led the Evals team. Built openbench, an open-source standard for running evals easily, reliably, and in a reproducible manner. Designed evaluation infrastructure that standardized benchmarking across 20+ evaluation suites and became the backbone of Groq's model quality process.

Nous Research

Apr 2024 – Jun 2024

Researcher in Residence

Developed synthetic data pipelines for training language models. Built systems that generated high-quality training data at scale, contributing to Nous's open-source model releases.

Projects

openbench

Provider-agnostic, open-source evaluation infrastructure for language models. Standardized benchmarking across 20+ evaluation suites.

Eris

An evaluation framework using debate simulations to assess AI models' reasoning and communication skills.

Set-Eval

A multimodal benchmark for testing vision capabilities and reasoning in AI models.

Speaking

RAG, Agents, and Latency

Webinar with Jason Liu · June 2025

AI Evals for Engineers & PMs

Maven Course Guest Lecture · May 2025

Guest Lecture on Evals

Stanford CS224G · February 2025