The Internship Game
There's a curious disconnect in how we talk about tech internships. The conventional wisdom—polish your resume, practice interview questions, network aggressively—isn't wrong, exactly. But it misses something essential.
Head of Evals at Groq
I lead the Evals team at Groq, where we're building OpenBench - an open-source standard for running evals easily, reliably, and in a reproducible manner. Before Groq, I was at Nous Research developing synthetic data pipelines for training language models.
When I'm not working on eval infrastructure, I find great joy in reading epic fantasy novels (especially Brandon Sanderson's works) and optimizing little parts of my life with software. The best way to reach me is a DM on X.
Provider-agnostic, open-source evaluation infrastructure for language models. Standardized benchmarking across 20+ evaluation suites.
An evaluation framework using debate simulations to assess AI models' reasoning and communication skills.
A multimodal benchmark for testing vision capabilities and reasoning in AI models.
There's a curious disconnect in how we talk about tech internships. The conventional wisdom—polish your resume, practice interview questions, network aggressively—isn't wrong, exactly. But it misses something essential.
Talking about fast RAG and why most agent infra is slow by default
Teaching engineers how to evaluate AI in the context of reliability and infrastructure
Guest lectured at Stanford about evals with Ben Klieger