I lead evaluation systems at Groq. I'm interested in how we can better understand AI capabilities through systematic evaluation. In my spare time, I build benchmarks that test the capabilities of LLMs.

Before Groq, I was at Nous Research developing synthetic data pipelines for training language models.

AIME Problem #1 - 2025 AIME I
Problem: Find the sum of all integer bases $b>9$ for which $17_b$ is a divisor of $97_b$.
|

Projects

AI Debate Evaluation
Topic: The potential of quantum computing in breaking current encryption standards
Affirmative Position
Negative Position
Set-Eval

Blog