Meta Superintelligence Labs
Feb 2026 – PresentSoftware Engineer
Software Engineer
Evals Lead
Joined after the Groq–NVIDIA partnership. Worked on inference and evaluation systems at the intersection of hardware and model performance.
Head of Evals
Led the Evals team. Built openbench, an open-source standard for running evals easily, reliably, and in a reproducible manner. Designed evaluation infrastructure that standardized benchmarking across 20+ evaluation suites and became the backbone of Groq's model quality process.
Researcher in Residence
Developed synthetic data pipelines for training language models. Built systems that generated high-quality training data at scale, contributing to Nous's open-source model releases.
Provider-agnostic, open-source evaluation infrastructure for language models. Standardized benchmarking across 20+ evaluation suites.
An evaluation framework using debate simulations to assess AI models' reasoning and communication skills.
A multimodal benchmark for testing vision capabilities and reasoning in AI models.
RAG, Agents, and Latency
Webinar with Jason Liu · June 2025
AI Evals for Engineers & PMs
Maven Course Guest Lecture · May 2025
Guest Lecture on Evals
Stanford CS224G · February 2025