Meta Superintelligence Labs
Feb 2026 – PresentSoftware Engineer
Superintelligence
Software Engineer
Superintelligence
Evals Lead
Joined after the Groq–NVIDIA partnership. Worked on inference and evaluation systems at the intersection of hardware and model performance.
Head of Evals
Led the Evals team. Built openbench, an open-source standard for running evals easily, reliably, and in a reproducible manner. Designed evaluation infrastructure that standardized benchmarking across 20+ evaluation suites and became the backbone of Groq's model quality process.
Provider-agnostic, open-source evaluation infrastructure for language models. Standardized benchmarking across 20+ evaluation suites.
An evaluation framework using debate simulations to assess AI models' reasoning and communication skills.
A multimodal benchmark for testing vision capabilities and reasoning in AI models.