Work

Meta Superintelligence Labs

Feb 2026 – Present

Software Engineer

Superintelligence

NVIDIA

Jan 2026 – Feb 2026

Evals Lead

Joined after the Groq–NVIDIA partnership. Worked on inference and evaluation systems at the intersection of hardware and model performance.

Led the Evals team. Built openbench, an open-source standard for running evals easily, reliably, and in a reproducible manner. Designed evaluation infrastructure that standardized benchmarking across 20+ evaluation suites and became the backbone of Groq's model quality process.

Projects

openbench

2025

Provider-agnostic, open-source evaluation infrastructure for language models. Standardized benchmarking across 20+ evaluation suites.

Announcement GitHub Tweet

Eris

2024

An evaluation framework using debate simulations to assess AI models' reasoning and communication skills.

Announcement Code

Set-Eval

2024

A multimodal benchmark for testing vision capabilities and reasoning in AI models.

Details Code