Aarush Sah

Machine Learning Scientist

Intern @ Groq

prev. Researcher in Residence @ Nous Research

I'm an LLM Researcher who's interested in LLMs' ability to generalize.

What I like to do:

Featured Projects

ChronoBench

Time will reveal the true measure of intelligence...

View Project

Eris

A novel evaluation framework using debate simulations to assess AI models' reasoning, knowledge, and communication skills. Released with OpenRouter and Weights & Biases.

View Project

Set-Eval

An easy-to-reproduce evaluation framework built on Inspect-AI, offering dataset generation and evaluation tools.

View Project

Prompt Optimizer

A Python script that automates prompt engineering using Anthropic's Claude language model.

View Project

BookTrailers

A Python script that automates the process of generating book trailers for libraries to drive teen engagement.

View Project

LLM-PCI

A Python script that allows you to inject the context of an entire project folder into long-context language models for use as a coding assistant.

View Project

Calliope

An AI-powered storywriting copilot that coaxes you into writing a story, and architects plot points, characters, and chapters.

View Project

Fortis Fortuna Adiuvat