prev. Researcher in Residence @ Nous Research
I'm an LLM Researcher who's interested in LLMs' ability to generalize.
A novel evaluation framework using debate simulations to assess AI models' reasoning, knowledge, and communication skills. Released with OpenRouter and Weights & Biases.
View ProjectAn easy-to-reproduce evaluation framework built on Inspect-AI, offering dataset generation and evaluation tools.
View ProjectA Python script that automates prompt engineering using Anthropic's Claude language model.
View ProjectA Python script that automates the process of generating book trailers for libraries to drive teen engagement.
View ProjectA Python script that allows you to inject the context of an entire project folder into long-context language models for use as a coding assistant.
View ProjectAn AI-powered storywriting copilot that coaxes you into writing a story, and architects plot points, characters, and chapters.
View ProjectFortis Fortuna Adiuvat