Cookbook
This cookbook, inspired by OpenAI's cookbook, is a collection of recipes for common use cases of Braintrust. Each recipe is an open source self-contained example, hosted on GitHub. We welcome community contributions and aspire for the cookbook to be a collaborative, living, breathing collection of best practices for building high quality AI products.
TypeScriptAug 5, 2025
Building reliable AI agents

Ornella Altunyan
agenttoolsevalstypescriptlogging
TypeScriptMay 22, 2025
Using PDF attachments in playgrounds


Carlos Esteban, Ornella Altunyan
loggingmultimodalplaygroundtypescript
TypeScriptMay 15, 2025
Tracing Vercel AI SDK applications

Phil Hetzel
loggingNext.js
PythonMay 14, 2025
Evaluating video QA with Twelve Labs


James Le, Ornella Altunyan
evalvideomultimodal
PythonMar 8, 2025
Evaluating a web agent


Ornella Altunyan, Adrian Barbir
evalagentmultimodal
PythonFeb 24, 2025
Prompt versioning and deployment

Adrian Barbir
evalspromptingfunctions
PythonFeb 18, 2025
Evaluating video QA

Adrian Barbir
evalsvideodatasets
PythonFeb 13, 2025
Evaluating a voice agent

Adrian Barbir
agentevalsvoice
TypeScriptFeb 8, 2025
Classifying spam using structured outputs

Ornella Altunyan
classifierstructured outputsplayground
PythonJan 30, 2025
Evaluating a prompt chaining agent

Adrian Barbir
agentevalspython
PythonJan 17, 2025
Evaluating the precision and recall of an emotion classifier

Adrian Barbir
recallprecisionevalsclassifierpython
TypeScriptDec 14, 2024
Evaluating audio with the OpenAI Realtime API

Ornella Altunyan
evalstoolsaudio
PythonDec 6, 2024
Evaluating SimpleQA


Ankur Goyal, Ornella Altunyan
datasetsevals
TypeScriptNov 22, 2024
Using Python functions to extract text from images

Ornella Altunyan
pythontoolsocrfunctions
TypeScriptOct 31, 2024
Using OpenTelemetry for LLM observability

Ornella Altunyan
evalstools
TypeScriptOct 8, 2024
Using functions to build a RAG agent


Ornella Altunyan, Ankur Goyal
functionsragtools
PythonSep 30, 2024
Evaluating multimodal receipt extraction

Ankur Goyal
evalsmultimodalreceipts
TypeScriptAug 28, 2024
Unreleased AI: A full stack Next.js app for generating changelogs

Ornella Altunyan
evalsloggingnext.js
PythonAug 12, 2024
An agent that runs OpenAPI commands

Ankur Goyal
agentragevals
TypeScriptJul 29, 2024
Benchmarking inference providers

Ankur Goyal
evalsllama-3.1providers
TypeScriptJul 26, 2024
Tool calls in LLaMa 3.1

Ankur Goyal
evalsllama-3.1tools
TypeScriptJul 16, 2024
Evaluating a chat assistant

Tara Nagar
evalschat
PythonMay 29, 2024
LLM Eval For Text2SQL

Ankur Goyal
evalsdatasetstext2sql
PythonMay 27, 2024
Optimizing Ragas to evaluate a RAG pipeline


Ankur Goyal, Nelson Auner
evalsrag
TypeScriptMay 22, 2024
Comparing evals across multiple AI models

John Huang
evalscharts
PythonMay 20, 2024
Detecting Prompt Injections

Nelson Auner
evalsclassification
PythonMar 4, 2024
AI Search Bar

Austin Moehle
evalssql
TypeScriptFeb 13, 2024
How Zapier uses assertions to evaluate tool usage in chatbots

Vítor Balocco
evalsassertionstools
TypeScriptFeb 2, 2024
Generating release notes and hill-climbing to improve them

Ankur Goyal
evalshill-climbing
TypeScriptJan 29, 2024
Generating beautiful HTML components

Ankur Goyal
loggingdatasetsevals
PythonDec 21, 2023
Coda's Help Desk with and without RAG


Austin Moehle, Kenny Wong
evalsrag
TypeScriptOct 29, 2023
Improving Github issue titles using their contents

Ankur Goyal
evalssummarization
PythonSep 1, 2023
Classifying news articles

David Song
evalsclassification
PythonAug 12, 2023
Text-to-SQL

Ankur Goyal
evalssql