Cookbook

This cookbook, inspired by OpenAI's cookbook, is a collection of recipes for common use cases of Braintrust. Each recipe is an open source self-contained example, hosted on GitHub. We welcome community contributions and aspire for the cookbook to be a collaborative, living, breathing collection of best practices for building high quality AI products.

TypeScriptAug 5, 2025
Building reliable AI agents
Avatar
Ornella Altunyan
agenttoolsevalstypescriptlogging
TypeScriptMay 22, 2025
Using PDF attachments in playgrounds
Avatar
Avatar
Carlos Esteban, Ornella Altunyan
loggingmultimodalplaygroundtypescript
TypeScriptMay 15, 2025
Tracing Vercel AI SDK applications
Avatar
Phil Hetzel
loggingNext.js
PythonMay 14, 2025
Evaluating video QA with Twelve Labs
Avatar
Avatar
James Le, Ornella Altunyan
evalvideomultimodal
PythonMar 8, 2025
Evaluating a web agent
Avatar
Avatar
Ornella Altunyan, Adrian Barbir
evalagentmultimodal
PythonFeb 24, 2025
Prompt versioning and deployment
Avatar
Adrian Barbir
evalspromptingfunctions
PythonFeb 18, 2025
Evaluating video QA
Avatar
Adrian Barbir
evalsvideodatasets
PythonFeb 13, 2025
Evaluating a voice agent
Avatar
Adrian Barbir
agentevalsvoice
TypeScriptFeb 8, 2025
Classifying spam using structured outputs
Avatar
Ornella Altunyan
classifierstructured outputsplayground
PythonJan 30, 2025
Evaluating a prompt chaining agent
Avatar
Adrian Barbir
agentevalspython
PythonJan 17, 2025
Evaluating the precision and recall of an emotion classifier
Avatar
Adrian Barbir
recallprecisionevalsclassifierpython
TypeScriptDec 14, 2024
Evaluating audio with the OpenAI Realtime API
Avatar
Ornella Altunyan
evalstoolsaudio
PythonDec 6, 2024
Evaluating SimpleQA
Avatar
Avatar
Ankur Goyal, Ornella Altunyan
datasetsevals
TypeScriptNov 22, 2024
Using Python functions to extract text from images
Avatar
Ornella Altunyan
pythontoolsocrfunctions
TypeScriptOct 31, 2024
Using OpenTelemetry for LLM observability
Avatar
Ornella Altunyan
evalstools
TypeScriptOct 8, 2024
Using functions to build a RAG agent
Avatar
Avatar
Ornella Altunyan, Ankur Goyal
functionsragtools
PythonSep 30, 2024
Evaluating multimodal receipt extraction
Avatar
Ankur Goyal
evalsmultimodalreceipts
TypeScriptAug 28, 2024
Unreleased AI: A full stack Next.js app for generating changelogs
Avatar
Ornella Altunyan
evalsloggingnext.js
PythonAug 12, 2024
An agent that runs OpenAPI commands
Avatar
Ankur Goyal
agentragevals
TypeScriptJul 29, 2024
Benchmarking inference providers
Avatar
Ankur Goyal
evalsllama-3.1providers
TypeScriptJul 26, 2024
Tool calls in LLaMa 3.1
Avatar
Ankur Goyal
evalsllama-3.1tools
TypeScriptJul 16, 2024
Evaluating a chat assistant
Avatar
Tara Nagar
evalschat
PythonMay 29, 2024
LLM Eval For Text2SQL
Avatar
Ankur Goyal
evalsdatasetstext2sql
PythonMay 27, 2024
Optimizing Ragas to evaluate a RAG pipeline
Avatar
Avatar
Ankur Goyal, Nelson Auner
evalsrag
TypeScriptMay 22, 2024
Comparing evals across multiple AI models
Avatar
John Huang
evalscharts
PythonMay 20, 2024
Detecting Prompt Injections
Avatar
Nelson Auner
evalsclassification
PythonMar 4, 2024
AI Search Bar
Avatar
Austin Moehle
evalssql
TypeScriptFeb 13, 2024
How Zapier uses assertions to evaluate tool usage in chatbots
Avatar
Vítor Balocco
evalsassertionstools
TypeScriptFeb 2, 2024
Generating release notes and hill-climbing to improve them
Avatar
Ankur Goyal
evalshill-climbing
TypeScriptJan 29, 2024
Generating beautiful HTML components
Avatar
Ankur Goyal
loggingdatasetsevals
PythonDec 21, 2023
Coda's Help Desk with and without RAG
Avatar
Avatar
Austin Moehle, Kenny Wong
evalsrag
TypeScriptOct 29, 2023
Improving Github issue titles using their contents
Avatar
Ankur Goyal
evalssummarization
PythonSep 1, 2023
Classifying news articles
Avatar
David Song
evalsclassification
PythonAug 12, 2023
Text-to-SQL
Avatar
Ankur Goyal
evalssql
Cookbook - Docs - Braintrust