file: ./content/docs/changelog.mdx meta: { "title": "Changelog" } # Changelog ## Week of 2025-08-11 * Pro plan organizations can now downgrade to the Free plan via the settings page without contacting support * Prevent read-only users from downloading data from the UI ## Data plane (1.1.19) * Add support for GPT-5 models * OTel tracing support for Google Agent Development Kit * OTel support for deleting fields * Fix binder error handling for malformed BTQL queries * Enable environment tags on prompt versions ## Week of 2025-08-04 * @mention team members in comments to notify them via email. To mention someone, type "@" and a team member's name or email in any comment input. * You can now assign users to rows in experiments, logs, and datasets. Once assigned, you can filter rows by a specific user or a group of users. * View configuration has been changed to no longer auto-save changes. It now shows a dirty state and you have the option of saving or resetting those changes back to the base view. ## Python SDK version 0.2.2 * Added `environment` parameter to `load_prompt` * The Otel SpanProcessor now keeps `traceloop.*` spans by default. * Experiments can now be run without sending results to the server. * Span creation is significantly faster in Python. ## JS SDK version 0.2.3 * Added `environment` parameter to `load_prompt` * The Otel SpanProcessor now keeps `traceloop.*` spans by default. * Experiments can now be run without sending results to the server. * Fix `npx braintrust pull` for large prompts ## JS SDK version 0.2.2 * Fix ai-sdk tool call formatting in output * Log OpenAI Agents input and output to root span * Wrap OpenAI responses.parse * Add @traced support for generator functions ## Python SDK version 0.2.1 * Fix langchain-py integration tracing when users use a @traced method * Wrap OpenAI responses.parse * Add @traced support for generator functions ## Week of 2025-07-28 * New improved UI for trace tree. * Token and cost metrics are computed per sub-tree in the trace viewer. * Download BTQL sandbox results as JSON or CSV ## Data plane (1.1.18) This is our largest data plane release in a while, and it includes several significant performance improvements, bug fixes, and new features: * Improve performance for non-selective searches. Eg make `foo != 'bar'` faster. * Improve performance for score filters. Eg make `scores.correctness = 0` faster. * Improve group by performance. This should make the monitor page and project summary page significantly faster. * Add syntax for explicit casting. You can now use explicit casting functions to cast data to any datatype. e.g. `to_number(input.foo)`, `to_datetime(input.foo)`, etc. * Fix ILIKE queries on nested json: ILIKE queries previously returned incorrect results on nested json objects. ILIKE now works as expected for all json objects. * Improve backfill performance. New objects should get picked up faster. * Improve compaction latency. Indexing should kick in much faster, and in particular, this means data gets indexed a lot faster. * Improved support for OTel mappings, including the new [GenAI Agent](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/) conventions and [strands framework](https://aws.amazon.com/blogs/opensource/introducing-strands-agents-an-open-source-ai-agents-sdk/). * Add Gemini 2.5 Flash-Lite GA, GPT-OSS models on several providers, and Claude Opus 4.1. ## Week of 2025-07-21 * Moved monitor chart legends to the bottom and increased chart heights. * Fixed a monitor chart issue where the series toggle selector would filter the incorrect series. * Improved monitor fullscreen experience: charts now open faster and retain their series filter state. * Loop is now available in the experiments page and has a new ability to render interactive components inside the chat that will help you find the UI element that Loop is referencing. * You can now use remote evals with the "+Experiment" button to create a new experiment. Previously, they were only available in the playground. ## JS SDK version 0.2.1 * Fix support for the `openai.chat.completions.parse` method when used with `wrapOpenAI`. * Added support for ai-sdk\@beta with new `BraintrustMiddleware` * Support running remote evals as full experiments. ## JS SDK version 0.2.0 * When running multiple trials per input (`trial_count > 1`), you can now access the current trial index (0-based) via `hooks.trialIndex` in your task function. * Added `BraintrustExporter` in addition to `BraintrustSpanProcessor`. * Bound max ancestors in git to 1,000. ## Python SDK version 0.2.0 * When running multiple trials per input (`trial_count > 1`), you can now access the current trial index (0-based) via `hooks.trial_index` in your task function. * New LiteLLM `wrap_litellm` wrapper. * Increase max ancestors in git to 1,000. ## Data plane (1.1.15) * Add ability to run scorers as tasks in the playground * You can now use object storage, instead of Redis, as a locks manager. * Support async python in inline code functions * Don't re-trigger online scoring on existing traces if only metadata fields like `tags` change. ## Week of 2025-07-14 * Add monitor page UTC timezone toggle * Improved trace view loading performance for large traces. ## Python SDK version 0.1.8 * Added `BraintrustSpanProcessor` to simplify Braintrust's integration with OpenTelemetry. ## JS SDK version 0.1.1 * Added `BraintrustSpanProcessor` to simplify integration with OpenTelemetry. ## Data plane (1.1.14) * Switch the default query shape from `traces` to `spans` in the API. This means that btql queries will now return 1 row per span, rather than per trace. This change also applies to the REST API. * Service tokens with scoped, user-independent credentials for system integrations. * Fix a bug where very large experiments (run through the API) would drop spans if they could not flush data fast enough. * Support built-in OTel metrics (contact your account team for more details) * New parallel backfiller improves performance of loading data into Brainstore across many projects. ## Python SDK version 0.1.7 * Added support for loading prompts by ID via the `load_prompt` function. You can now load prompts directly by their unique identifier: ```python prompt = braintrust.load_prompt(id="prompt_id_123") ``` ## JS SDK version 0.1.0 * Fix a bug where large experiments would drop spans if they could not flush data fast enough. * Fix bug in attachment uploading in evals executed with `npx braintrust eval`. * Upgrading zod dependency from `^3.22.4` to `^3.25.3` * Added support for loading prompts by ID via the `loadPrompt` function. You can now load prompts directly by their unique identifier: ```typescript #skip-compile const prompt = await loadPrompt({ id: "prompt_id_123" }); ``` ## Week of 2025-07-07 * Loop can now create custom code scorers in playgrounds * Schema builder UI for structured outputs * Sort datasets when the `Faster tables` feature flag is enabled * Change LLM duration to be the sum, not average, of LLM duration across spans * Add support for Grok 4 and Mistral's Devstral Small Latest ## Data plane (1.1.13) * Fix support for `COALESCE` with variadic arguments * Add option to select logs for online scoring with a BTQL filter * Add ability to test online scoring configuration on existing logs * Mmap based indexing optimization enabled by default for Brainstore ## Data plane (1.1.12) \[skipped] ## Week of 2025-06-30 * Time range filters on the logs page ## Data plane (1.1.11) * Add support for LLaMa 4 Scout for Cerebras * Turn on index validation (which enables self-healing failed compactions) in the Cloudformation by default. ## Week of 2025-06-23 * Add support for multi-factor authentication * Fix a bug with Vertex AI calls when the request includes the anthropic-beta header * Add Zapier integration to trigger Zaps when there's a new automation event or a new project. ## Data plane (1.1.7) * Improve performance of error count queries in Brainstore * Automatically heal segments that fail to compact * Add support for new models including o3 pro * Improve error messages for LLM-originated errors in the proxy ## Autoevals.js v0.0.130 * Remove dependency on `@braintrust/core` ## JS SDK version 0.0.209 * Ensure SpanComponentsV3 encoding works in the browser. ## JS SDK version 0.0.208 * Ensure running remote evals (i.e. `runDevServer`) works without the CLI wrapper. * Add span + parent ids to `StartSpanArgs` ## Week of 2025-06-16 * Add OpenAI's [o3-pro](https://platform.openai.com/docs/models/o3-pro) model to the playground and AI proxy. * View parameters are now present in the url when viewing a default view * Experiments charting controls have been added into views * Experiment objects now support tags through the API and on the experiments view * Add support for Gemini 2.5 Pro, Gemini 2.5 Flash, and Gemini 2.5 Flash Lite ### Python SDK version 0.1.5 * The SDK's under-the-hood log queue will not block when full and has a default size of 25000 logs. You can configure the max size by setting `BRAINTRUST_LOG_QUEUE_MAX_SIZE` in your environment. The environment variable `BRAINTRUST_QUEUE_DROP_WHEN_FULL` is no longer used. * Improvements to the logging of parallel tool calls. * Attachments are now converted to base64 data URLs, making it easier to work with image attachments in prompts. ### JS SDK version 0.0.207 * The SDK's under-the-hood queue for sending logs now has a default size of 5000 logs. You can configure the max size by setting `BRAINTRUST_LOG_QUEUE_MAX_SIZE` in your environment. * Improvements to the logging of parallel tool calls. * Attachments are now converted to base64 data URLs, making it easier to work with image attachments in prompts. ## Data plane (1.1.6) * Patch a bug in 1.1.5 related to the `realtime_state` field in the API response. ## Data plane (1.1.5) * Default query timeout in Brainstore is now 32 seconds. * Auto-recompact segments which have been rendered unusable due to an S3-related issue. * Gemini 2.5 models ## Data plane (1.1.4) * Optimize "Activity" (audit log) queries, which reduces the query workload on Postgres for large traces (even if you are using Brainstore). * Automatically convert base64 payloads to attachments in the data plane. This reduces the amount of data that needs to be stored in the data plane and improves page load times. You can disable this by setting `DISABLE_ATTACHMENT_OPTIMIZATION=true` or `DisableAttachmentOptimization=true` in your stack. * Improve AI proxy errors for status codes 401->409 * Increase real-time query memory limit to 10GB in Brainstore ## Week of 2025-06-09 * Correctly propagate `expected` and `metadata` values to function calls when running `invoke`. This means that if you provide `expected` or `metadata`, `input` refers to the top-level input argument. If you are passing in a value like `{input: "a"}`, then you must now use `{{input.input}}` to refer to the string "a", if you pass in `expected` or `metadata`. This should have no effect on the playground or scorers. * Chat-like thread layout that simplifies thread display to LLM and score data * Enable all agent nodes to access dataset variables with the mustache variable `{{dataset}}`. For example, to access `metadata.foo` in the third prompt in an agent, you can use `{{dataset.metadata.foo}}`. * Improve reliability of online scoring when logging high volumes of data to a project. * Tags can now be sorted in the project configuration page which will change their display order in other parts of the UI. * System-only messages are now supported in Anthropic and Bedrock models. * Logs page UI can now filter nested data fields in `metadata`, `input`, `output`, and `expected`. ### Python SDK version 0.1.4 * Add `project.publish()` to directly `push` prompts to Braintrust (without running `braintrust push`). * `@traced` now works correctly with async generator functions. * The OpenAI and Anthropic wrappers set `provider` metadata. ### JS SDK version 0.0.206 * Add support for `project.publish()` to directly `push` prompts to Braintrust (without running `braintrust push`). * The OpenAI and Anthropic wrappers set `provider` metadata. ## Week of 2025-06-02 * Support reasoning params and reasoning tokens in streaming and non-streaming responses in the [AI proxy](/docs/guides/proxy) and across the product (requires a stack update to 0.0.74). * New [braintrust-proxy](https://pypi.org/project/braintrust-proxy/) Python library to help developers integrate with their IDEs to support new reasoning input and output types. * New `@braintrust/proxy/types` module to augment OpenAI libraries with reasoning input and output types. * New streaming protocol between Brainstore and the API server speeds up queries. * Time brushing interaction enabled on Monitor page charts. * Can create user-defined views in the monitoring page. * Live updating time mode added to the monitoring page. * The `anthropic` package is now included by default in Python functions. * Audit log queries must now specify an `id` filter for the set of rows to fetch. These queries will only return the audit log for the specified rows, rather than the whole trace. * (Beta) continuously export logs, experiments, and datasets to S3. * Enable passing `metadata` and `expected` as arguments to the first agent prompt node. ### Python SDK version 0.1.3 * Improve retry logic in the control plane connection (used to create new experiments and datasets). ## Week of 2025-05-26 * The "Faster tables" flag is now the default (you may need to update your data plane if you are self-hosted). You should notice experiments, datasets, and the logs page load much faster. * Add Claude 4 models in Bedrock and Vertex to the AI proxy and playground. * Braintrust now incorporates cached tokens into the cost calculations for experiments and logs. The monitor page also now includes separate lines so you can track costs and counts for uncached, cached, and cache creation tokens. * Native support for thinking parameters in the playground.