Batch processing
The Message Batches API is Anthropic's system for submitting large numbers of independent Claude API requests as a single grouped job for asynchronous processing. Instead of sending requests one at a time and waiting for each response, you package up to 100,000 requests into a batch, submit it once, and retrieve all results when processing completes — typically within an hour, with a 24-hour maximum window.
The primary benefit is cost: all batch requests are charged at 50% of standard API prices. This discount can be combined with prompt caching discounts, so heavily cached batches can cost significantly less than real-time equivalents. Throughput is also higher than sequential real-time calls, because Anthropic's infrastructure processes batches during available capacity.
Batch processing is designed for workloads that don't need immediate responses — things like overnight data enrichment, bulk content generation, large-scale evaluations, or any job where a one-hour turnaround is acceptable. It supports all standard Messages API features including vision, tool use, extended thinking, and prompt caching. Results are stored for 29 days after batch creation.
When you’d use it
- ◆Bulk sentiment analysis — A company has 50,000 customer support tickets and wants to classify each as positive, negative, or neutral before importing into a dashboard. Submitting all 50,000 as a batch completes in roughly an hour at half the cost of sequential real-time calls.
- ◆LLM-as-a-judge evaluations — An engineering team wants to test a new prompt against 10,000 edge-case inputs before deploying to production. They submit all test cases as a batch, then analyze the returned scores and pass/fail flags to catch regressions before launch.
- ◆Content moderation at scale — A social platform needs to review a backlog of 200,000 user posts for policy violations. Batching avoids saturating the real-time API used for live user traffic and reduces moderation costs significantly.
- ◆Bulk document data extraction — A data engineering team has 5,000 scanned PDF reports and needs to extract structured fields (dates, amounts, entities) from each. The Batches API processes all documents asynchronously, returning JSON objects that can be loaded directly into a database.
- ◆Large-scale content generation — An e-commerce company needs product descriptions written for 30,000 SKUs. Each SKU's attributes are sent as a separate batch request; the completed descriptions are downloaded overnight and imported into the product catalog.
What changed recently
- ◆2024-12-17 — Message Batches API launched in public beta, initially supporting Claude 3.5 Sonnet, Claude 3 Opus, and Claude 3 Haiku.
- ◆2025-10-31 — Message Batches API reached General Availability on the Anthropic API, graduating from public beta.
- ◆2026-03-24 — Extended output beta support added for the Message Batches API. Using the 'output-300k-2026-03-24' beta header raises the max_tokens ceiling to 300,000 for Claude Opus 4 and Sonnet 4 models, enabling long-form document generation, large code output, and exhaustive data extraction within a single batch request.
This is the short version
The full chapter has three worked examples, the common pitfalls, and the workflow that makes it pay — plus the other 84 features, kept current.
Get Claude Master — $97 →