When would you use Extended thinking (API)?

Mathematical proofs and formal reasoning: A researcher needs Claude to produce a rigorous, step-by-step proof (e.g., that sqrt(2) is irrational). Extended thinking lets Claude lay out assumptions, test cases, and logical dependencies before committing to a final written proof, reducing errors in multi-step deduction.

When would you use Extended thinking (API)?

Complex multi-file code debugging: A backend engineer asks Claude to find the root cause of a failing database migration that spans multiple files. Claude uses the thinking budget to map file dependencies, trace the migration's execution path, and identify the constraint violation before suggesting a fix.

When would you use Extended thinking (API)?

Autonomous multi-step agentic workflows: A data engineering team builds a coding agent that receives a high-level requirement and must plan, implement, test, and iterate across many autonomous steps. Adaptive thinking lets the model calibrate reasoning depth per step, spending more thinking tokens on complex planning phases and fewer on straightforward actions.

← ContentsClaude API · advanced

Extended thinking (API)

Extended thinking is a feature of the Claude Messages API that gives Claude a dedicated token budget to reason through complex problems step by step before producing its final response. Instead of jumping straight to an answer, Claude performs multiple sequential reasoning steps internally — a process sometimes called 'serial test-time compute' — and the API returns either a summary or the full record of that reasoning alongside the final text response. The feature has two modes. The older 'manual' mode (type: 'enabled') lets developers set an explicit budget_tokens value to control exactly how many tokens Claude may spend thinking. The newer 'adaptive' mode (type: 'adaptive'), introduced with the Claude 4 family, accepts an effort level (low, medium, high, or max) and lets Claude decide how much reasoning each step actually needs. Adaptive mode is the current standard; manual budget mode is deprecated on Claude Sonnet 4.6 and later and is strictly blocked on Claude Opus 4.7 and 4.8. Extended thinking is particularly effective for mathematics, formal proofs, complex code debugging, multi-step planning, and autonomous agentic workflows. Thinking tokens count toward billing at the same per-token output rate as regular tokens, but the billed count may differ from the token count visible in the response because you are charged for the full thinking generated, not just any summary returned.

🎧 Listen to this as a podcast episode

When you’d use it

◆Mathematical proofs and formal reasoning — A researcher needs Claude to produce a rigorous, step-by-step proof (e.g., that sqrt(2) is irrational). Extended thinking lets Claude lay out assumptions, test cases, and logical dependencies before committing to a final written proof, reducing errors in multi-step deduction.
◆Complex multi-file code debugging — A backend engineer asks Claude to find the root cause of a failing database migration that spans multiple files. Claude uses the thinking budget to map file dependencies, trace the migration's execution path, and identify the constraint violation before suggesting a fix.
◆Autonomous multi-step agentic workflows — A data engineering team builds a coding agent that receives a high-level requirement and must plan, implement, test, and iterate across many autonomous steps. Adaptive thinking lets the model calibrate reasoning depth per step, spending more thinking tokens on complex planning phases and fewer on straightforward actions.
◆Long-horizon research and analysis — A legal-tech company needs Claude to analyze a large body of case law, identify contradictions across rulings, and synthesize a coherent summary. Extended thinking gives Claude space to track parallel arguments and cross-reference evidence before writing the final report.
◆Interleaved tool orchestration in agents — An AI assistant uses web search and database tools to answer a complex financial question. With interleaved thinking enabled, Claude can reason about each tool result before deciding which tool to call next, chaining multiple tool calls with reasoning steps in between — without the developer needing to build complex multi-prompt orchestration logic.

What changed recently

◆2026-05 — Claude Opus 4.8 released with adaptive thinking only. Manual budget mode (type: 'enabled') is strictly blocked on Opus 4.8 and returns a 400 error. Pricing: $5/million input tokens, $25/million output tokens. Up to 90% savings with prompt caching, 50% with batch processing.
◆2026-05 — Claude Opus 4.7 released with the same adaptive-only restriction as Opus 4.8.
◆2025-11 — Interleaved thinking launched in public beta. Enabled via the 'anthropic-beta': 'interleaved-thinking-2025-05-14' header. Allows Claude to generate thinking blocks between tool calls in agentic workflows.
◆2025-10 — Claude Opus 4.6 and Claude Sonnet 4.6 launched with adaptive thinking mode as the recommended approach. Manual budget_tokens mode deprecated on these models but still accepted. Extended thinking response now returns a summary of Claude's reasoning rather than the full thinking trace by default to prevent misuse.

This is the short version

The full chapter has three worked examples, the common pitfalls, and the workflow that makes it pay — plus the other 84 features, kept current.

Get Claude Master — $97 →