When would you use Long-context strategies?

Multi-document research synthesis: A policy analyst uploads five 40-page government reports and asks Claude to extract conflicting recommendations and produce a unified briefing. Placing all documents at the top and the synthesis request at the bottom ensures Claude attends to the right content.

When would you use Long-context strategies?

Whole-codebase review: A developer pastes or uploads an entire Node.js project—hundreds of files totaling 150K tokens—and asks Claude to identify security vulnerabilities and outdated dependencies in a single pass.

When would you use Long-context strategies?

Long-running autonomous agent: An engineering team builds a refactoring agent in Claude Code that runs for 30+ minutes, reads and rewrites files, and uses server-side compaction to continue working after each context reset without losing project state.

← ContentsPrompting · advanced

Long-context strategies

Long-context strategies are prompting techniques and system-design patterns that help you get the best results when working with large amounts of text—typically 20,000 tokens or more—within Claude's context window. The context window is everything Claude can 'see' at once: your system prompt, conversation history, uploaded documents, tool results, and your current message. Getting the most out of that space requires deliberate choices about what to include, where to place it, and how to manage it over time. The core strategies fall into three categories: (1) structural placement—putting long documents near the top of your prompt and your actual question at the bottom, (2) document formatting—wrapping content in XML tags so Claude can navigate it clearly, and (3) context lifecycle management—using tools like server-side compaction and prompt caching to keep conversations coherent and cost-effective over many turns or very long sessions. These are not features you enable with a toggle. They are techniques you apply when writing prompts or building applications. They work on every Claude plan and across every interface—Claude.ai chat, Claude Code, and the API—though some advanced capabilities like server-side compaction require specific API headers or are rolling out in beta.

When you’d use it

◆Multi-document research synthesis — A policy analyst uploads five 40-page government reports and asks Claude to extract conflicting recommendations and produce a unified briefing. Placing all documents at the top and the synthesis request at the bottom ensures Claude attends to the right content.
◆Whole-codebase review — A developer pastes or uploads an entire Node.js project—hundreds of files totaling 150K tokens—and asks Claude to identify security vulnerabilities and outdated dependencies in a single pass.
◆Long-running autonomous agent — An engineering team builds a refactoring agent in Claude Code that runs for 30+ minutes, reads and rewrites files, and uses server-side compaction to continue working after each context reset without losing project state.
◆Contract and legal document analysis — A legal team uploads 200 pages of contracts and asks Claude to flag non-standard clauses. XML document tags with source metadata let Claude cite specific sections accurately.
◆Multi-turn customer support with cost optimization — A SaaS company builds a support bot with a 4,000-token system prompt containing product knowledge. Prompt caching ensures that system prompt is only billed at full rate once every five minutes, cutting per-conversation costs by up to 90% on system-prompt tokens.

What changed recently

◆2026-01 — Sonnet 4.6 gained a 1M token context window in beta, making million-token capacity available on a more cost-efficient model tier previously only available on Opus models.
◆2025-12 — Server-side compaction entered beta for the API. Access requires passing the beta header 'compact-2026-01-12'. Available for Claude Opus 4.7, Opus 4.6, Sonnet 4.6, and Claude Mythos Preview.
◆2026-05 — Free tier upgraded to Sonnet 4.6 as the default model, which includes automatic compaction. Long-context strategies via compaction are now available at no cost.
◆2026-01 — Minimum cacheable token threshold for prompt caching reduced to 1,024 tokens on Opus 4.8 and Sonnet 4.6, down from higher prior thresholds, making caching practical for mid-sized system prompts.

This is the short version

The full chapter has three worked examples, the common pitfalls, and the workflow that makes it pay — plus the other 84 features, kept current.

Get Claude Master — $97 →