When would you use Cost & token monitoring?

Spot-checking spend before a heavy task: A developer is mid-session and about to ask Claude to refactor a large multi-file module. They run /cost to see whether the context has already accumulated significant spend before adding more expensive output tokens.

When would you use Cost & token monitoring?

Choosing the right model to contain costs: An engineer realizes their session estimate is climbing fast. They check /model to compare available models, switch from Opus to Sonnet for routine edits, and reserve Opus only for architectural decisions where deeper reasoning justifies the higher per-token price.

When would you use Cost & token monitoring?

Monthly budget reporting for an engineering team: An Engineering Manager exports a CSV from Console's Usage page at month-end, filters by team API keys, and identifies which developers or projects consumed the most tokens, then uses that data to forecast next month's API budget.

← ContentsClaude Code · advanced

Cost & token monitoring

Cost and token monitoring in Claude Code is the practice of tracking how many tokens your sessions consume and what those tokens cost, so you can manage spend, stay within plan limits, and optimize model selection. Because Claude Code operates as an agentic system that automatically loads files, conversation history, tool definitions, and subagent outputs into the context window on every turn, token counts can grow quickly and silently. Monitoring gives you visibility into that growth before it becomes a surprise on your bill. The feature spans several layers: a built-in `/cost` command in the CLI that shows a session-level estimate, a `/usage` command that adds plan-limit breakdowns for subscribers, the Usage page in Claude Console for authoritative billing data, an Admin API for programmatic org-wide reporting, and OpenTelemetry integration for exporting metrics to external observability stacks. Each layer serves a different audience, from individual developers checking their burn rate mid-session to platform teams building automated cost governance pipelines. Understanding the distinction between the local cost estimate and your actual bill is fundamental. The CLI computes costs locally from token counts using cached pricing tables; the result is an approximation useful for relative comparisons during a session, not a substitute for the Console's authoritative usage data. Subscription users (Pro, Max, Team, Enterprise) see quota utilization rather than dollar amounts, because their plan includes usage rather than billing per token.

🎧 Listen to this as a podcast episode

When you’d use it

◆Spot-checking spend before a heavy task — A developer is mid-session and about to ask Claude to refactor a large multi-file module. They run /cost to see whether the context has already accumulated significant spend before adding more expensive output tokens.
◆Choosing the right model to contain costs — An engineer realizes their session estimate is climbing fast. They check /model to compare available models, switch from Opus to Sonnet for routine edits, and reserve Opus only for architectural decisions where deeper reasoning justifies the higher per-token price.
◆Monthly budget reporting for an engineering team — An Engineering Manager exports a CSV from Console's Usage page at month-end, filters by team API keys, and identifies which developers or projects consumed the most tokens, then uses that data to forecast next month's API budget.
◆Automated cost attribution across a large org — A platform team uses the Claude Code Analytics Admin API to pull daily usage data broken down by user and model, feeds the results into their internal BI tool, and generates per-team cost dashboards that engineering directors can review without accessing Console directly.
◆Understanding caching savings in an agentic pipeline — A developer building an automated code-review pipeline tracks cache_creation_input_tokens and cache_read_input_tokens separately in the Agent SDK usage object to measure how much the automatic prompt caching is reducing their effective cost versus a naïve no-cache baseline.

What changed recently

◆2026-05 — The /usage command was updated to display a granular per-category breakdown for subscription plans, explicitly attributing quota consumption to skills, subagents, plugins, and individual MCP servers as percentages of the total.
◆2026-05 — Contribution metrics entered public beta for Team and Enterprise organizations, connecting Claude Code token usage data to GitHub repository activity so managers can correlate AI spend with pull request velocity.
◆2025 — Claude Code Analytics Admin API introduced, providing programmatic access to per-user, per-model token and cost data with cursor-based pagination and up to 1-hour data latency.
◆2025 — OpenTelemetry export support added via CLAUDE_CODE_ENABLE_TELEMETRY environment variable, enabling integration with OTLP-compatible observability platforms.

This is the short version

The full chapter has three worked examples, the common pitfalls, and the workflow that makes it pay — plus the other 84 features, kept current.

Get Claude Master — $97 →