Computer use tool
The computer use tool is a beta API capability that lets Claude autonomously interact with desktop computer environments. Claude uses its vision capabilities to capture screenshots of a display, then issues actions — mouse clicks, mouse movement, keyboard input, and scrolling — to control the computer as a human would. This allows it to operate standard desktop applications, web browsers, and other GUI-based software without requiring custom API integrations for each tool.
Instead of building task-specific integrations, developers teach Claude general computer skills, enabling it to complete multi-step workflows across arbitrary applications. For example, Claude can open a browser, navigate to a website, fill out a form, switch to a spreadsheet application, and save results — all autonomously within a single session.
The feature is implemented through an 'agent loop': the client application captures a screenshot and sends it to Claude via the API; Claude analyzes it and returns one or more tool-use actions (e.g., click at coordinate X,Y or type text); the client executes those actions on the actual OS; a new screenshot is captured; and the loop repeats until the task is complete or a maximum iteration limit is reached.
When you’d use it
- ◆Automated UI / QA Testing — A developer wants to verify that a newly shipped web application renders correctly and that all buttons, forms, and navigation elements function as expected, without writing a custom Selenium or Playwright test suite.
- ◆Legacy Application Data Entry — A company uses a decades-old desktop ERP system with no public API. Claude can visually navigate the application's menus, locate the correct data-entry forms, and input records extracted from modern data sources.
- ◆Cross-Application Data Aggregation — A researcher needs to pull financial metrics from three separate browser-based dashboards, compile them into a local spreadsheet, and save the file — a workflow that would otherwise require manual copy-paste across applications.
- ◆Web Form Completion from Structured Data — An operations team must submit dozens of government or vendor portal forms populated from a spreadsheet. Claude can read each row, open the browser portal, fill in the correct fields, and submit, repeating the cycle for each record.
- ◆Software Development and Debugging Assistance — Claude can open a code editor, run terminal commands, observe error output in screenshots, edit source files, re-run tests, and iterate — acting as an autonomous coding agent on a real development environment.
What changed recently
- ◆2025-01-24 — New beta header 'computer-use-2025-01-24' released. Updated computer_20250124 tool introduced with new command options: hold_key, left_mouse_down, left_mouse_up, scroll, triple_click, and wait. The bash_20250124 and text_editor_20250124 tools were decoupled from the computer use system prompt and made independent — they no longer require the computer use beta header.
- ◆2025-11 — Claude Sonnet 4.5 released with significantly improved computer use performance. On the OSWorld benchmark (real-world computer tasks), Sonnet 4.5 reached 61.4%, up from 42.2% for Sonnet 4 just four months prior. Claude Opus 4.5 released, described as the best model for coding, agents, and computer use at that time.
- ◆2025-11-24 — Updated beta header 'computer-use-2025-11-24' released, enabling higher-tier computer interaction capabilities on newer-generation Opus and Sonnet models.
- ◆2026-03-23 — Computer use became accessible to Pro and Max plan users without requiring a custom API setup, through the Cowork and Claude Code features in the Claude desktop application for macOS and Windows.
This is the short version
The full chapter has three worked examples, the common pitfalls, and the workflow that makes it pay — plus the other 84 features, kept current.
Get Claude Master — $97 →