Claude 3.5 Sonnet Computer Use: The AI That Operates Software for You
Anthropic's computer use beta lets Claude click, type, and navigate software autonomously. Here is what executives need to know before testing it.
What matters today
Anthropic's computer use beta lets Claude click, type, and navigate software autonomously. Here is what executives need to know before testing it.
Key points
- How Computer Use Works
- Setting Up Safely
- What Computer Use Cannot Do Well (Yet)
- Action Steps for Executives
What You'll Learn
- How Claude computer use works technically and what it can operate
- Which business workflows are the best candidates for autonomous operation
- How to set up a safe sandboxed testing environment during the beta
- The current limitations and where human oversight is non-negotiable
- A five-step plan for piloting computer use in your organization
A contract management analyst at a mid-size legal firm spends 90 minutes every morning opening vendor portals, downloading PDFs, extracting key dates and dollar amounts, and entering that data into a master spreadsheet. The work is accurate because it has to be. It is also entirely mechanical. Every click, every copy-paste, every tab-switch follows the same sequence, day after day.
That workflow is now a candidate for automation. Not through a custom integration or a no-code tool that requires maintaining API connections. Through Claude 3.5 Sonnet looking at the screen, reasoning about what to click next, and doing it.
Anthropic launched computer use in public beta on November 4, 2024. The gap between "AI assists" and "AI acts" just narrowed in a way that will define enterprise AI adoption in 2025. The teams that test autonomous workflows now will have 6 to 12 months of operational experience before this capability becomes standard.
SUBSCRIBER BREAK -- Premium Content Below
How Computer Use Works
Claude computer use operates through a loop: task in natural language, screenshot of current screen, reasoning about next action, execution of that action (click, type, scroll), another screenshot, and repeat until complete. The model sees the world as a series of screenshots. It does not have direct DOM access. This means it works with any application that has a visual interface, regardless of whether that application exposes an API.
Three categories of work are the clearest candidates for computer use automation: recurring data transfer workflows (reading from one system, entering into another), web-based research and monitoring, and form completion and submission across portals and government systems.
Setting Up Safely
Anthropic warns explicitly that computer use is susceptible to prompt injection from malicious web content. Running computer use on a live machine with access to sensitive accounts is not recommended during the beta period. The correct setup for initial testing follows five steps.
- Use a sandboxed virtual machine. Provision a clean cloud VM with no access to production systems or financial accounts. AWS EC2, Google Cloud, or Azure all work.
- Create test accounts. Set up throwaway accounts on any web services the model will interact with. Never use credentials for live systems during testing.
- Scope the task tightly. Start with a single workflow with a clear start and end state. "Extract three data fields from this website and put them in this spreadsheet" works. "Manage my inbox" does not.
- Review outputs before any real action. For the first 10 to 20 runs of any workflow, have a human review the output before it is used or forwarded. Build review checkpoints into the design for anything financial.
- Log everything. Capture screenshots at each step. If something goes wrong, the audit trail is how you understand where and why.
What Computer Use Cannot Do Well (Yet)
Captchas and bot detection will block many enterprise portal workflows. Highly dynamic interfaces that change based on screen size or user state may confuse the model. Numeric precision is a concern: the model reads numbers from screenshots, so very small or low-contrast text can produce misreads. And computer use is not fast: a 5-minute human task may take 15 to 20 minutes via computer use. Still valuable for volume and consistency, but not for time-sensitive work.
Action Steps for Executives
- Request API access. Go to console.anthropic.com and confirm your account has access to Claude 3.5 Sonnet. Computer use is available via the standard API with no separate approval as of November 2024.
- Identify your first candidate workflow. Pick a recurring task that uses a visual interface, follows a predictable sequence, and where a mistake is easy to catch. Document the exact steps a human currently takes.
- Provision a test environment. Stand up a clean VM with a browser. Install only the applications needed. Confirm no access to sensitive accounts.
- Write the task description. Give Claude a clear natural language description of what to accomplish. Include expected outputs and what to do if the model encounters an unexpected screen state.
- Run, review, iterate. Execute the workflow, capture all screenshots, review outputs. Identify where the model hesitated or erred. Refine and retry.
Three deep dives. Four useful moves. One email worth opening.
PromptHacker turns the AI firehose into practical next steps for work, health, family, and everything time keeps trying to steal.