Claude Opus 4.7 plus Routines: your first 24/7 AI workflow, wired up this weekend
Two Anthropic releases landed the same week. Opus 4.7 handles long-horizon work. Routines runs it on a schedule, an API call, or a webhook. Together they turn an overnight agent from a slide into something you can ship before Monday.
Anthropic shipped two things in the same week and most of the coverage treated them as separate headlines. They are not. Claude Opus 4.7 went generally available on April 14. Claude Routines hit research preview three days later. One is the model that finally handles long-horizon agent loops without silently collapsing halfway through. The other is the scheduler, API trigger, and webhook receiver that lets you run those loops unattended.
Together they are the first credible setup for an overnight agent. A routine that triages your project backlog at 5 a.m., drafts a status report by 6 a.m., and pings Slack before you open your laptop is no longer a conference-stage slide. It is something you can wire up before Monday.
This article covers what each release actually changes, the three routine types and when to stack them, the real cost you should expect after the new tokenizer, and a weekend build plan that starts small on purpose.
Opus 4.7: the numbers that matter for unattended work
The benchmark most teams quote is SWE-bench Verified: 87.6 percent, up from roughly 81 percent on Opus 4.6. That is the headline. It is not the number that decides whether you trust a model to run for six hours without you watching it.
The number that decides that is tool-call reliability across ten-plus-step chains. Anthropic's own evals show roughly one-third the tool errors of Opus 4.6 on complex multi-step tasks, and a 14-point jump on the internal long-horizon suite. For an overnight agent, that difference compounds. A 5 percent per-step failure rate becomes a 40 percent chance of a broken run over ten steps. A 1.5 percent per-step rate becomes a 14 percent chance. The first number is a coin flip on whether you wake up to anything useful. The second is something you can ship.
Two other under-covered changes matter more than SWE-bench for unattended work.
First, the new task budgets API. You set a token ceiling on a full agentic loop, the model prioritizes its own work against the budget, and it finishes gracefully rather than cutting off mid-tool-call. That is the primitive you needed to bound the cost of a misbehaving routine. Without it, a runaway loop on Opus pricing could quietly spend hundreds of dollars overnight. With it, you set the number and walk away.
Second, file-based memory finally works reliably. The "CLAUDE.md plus a notes directory" pattern was aspirational in the 4.5 era. In 4.7 the model reads and writes notes consistently across a session and picks them back up in the next one. That is what turns a series of one-shot agent calls into a system with memory, which is what turns a demo into a workflow.
Routines: the three trigger types, and when to use each
Routines ship in research preview with three trigger types. The first rule of building with them is that you will usually want to stack more than one on a single routine.
- Scheduled (cron-style). The obvious one. Run this routine every weekday at 5 a.m., every Friday at 4 p.m., first of the month at 9 a.m. Maps directly to a cron string inside the Routines dashboard. Best for rituals: standup digests, weekly reviews, monthly board-email drafts.
- API-triggered. The routine is addressable as a callable endpoint. You fire it from a button, a Shortcut, a Zapier step, or any other script. Best for on-demand work you want to offload off your own machine: "run the competitor check," "regenerate this week's metrics."
- Webhook-triggered. GitHub is first-class out of the gate; anything that can POST to a URL works. Best for event-driven workflows: a PR is opened and a routine posts a code-review draft; an incident fires in PagerDuty and a routine drafts the post-mortem skeleton; a HubSpot deal hits stage X and a routine drafts the kickoff doc.
Stacking matters. The overnight backlog triage example works best as a Scheduled run at 5 a.m. that is also exposed as an API endpoint so you can rerun it on demand at 11 a.m. when the first run missed something. The GitHub code-review routine works best as a webhook that is also callable from an API so you can replay it on a PR it chose to skip. Build the base routine once, attach all three trigger types, and you cover the retry and rerun cases without writing new logic.
Pricing reality: $5 and $25, but check the tokenizer
Headline pricing on Opus 4.7 is unchanged: $5 per million input tokens and $25 per million output tokens. If you use Opus in short bursts for reasoning, the sticker price is the price.
If you run agent loops on code-heavy prompts, the 4.7 release also shipped a new tokenizer. On our internal code-review and SWE-bench-style workloads, real token counts per equivalent prompt came in about 35 percent higher than under the 4.6 tokenizer. That is not a stealth price increase; it is a change in what counts as a token. The practical effect is the same: budget as if Opus 4.7 costs roughly $6.75 per million input and $33.75 per million output for code-heavy routines, and set task budgets accordingly. If your routines are mostly natural-language work over prose, the tokenizer delta is smaller and the sticker price holds.
The other cost to plan for is the floor. An overnight routine that runs seven days a week for a quarter will generate a meaningful bill if you do not cap it. The pattern we have landed on is: set a daily task budget, set a weekly ceiling in the dashboard, and have the routine itself log cost to a Google Sheet or your observability tool so you can see drift before the bill arrives.
The weekend build: one routine, three verified runs, then scale
The failure mode with new automation primitives is starting with the ambitious workflow and watching it fail in three different ways at once. The pattern that works is boring: one routine, one task, verified three times, before you add anything.
Here is the build I would do this weekend to get your first overnight loop running by Monday morning.
You are my overnight chief-of-staff agent. Every weekday at 5:30 a.m. Pacific, do the following:
1. Read every open issue in my GitHub project board.
2. Read the last seven days of messages in the Slack channel #eng-standup.
3. Cross-reference: name the three issues most likely to need my attention today, based on recent activity, blockers, or stale status.
4. For each, write one sentence on what is blocking it and one on the recommended next action.
5. Post the result to the Slack channel #morning-digest as a threaded message, with the subject "Morning digest, [date]."
6. Log total token cost for the run to the Google Sheet "Routine cost log" in a new row.
Rules: task budget 40,000 tokens. If you exceed 35,000 tokens before step 5, skip step 4 for the third issue and still post. If a source returns nothing, say so in the digest so I know whether you skipped or the connection is broken. No hedging.
Wire that up as a Scheduled routine, then attach an API trigger so you can run it on demand. Verify it runs cleanly three mornings in a row before you add a second routine. The goal for week one is not impressive scope; it is confidence that the scheduler, the tool-call reliability, and the budget ceiling all behave the way the release notes claim.
Three routines that are already earning their keep
After two weeks of running Routines with Opus 4.7 in production, three patterns have paid for themselves. Copy whichever matches your week.
- The 5 a.m. backlog triage. Scheduled daily, stacked with an API trigger. Reads project board, deal pipeline, and inbox flags. Posts three items to Slack with the blocker and the recommended action. Replaces the 20 minutes of flailing most executives do between coffee and the first meeting.
- The GitHub PR code-review draft. Webhook-triggered on pull request open. Reads the diff, the linked issue, and the last two merged PRs in the same area. Posts a draft review as a comment the human reviewer can edit in two minutes instead of writing from scratch in twenty.
- The Friday close-the-week routine. Scheduled weekly at 4 p.m. Cross-references calendar, Slack mentions, HubSpot call logs, and meeting transcripts. Produces the executive weekly review we covered in the companion Pro Tip this week. Routines is what lets that prompt run without you remembering to push the button.
Where it still breaks
Three honest failure modes after two weeks of daily use.
- Webhooks outside GitHub are rougher. GitHub has first-class support. Everything else works, but the dashboard ergonomics around signature verification, retry behavior, and event filtering are thin. Plan an extra hour per non-GitHub webhook.
- Research preview means some routines silently pause. Roughly once a week we see a scheduled routine skip a run with a terse error about capacity. The recovery is clean (the next run fires normally), but if the routine is load-bearing you want an alert on missed runs, not just a daily digest.
- File memory is reliable but not infinite. The CLAUDE.md plus notes pattern works. It does not replace a real database. If your routine needs durable state across hundreds of runs, write it to a sheet or a Postgres table; do not rely on the agent's own notes folder as your source of truth.
Monday morning action
Two hours this weekend is enough to ship your first overnight agent.
- Upgrade your Anthropic account to a tier with Routines access (Pro or above in research preview). Confirm you can see the Routines dashboard.
- Pick one of the three routines above. Do not invent your own on day one.
- Wire it as Scheduled plus an API trigger. Set a daily task budget. Run it manually three times and watch the output land where you expect.
- Turn the schedule on for Monday morning. Add one alert that fires if the routine does not post by 6 a.m.
The point is not to replace your judgment. It is to make sure that by the time you open your laptop on Monday, the first twenty minutes of flailing is already done. The model cannot make the hard call. It can make sure you know exactly which call is hard, and post it to Slack before your first meeting.
Pick the next useful thing.
Build a Safe vs Risky AI Chatbot Detector Game with Your Kid
A 60-minute family activity that teaches kids to spot risky chatbot answers with zero screens required for the core lesson.
Turn Apple Watch Sleep Data into One Better Week with GPT-5.5
A five-minute Sunday ritual using Apple Watch sleep data and GPT-5.5 to pick one practical behavior change.
The $65 Billion Anthropic Bet: What It Means for Your Stack
What Google and Amazon investment means for pricing, tooling, and your 2026 agent roadmap.
Three deep dives. Four useful moves. One email worth opening.
PromptHacker turns the AI firehose into practical next steps for work, health, family, and everything time keeps trying to steal.
No comments yet