Issue 121 - The Week AI Became Native
Gemini shipped a Mac app. Anthropic shipped Opus 4.7. Microsoft shipped a control plane for agents. What executives actually need to do this week.
The Week AI Became Native
April 23, 2026 · 14 MIN READ
Good morning.
For about eighteen months, AI for executives was a tab in your browser. You opened ChatGPT, you opened Claude, you paid them each $20, and you pretended the context-switch tax was the cost of doing business.
That ended last week.
Google shipped a native Gemini app for Mac on April 15, with a global Option+Space shortcut that puts the model one keystroke away from anything on your screen. Anthropic shipped Opus 4.7 on April 16, posting the highest real-world software engineering score of any production model and cutting tool-call errors by roughly two thirds. And Microsoft confirmed that Agent 365, the governance layer for every AI agent running inside an enterprise, ships generally on May 1 as part of the new E7 suite.
Three different companies. One message: AI is moving from tab to operating system, and from individual assistant to governed fleet. If you are the person who signs off on tooling, this is the week your mental model needs to change.
Brought to you by Manus · Sponsored
The AI agent that finishes the work while you are in the meeting.
Manus is the general-purpose agent executives are quietly handing the research reports, the competitive teardowns, the first draft of the board memo, and the inbox triage. It opens the browser, reads the docs, runs the analysis, and delivers the deliverable. Not a chatbot. A teammate that ships.
- → Run hours of research, writing, and analysis in one async task
- → Works in the background while you are in your next meeting
- → Output: a finished document, not a chat transcript
manus.im · invitation included
Anthropic previews Claude Design
Launched April 17 in research preview for Pro, Max, Team, and Enterprise users. Lets you describe a visual and collaborate with Claude to iterate quickly, aimed at decks and campaign assets rather than polished production design.
OpenAI ships GPT-Rosalind for life sciences
A research-preview reasoning model tuned for biology, drug discovery, and translational medicine, plus a new Codex research plugin. Launching through a trusted-access program for qualified U.S. enterprise customers.
GPT-5.4 mini rolling out in ChatGPT
A cheaper, faster variant of OpenAI's current flagship, now live for consumer ChatGPT users. Business and Enterprise support coming soon. Worth benchmarking against Haiku 4.5 and Gemini Flash if you pay for inference.
Copilot Studio opens multi-agent orchestration
Microsoft made multi-agent coordination generally available across Microsoft Fabric, the Microsoft 365 Agents SDK, and the open Agent-to-Agent protocol. Agents can now hand work off to each other inside your tenancy without custom glue code.
Source: Microsoft Copilot blog
Claude Cowork hits general availability
Anthropic's desktop "do work for me" mode is now GA on macOS and Windows, with expanded analytics, OpenTelemetry support, and role-based access controls on Enterprise plans. The equivalent productivity surface to Gemini's new Mac app, aimed at non-developers.
Stanford AI Index: 70% of companies now using generative AI
The 2026 AI Index puts generative AI adoption at roughly seven in ten companies using it in at least one business function. The "should we pilot AI" conversation is over.
Opus 4.7 is the first model that actually closes tickets
Anthropic released Claude Opus 4.7 on April 16. The only number that matters is SWE-bench Pro: 64.3 percent, the highest score any production model has posted on real-world software engineering tasks. That is 6.6 points ahead of GPT-5.4 and roughly 11 points better than Opus 4.6. Tool-call errors dropped by about two thirds in Anthropic's own evals. Pricing is flat at $5 per million input tokens and $25 per million output. Cost-per-resolved-ticket drops materially even with the same sticker price, because you stop paying for the 6 to 9 percent of runs that used to fail silently.
Business Impact
- Eliminates: The "is the agent going to fail silently halfway through?" hedge that made most teams keep a human in the loop.
- Time back: A third fewer tool errors on multi-step workflows. If your agents touch GitHub, Jira, or your data warehouse, the blast radius of retries shrinks materially.
- Best for: Coding agents, customer support pipelines that call internal tools, and any workflow where you were about to hire a prompt engineer to babysit fallbacks.
Executive Action Steps
- Switch your highest-value agent (the one closest to revenue or cost) to Opus 4.7 on Monday. Keep 4.6 on a parallel route for two weeks of A/B.
- Re-run last month's top ten failed tasks. If the error rate drops under 10 percent, roll 4.7 forward as the default and sunset 4.6.
- If you have a custom eval harness, add SWE-bench-style "did the tool call succeed and produce the intended side effect" checks. Model quality is moving past tone scoring.
- Cache aggressively. Same pricing as 4.6 means any per-task cost reduction is coming from your side, not theirs.
Why it matters: The gap between "LLM that talks well" and "LLM that does work" was the entire 2025 story. 4.7 is the first checkpoint where the second one is convincingly the winner.
Gemini for Mac is the first keystroke AI with Nano Banana and Veo built in
Google released the native Gemini app for macOS on April 15, free globally on macOS 15 and up, with a global Option+Space shortcut that invokes Gemini 2.5 Pro over whatever is on your screen. First-party screen awareness, Nano Banana for image generation, and Veo for video generation all ship integrated. 20 million people installed it in the first eight days. It does not yet read or write files, drive other applications, or call Google Workspace data, which is why it does not replace Cowork or Comet for execution workflows. It does replace the "open a ChatGPT tab and paste a screenshot" reflex completely.
Business Impact
- Eliminates: The copy-paste tax. Every screenshot and "let me paste this into the chat" round trip is now a keystroke.
- Time back: Roughly 30-40 minutes per knowledge-worker day if you actually retrain muscle memory.
- Best for: Exec assistants, sales leaders, anyone who lives inside slide decks, PDFs, and browser tabs.
Executive Action Steps
- Install the app at gemini.google/mac. The default Option+Space shortcut does not collide with Spotlight (which is Cmd+Space), but it will collide with Alfred, Raycast, and most window managers. Remap before you roll out.
- Pick three recurring moments in your day where you currently copy something into a browser tab. Retrain to Option+Space instead. Measure in a week.
- Before rolling out company-wide, check your DLP posture. Gemini can see whatever is on screen when invoked.
- Treat this as a dress rehearsal. Apple Intelligence and Copilot desktop are converging on the same OS-level pattern.
Why it matters: When AI is always one shortcut away, the bottleneck stops being "did you open the tab" and becomes "do your people know what to ask for."
Microsoft Agent 365 makes agents a budget line, not a science project
Microsoft confirmed that Agent 365, its control plane for AI agents, hits general availability on May 1 as part of the new Microsoft 365 E7 suite. E7 bundles E5, Microsoft 365 Copilot, the Entra Suite, and Agent 365. Translation: Microsoft now sells a single SKU that says "humans plus governed agents, one contract, one admin console."
Business Impact
- Eliminates: The "one-off shadow agent" problem. If it runs inside your tenancy, Agent 365 sees it, and your CISO can log, gate, or kill it.
- Time back: Removes the typical 60 to 90 day security review for each new agent pilot.
- Best for: Any organization with more than one agent in production and a security team that has started asking hard questions.
Executive Action Steps
- Get an answer from your Microsoft rep on E7 upgrade pricing this week. Agent 365 is not sold standalone, so the SKU math matters.
- Inventory every agent running in your environment today, including Copilot Studio, Power Automate, Zapier, and homegrown scripts.
- Designate an "agent owner" role on your IT org chart before May 1. Not a committee. One person accountable for policy, cost, and incident response.
- Benchmark E7 against the Anthropic Enterprise plus Okta plus OpenTelemetry stack.
Why it matters: The AI budget conversation in 2025 was "how many seats of Copilot?" In 2026 it becomes "how do we govern a thousand agents we did not build?" Microsoft just priced the answer.
Claude Opus 4.7 plus Routines: your first 24/7 AI workflow
Anthropic shipped two releases in the same week and they are better together. Opus 4.7 went generally available on April 14 with stronger long-horizon performance at the same sticker price. Routines hit research preview three days later: scheduled runs, API-triggered runs, and GitHub webhook-triggered runs, all on top of Opus. The model that finally handles ten-plus-step agentic loops without silently collapsing now has the infrastructure to run them while you sleep. The overnight agent is no longer a slide; it is a weekend build.
Business Impact
- Eliminates: The "is this agent loop going to fail silently at step six?" hedge that kept unattended work gated behind humans. Roughly one-third the tool-call errors of Opus 4.6 on complex chains.
- Time back: 15 to 20 minutes per morning for any executive who used to start the day flailing through inboxes, dashboards, and Slack backlogs before the first meeting.
- Best for: Anyone who has wanted a 5 a.m. chief-of-staff agent, a GitHub PR review bot that actually posts a usable draft, or a Friday close-the-week routine that runs without you.
Executive Action Steps
- Upgrade your Anthropic account to a tier with Routines access (Pro or above in research preview) and confirm the Routines dashboard loads.
- Ship one routine this weekend, not three. Wire it as Scheduled and also expose it as an API endpoint for manual reruns. Set a daily task budget.
- Run it manually three mornings in a row before you flip the schedule on. The goal is confidence, not scope.
- Budget roughly 35% more real tokens on code-heavy prompts. Pricing is unchanged at $5 in and $25 out per million, but the new tokenizer counts more tokens for the same code. Cap spend in the dashboard and log cost per run.
Why it matters: Opus 4.7 makes multi-step agentic loops reliable. Routines makes them scheduled, callable, and event-driven. Together they are the first credible setup for a 24/7 AI workflow you can ship before Monday.
Subscribe
Want next week's issue in your inbox?
Three deep dives. Four tips. Zero filler.
The Opus 4.7 Friday review that reads your calendar, inbox, meetings, Slack, and deals on its own
Connect five MCP sources once (Calendar, Gmail or Outlook, Granola or Fireflies, Slack or Teams, and HubSpot), then run this prompt every Friday at 4pm. Claude Opus 4.7 fetches the week on its own. No pasting, no exports, no screenshots. Twelve minutes to a real executive review.
You are my chief of staff. Run a weekly review covering the trailing seven calendar days, ending today.
Pull the following on your own using my connected tools:
• Calendar: every meeting I attended, title, attendees, duration. Flag anything longer than 45 minutes or with more than six attendees.
• Email: threads in my "Flagged" or "Follow-up" label plus anything I sent more than two replies in.
• Meeting notes: Granola (or Fireflies / Zoom) transcripts and action items from the week.
• Slack/Teams: every direct message, mention, and thread I reacted to in the last seven days.
• HubSpot: every call logged on a deal, every deal that changed stage, every contact I touched.
Produce a review in this exact structure. No preamble. 1) Three wins. 2) Three things that slipped. 3) The single most important decision I must make before end of day Tuesday next week. 4) People I owe a reply, sorted by cost of forgetting. 5) The one meeting I can kill from next week. 6) The one deal most at risk based on the week's data. Evidence required. No hedging. Under 500 words. If a source returned nothing, say so explicitly.
Claude Cowork: the three MCP-backed routines that make the install non-negotiable
Cowork hit general availability this week. It is the only desktop AI that reads and writes your actual systems of record through MCP. The mistake most people make is treating it like a faster chat window. It is a teammate. Connect Calendar, Gmail, HubSpot, and Drive once, then run these three routines on day one. None of them require pasting or exporting anything.
- The Monday briefing. Cowork pulls your calendar, your flagged email threads, and your top HubSpot deals on its own and writes a one-page briefing to your workspace folder. Open in any editor and edit for five minutes. Weekly spine done.
- The MCP inbox triage. Cowork reads your inbox directly, sorts by urgency using Calendar context (who you are meeting this week matters), and drafts two-sentence replies to urgent threads. You approve or revise inside the workspace. Gmail Send stays off.
- The deal pulse check. Cowork queries HubSpot for deals changed this week, cross-references meeting transcripts via Granola or Fireflies, and writes a one-page risk report. Catches the deal that slipped a week before the rep reports it.
The Apple Watch sleep briefing that turns last week into one change
Most of us wear an Apple Watch to bed, glance at the sleep score, and do nothing with it. One night is never the signal. The pattern across a week is. Here is the whole ritual.
The 60-second export
Open the Health app on iPhone. Tap Browse, then Sleep. Switch the view to "W" for week. Screenshot it. Scroll down to "Show More Sleep Data" and screenshot the averages block. Two screenshots, 60 seconds.
The prompt
Here are two screenshots from Apple Health: a week view of sleep stages from my Apple Watch and the averages block underneath. Do not diagnose anything. Do this:
1. Summarize the week in three sentences: average total sleep, consistency, most unusual night.
2. Name one pattern across the seven nights (bedtime drift, REM dip, fragmented nights).
3. Suggest ONE behavior change to test next week. Specific, measurable, free. Phrase it as "this week, try..."
4. Name the one thing I should watch in next week's data to know whether the change worked.
Rules: no hedging, no "consult a doctor" boilerplate, no lists longer than three items, total response under 250 words.
One pattern, one change, one thing to watch. Five minutes. Test next week. That is the loop.
Teach your kid how an AI “thinks” without a screen
The hardest thing for a kid (and many adults) to understand about AI is that the model follows instructions literally. The best way to teach it is not with a laptop. It is with a jar of peanut butter.
The setup
Bread, peanut butter, jelly, a butter knife, a plate, a notebook. Sit your kid down with the notebook. Tell them: "You are going to write down instructions for how to make a PB&J. I am going to be a robot. I will do exactly what you write down, nothing more and nothing less."
Play the robot literally
"Put peanut butter on bread." Set the unopened jar on top of the bread bag. They laugh, write a better rule, you find a new way to break it. Twenty minutes later the sandwich is made and the instructions are three pages long.
The four lessons
- Computers and AI do not have common sense.
- Every assumption has to be spelled out.
- Order matters.
- When an instruction is ambiguous, the result is surprising.
Then open Claude or ChatGPT together. Watch their first prompt. Ask, "Do you think the AI has common sense?" Watch the penny drop. That is AI literacy.
That is the week.
If even one of these moved your thinking, hit reply and tell me which one. Every reply lands in my personal inbox. I read all of them, I write back, and the best responses shape what ships in Issue 122.
Back Monday with the next one.
Pierre
PromptHacker.ai
Forward to a colleague
Know an executive who would use this on Monday?
Send them this issue. The best new readers always come from another reader.
The briefing executives read before Monday.
Every week: three deep dives on what actually shipped, the Pro Tips and Productivity Gems that earned their keep, and vendor-comparison matrices your team can act on. No filler.
- ✓ Full weekly Pro Guides with charts and cost math
- ✓ Quarterly live executive briefings
- ✓ Searchable archive of every issue
- ✓ Cancel any time, no lock-in
Pick the next useful thing.
Build a Safe vs Risky AI Chatbot Detector Game with Your Kid
A 60-minute family activity that teaches kids to spot risky chatbot answers with zero screens required for the core lesson.
Turn Apple Watch Sleep Data into One Better Week with GPT-5.5
A five-minute Sunday ritual using Apple Watch sleep data and GPT-5.5 to pick one practical behavior change.
The $65 Billion Anthropic Bet: What It Means for Your Stack
What Google and Amazon investment means for pricing, tooling, and your 2026 agent roadmap.
Three deep dives. Four useful moves. One email worth opening.
PromptHacker turns the AI firehose into practical next steps for work, health, family, and everything time keeps trying to steal.
No comments yet