Within 18 days, Google killed its pixel-level agent and Anthropic improved theirs
On April 16, Anthropic shipped Claude Opus 4.7 with 1:1 pixel-coordinate Computer Use and 3.75-megapixel vision support. On May 4, Google shut down Project Mariner and moved its features to API-first agents. Two of three frontier labs just publicly disagreed on what an agent should see, and the registry layer has nothing to say about the pixel-level branch yet.
Two of the three frontier labs made structurally opposite decisions about pixel-level agent control inside an eighteen-day window in April and May 2026. On April 16, Anthropic released Claude Opus 4.7 with explicit improvements to its Computer Use capability: 1:1 pixel-coordinate mapping (Opus 4.6 required a scale-factor correction step) and vision support up to 2,576 pixels on the long edge, roughly three times the prior limit. On May 4, Google shut down Project Mariner, its screenshot-based web-browsing agent, citing high compute costs, reliability issues, and privacy concerns. Mariner's features were absorbed into the Gemini API and the new Gemini Agent rather than discontinued outright; the architecture, in other words, was retired. The capability moved to API-first agents.
Two frontier labs are now placing different bets on what an agent should see. The bet matters because pixel-level agent control is a fundamentally different trust model from API-mediated agent control — and the registry layer has nothing to say about it today.
The split, drawn out
Three events, three positions. Cloudflare bet on code-level orchestration through a sandbox. Anthropic doubled down on pixel-level operation through improved vision. Google exited pixel-level operation in favor of API integration. None of those bets is necessarily wrong; they describe three different views of what an agent's eyes and hands should be.
What "pixel-level" actually means
A pixel-level agent operates on the screen the user sees. It takes a screenshot, reasons about what it can see, decides which pixel to click, sends a mouse or keyboard event, takes another screenshot, and loops. There is no API contract. The agent's understanding of the interface comes entirely from vision, and its actions are bound only by what an actual user could do at the same keyboard.
The capability that makes possible is broad: a pixel-level agent can operate on any application a human can, including ones with no API, ones with broken APIs, and ones whose APIs intentionally exclude automation. That last category is large. CAPTCHA-protected sites, SaaS dashboards built without machine consumers in mind, legacy desktop applications, internal enterprise tools whose owners never wrote a public API — the pixel-level agent reaches all of them.
The cost of that breadth is what Google named in shutting Mariner down. Vision tokens are expensive (Anthropic's 2,576-pixel images at megapixel scale carry significant token cost per screenshot). The agent's reliability degrades whenever the UI changes its layout. Privacy concerns scale with how much of the screen the agent has to see, which in practice is everything. Each cost is real and known. The question is whether the unique capability justifies them.
The trust model is different in kind
The API-mediated agent stack we've covered across this series (MCP servers, A2A signed cards, ERC-8004 identities, Cloudflare Code Mode) shares a structural property. The agent's available actions are bounded by a published specification, however informal that specification turns out to be in practice. A registry can probe what the spec claims and publish what the probe found. The probe is the load-bearing claim and the spec is the artifact a consumer can read.
Pixel-level agent control inverts that property. The agent's available actions are bounded only by what a pixel-level human-input loop allows on the target machine. There is no spec to probe. There is no card to sign. The capability surface is "the screen," and the screen is whatever the operating system happens to render at the moment the agent looks at it. A consumer cannot ask a registry "what will this agent do" in the way they can ask of an MCP server, because the answer depends on what the agent sees in the moment.
That distinction has direct security implications. The supply chain attack pattern from the previous post relied on injecting a rogue MCP server into a config file. The pixel-level equivalent would be UI redress — a malicious page that visually impersonates the page the agent expects to see. Browser sandboxing partly defends against this for web-only agents. Desktop-wide computer-use agents inherit the same problem at OS scope, and the defenses are less mature.
What this means for the registry layer
A useful question for the next year of agent infrastructure is what a registry would even publish about a pixel-level agent. The signal a consumer needs is not "this agent's tool catalog" but something closer to "this agent has been observed to behave safely in the following pixel-level environments under the following input policies." That kind of probe data is harder to gather than an MCP card probe. It requires running the agent against real environments and recording the actions it takes.
Anthropic's Opus 4.7 announcement covers half of this question by specifying automated real-time cybersecurity safeguards that interpose between the agent and the actions it tries to take. That is a runtime guardrail, not a registry signal. The two compose: a registry can publish what guardrails an agent ships with, and a probe can publish what behavior emerges when those guardrails meet the screen.
Agenstry's funnel approach was built for spec-bounded API agents. The pixel-level branch of the agent stack needs a different probe shape. The shape is observable, but it's not the shape any registry is publishing today.
What we're watching
Three things, observable within the next two quarters:
- Whether OpenAI picks a side. Anthropic ships Computer Use, Google retreats from Mariner. OpenAI has shipped sandboxing for API-mediated agents and demonstrated browsing-style agent capabilities in research, but has not publicly committed to a pixel-level production offering at the Anthropic level. The next GPT-class release will indicate whether OpenAI sees Computer Use as a load-bearing capability or as a research direction.
- Whether the published vendor-side cost of pixel-level agent sessions converges. Vision tokens, repeated screenshot rounds, and the runtime safeguards Anthropic is now shipping all cost real money. The first transparent cost-per-pixel-session pricing model will set the economic gravity for the rest of the field.
- Whether a public probe registry for pixel-level agents emerges. The probe shape is different from MCP ("what does this agent do when shown the following screen") but the measurement is concrete and the population is small enough to sample. The first registry that publishes that data will define what pixel-level agent transparency looks like.
The headline reading of the last eighteen days is "Google killed an AI agent." The structural reading is that two of three frontier labs just publicly disagreed about what an agent's eyes should be. The disagreement is real, the trade-offs are real, and the registry layer that has spent the past year cataloging the API-mediated branch has no equivalent for the pixel-level branch yet. The work that closes that gap is the work the next chapter of agent infrastructure has to do.
Sources
- Introducing Claude Opus 4.7 — Anthropic, April 16, 2026.
- What's new in Claude Opus 4.7 — Claude API Docs, accessed May 2026.
- Project Mariner is dead, but Google's browser-controlling AI plans are not — TechSpot, May 2026.
- Google Kills Project Mariner as the Industry Pivots to API-First Agents — AI2Work, May 2026.
- Anthropic Releases Claude Opus 4.7 with Automated Real-Time Cybersecurity Safeguards — Cybersecurity News, April 16, 2026.
- Computer use tool — Claude API Docs — Anthropic, accessed May 2026.