Google Managed Agents API Launches Isolated AI Sandboxes
Product Launch

Google Managed Agents API Launches Isolated AI Sandboxes

Google's Managed Agents API lets developers spin up isolated Linux AI environments via one API call, powered by Gemini 3.5 Flash and built on Antigravity.

Share:XLinkedIn

Key Takeaways

  • Single API call provisions full agent stack: Managed Agents spins up an isolated Linux environment with code execution, Google Search, and web browsing, eliminating the 40-60% of agent engineering effort previously spent on infrastructure setup
  • Gemini 3.5 Flash scores 76.2% on Terminal-Bench 2.1: the highest published benchmark score on the primary agentic evaluation framework, giving Google a measurable model quality advantage at launch
  • Per-compute-minute pricing for enterprise: billing on active work time rather than tokens directly addresses the cost unpredictability complaint that was Agentforce customers top concern in Q1 2026
  • AGENTS.md and SKILL.md extensibility: file-based, version-controllable agent definitions align with the developer workflow convention being standardized across the industry, lowering skill transfer cost between platforms
  • Google Cloud enterprise AI at $9.4 billion annually: Managed Agents converts one-time API transactions into sustained infrastructure revenue, making every successful enterprise agent deployment a recurring contract rather than a usage spike

Building an AI agent that can actually do things has required, until today, a real infrastructure investment: a provisioned server, a containerized execution environment, a security boundary, file system access, web browsing capability, and a cleanup routine that prevents sandbox state from leaking between runs. At Google I/O 2026, that entire stack became a single API call. Google's Managed Agents in the Gemini API provisions all of it on demand, wraps it in an isolated Linux environment, and tears it down when the task completes. The developer writes a prompt. The rest is Google's problem.

What Actually Happened

Google launched Managed Agents in the Gemini API at I/O 2026, making it generally available to developers via the Interactions API and in Google AI Studio. A single API call provisions a remote Linux environment where an agent can reason, plan, call tools, execute code, manage files, and browse the web, all inside an isolated sandbox hosted by Google. The experience is powered by the Antigravity agent, which Google built on Gemini 3.5 Flash, the model that posted the highest scores on the agentic benchmarks Terminal-Bench 2.1 (76.2%), GDPval-AA (1656 Elo), and MCP Atlas (83.6%) at I/O 2026. Every provisioned agent instance starts with three default tools: code execution, Google Search, and URL context. The environment is ephemeral, meaning state does not persist between calls unless developers explicitly configure storage, preventing the class of security failures that plagued early multi-session agent deployments.

The API is extensible through a file-based instruction system. Developers can define custom agent behavior by creating AGENTS.md and SKILL.md configuration files and registering them as named managed agents via the Gemini API. An AGENTS.md file sets the agent's persona, capabilities, and operating constraints. A SKILL.md file defines reusable task modules the agent can invoke mid-run. This structure is functionally identical to the CLAUDE.md convention that Claude Code popularized for coding agents, a clear signal that Google is targeting the same developer workflow. Custom agents built on this system can be deployed through the Gemini API, shared across a team, or published to the Google AI Studio agent library. The entire system is version-controlled, meaning agent definitions can be managed in the same repository as application code rather than in a separate configuration database.

Enterprise access is available through the Gemini Enterprise Agent Platform, which entered preview alongside the consumer API launch. Enterprise-tier Managed Agents add audit logging, resource quotas, access controls, and integration with Google Cloud's identity management system. Enterprises can configure agents to run inside a VPC perimeter that prevents data exfiltration, satisfying the data residency requirements that have blocked cloud AI adoption in regulated industries. Google confirmed that enterprise Managed Agents are priced per active compute minute rather than per token, aligning incentives between the developer and the platform: you pay when the agent is doing work, not when it is reasoning about what to do next. The pricing model is a direct response to the complaint that token-based pricing makes agentic workflows economically unpredictable because agents generate large numbers of intermediate reasoning tokens that add cost without adding user-visible value.

Stay Ahead

Get daily AI signals before the market moves.

Join founders, investors, and operators reading TechFastForward.

Why This Matters More Than People Think

The removal of infrastructure friction is not just a developer convenience story. It changes who can build production-grade AI agents from a small set of well-funded engineering teams to essentially any developer with a Gemini API key. Before Managed Agents, shipping an agent that could reliably execute code, browse the web, and handle file operations required maintaining containerized infrastructure, writing security policies for untrusted code execution, implementing timeout and retry logic, and debugging the interaction between the agent's reasoning and the execution environment. That work alone consumed 40 to 60% of the engineering effort in early agent projects, based on reported timelines from companies that shipped agents in 2024 and 2025. Managed Agents eliminates that entire workload. The time-to-first-functional-agent drops from weeks to hours.

The benchmark numbers for Gemini 3.5 Flash, which powers the Antigravity agent underneath Managed Agents, are the hidden story of this announcement. Terminal-Bench 2.1 measures an agent's ability to complete real shell tasks in a live Linux environment, the exact skill set needed for Managed Agents. A 76.2% success rate on Terminal-Bench means the underlying model can complete roughly three out of four real-world terminal tasks without human intervention. The MCP Atlas benchmark, which tests tool use across the Model Context Protocol, is the emerging standard for evaluating agent-native capability in 2026. Scoring 83.6% on MCP Atlas positions Gemini 3.5 Flash as the strongest publicly benchmarked model on agentic tasks as of this announcement. Developers building Managed Agents are not just buying infrastructure convenience; they are accessing the highest-performing agentic model currently available through any major provider's API.

The per-compute-minute pricing model for enterprise Managed Agents deserves closer examination because it signals a fundamental rethinking of how AI infrastructure should be billed. Token pricing creates a perverse incentive: models that reason extensively before acting generate more billable tokens, which means more thoughtful agents cost more than impulsive ones. Compute-minute pricing inverts this: an agent that completes a task in 30 seconds costs less than one that takes 3 minutes, regardless of how many tokens it generated internally. This rewards efficiency and enables developers to build agents that reason as thoroughly as needed without watching the token meter. Salesforce reported in its Agentforce Q1 2026 earnings commentary that token cost unpredictability was the single largest complaint from enterprise Agentforce customers. Google's pricing choice is a direct market attack on that vulnerability.

The Competitive Landscape

The direct competitors for Managed Agents are OpenAI's Codex API, which provides a similar remote code execution environment, and Amazon Bedrock Agents, which offers managed agent orchestration on AWS infrastructure. OpenAI Codex executes code in isolated containers and has demonstrated strong software engineering benchmarks, but it focuses narrowly on coding tasks rather than general-purpose reasoning and web interaction. Amazon Bedrock Agents integrates deeply with AWS services and is the default choice for enterprises already committed to AWS infrastructure, but the agent definition model requires more configuration overhead than Google's AGENTS.md file approach. Microsoft's Azure AI Agent Service, announced at Build 2026, targets the same enterprise segment but with a focus on integration with Office 365 data rather than open-ended web browsing and code execution. The competitive gap that Managed Agents occupies is the intersection of general-purpose reasoning, live web access, and managed infrastructure at developer-friendly pricing.

The historical parallel worth studying is AWS Lambda's 2015 launch, which eliminated server provisioning for functions and triggered a wave of serverless architecture adoption that took three years to become mainstream but ultimately restructured how backend services were built. Lambda's initial adoption was slow because developers underestimated the operational convenience and overestimated the performance trade-offs. Managed Agents is facing an identical adoption curve challenge: developers who have already invested in custom agent infrastructure are unlikely to migrate immediately, and developers who have not yet built agents will evaluate multiple options rather than defaulting to Google's. The path to dominance runs through the second group, the developers building their first real agents now, who have no sunk costs and will default to the lowest-friction option. That is precisely the segment that Managed Agents is optimized to capture.

Critics argue, however, that Google's managed infrastructure creates a dependency that is harder to escape than a standard API integration. An agent built on Managed Agents runs inside Google's Linux environment, uses Google's search tools, and generates logs stored in Google's systems. The portable part is the AGENTS.md instruction file; the execution infrastructure is entirely Google-controlled. A developer who needs to migrate to a different provider for cost, performance, or regulatory reasons must rebuild the entire execution layer from scratch. The OpenAI SDK and the Anthropic SDK both allow the same agent code to run on self-hosted infrastructure by swapping the model client, while Managed Agents intentionally abstracts away the infrastructure layer in a way that makes portability structurally difficult. For startups building their first agent product, that trade-off is probably worth taking. For enterprises with five-year infrastructure commitments and vendor management policies, it deserves careful analysis before adoption.

Hidden Insight: The Agent Platform Wars Are Really Infrastructure Wars

The AI industry has framed the competitive dynamic of 2026 as a model quality race: whose reasoning is best, whose benchmarks are highest, whose safety record is cleanest. Managed Agents reveals that the real competition is at the infrastructure layer. Google is not just selling a better model. It is selling a platform where the hardest parts of building production agents, the execution environment, the security boundary, the web access, the state management, are owned by Google and billed as a utility. The developer who builds on Managed Agents is not writing infrastructure code; they are writing business logic. That is a qualitatively different relationship between developer and platform than any previous AI product has offered. The question for the next 24 months is whether the developer who builds on Managed Agents today is still comfortable building on it when their product has 10 million users and Google controls the execution environment their business depends on.

The AGENTS.md and SKILL.md extensibility system is more strategically important than Google's documentation suggests. By adopting a file-based, version-controllable instruction format, Google is positioning agent definitions as code artifacts rather than configuration records in a proprietary database. This means agent definitions can be reviewed in pull requests, deployed through CI/CD pipelines, and rolled back with standard git tooling. The developer who writes AGENTS.md today will write a similar file for Claude Code, for GitHub Copilot agents, and for any future agentic platform, because the concept of a markdown-based agent instruction file is becoming the de facto standard across the industry. Google is not inventing this standard; it is adopting and accelerating it, which is the smarter move. Anthropic's CLAUDE.md convention and Google's AGENTS.md convention will converge toward a common format, and the developer who learns either will transfer skills to the other immediately.

The Gemini Enterprise Agent Platform preview, released simultaneously with the developer API, reveals Google's actual business target. Developer adoption drives headline announcements; enterprise contracts drive revenue. The enterprise version's compute-minute pricing, VPC perimeter support, and identity management integration are not developer conveniences; they are the features that enterprise procurement teams require before signing a platform contract. Google Cloud's enterprise AI revenue grew from a negligible base in 2023 to approximately $9.4 billion annually by Q4 2025. Every enterprise that builds production agents on Managed Agents commits its agent workloads to Google Cloud infrastructure, which means every successful agent deployment is a sustained revenue stream rather than a one-time API transaction. Google is building a recurring revenue business on top of a developer convenience product, and the two reinforce each other in a way that pure model API businesses do not.

The benchmark position matters more than most commentary has acknowledged. Terminal-Bench 2.1 and MCP Atlas are relatively new evaluation frameworks, but they are quickly becoming the standards by which enterprise buyers evaluate agent platforms. A procurement team evaluating Managed Agents against Amazon Bedrock Agents will increasingly rely on these benchmarks as proxies for real-world performance, the same way cloud buyers used SPECint benchmarks for servers in the 2000s. Google has the highest published scores on both frameworks as of this announcement. The company that leads the benchmark leaderboard when a standard solidifies tends to hold that position for several years, because product roadmaps and enterprise contracts form around the assumption of continued leadership. Google needs to maintain the benchmark advantage for 12 to 18 months to convert it into durable market share.

What to Watch Next

The 30-day signal is developer adoption velocity in Google AI Studio. Google AI Studio's active user base was approximately 4.2 million developers as of I/O 2026 based on published figures. If Managed Agents shows 500,000 or more active agent projects within 30 days, it validates the hypothesis that removing infrastructure friction alone is sufficient to unlock a wave of new agent development. Watch Google's developer blog and the Gemini API changelog in June and July 2026 for usage milestone announcements, as Google has historically published early adoption numbers for major API features when they are strong. A slow start, below 100,000 projects in the first month, would suggest that the developer market for general-purpose managed agents is smaller than Google's internal projections estimated.

The 90-day signal is the first wave of enterprise Gemini Enterprise Agent Platform contract announcements. Google typically publishes customer references within 60 to 90 days of enterprise preview launches when early customer use cases are compelling enough to quote. Look specifically for announcements from financial services and healthcare customers, the two verticals where data residency requirements have historically been the largest blocker to cloud AI adoption. A financial services customer using Managed Agents inside a VPC perimeter would validate the compliance positioning more credibly than any Google marketing claim. Microsoft's Build 2026 announcements of Office 365 Agent Mode and Azure AI Foundry Agent Orchestrator mean that Google has a tight window to sign enterprise anchor customers before Microsoft's enterprise AI relationships create switching cost.

The 180-day indicator is whether OpenAI responds with a comparable managed execution product. OpenAI's Codex API offers remote code execution but lacks the general-purpose web browsing and file management that define Managed Agents. OpenAI's product roadmap for H2 2026 has not publicly addressed the infrastructure layer in the way that Google's I/O 2026 announcements did. If OpenAI launches a full-stack managed agent environment by Q3 2026, it signals that the infrastructure layer is the real competitive front and that both companies recognize it. If OpenAI does not respond, it suggests the company believes model quality alone is sufficient to retain developer loyalty even as Google removes the operational friction that currently discourages agent development. That is the strategic bet investors should watch most carefully heading into OpenAI's anticipated Q4 2026 IPO.

The developer who builds on Managed Agents is not writing infrastructure code; they are writing business logic, and that shift in what developers own changes what gets built.


Key Takeaways

  • Single API call provisions full agent stack: Managed Agents spins up an isolated Linux environment with code execution, Google Search, and web browsing, eliminating the 40-60% of agent engineering effort previously spent on infrastructure setup
  • Gemini 3.5 Flash scores 76.2% on Terminal-Bench 2.1: the highest published benchmark score on the primary agentic evaluation framework, giving Google a measurable model quality advantage at launch
  • Per-compute-minute pricing for enterprise: billing on active work time rather than tokens directly addresses the cost unpredictability complaint that was Agentforce customers' top concern in Q1 2026
  • AGENTS.md and SKILL.md extensibility: file-based, version-controllable agent definitions align with the developer workflow convention being standardized across the industry, lowering skill transfer cost between platforms
  • Google Cloud enterprise AI at $9.4 billion annually: Managed Agents converts one-time API transactions into sustained infrastructure revenue, making every successful enterprise agent deployment a recurring contract rather than a usage spike

Questions Worth Asking

  1. Google's Managed Agents abstracts the execution infrastructure in a way that makes portability structurally difficult. Is the developer productivity gain worth the long-term vendor dependency, especially for startups that will eventually need to negotiate pricing from a position of infrastructure lock-in?
  2. If per-compute-minute pricing rewards efficient agents and penalizes slow ones, does it create pressure to reduce the reasoning depth of agents to minimize compute time, potentially trading thoroughness for cost savings?
  3. Google controls the execution environment, the search tools, the web browsing capability, and the audit logs for every Managed Agent. At what scale does that level of platform visibility into enterprise workflows become a data advantage for Google's own AI development that competitors cannot match?
Newsletter

Enjoyed this analysis? Get the next one in your inbox.

Daily AI signals. No noise. Built for founders, investors, and operators.

Share:XLinkedIn
</> Embed this article

Copy the iframe code below to embed on your site:

<iframe src="https://techfastforward.com/embed/google-managed-agents-api-launches-isolated-ai-sandboxes" width="480" height="260" frameborder="0" style="border-radius:16px;max-width:100%;" loading="lazy"></iframe>