Google’s Cloud AI leads on the three frontiers of model capability

Michael Gerstenhaber is a product VP at Google Cloud, where he primarily works on Vertex AI. This is the company’s unified platform for deploying enterprise AI. His role provides a high-level view of how companies are actually using AI models and what still needs to be done to unleash the potential of agentic AI.

In a conversation with Gerstenhaber, one idea stood out as particularly novel. He explained that AI models are simultaneously pushing against three frontiers. The first is raw intelligence. The second is response time. The third is a quality less about raw capability and more about cost—specifically, whether a model can be deployed cheaply enough to run at massive, unpredictable scale. This framework offers a new and valuable way to think about model capabilities, especially for those trying to push frontier models in a new direction.

Gerstenhaber shared his background and current work. He has been in AI for about two years, spending a year and a half at Anthropic before joining Google almost half a year ago. He runs Vertex AI, Google’s developer platform. Most customers are engineers building their own applications who want access to agentic patterns, an agentic platform, and inference from the world’s smartest models. Gerstenhaber’s team provides that infrastructure, while companies like Shopify and Thomson Reuters build the actual applications for their domains.

He was drawn to Google because of its unique vertical integration. The company controls everything from the infrastructure layer, building data centers and power plants, to its own chips, models, and inference layer. It also controls the agentic layer with APIs for memory and interleaved code writing, an agent engine for compliance and governance, and consumer and enterprise chat interfaces like Gemini. This comprehensive control is a key strength.

When considering the competitive landscape, Gerstenhaber sees three distinct boundaries for model capabilities. Models like Gemini Pro are tuned for raw intelligence, ideal for tasks like code writing where the best possible output is the priority, regardless of time.

A second boundary is latency. For applications like customer support, where an agent needs to apply a policy or answer a question, intelligence is required but speed is critical. The most intelligent response is useless if it arrives after the customer has hung up.

The third boundary is cost at scale. A company like Reddit or Meta, needing to moderate vast amounts of content, requires a model intelligent enough for the task but also affordable enough to scale infinitely against an unpredictable volume of work. For these cases, cost becomes the paramount concern.

Regarding the slower-than-expected adoption of agentic AI systems, Gerstenhaber notes the technology is only about two years old and still lacks essential infrastructure. Patterns for auditing agent actions or authorizing data access for agents are not yet established. Production deployment always lags behind technological capability, and two years is not enough time to see the full intelligence manifest in live systems.

He observes that agentic AI has moved uniquely quickly in software engineering because it fits neatly into the existing development lifecycle. Safe dev environments, promotion to test, and human-in-the-loop processes like code audits make implementation low-risk. The challenge is to develop similar safe and structured patterns for other professions and use cases.