On Thursday, Google released a reimagined version of its research agent, Gemini Deep Research. This new tool is based on the company’s state-of-the-art foundation model, Gemini 3 Pro. While it can still produce detailed research reports, its primary new feature allows developers to embed Google’s deep research capabilities directly into their own applications. This integration is made possible through Google’s new Interactions API, designed to give developers more control in the emerging agentic AI era.
The enhanced Gemini Deep Research agent is built to synthesize vast amounts of information and handle large context prompts. Google reports that customers are already using it for a wide range of tasks, from due diligence to drug toxicity safety research. Furthermore, Google plans to integrate this deep research agent into several of its own services soon, including Google Search, Google Finance, the Gemini App, and NotebookLM. This move represents another step toward a future where AI agents, rather than humans, perform information-seeking tasks.
Google emphasizes that Deep Research benefits from Gemini 3 Pro’s status as its most factual model, specifically trained to minimize hallucinations during complex tasks. Hallucinations, where a large language model generates false information, are a critical challenge for long-running, autonomous agentic tasks. The more decisions an AI agent makes over extended periods, the greater the risk that a single hallucination could invalidate the entire output.
To demonstrate its progress, Google introduced a new benchmark named DeepSearchQA, intended to test AI agents on complex, multi-step information-seeking tasks. Google has open-sourced this benchmark. The company also tested Deep Research on other benchmarks, including the independently created Humanity’s Last Exam, a general knowledge test with niche tasks, and BrowserComp, a benchmark for browser-based agentic tasks.
As might be expected, Google’s new agent outperformed the competition on its own DeepSearchQA benchmark and on Humanity’s Last Exam. However, OpenAI’s ChatGPT 5 Pro was a close second in these tests and slightly outperformed Google’s agent on the BrowserComp benchmark.
These comparisons, however, were quickly overshadowed. On the same day Google made its announcement, OpenAI launched its highly anticipated GPT 5.2 model, codenamed Garlic. OpenAI claims its new model surpasses rivals, particularly Google, on a suite of standard benchmarks, including its own. The timing of Google’s announcement is notable, as it strategically coincided with the anticipated release of OpenAI’s new model.

