OpenAI says AI browsers may always be vulnerable to prompt injection attacks

OpenAI continues to strengthen the security of its Atlas AI browser, but the company acknowledges that prompt injection attacks remain a persistent threat. This type of attack manipulates AI agents by hiding malicious instructions within web pages or emails, and OpenAI admits it is unlikely to ever be fully eliminated. This reality raises significant questions about the safety of AI agents operating freely on the open web.

In a recent blog post, OpenAI stated that prompt injection, similar to online scams and social engineering, is a perennial challenge. The company specifically noted that the “agent mode” in ChatGPT Atlas expands the potential surface for security threats. Since its launch in October, security researchers have quickly demonstrated vulnerabilities, such as embedding words in a Google Doc that could alter the browser’s core behavior.

OpenAI is not alone in this assessment. The UK’s National Cyber Security Centre also warned that prompt injection attacks against generative AI may never be completely mitigated, advising professionals to focus on reducing risk rather than expecting to stop the attacks entirely. Other companies, like Brave and Perplexity, have highlighted that indirect prompt injection is a systematic challenge for all AI-powered browsers.

To address this ongoing issue, OpenAI is employing a proactive, rapid-response cycle. A key part of their strategy is an “LLM-based automated attacker.” This is a bot trained with reinforcement learning to act as a hacker, constantly searching for new ways to deliver malicious prompts to an AI agent. The bot tests attacks in simulation, studying the target AI’s internal reasoning to refine its approach repeatedly. This internal access allows OpenAI’s system to potentially discover flaws faster than external attackers could.

OpenAI reports that this automated attacker can engineer sophisticated, multi-step harmful workflows and has uncovered novel attack strategies missed by human testers. In one demonstration, the attacker placed a malicious email in a user’s inbox. When the AI agent scanned the messages, it followed hidden instructions to send a resignation letter instead of an out-of-office reply. Following a security update, Atlas’s agent mode now detects such attempts and alerts the user.

While asserting that foolproof security is difficult, OpenAI relies on large-scale testing and faster updates to harden its systems ahead of real-world exploitation. The company recommends users take precautions, such as limiting an agent’s logged-in access and requiring user confirmation for actions like sending messages or making payments. They also advise giving agents specific instructions rather than broad, unsupervised authority.

However, security experts urge caution regarding the risk-to-benefit balance of such tools. Rami McCarthy, a principal security researcher at Wiz, notes that while reinforcement learning helps adapt to attackers, agentic browsers occupy a high-risk space due to their combination of moderate autonomy and very high access to sensitive data. McCarthy suggests that for most everyday uses, the current risk profile of these browsers may not yet justify their value, as the trade-offs between powerful access and potential exposure remain very real.