Google says its AI-based bug hunter found 20 security vulnerabilities

Google’s AI-powered bug hunter has reported its first batch of security vulnerabilities. Heather Adkins, Google’s vice president of security, announced that its LLM-based vulnerability researcher, Big Sleep, found and reported 20 flaws in popular open-source software.

Big Sleep, developed by Google’s AI department DeepMind and its elite hacker team Project Zero, identified its first-ever vulnerabilities, primarily in open-source tools like FFmpeg and ImageMagick. Since these vulnerabilities remain unfixed, Google has not disclosed their impact or severity, following standard policy to avoid premature details.

The significance lies in Big Sleep’s ability to autonomously discover and reproduce these flaws, though human experts review reports before submission. Kimberly Samra, a Google spokesperson, confirmed that while human oversight ensures quality, the AI independently detected and verified each vulnerability.

Royal Hansen, Google’s vice president of engineering, described the findings as a breakthrough in automated vulnerability discovery. AI-powered bug hunters like Big Sleep, RunSybil, and XBOW are becoming a reality, with XBOW recently topping a leaderboard on HackerOne. However, human verification remains crucial to confirm legitimate findings.

Vlad Ionescu, co-founder of RunSybil, acknowledged Big Sleep as a credible project, citing its strong design and the expertise of Project Zero and DeepMind. Despite the potential, challenges persist, including false positives. Some software maintainers have reported AI-generated bug reports that turn out to be hallucinations, leading to frustration and comparisons to low-quality AI output.

The rise of AI in security research offers promise but also underscores the need for accuracy and human validation to distinguish real threats from misleading noise.