Google DeepMind has unveiled Genie 3, its latest foundation world model, which the AI lab describes as a crucial step toward achieving artificial general intelligence, or human-like intelligence.
Shlomi Fruchter, a research director at DeepMind, explained during a press briefing that Genie 3 is the first real-time interactive general-purpose world model. Unlike previous narrow world models, Genie 3 is not limited to specific environments. It can generate both photorealistic and imaginary worlds, as well as everything in between.
Currently in research preview and not publicly available, Genie 3 builds on its predecessor, Genie 2, which could generate new environments for AI agents. It also incorporates advancements from DeepMind’s latest video generation model, Veo 3, which demonstrates a deep understanding of physics.
With a simple text prompt, Genie 3 can generate multiple minutes of diverse, interactive 3D environments at 24 frames per second with a resolution of 720p—a significant improvement over Genie 2’s 10 to 20 seconds. The model also introduces “promptable world events,” allowing users to modify the generated world with additional prompts.
One of Genie 3’s most notable features is its ability to maintain physical consistency over time. The model remembers what it has previously generated, an emergent capability that DeepMind researchers did not explicitly program. This consistency enables the model to develop an intuitive grasp of physics, similar to how humans understand real-world dynamics.
Fruchter noted that while Genie 3 has potential applications in education, gaming, and creative prototyping, its true breakthrough lies in training AI agents for general-purpose tasks—a critical component in achieving AGI.
Jack Parker-Holder, a research scientist at DeepMind, emphasized that world models are essential for developing embodied AI agents, particularly because simulating real-world scenarios is inherently challenging.
Genie 3 operates as an autoregressive model, generating one frame at a time while referencing past outputs to determine future actions. This memory-based approach ensures coherence in its simulations, making it an ideal training ground for general-purpose agents. The model can create endless, diverse environments, pushing AI agents to adapt, learn, and improve through experience—mirroring human learning processes.
However, there are limitations. The range of actions an agent can take is still restricted, and modeling complex interactions between multiple agents remains difficult. Additionally, Genie 3 currently supports only a few minutes of continuous interaction, whereas hours would be necessary for comprehensive training.
Despite these challenges, Genie 3 represents a significant advancement in AI development. It moves beyond reactive systems, enabling agents to plan, explore, and learn through trial and error—key components of general intelligence.
Parker-Holder compared the potential impact of Genie 3 to DeepMind’s AlphaGo, which famously made an unconventional move in a 2016 match against world champion Lee Sedol. That moment symbolized AI’s ability to discover strategies beyond human understanding. With Genie 3, Parker-Holder believes a new era of embodied AI learning could be on the horizon.