I built marshmallow castles in Google’s new AI-world generator

Google DeepMind is opening up access to Project Genie, its AI tool for creating interactive game worlds from text prompts or images. Starting Thursday, Google AI Ultra subscribers in the U.S. can experiment with this research prototype. The tool is powered by a combination of Google’s latest world model Genie 3, its image-generation model Nano Banana Pro, and Gemini.

This move comes five months after the research preview of Genie 3. It is part of a broader push to gather user feedback and training data as DeepMind races to develop more capable world models. World models are AI systems that generate an internal representation of an environment and can be used to predict future outcomes and plan actions. Many AI leaders, including those at DeepMind, believe world models are a crucial step toward achieving artificial general intelligence, or AGI. In the nearer term, labs like DeepMind envision a plan that starts with video games and entertainment, later branching out into training embodied agents like robots in simulation.

DeepMind’s release of Project Genie arrives as competition in world models intensifies. Fei-Fei Li’s World Labs released its first commercial product called Marble late last year. Runway, the AI video-generation startup, has also launched a world model recently. Furthermore, former Meta chief scientist Yann LeCun’s startup AMI Labs will also focus on developing world models.

DeepMind researchers were upfront about the tool’s experimental nature. It can be inconsistent, sometimes impressively generating playable worlds and other times producing baffling results that miss the mark.

Here is how Project Genie works. You start with a “world sketch” by providing text prompts for both the environment and a main character, whom you will later be able to maneuver through the world in either first- or third-person view. Nano Banana Pro creates an image based on the prompts that you can, in theory, modify before Genie uses the image as a starting point for an interactive world. The modifications mostly worked, but the model occasionally stumbled, for example giving you purple hair when you asked for green.

You can also use real-life photos as a baseline for the model to build a world on, which again was hit or miss. Once satisfied with the image, it takes a few seconds for Project Genie to create an explorable world. You can also remix existing worlds into new interpretations by building on top of their prompts, or explore curated worlds in a gallery for inspiration. You can then download videos of the world you explored.

DeepMind is currently only granting 60 seconds of world generation and navigation, partly due to budget and compute constraints. Because Genie 3 is an auto-regressive model, it requires significant dedicated compute power, which limits how much DeepMind can provide to users. A research director explained that the 60-second limit allows the tool to be brought to more users, as a dedicated chip is allocated for each session. Extending beyond 60 seconds would diminish the incremental value of testing, as the environments currently have a limited level of interaction and dynamism.

During testing, the model’s safety guardrails were active. It could not generate anything resembling nudity or worlds that referenced Disney or other copyrighted material. This follows a cease-and-desist letter Disney sent to Google last year, accusing the firm’s AI models of copyright infringement. Attempts to generate worlds of mermaids or ice queens were also blocked.

The demo showed impressive results for whimsical, artistic concepts. For instance, a prompt for a claymation-style castle in the clouds made of marshmallows with a chocolate river and candy trees delivered a charming and playful world. However, Project Genie still has kinks to work out. It excelled at creating worlds based on artistic styles like watercolors or anime but tended to fail at photorealistic or cinematic worlds, often producing results that looked like a video game rather than a real setting.

It also did not always respond well to real photos. When given a photo of an office and asked to create a world based on it, the result included similar furnishings but laid out differently in a sterile, digital-looking space. When provided a photo of a desk with a stuffed toy, Project Genie animated the toy navigating the space, with other objects occasionally reacting as it moved past them. This interactivity is an area DeepMind is working to improve, as there were several occasions where characters walked through walls or other solid objects.

The navigation controls, using arrow keys to look around and W-A-S-D keys to move, were also a point of friction. For non-gamers, the controls did not feel natural and were sometimes non-responsive or sent the character in the wrong direction, making simple navigation a chaotic challenge.

The research team acknowledged these shortcomings, reiterating that Project Genie is an experimental prototype. They hope to enhance realism, improve interaction capabilities, and give users more control over actions and environments in the future. They view the project not as a finished product for daily use, but as a glimpse of something interesting and unique that cannot be done in another way.