
Google DeepMind introduced SIMA 2, the latest iteration of its generalist AI research, building on last year’s SIMA (Scalable Instructable Multiworld Agent). The original SIMA could follow instructions across multiple virtual environments, performing over 600 language-based tasks such as “turn left,” “climb the ladder,” and “open the map.”
Features of SIMA 2
SIMA 2 evolves this approach, combining instruction-following with reasoning, goal-oriented actions, and self-improvement capabilities. It includes:
Advanced Reasoning and Goal-Oriented Actions
SIMA 2 incorporates the Gemini model, enabling it to interpret high-level user goals, reason about tasks, and explain its actions. The agent can describe the steps it takes to achieve objectives, answer user questions, and assess its own behavior and environment. Training included human demonstration videos with language labels, augmented with Gemini-generated labels.

Generalization Across Games
The agent demonstrates improved generalization, capable of executing complex instructions in games it was not explicitly trained on, including the Viking survival game ASKA and MineDojo, a Minecraft research environment.

SIMA 2 can understand long, multi-step tasks, multimodal prompts such as sketches, multiple languages, and even emojis. It can transfer learned concepts, for instance, applying “mining” knowledge from one game to “harvesting” in another.
Interaction in Newly Generated Worlds
When combined with Genie 3, which generates real-time 3D environments from images or text, SIMA 2 can navigate and perform goal-directed actions in previously unseen worlds. This demonstrates the agent’s adaptability to novel environments.

Self-Improvement and Iterative Learning
SIMA 2 can improve independently through self-directed play. Initial training relies on human demonstrations, after which the agent can generate experience data to train future versions. This iterative process allows the agent to attempt increasingly complex tasks and learn in newly created environments without additional human data.

Embodied Intelligence Applications
The skills developed by SIMA 2, including navigation, tool use, and collaborative task execution, provide a foundation for research into general embodied intelligence.

While the agent can operate across diverse gaming environments, limitations remain in tasks requiring long-horizon planning, precise low-level actions, and robust visual understanding. SIMA 2 also has a constrained memory window for interaction.
Responsible Development and Access
SIMA 2 is available as a limited research preview to a small cohort of academics and game developers. DeepMind emphasizes oversight on self-improvement capabilities and seeks interdisciplinary feedback to ensure responsible development and mitigate potential risks.
