Developing generalist agents capable of solving open-ended tasks in visually rich, dynamic environments remains a core pursuit of embodied AI. While Minecraft has emerged as a compelling benchmark, existing agents often suffer from fragmented cognitive abilities, lacking the synergy between reflexive execution (System 1) and deliberative reasoning (System 2). In this paper, we introduce Optimus-3, a generalist agent that organically integrates these dual capabilities within a unified framework. To achieve this, we address three fundamental challenges. First, to overcome the scarcity of reasoning data, we propose a Knowledge-Enhanced Automated Data Generation Pipeline. It synthesizes high-quality System 2 reasoning traces from raw System 1 interaction trajectories, effectively mitigating hallucinations via injection of domain knowledge. We release the resulting dataset, OptimusM$^{4}$, to the community. Second, to reconcile the dichotomous computational requirements of the dual systems, we design a Dual-Router Aligned MoE Architecture. It employs a Task Router to prevent task interference via parameter decoupling, and a Layer Router to dynamically modulate reasoning depth, creating a computational ``Fast Path'' for System 1 and a ``Deep Path'' for System 2. Third, to activate the reasoning capabilities of System 2, we propose Dual-Granularity Reasoning-Aware Policy Optimization (DGRPO) algorithm. It enforces Process-Outcome Co-Supervision via dual-granularity dense rewards, ensuring consistency between the thought process and the answer. Extensive evaluations demonstrate that Optimus-3 surpasses existing state-of-the-art methods on both System~2 (21$\%$ on Planning, 66\% on Captioning, 76\% on Embodied QA, 3.4$\times$ on Grounding, and 18\% on Reflection) and System~1 (3\% on Long-Horizon Action) tasks, with a notable 60\% success rate on open-ended tasks.
In this paper, we presented Optimus-3, a unified generalist agent that organically integrates System 1 action loops with System 2 reasoning capabilities within an end-to-end framework. To overcome the challenges of data scarcity, architectural conflict, and open-world generalization, we contributed advances along three dimensions. First, we introduced a Knowledge-Enhanced Data Generation Pipeline that samples high-fidelity System 2 reasoning traces from raw interaction trajectories. By leveraging domain constraints to filter hallucinations, we constructed and released the OptimusM$^4$ dataset. Second, we proposed the Dual-Router Aligned MoE architecture to address the computational conflict between the two systems. Through horizontal parameter decoupling and vertical depth adaption, it efficiently maintains a ``Fast Path'' for reflexive control and a ``Deep Path'' for deliberative reasoning. Third, we developed the Dual-Granularity Reasoning-Aware Policy Optimization (DGRPO) algorithm. It establishes a Process-Outcome Co-Supervision mechanism, utilizing dual-granularity rewards to align reasoning chains with visual evidence. Extensive experiments demonstrate that Optimus-3 achieves superior performance across diverse tasks, marking a significant step toward achieving general-purpose embodied intelligence in complex, open-ended worlds.