PanoWorld-X: Generating Explorable PanoramicWorlds via Sphere-Aware Video Diffusion

Yuyang Yin1*, HaoXiang Guo2*, Fangfu Liu3, Mengyu Wang1, Hanwen Liang4,
Eric Li2, Yikai Wang5†, Xiaojie Jin1, Yao Zhao1, Yunchao Wei1†
1Beijing Jiaotong University    2Skywork AI    3Tsinghua University
4University of Toronto    5Beijing Normal University

*Equal Contribution    Corresponding Authors

Abstract

Generating a complete and explorable 360-degree visual world enables a wide range of downstream applications. While prior works have advanced the field, they remain constrained by either narrow field-of-view limitations, which hinder the synthesis of continuous and holistic scenes, or insufficient camera controllability that restricts free exploration by users or autonomous agents. To address this, we propose PanoWorld-X, a novel framework for high-fidelity and controllable panoramic video generation with diverse camera trajectories. Specifically, we first construct a large-scale dataset of panoramic video-exploration route pairs by simulating camera trajectories in virtual 3D environments via Unreal Engine. As the spherical geometry of panoramic data misaligns with the inductive priors from conventional video diffusion, we then introduce a Sphere-Aware Diffusion Transformer architecture that reprojects equirectangular features onto the spherical surface to model geometric adjacency in latent space, significantly enhancing visual fidelity and spatiotemporal continuity. Extensive experiments demonstrate that our PanoWorld-X achieves superior performance in various aspects, including motion range, control precision, and visual quality, underscoring its potential for real-world applications.

Teaser Image
Method Overview

Video Presentation

BibTeX

BibTex Code Here