Collect some World Models (for Autonomous Driving) papers.
If you find some ignored papers, feel free to create pull requests, open issues, or email me / Qi Wang. Contributions in any form to make this list more comprehensive are welcome. 📣📣📣
If you find this repository useful, please consider giving us a star 🌟.
Feel free to share this list with others! 🥳🥳🥳
-
CVPR 2024 Workshop & Challenge | OpenDriveLab
Track #4: Predictive World Model.Serving as an abstract spatio-temporal representation of reality, the world model can predict future states based on the current state. The learning process of world models has the potential to elevate a pre-trained foundation model to the next level. Given vision-only inputs, the neural network outputs point clouds in the future to testify its predictive capability of the world.
-
CVPR 2023 Workshop on Autonomous Driving
CHALLENGE 3: ARGOVERSE CHALLENGES, 3D Occupancy Forecasting using the Argoverse 2 Sensor Dataset. Predict the spacetime occupancy of the world for the next 3 seconds.
- Using Occupancy Grids for Mobile Robot Perception and Navigation [paper]
Yann LeCun
: A Path Towards Autonomous Machine Intelligence [paper] [Video]CVPR'23 WAD
Keynote - Ashok Elluswamy, Tesla [Video]Wayve
Introducing GAIA-1: A Cutting-Edge Generative AI Model for Autonomy [blog]World models are the basis for the ability to predict what might happen next, which is fundamentally important for autonomous driving. They can act as a learned simulator, or a mental “what if” thought experiment for model-based reinforcement learning (RL) or planning. By incorporating world models into our driving models, we can enable them to understand human decisions better and ultimately generalise to more real-world situations.
- A survey on multimodal large language models for autonomous driving.
WACVW 2024
[Paper] [Code] - Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI.
arXiv 2024.7
[Paper] [Code] - Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond.
arXiv 2024.5
[Paper] [Code] - World Models for Autonomous Driving: An Initial Survey.
2024.3, arxiv
[Paper]
- [SEM2] Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model.
TITS
[Paper] - [DriveDreamer] DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving.
ECCV 2024
[Paper] [Code] - [GenAD] GenAD: Generative End-to-End Autonomous Driving.
ECCV 2024
[Paper] [Code] - [OccWorld] OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving.
ECCV 2024
[Paper] [Code] - [CarFormer] CarFormer: Self-Driving with Learned Object-Centric Representations.
ECCV 2024
[Paper] [Code] - [MARL-CCE] Modelling-Competitive-Behaviors-in-Autonomous-Driving-Under-Generative-World-Model.
ECCV 2024
[Code] - [DrivingDiffusion] DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model.
ECCV 2024
[Paper] [Code] - [3D-VLA] 3D-VLA: A 3D Vision-Language-Action Generative World Model.
ICML 2024
[Paper] - [RoboDreamer] RoboDreamer: Learning Compositional World Models for Robot Imagination.
ICML 2024
[Paper] [Code] - [ViDAR] Visual Point Cloud Forecasting enables Scalable Autonomous Driving.
CVPR 2024
[Paper] [Code] - [GenAD] Generalized Predictive Model for Autonomous Driving.
CVPR 2024
[Paper] [Data] - [Cam4DOCC] Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications.
CVPR 2024
[Paper] [Code] - [Drive-WM] Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving.
CVPR 2024
[Paper] [Code] - [DriveWorld] DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving.
CVPR 2024
[Paper] - [Panacea] Panacea: Panoramic and Controllable Video Generation for Autonomous Driving.
CVPR 2024
[Paper] [Code] - [MagicDrive] MagicDrive: Street View Generation with Diverse 3D Geometry Control.
ICLR 2024
[Paper] [Code] - [Copilot4D] Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion.
ICLR 2024
[Paper] - [SafeDreamer] SafeDreamer: Safe Reinforcement Learning with World Models.
ICLR 2024
[Paper] [Code] - [DriveGenVLM] DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving.
arXiv 2024.8
[Paper] - [Drive-OccWorld] Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving.
arXiv 2024.8
[Paper] - [BEVWorld] BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space.
arXiv 2024.7
[Paper] [Code] - [TOKEN] Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving.
arXiv 2024.7
[Paper] - [UnO] UnO: Unsupervised Occupancy Fields for Perception and Forecasting.
arXiv 2024.6
[Paper] [Code] - [UMAD] UMAD: Unsupervised Mask-Level Anomaly Detection for Autonomous Driving.
arXiv 2024.6
[Paper] - [SimGen] SimGen: Simulator-conditioned Driving Scene Generation.
arXiv 2024.6
[Paper] [Code] - [AdaptiveDriver] Planning with Adaptive World Models for Autonomous Driving.
arXiv 2024.6
[Paper] [Code] - [LAW] Enhancing End-to-End Autonomous Driving with Latent World Model.
arXiv 2024.6
[Paper] [Code] - [Delphi] Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation.
arXiv 2024.6
[Paper] [Code] - [OccSora] OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving.
arXiv 2024.5
[Paper] [Code] - [Vista] Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability.
arXiv 2024.5
[Paper] [Code] - [MagicDrive3D] MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes.
arXiv 2024.5
[Paper] [Code] - [CarDreamer] CarDreamer: Open-Source Learning Platform for World Model based Autonomous Driving.
arXiv 2024.5
[Paper] [Code] - [DriveSim] Probing Multimodal LLMs as World Models for Driving.
arXiv 2024.5
[Paper] [Code] - [LidarDM] LidarDM: Generative LiDAR Simulation in a Generated World.
arXiv 2024.4
[Paper] [Code] - [SubjectDrive] SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control.
arXiv 2024.3
[Paper] [Project] - [DriveDreamer-2] DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation.
arXiv 2024.3
[Paper] [Code] - [Think2Drive] Think2Drive: Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving.
arXiv 2024.2
[Paper]
- [TrafficBots] TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction.
ICRA 2023
[Paper] [Code] - [WoVoGen] WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation.
arXiv 2023.12
[Paper] [Code] - [CTT] Categorical Traffic Transformer: Interpretable and Diverse Behavior Prediction with Tokenized Latent.
arXiv 2023.11
[Paper] - [MUVO] MUVO: A Multimodal Generative World Model for Autonomous Driving with Geometric Representations.
arXiv 2023.11
[Paper] - [GAIA-1] GAIA-1: A Generative World Model for Autonomous Driving.
arXiv 2023.9
[Paper] - [ADriver-I] ADriver-I: A General World Model for Autonomous Driving.
arXiv 2023.9
[Paper] - [UniWorld] UniWorld: Autonomous Driving Pre-training via World Models.
arXiv 2023.8
[Paper] [Code]
- [MILE] Model-Based Imitation Learning for Urban Driving.
NeurIPS 2022
[Paper] [Code] - [Iso-Dream] Iso-Dream: Isolating and Leveraging Noncontrollable Visual Dynamics in World Models.
NeurIPS 2022 Spotlight
[Paper] [Code] - [Symphony] Symphony: Learning Realistic and Diverse Agents for Autonomous Driving Simulation.
ICRA 2022
[Paper] - Hierarchical Model-Based Imitation Learning for Planning in Autonomous Driving.
IROS 2022
[Paper] - [SEM2] Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model.
NeurIPS 2022 workshop
[Paper]
- [DWL] Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning.
RSS 2024 (Best Paper Award Finalist)
[Paper] - [LLM-Sim] Can Language Models Serve as Text-Based World Simulators?
ACL
[Paper] [Code] - [Δ-IRIS] Efficient World Models with Context-Aware Tokenization.
ICML 2024
[Paper] [Code] - [AD3] AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors.
ICML 2024
[Paper] - [Hieros] Hieros: Hierarchical Imagination on Structured State Space Sequence World Models.
ICML 2024
[Paper] - [HRSSM] Learning Latent Dynamic Robust Representations for World Models.
ICML 2024
[Paper] [Code] - [HarmonyDream] HarmonyDream: Task Harmonization Inside World Models.
ICML 2024
[Paper] [Code] - [REM] Improving Token-Based World Models with Parallel Observation Prediction.
ICML 2024
[Paper] [Code] - Do Transformer World Models Give Better Policy Gradients?
ICML 2024
[Paper] - [TD-MPC2] TD-MPC2: Scalable, Robust World Models for Continuous Control.
ICLR 2024
[Paper] [Torch Code] - [DreamSmooth] DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing.
ICLR 2024
[Paper] - [R2I] Mastering Memory Tasks with World Models.
ICLR 2024
[Paper] [JAX Code] - [MAMBA] MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning.
ICLR 2024
[Paper] [Code] - Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction.
arXiv 2024.8
[Paper] - [MoReFree] World Models Increase Autonomy in Reinforcement Learning.
arXiv 2024.8
[Paper] [Project] - [UrbanWorld] UrbanWorld: An Urban World Model for 3D City Generation.
arXiv 2024.7
[Paper] - [PWM] PWM: Policy Learning with Large World Models.
arXiv 2024.7
[Paper] [Code] - [Predicting vs. Acting] Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling.
arXiv 2024.7
[Paper] - [GenRL] Multimodal foundation world models for generalist embodied agents.
arXiv 2024.6
[Paper] [Code] - [DLLM] World Models with Hints of Large Language Models for Goal Achieving.
arXiv 2024.6
[Paper] - Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model.
arXiv 2024.6
[Paper] - [CityBench] CityBench: Evaluating the Capabilities of Large Language Model as World Model.
arXiv 2024.6
[Paper] [Code] - [CoDreamer] CoDreamer: Communication-Based Decentralised World Models.
arXiv 2024.6
[Paper] - [EBWM] Cognitively Inspired Energy-Based World Models.
arXiv 2024.6
[Paper] - Evaluating the World Model Implicit in a Generative Model.
arXiv 2024.6
[Paper] [Code] - Transformers and Slot Encoding for Sample Efficient Physical World Modelling.
arXiv 2024.5
[Paper] [Code] - [Puppeteer] Hierarchical World Models as Visual Whole-Body Humanoid Controllers.
arXiv 2024.5
Yann LeCun
[Paper] [Code] - [BWArea Model] BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation.
arXiv 2024.5
[Paper] - [Pandora] Pandora: Towards General World Model with Natural Language Actions and Video States. [Paper] [Code]
- [WKM] Agent Planning with World Knowledge Model.
arXiv 2024.5
[Paper] [Code] - [Diamond] Diffusion for World Modeling: Visual Details Matter in Atari.
arXiv 2024.5
[Paper] [Code] - [Newton] Newton™ – a first-of-its-kind foundation model for understanding the physical world.
Archetype AI
[Blog] - [Compete and Compose] Compete and Compose: Learning Independent Mechanisms for Modular World Models.
arXiv 2024.4
[Paper] - [MagicTime] MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators.
arXiv 2024.4
[Paper] [Code] - [Dreaming of Many Worlds] Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot Generalization.
arXiv 2024.3
[Paper] [Code] - [ManiGaussian] ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation.
arXiv 2024.3
[Paper] [Code] - [V-JEPA] V-JEPA: Video Joint Embedding Predictive Architecture.
Meta AI
[Blog] [Paper] [Code] - [IWM] Learning and Leveraging World Models in Visual Representation Learning.
Meta AI
[Paper] - [Genie] Genie: Generative Interactive Environments.
DeepMind
[Paper] [Blog] - [Sora] Video generation models as world simulators.
OpenAI
[Technical report] - [LWM] World Model on Million-Length Video And Language With RingAttention.
arXiv 2024.2
[Paper] [Code] - Planning with an Ensemble of World Models.
OpenReview
[Paper] - [WorldDreamer] WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens.
arXiv 2024.1
[Paper] [Code]
- [IRIS] Transformers are Sample Efficient World Models.
ICLR 2023 Oral
[Paper] [Torch Code] - [STORM] STORM: Efficient Stochastic Transformer based World Models for Reinforcement Learning.
NIPS 2023
[Paper] [Torch Code] - [TWM] Transformer-based World Models Are Happy with 100k Interactions.
ICLR 2023
[Paper] [Torch Code] - [Dynalang] Learning to Model the World with Language.
arXiv 2023.8
[Paper] [JAX Code] - [CoWorld] Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning.
arXiv 2023.5
[Paper] - [DreamerV3] Mastering Atari with Discrete World Models.
arXiv 2023.1
[Paper] [JAX Code] [Torch Code]
- [TD-MPC] Temporal Difference Learning for Model Predictive Control.
ICML 2022
[Paper][Torch Code] - [DreamerPro] DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations.
ICML 2022
[Paper] [TF Code] - [DayDreamer] DayDreamer: World Models for Physical Robot Learning.
CoRL 2022
[Paper] [TF Code] - Deep Hierarchical Planning from Pixels.
NIPS 2022
[Paper] [TF Code] - [Iso-Dream] Iso-Dream: Isolating and Leveraging Noncontrollable Visual Dynamics in World Models.
NIPS 2022 Spotlight
[Paper] [Torch Code] - [DreamingV2] DreamingV2: Reinforcement Learning with Discrete World Models without Reconstruction.
arXiv 2022.3
[Paper]
- [DreamerV2] Mastering Atari with Discrete World Models.
ICLR 2021
[Paper] [TF Code] [Torch Code] - [Dreaming] Dreaming: Model-based Reinforcement Learning by Latent Imagination without Reconstruction.
ICRA 2021
[Paper]
- [DreamerV1] Dream to Control: Learning Behaviors by Latent Imagination.
ICLR 2020
[Paper] [TF Code] [Torch Code] - [Plan2Explore] Planning to Explore via Self-Supervised World Models.
ICML 2020
[Paper] [TF Code] [Torch Code]
- World Models.
NIPS 2018 Oral
[Paper]