This is the PyTorch implementation for the paper
MUVO: A Multimodal World Model with Spatial Representations for Autonomous Driving.
The simplest way to install all required dependencies is to create a conda environment by running
conda env create -f conda_env.yml
Then activate conda environment by
conda activate muvo
or create your own venv and install the requirement by running
pip install -r requirements.txt
Use CARLA to collection data. First install carla refer to its documentation.
Change settings in config/,
then run bash run/data_collect.sh ${PORT}
with ${PORT}
the port to run CARLA (usually 2000
)
The data collection code is modified from
CARLA-Roach and MILE,
some config settings can be referred there.
After collecting the data by CARLA, create voxels data by running data/generate_voxels.py
,
voxel settings can be changed in data_preprocess.yaml
.
After completing the above steps, or otherwise obtaining the dataset,
please change the file structure of the dataset.
The main branch includes most of the results presented in the paper. In the 2D branch, you can find 2D latent states, perceptual losses, and a new transformer backbone. The data is organized in the following format
/carla_dataset/trainval/
├── train/
│ ├── Town01/
│ │ ├── 0000/
│ │ │ ├── birdview/
│ │ │ │ ├ birdview_000000000.png
│ │ │ │ .
│ │ │ ├── depth_semantic/
│ │ │ │ ├ depth_semantic_000000000.png
│ │ │ │ .
│ │ │ ├── image/
│ │ │ │ ├ image_000000000.png
│ │ │ │ .
│ │ │ ├── points/
│ │ │ │ ├ points_000000000.png
│ │ │ │ .
│ │ │ ├── points_semantic/
│ │ │ │ ├ points_semantic_000000000.png
│ │ │ │ .
│ │ │ ├── routemap/
│ │ │ │ ├ routemap_000000000.png
│ │ │ │ .
│ │ │ ├── voxel/
│ │ │ │ ├ voxel_000000000.png
│ │ │ │ .
│ │ │ └── pd_dataframe.pkl
│ │ ├── 0001/
│ │ ├── 0002/
│ | .
│ | └── 0024/
│ ├── Town03/
│ ├── Town04/
│ .
│ └── Town06/
├── val0/
.
└── val1/
Run
python train.py --conifg-file muvo/configs/your_config.yml
You can use default config file muvo/configs/muvo.yml
, or create your own config file in muvo/configs/
.
In config file(*.yml)
, you can set all the configs listed in muvo/config.py
.
Before training, make sure that the required input/output data as well as the model structure/dimensions are correctly set in muvo/configs/your_config.yml
.
We provide weights for pre-trained models, and each was trained with around 100,000 steps. weights is for a 1D latent space. weights_2D for a 2D latent space. We provide config files for each:
'basic_voxel' in weights_2D is for the basic 2D latent space model, which uses resnet18 as the backbone, without bev mapping for image features, uses range view for point cloud and uses the transformer to fuse features, the corresponding config file is 'test_base_2d.yml';
'mobilevit' weights just change the backbone compared to the 'basic_voxel' weights, the corresponding config file is 'test_mobilevit_2d.yml';
'RV_WOB_TR_1d_Voxel' and 'RV_WOB_TR_1d_no_Voxel' in weights all use basic setting but use 1d latent space, 'test_base_1d.yml' and 'test_base_1d_without_voxel.yml' are corresponding config files.
Run
python prediction.py --config-file muvo/configs/test.yml
The config file is the same as in training.
In file 'muvo/data/dataset.py', class 'DataModule', function 'setup'
, you can change the test dataset/sampler type.
Our code is based on MILE. And thanks to CARLA-Roach for making a gym wrapper around CARLA.