Multi-Modal Fusion of Event and RGB for Monocular Depth Estimation Using Transformer Architecture (ER-F2D)
To install and run this project, you need the following Python packages:
- PyTorch == 2.0.1
- scikit-learn == 1.3.0
- scikit-image == 0.21.0
- opencv == 4.8.0
- Matplotlib == 3.7.2
- kornia == 0.7.0
- tensorboard == 2.13.0
- torchvision == 0.15.2
You can install these packages using the following command:
pip install -r requirements.txt
Download the datasets through the links provided below:
Download the pre-trained weights of the datasets provided below:
Download the pre-trained weights of vit-base here.
CUDA_VISIBLE_DEVICES=0 python3 --epochs 70 --batch_size 16
Testing is done in two steps. First, is to run script, which saves the prediction outputs in a folder. Download the pre-trained weights of the transformer-based models on the datasets and run the below command.
CUDA_VISIBLE_DEVICES=0 python --path_to_model experiments/exp_1/checkpoints/model_best.pth.tar --output_folder experiments/exp_1/test/ --data_folder test
Later, we run script takes both the groundtruth and prediction output as inputs, and calculates the metric depth on logarithmic depth maps using both clip distance and reg_factor.
python --target_dataset experiments/exp_1/test/ground_truth/npy/gt/ --predictions_dataset experiments/exp_1/test/npy/depth/ --clip_distance 80 --reg_factor 3.70378