🔥 🔥 🔥 [22/12/2023] The pre-trained model and the code for real-world inference, training and testing are now available
This is the official repository of the paper "Reference-based Restoration of Digitized Analog Videotapes".
Analog magnetic tapes have been the main video data storage device for several decades. Videos stored on analog videotapes exhibit unique degradation patterns caused by tape aging and reader device malfunctioning that are different from those observed in film and digital video restoration tasks. In this work, we present a reference-based approach for the resToration of digitized Analog videotaPEs (TAPE). We leverage CLIP for zero-shot artifact detection to identify the cleanest frames of each video through textual prompts describing different artifacts. Then, we select the clean frames most similar to the input ones and employ them as references. We design a transformer-based Swin-UNet network that exploits both neighboring and reference frames via our Multi-Reference Spatial Feature Fusion (MRSFF) blocks. MRSFF blocks rely on cross-attention and attention pooling to take advantage of the most useful parts of each reference frame. To address the absence of ground truth in real-world videos, we create a synthetic dataset of videos exhibiting artifacts that closely resemble those commonly found in analog videotapes. Both quantitative and qualitative experiments show the effectiveness of our approach compared to other state-of-the-art methods.
Overview of the proposed approach. Left given a video, we identify the cleanest frames with CLIP. First, we measure the similarity between the frames and textual prompts that describe different artifacts. Then, we employ Otsu's method to define a threshold for classifying the frames based on their similarity scores, resulting in a set of clean frames. Right given a window of
We release a dataset of videos synthetically degraded with Adobe After Effects to exhibit artifacts resembling those of real-world analog videotapes. The original high-quality videos belong to the Venice scene of the Harmonic dataset. The artifacts taken into account are: 1) tape mistracking; 2) VHS edge waving; 3) chroma loss along the scanlines; 4) tape noise; 5) undersaturation. The dataset comprises a total of 26,392 frames corresponding to 40 clips. The clips are randomly divided into training and test sets with a 75%-25% ratio.
The dataset can be downloaded here. We release both the mp4
videos and the LMDB files associated with each split.
@inproceedings{agnolucci2024reference,
title={Reference-based Restoration of Digitized Analog Videotapes},
author={Agnolucci, Lorenzo and Galteri, Leonardo and Bertini, Marco and Del Bimbo, Alberto},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={1659--1668},
year={2024}
}
We recommend using the Anaconda package manager to avoid dependency/reproducibility problems. For Linux systems, you can find a conda installation guide here.
- Clone the repository
git clone https://github.com/miccunifi/TAPE
- Install Python dependencies
conda create -n TAPE -y python=3.10
conda activate TAPE
cd TAPE
chmod +x install_requirements.sh
./install_requirements.sh
- (Optional) If you want to compute the VMAF score, you first need to install ffmpeg.
Then, follow the instructions reported here to install
the VMAF Python library. Finally, place the
vmaf
folder inside theutils
directory.
Download the dataset from here. At the end, the directory structure should look like this:
├── data_base_path
|
| ├── train
| | ├── input
| | | ├── input.lmdb
| | | ├── videos
| | ├── gt
| | | ├── gt.lmdb
| | | ├── videos
|
| ├── test
| | ├── input
| | | ├── input.lmdb
| | | ├── videos
| | ├── gt
| | | ├── gt.lmdb
| | | ├── videos
To use our method for restoring a real-world video, download the pre-trained model from the
release and place it under
the experiments/pretrained_model
directory. Then, run the following command:
python real_world_inference.py --input-path <path_to_video> --output-path <path_to_output_folder>
--input-path <str> Path to the video to restore
--output-path <str> Path to the output folder
--checkpoint-path <str> Path to the pretrained model checkpoint (default=experiments/pretrained_model/checkpoint.pth)
--num-input-frames <int> Number of input frames T for each input window (default=5)
--num-reference-frames <int> Number of reference frames D for each input window (default=5)
--preprocess-mode <str> Preprocessing mode, options: ['crop', 'resize', 'none']. 'crop' extracts the --patch-size center
crop, 'resize' resizes the longest side to --patch-size while keeping the aspect ratio, 'none'
applies no preprocessing (default=crop)
--patch-size <int> Maximum patch size for --preprocess-mode ['crop', 'resize'] (default=512)
--frame-format <str> Frame format of the extracted and restored frames (default=jpg)
--generate-combined-video <store_true> Whether to generate the combined video (i.e. input and restored videos side by side)
--no-intermediate-products <store_true> Whether to delete intermediate products (i.e. input frames, restored frames, references)
--batch-size <int> Batch size (default=1)
--num-workers <int> Number of workers of the data loader (default=20)
To train our model from scratch, run the following command:
python main.py --experiment-name <name_of_the_experiment> --data-base-path <data_base_path> --comet-api-key <comet_api_key> --comet-project-name <comet_project_name>
You need a Comet ML for logging. See main.py
for all the available options. The
checkpoints will be saved inside the experiments/<name_of_the_experiment>/checkpoints
folder. After training, main.py
will run the evaluation on the test set and save the results inside the experiments/<name_of_the_experiment>/results
folder.
If you want to skip the training and just run the evaluation on the test set, add the --test-only
flag to the command
above. In addition, if you want to avoid computing the VMAF score, add the --no-vmaf
flag.
You can test our pre-trained model by adding the --eval-type pretrained
flag. Note that you first need to download the pre-trained model from the
release and to place it under
the experiments/pretrained_model
directory.
This work was partially supported by the European Commission under European Horizon 2020 Programme, grant number 101004545 - ReInHerit.
All material is made available under Creative Commons BY-NC 4.0. You can use, redistribute, and adapt the material for non-commercial purposes, as long as you give appropriate credit by citing our paper and indicate any changes that you've made.