First prepare the MMC4 dataset, as below: Image-text Data Interleaved Image-text Data MMC4 python src/utils/count_webdataset_sample.py --data_dir /datadrive_d/jinpeng/Code/videogpt4/datas/raw/mmc4/subdir pip install -e . Run amlt run