Generating video descriptions using deep learning in Keras
Start with AWS Ubuntu Deep Learning AMI on a EC2 p2.xlarge instance. (or better, p2.xlarge costs $0.9/hour on-demand and ~$0.3/hour as a spot instance)
source activate tensorflow_p27
conda install scikit-learn
conda install scikit-image
If you are not using AWS, ensure you have a recent version of Keras and Tensorflow installed and working, and also install scikit-learn and scikit-image if you want to train tag prediction models
git clone https://github.com/rohit-gupta/V2L-MSVD.git
cd V2L-MSVD
bash fetch-pretrained-model.sh
sudo bash install-youtube-dl.sh
bash fetch-youtube-video.sh https://www.youtube.com/watch?v=cKWuNQAy2Sk
bash process-youtube-video.sh
bash fetch-pretrained-model.sh
bash fetch-from-localpath.sh /home/ubuntu/vid1.mp4
bash process-youtube-video.sh
bash fetch-data.sh
If you only want to use Verified descriptions ->
bash preprocess-data.sh CleanOnly
If you want to use both verified and unverified descriptions ->
bash preprocess-data.sh
bash extract_frames.sh
bash run-feature-extractor.sh
bash train-simple-tag-prediction-model.sh
bash train-language-model.sh
bash score-language-model.sh
- If at any stage you get an error that contains
/lib/libstdc++.so.6: version `CXXABI_1.3.x' not found
You can fix it with:
cd ~/anaconda3/envs/tensorflow_p27/lib && mv libstdc++.a stdcpp_bkp && mv libstdc++.so stdcpp_bkp && mv libstdc++.so.6 stdcpp_bkp && mv libstdc++.so.6.0.19 stdcpp_bkp/ && mv libstdc++.so.6.0.19-gdb.py stdcpp_bkp/ && mv libstdc++.so.6.0.21 stdcpp_bkp/ && mv libstdc++.so.6.0.24 stdcpp_bkp/ && cd -
- Tensorflow 1.3 has a memory leak bug that might affect this code
You can fix it by upgrading Tensorflow.
Reference for this problem: #3
The video captioning model here uses Mean Pooled ResNet50 features of video frames along with Object, Action and Attribute tags predicted by a simple feedforward network.
The Table below compares the performance of our model with some other models that also rely on mean pooled frame features. It is sourced from papers 1, 2 and 3.
Model | METEOR score on MSVD |
---|---|
Mean Pooled (AlexNet Features) | 26.9 |
Mean Pooled (VGG Features) | 27.7 |
Mean Pooled (GoogleNet Features) | 28.7 |
Ours (Mean Pooled ResNet50 Features + Predicted Tags) | 29.0 |