You need to download and extract the global pool5 and spatial res4f_relu features under this folder.
You can download 2048D pool5 features extracted from a ResNet-50 pre-trained for ImageNet classification by clicking here (105MB). The tarball contains Flickr30k features for train,val,test2016,test2017 sets as well as the secondary ambiguous test set from MSCOCO.