Detailed description of this task can be found @PAN 2018. This code only analyzes tweets in English.
The dataset used for this experiments can be downloaded from the PAN 2018.
- gensim
- sklearn
- nltk
The GloVe models (100d & 200d) are required for word embeddings.
For image captioning, image caption generation using chainer was used. First, you need to generate image captions using the above tool and store it in a csv file (format:imageid\tcaption) in resource folder.
python master.py training_input_add test_input_add test_output_add
Output will be a xml file as specified in this link.
Please cite the following paper if you find this code is useful.
B. G. Patra, G. Das, and D. Das. 2018. Multimodal Author Profiling for Twitter - Notebook for PAN at CLEF 2018. In Proceedings of the PAN 2018 at CLEF-2018, Avignon, France. link
If you have any query please e-mail us. We welcome bug fixes and new features.