Detailed description of this task can be found @PAN 2018. This code only analyzes tweets in English.
Dataset:The dataset used for this experiments can be downloaded from the PAN 2018.
Dependencies:
- gensim
- sklearn
- nltk
Other requirements:
The GloVe models (100d & 200d) are required for word embeddings.
For image captioning, image caption generation using chainer was used. Need to extract image captions before using the above tool and store it in a csv file (format:imageid \t text).
python master.py training_input_add test_input_add test_output_add
Output will be a xml file:
Please cite the following paper if you find this code is useful.
B. G. Patra, G. Das, and D. Das. 2018. Multimodal Author Profiling for Twitter - Notebook for PAN at CLEF 2018. In Proceedings of the PAN 2018 at CLEF-2018, Avignon, France. link
If you have any query please e-mail us. We welcome bug fixes and new features.