When did you become so smart, oh wise one?! Sarcasm Explanation in Multi-modal Multi-party Dialogues
This repository contains the code required to reproduce the results of our ACL 2022 paper - "When did you become so smart, oh wise one?! Sarcasm Explanation in Multi-modal Multi-party Dialogues".
The paper introduces the novel task of sarcasm explanation in dialogues (SED). Set in a multimodal and code-mixed setting, the task aims to generate natural language explanations of satirical conversations. An example case is illustrated in the figure below:
It contains a dyadic conversation of four utterances hu1, u2, u3, u4i, where the last utterance (u4) is a sarcastic remark. Note that in this example, although the opposite of what is being said is, “I don’t have to think about it," it is not what the speaker means; thus, it enforces our hypothesis that sarcasm explanation goes beyond simply negating the dialogue’s language. The discourse is also accompanied by ancillary audio-visual markers of satire such as an ironical intonation of the pitch, a blank face, or roll of the eyes. Thus, conglomerating the conversationhistory, multimodal signals, and speaker information, SED aims to generate a coherent and cohesive natural language explanation associated with these sarcastic dialogues.
We augment MaSaC, a multi-modal, code-mixed, sarcasm detection dataset in dialogues with code-mixed explanations to create WITS (Why Is This Sarcastic?) dataset.
{
'd_id': {
'episode_name': title,
'target_speaker': speaker1,
'target_utterance': t_utt,
'context_speakers': [sp_a, ..., sp_b],
'context_utterances': [utt_a, ..., utt_b],
'code_mixed_explanation': cmt_exp,
'sarcasm_target': t_sp,
'start_time': start_time,
'end_time': end_time
}
...
}
| episode_name | target_speaker | target_utterance | context_speakers | context_utterances | sarcasm_target | code_mixed_explanation | start_time | end_time | audio_feats |
| episode_name | target_speaker | target_utterance | context_speakers | context_utterances | sarcasm_target | code_mixed_explanation | start_time | end_time | video_feats |
- Download data features from this drive link.
- Place the text, audio, and video feature files in the format as described above in the following manner in the 'Data' folder:
- Data
- Text
- train_text.json
- val_text.json
- test_text.json
- Audio
- train_audio.p
- val_audio.p
- test_audio.p
- Video
- train_video.p
- val_video.p
- test_video.p
- Text
- Data
- Execution
- Go to the 'Code' directory and run
python Trimodal-BART-driver.py
.
- Go to the 'Code' directory and run
Models will be saved in the 'models' directory while generated explanations on the val and test sets will be saved in the 'results' folder.
@inproceedings{kumar-etal-2022-become,
title = "When did you become so smart, oh wise one?! Sarcasm Explanation in Multi-modal Multi-party Dialogues",
author = "Kumar, Shivani and
Kulkarni, Atharva and
Akhtar, Md Shad and
Chakraborty, Tanmoy",
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.acl-long.411",
pages = "5956--5968",
abstract = "Indirect speech such as sarcasm achieves a constellation of discourse goals in human communication. While the indirectness of figurative language warrants speakers to achieve certain pragmatic goals, it is challenging for AI agents to comprehend such idiosyncrasies of human communication. Though sarcasm identification has been a well-explored topic in dialogue analysis, for conversational systems to truly grasp a conversation{'}s innate meaning and generate appropriate responses, simply detecting sarcasm is not enough; it is vital to explain its underlying sarcastic connotation to capture its true essence. In this work, we study the discourse structure of sarcastic conversations and propose a novel task {--} Sarcasm Explanation in Dialogue (SED). Set in a multimodal and code-mixed setting, the task aims to generate natural language explanations of satirical conversations. To this end, we curate WITS, a new dataset to support our task. We propose MAF (Modality Aware Fusion), a multimodal context-aware attention and global information fusion module to capture multimodality and use it to benchmark WITS. The proposed attention module surpasses the traditional multimodal fusion baselines and reports the best performance on almost all metrics. Lastly, we carry out detailed analysis both quantitatively and qualitatively.",
}