diff --git a/data/xml/2022.acl.xml b/data/xml/2022.acl.xml index b81c16095d..57f536bd80 100644 --- a/data/xml/2022.acl.xml +++ b/data/xml/2022.acl.xml @@ -34,6 +34,7 @@ MultiRC QNLI SST + 10.18653/v1/2022.acl-long.1 Quantified Reproducibility Assessment of <fixed-case>NLP</fixed-case> Results @@ -44,6 +45,7 @@ This paper describes and tests a method for carrying out quantified reproducibility assessment (QRA) that is based on concepts and definitions from metrology. QRA produces a single score estimating the degree of reproducibility of a given system and evaluation measure, on the basis of the scores from, and differences between, different reproductions. We test QRA on 18 different system and evaluation measure combinations (involving diverse NLP tasks and types of evaluation), for each of which we have the original results and one to seven reproduction results. The proposed QRA method produces degree-of-reproducibility scores that are comparable across multiple reproductions not only of the same, but also of different, original studies. We find that the proposed method facilitates insights into causes of variation between reproductions, and as a result, allows conclusions to be drawn about what aspects of system and/or evaluation design need to be changed in order to improve reproducibility. 2022.acl-long.2 belz-etal-2022-quantified + 10.18653/v1/2022.acl-long.2 Rare Tokens Degenerate All Tokens: Improving Neural Text Generation via Adaptive Gradient Gating for Rare Token Embeddings @@ -59,6 +61,7 @@ yu-etal-2022-rare WikiText-103 WikiText-2 + 10.18653/v1/2022.acl-long.3 <fixed-case>A</fixed-case>leph<fixed-case>BERT</fixed-case>: Language Model Pre-training and Evaluation from Sub-Word to Sentence Level @@ -73,6 +76,7 @@ 2022.acl-long.4 seker-etal-2022-alephbert OSCAR + 10.18653/v1/2022.acl-long.4 Learning to Imagine: Integrating Counterfactual Thinking in Neural Discrete Reasoning @@ -88,6 +92,7 @@ li-etal-2022-learning DROP HybridQA + 10.18653/v1/2022.acl-long.5 Domain Adaptation in Multilingual and Multi-Domain Monolingual Settings for Complex Word Identification @@ -99,6 +104,7 @@ Complex word identification (CWI) is a cornerstone process towards proper text simplification. CWI is highly dependent on context, whereas its difficulty is augmented by the scarcity of available datasets which vary greatly in terms of domains and languages. As such, it becomes increasingly more difficult to develop a robust model that generalizes across a wide array of input examples. In this paper, we propose a novel training technique for the CWI task based on domain adaptation to improve the target character and context representations. This technique addresses the problem of working with multiple domains, inasmuch as it creates a way of smoothing the differences between the explored datasets. Moreover, we also propose a similar auxiliary task, namely text simplification, that can be used to complement lexical complexity prediction. Our model obtains a boost of up to 2.42% in terms of Pearson Correlation Coefficients in contrast to vanilla training techniques, when considering the CompLex from the Lexical Complexity Prediction 2021 dataset. At the same time, we obtain an increase of 3% in Pearson scores, while considering a cross-lingual setup relying on the Complex Word Identification 2018 dataset. In addition, our model yields state-of-the-art results in terms of Mean Absolute Error. 2022.acl-long.6 zaharia-etal-2022-domain + 10.18653/v1/2022.acl-long.6 <fixed-case>J</fixed-case>oint<fixed-case>CL</fixed-case>: A Joint Contrastive Learning Framework for Zero-Shot Stance Detection @@ -115,6 +121,7 @@ 2022.acl-long.7.software.zip liang-etal-2022-jointcl hitsz-hlt/jointcl + 10.18653/v1/2022.acl-long.7 [<fixed-case>CASPI</fixed-case>] Causal-aware Safe Policy Improvement for Task-oriented Dialogue @@ -126,6 +133,7 @@ 2022.acl-long.8 ramachandran-etal-2022-caspi MultiWOZ + 10.18653/v1/2022.acl-long.8 <fixed-case>U</fixed-case>ni<fixed-case>T</fixed-case>ran<fixed-case>S</fixed-case>e<fixed-case>R</fixed-case>: A Unified Transformer Semantic Representation Framework for Multimodal Task-Oriented Dialog System @@ -139,6 +147,7 @@ 2022.acl-long.9.software.zip ma-etal-2022-unitranser MMD + 10.18653/v1/2022.acl-long.9 Dynamic Schema Graph Fusion Network for Multi-Domain Dialogue State Tracking @@ -152,6 +161,7 @@ 2022.acl-long.10 feng-etal-2022-dynamic SGD + 10.18653/v1/2022.acl-long.10 Attention Temperature Matters in Abstractive Summarization Distillation @@ -165,6 +175,7 @@ 2022.acl-long.11.software.zip zhang-etal-2022-attention shengqiang-zhang/plate + 10.18653/v1/2022.acl-long.11 Towards Making the Most of Cross-Lingual Transfer for Zero-Shot Neural Machine Translation @@ -182,6 +193,7 @@ ghchen18/acl22-sixtp FLORES-101 FLoRes + 10.18653/v1/2022.acl-long.12 <fixed-case>T</fixed-case>op<fixed-case>WORDS</fixed-case>-Seg: Simultaneous Text Segmentation and Word Discovery for Open-Domain <fixed-case>C</fixed-case>hinese Texts via <fixed-case>B</fixed-case>ayesian Inference @@ -192,6 +204,7 @@ Processing open-domain Chinese texts has been a critical bottleneck in computational linguistics for decades, partially because text segmentation and word discovery often entangle with each other in this challenging scenario. No existing methods yet can achieve effective text segmentation and word discovery simultaneously in open domain. This study fills in this gap by proposing a novel method called TopWORDS-Seg based on Bayesian inference, which enjoys robust performance and transparent interpretation when no training corpus and domain vocabulary are available. Advantages of TopWORDS-Seg are demonstrated by a series of experimental studies. 2022.acl-long.13 pan-etal-2022-topwords + 10.18653/v1/2022.acl-long.13 An Unsupervised Multiple-Task and Multiple-Teacher Model for Cross-lingual Named Entity Recognition @@ -207,6 +220,7 @@ 2022.acl-long.14.software.zip li-etal-2022-unsupervised-multiple CoNLL-2003 + 10.18653/v1/2022.acl-long.14 Discriminative Marginalized Probabilistic Neural Method for Multi-Document Summarization of Medical Literature @@ -218,6 +232,7 @@ Although current state-of-the-art Transformer-based solutions succeeded in a wide range for single-document NLP tasks, they still struggle to address multi-input tasks such as multi-document summarization. Many solutions truncate the inputs, thus ignoring potential summary-relevant contents, which is unacceptable in the medical domain where each information can be vital. Others leverage linear model approximations to apply multi-input concatenation, worsening the results because all information is considered, even if it is conflicting or noisy with respect to a shared background. Despite the importance and social impact of medicine, there are no ad-hoc solutions for multi-document summarization. For this reason, we propose a novel discriminative marginalized probabilistic method (DAMEN) trained to discriminate critical information from a cluster of topic-related medical documents and generate a multi-document summary via token probability marginalization. Results prove we outperform the previous state-of-the-art on a biomedical dataset for multi-document summarization of systematic literature reviews. Moreover, we perform extensive ablation studies to motivate the design choices and prove the importance of each module of our method. 2022.acl-long.15 moro-etal-2022-discriminative + 10.18653/v1/2022.acl-long.15 Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm @@ -239,6 +254,7 @@ huang-etal-2022-sparse GLUE QNLI + 10.18653/v1/2022.acl-long.16 <fixed-case>C</fixed-case>ipher<fixed-case>DA</fixed-case>ug: Ciphertext based Data Augmentation for Neural Machine Translation @@ -250,6 +266,7 @@ 2022.acl-long.17 kambhatla-etal-2022-cipherdaug protonish/cipherdaug-nmt + 10.18653/v1/2022.acl-long.17 Overlap-based Vocabulary Generation Improves Cross-lingual Transfer Among Related Languages @@ -262,6 +279,7 @@ patil-etal-2022-overlap vaidehi99/obpe XNLI + 10.18653/v1/2022.acl-long.18 Long-range Sequence Modeling with Predictable Sparse Attention @@ -273,6 +291,7 @@ 2022.acl-long.19 zhuang-etal-2022-long LRA + 10.18653/v1/2022.acl-long.19 Improving Personalized Explanation Generation through Visualization @@ -286,6 +305,7 @@ In modern recommender systems, there are usually comments or reviews from users that justify their ratings for different items. Trained on such textual corpus, explainable recommendation models learn to discover user interests and generate personalized explanations. Though able to provide plausible explanations, existing models tend to generate repeated sentences for different items or empty sentences with insufficient details. This begs an interesting question: can we immerse the models in a multimodal environment to gain proper awareness of real-world concepts and alleviate above shortcomings? To this end, we propose a visually-enhanced approach named METER with the help of visualization generation and text–image matching discrimination: the explainable recommendation model is encouraged to visualize what it refers to while incurring a penalty if the visualization is incongruent with the textual explanation. Experimental results and a manual assessment demonstrate that our approach can improve not only the text quality but also the diversity and explainability of the generated explanations. 2022.acl-long.20 geng-etal-2022-improving + 10.18653/v1/2022.acl-long.20 New Intent Discovery with Pre-training and Contrastive Learning @@ -300,6 +320,7 @@ zhang-etal-2022-new zhang-yu-wei/mtp-clnn CLINC150 + 10.18653/v1/2022.acl-long.21 <fixed-case>M</fixed-case>odeling <fixed-case>U.S.</fixed-case> State-Level Policies by Extracting Winners and Losers from Legislative Texts @@ -310,6 +331,7 @@ Decisions on state-level policies have a deep effect on many aspects of our everyday life, such as health-care and education access. However, there is little understanding of how these policies and decisions are being formed in the legislative process. We take a data-driven approach by decoding the impact of legislation on relevant stakeholders (e.g., teachers in education bills) to understand legislators’ decision-making process and votes. We build a new dataset for multiple US states that interconnects multiple sources of data including bills, stakeholders, legislators, and money donors. Next, we develop a textual graph-based model to embed and analyze state bills. Our model predicts winners/losers of bills and then utilizes them to better determine the legislative body’s vote breakdown according to demographic/ideological criteria, e.g., gender. 2022.acl-long.22 davoodi-etal-2022-modeling + 10.18653/v1/2022.acl-long.22 Structural Characterization for Dialogue Disentanglement @@ -323,6 +345,7 @@ ma-etal-2022-structural xbmxb/structurecharacterization4dd Molweni + 10.18653/v1/2022.acl-long.23 Multi-Party Empathetic Dialogue Generation: A New Task for Dialog Systems @@ -338,6 +361,7 @@ zhu-etal-2022-multi MELD PEC + 10.18653/v1/2022.acl-long.24 <fixed-case>MISC</fixed-case>: A Mixed Strategy-Aware Model integrating <fixed-case>COMET</fixed-case> for Emotional Support Conversation @@ -354,6 +378,7 @@ morecry/misc ATOMIC ConceptNet + 10.18653/v1/2022.acl-long.25 <fixed-case>GLM</fixed-case>: General Language Model Pretraining with Autoregressive Blank Infilling @@ -375,6 +400,7 @@ SuperGLUE WikiText-103 WikiText-2 + 10.18653/v1/2022.acl-long.26 <fixed-case>Q</fixed-case>uote<fixed-case>R</fixed-case>: A Benchmark of Quote Recommendation for Writing @@ -391,6 +417,7 @@ qi-etal-2022-quoter thunlp/quoter BookCorpus + 10.18653/v1/2022.acl-long.27 Towards Comprehensive Patent Approval Predictions:Beyond Traditional Document Classification @@ -405,6 +432,7 @@ Predicting the approval chance of a patent application is a challenging problem involving multiple facets. The most crucial facet is arguably the novelty — 35 U.S. Code § 102 rejects more recent applications that have very similar prior arts. Such novelty evaluations differ the patent approval prediction from conventional document classification — Successful patent applications may share similar writing patterns; however, too-similar newer applications would receive the opposite label, thus confusing standard document classifiers (e.g., BERT). To address this issue, we propose a novel framework that unifies the document classifier with handcrafted features, particularly time-dependent novelty scores. Specifically, we formulate the novelty scores by comparing each application with millions of prior arts using a hybrid of efficient filters and a neural bi-encoder. Moreover, we impose a new regularization term into the classification objective to enforce the monotonic change of approval prediction w.r.t. novelty scores. From extensive experiments on a large-scale USPTO dataset, we find that standard BERT fine-tuning can partially learn the correct relationship between novelty and approvals from inconsistent data. However, our time-dependent novelty features offer a boost on top of it. Also, our monotonic regularization, while shrinking the search space, can drive the optimizer to better local optima, yielding a further small performance gain. 2022.acl-long.28 gao-etal-2022-towards + 10.18653/v1/2022.acl-long.28 Hypergraph <fixed-case>T</fixed-case>ransformer: <fixed-case>W</fixed-case>eakly-Supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering @@ -420,6 +448,7 @@ yujungheo/kbvqa-public DBpedia Visual Question Answering + 10.18653/v1/2022.acl-long.29 Cross-Utterance Conditioned <fixed-case>VAE</fixed-case> for Non-Autoregressive Text-to-Speech @@ -438,6 +467,7 @@ li-etal-2022-cross-utterance neurowave-ai/cucvae-tts LJSpeech + 10.18653/v1/2022.acl-long.30 Mix and Match: Learning-free Controllable Text Generationusing Energy Language Models @@ -450,6 +480,7 @@ mireshghallah-etal-2022-mix mireshghallah/mixmatch GYAFC + 10.18653/v1/2022.acl-long.31 So Different Yet So Alike! Constrained Unsupervised Text Style Transfer @@ -463,6 +494,7 @@ 2022.acl-long.32 ramesh-kashyap-etal-2022-different abhinavkashyap/dct + 10.18653/v1/2022.acl-long.32 e-<fixed-case>CARE</fixed-case>: a New Dataset for Exploring Explainable Causal Reasoning @@ -479,6 +511,7 @@ COPA CommonsenseQA GenericsKB + 10.18653/v1/2022.acl-long.33 Fantastic Questions and Where to Find Them: <fixed-case>F</fixed-case>airytale<fixed-case>QA</fixed-case> – An Authentic Dataset for Narrative Comprehension @@ -508,6 +541,7 @@ CLOTH NarrativeQA RACE + 10.18653/v1/2022.acl-long.34 <fixed-case>K</fixed-case>a<fixed-case>FSP</fixed-case>: Knowledge-Aware Fuzzy Semantic Parsing for Conversational Question Answering over a Large-Scale Knowledge Base @@ -519,6 +553,7 @@ 2022.acl-long.35.software.zip li-xiong-2022-kafsp CSQA + 10.18653/v1/2022.acl-long.35 Multilingual Knowledge Graph Completion with Self-Supervised Adaptive Graph Alignment @@ -536,6 +571,7 @@ 2022.acl-long.36 huang-etal-2022-multilingual amzn/ss-aga-kgc + 10.18653/v1/2022.acl-long.36 Modeling Hierarchical Syntax Structure with Triplet Position for Source Code Summarization @@ -548,6 +584,7 @@ Automatic code summarization, which aims to describe the source code in natural language, has become an essential task in software maintenance. Our fellow researchers have attempted to achieve such a purpose through various machine learning-based approaches. One key challenge keeping these approaches from being practical lies in the lacking of retaining the semantic structure of source code, which has unfortunately been overlooked by the state-of-the-art. Existing approaches resort to representing the syntax structure of code by modeling the Abstract Syntax Trees (ASTs). However, the hierarchical structures of ASTs have not been well explored. In this paper, we propose CODESCRIBE to model the hierarchical syntax structure of code by introducing a novel triplet position for code summarization. Specifically, CODESCRIBE leverages the graph neural network and Transformer to preserve the structural and sequential information of code, respectively. In addition, we propose a pointer-generator network that pays attention to both the structure and sequential tokens of code for a better summary generation. Experiments on two real-world datasets in Java and Python demonstrate the effectiveness of our proposed approach when compared with several state-of-the-art baselines. 2022.acl-long.37 guo-etal-2022-modeling + 10.18653/v1/2022.acl-long.37 <fixed-case>F</fixed-case>ew<fixed-case>NLU</fixed-case>: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding @@ -575,6 +612,7 @@ MultiRC SuperGLUE WSC + 10.18653/v1/2022.acl-long.38 Learn to Adapt for Generalized Zero-Shot Text Classification @@ -590,6 +628,7 @@ zhang-etal-2022-learn quareia/lta ATIS + 10.18653/v1/2022.acl-long.39 <fixed-case>T</fixed-case>able<fixed-case>F</fixed-case>ormer: Robust Transformer Modeling for Table-Text Encoding @@ -606,6 +645,7 @@ google-research/tapas SQA TabFact + 10.18653/v1/2022.acl-long.40 Perceiving the World: Question-guided Reinforcement Learning for Text-based Games @@ -620,6 +660,7 @@ 2022.acl-long.41 xu-etal-2022-perceiving yunqiuxu/qwa + 10.18653/v1/2022.acl-long.41 Neural Label Search for Zero-Shot Multi-Lingual Extractive Summarization @@ -635,6 +676,7 @@ jia-etal-2022-neural MLSUM WikiLingua + 10.18653/v1/2022.acl-long.42 Few-Shot Class-Incremental Learning for Named Entity Recognition @@ -649,6 +691,7 @@ Previous work of class-incremental learning for Named Entity Recognition (NER) relies on the assumption that there exists abundance of labeled data for the training of new classes. In this work, we study a more challenging but practical problem, i.e., few-shot class-incremental learning for NER, where an NER model is trained with only few labeled samples of the new classes, without forgetting knowledge of the old ones. To alleviate the problem of catastrophic forgetting in few-shot class-incremental learning, we reconstruct synthetic training data of the old classes using the trained NER model, augmenting the training of new classes. We further develop a framework that distills from the existing model with both synthetic data, and real data from the current training set. Experimental results show that our approach achieves significant improvements over existing baselines. 2022.acl-long.43 wang-etal-2022-shot + 10.18653/v1/2022.acl-long.43 Improving Meta-learning for Low-resource Text Classification and Generation via Memory Imitation @@ -666,6 +709,7 @@ 2022.acl-long.44.software.zip zhao-etal-2022-improving PERSONA-CHAT + 10.18653/v1/2022.acl-long.44 Quality Controlled Paraphrase Generation @@ -681,6 +725,7 @@ bandel-etal-2022-quality ibm/quality-controlled-paraphrase-generation COCO + 10.18653/v1/2022.acl-long.45 Controllable Dictionary Example Generation: Generating Example Sentences for Specific Targeted Audiences @@ -691,6 +736,7 @@ 2022.acl-long.46 2022.acl-long.46.software.zip he-yiu-2022-controllable + 10.18653/v1/2022.acl-long.46 <fixed-case>A</fixed-case>ra<fixed-case>T</fixed-case>5: Text-to-Text Transformers for <fixed-case>A</fixed-case>rabic Language Generation @@ -703,6 +749,7 @@ nagoudi-etal-2022-arat5 C4 mC4 + 10.18653/v1/2022.acl-long.47 Legal Judgment Prediction via Event Extraction with Constraints @@ -715,6 +762,7 @@ 2022.acl-long.48.software.zip feng-etal-2022-legal wapay/epm + 10.18653/v1/2022.acl-long.48 Answer-level Calibration for Free-form Multiple Choice Question Answering @@ -733,6 +781,7 @@ SWAG Social IQA WinoGrande + 10.18653/v1/2022.acl-long.49 Learning When to Translate for Streaming Speech @@ -747,6 +796,7 @@ dong-etal-2022-learning dqqcasia/mosst MuST-C + 10.18653/v1/2022.acl-long.50 Compact Token Representations with Contextual Quantization for Efficient Document Re-ranking @@ -759,6 +809,7 @@ yang-etal-2022-compact yingrui-yang/ContextualQuantizer MS MARCO + 10.18653/v1/2022.acl-long.51 Early Stopping Based on Unlabeled Samples in Text Classification @@ -774,6 +825,7 @@ AG News IMDb Movie Reviews SST + 10.18653/v1/2022.acl-long.52 Meta-learning via Language Model In-context Tuning @@ -788,6 +840,7 @@ chen-etal-2022-meta yandachen/in-context-tuning LAMA + 10.18653/v1/2022.acl-long.53 It is <fixed-case>AI</fixed-case>’s Turn to Ask Humans a Question: Question-Answer Pair Generation for Children’s Story Books @@ -806,6 +859,7 @@ MS MARCO NarrativeQA PAQ + 10.18653/v1/2022.acl-long.54 Prompt-Based Rule Discovery and Boosting for Interactive Weakly-Supervised Learning @@ -820,6 +874,7 @@ zhang-etal-2022-prompt rz-zhang/prboost AG News + 10.18653/v1/2022.acl-long.55 Constrained Multi-Task Learning for Bridging Resolution @@ -831,6 +886,7 @@ 2022.acl-long.56 kobayashi-etal-2022-constrained juntaoy/dali-bridging + 10.18653/v1/2022.acl-long.56 <fixed-case>DEAM</fixed-case>: Dialogue Coherence Evaluation using <fixed-case>AMR</fixed-case>-based Semantic Manipulations @@ -846,6 +902,7 @@ FED PERSONA-CHAT Topical-Chat + 10.18653/v1/2022.acl-long.57 <fixed-case>HIBRIDS</fixed-case>: Attention with Hierarchical Biases for Structure-aware Long Document Summarization @@ -855,6 +912,7 @@ Document structure is critical for efficient information consumption. However, it is challenging to encode it efficiently into the modern Transformer architecture. In this work, we present HIBRIDS, which injects Hierarchical Biases foR Incorporating Document Structure into attention score calculation. We further present a new task, hierarchical question-summary generation, for summarizing salient content in the source document into a hierarchy of questions and summaries, where each follow-up question inquires about the content of its parent question-summary pair. We also annotate a new dataset with 6,153 question-summary hierarchies labeled on government reports. Experiment results show that our model produces better question-summary hierarchies than comparisons on both hierarchy quality and content coverage, a finding also echoed by human judges. Additionally, our model improves the generation of long-form summaries from long government reports and Wikipedia articles, as measured by ROUGE scores. 2022.acl-long.58 cao-wang-2022-hibrids + 10.18653/v1/2022.acl-long.58 De-Bias for Generative Extraction in Unified <fixed-case>NER</fixed-case> Task @@ -868,6 +926,7 @@ 2022.acl-long.59 zhang-etal-2022-de GENIA + 10.18653/v1/2022.acl-long.59 An Information-theoretic Approach to Prompt Engineering Without Ground Truth Labels @@ -890,6 +949,7 @@ CommonsenseQA IMDb Movie Reviews LAMBADA + 10.18653/v1/2022.acl-long.60 Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation @@ -902,6 +962,7 @@ wang-etal-2022-expanding cindyxinyiwang/expand-via-lexicon-based-adaptation MasakhaNER + 10.18653/v1/2022.acl-long.61 Language-agnostic <fixed-case>BERT</fixed-case> Sentence Embedding @@ -918,6 +979,7 @@ MPQA Opinion Corpus SST SentEval + 10.18653/v1/2022.acl-long.62 Nested Named Entity Recognition with Span-level Graphs @@ -930,6 +992,7 @@ 2022.acl-long.63 wan-etal-2022-nested GENIA + 10.18653/v1/2022.acl-long.63 <fixed-case>C</fixed-case>og<fixed-case>T</fixed-case>askonomy: Cognitively Inspired Task Taxonomy Is Beneficial to Transfer Learning in <fixed-case>NLP</fixed-case> @@ -944,6 +1007,7 @@ GLUE QNLI Taskonomy + 10.18653/v1/2022.acl-long.64 <fixed-case>R</fixed-case>o<fixed-case>CB</fixed-case>ert: Robust <fixed-case>C</fixed-case>hinese Bert with Multimodal Contrastive Pretraining @@ -959,6 +1023,7 @@ 2022.acl-long.65 2022.acl-long.65.software.zip su-etal-2022-rocbert + 10.18653/v1/2022.acl-long.65 Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues @@ -982,6 +1047,7 @@ SNLI-VE VCR Visual Question Answering + 10.18653/v1/2022.acl-long.66 Parallel Instance Query Network for Named Entity Recognition @@ -1006,6 +1072,7 @@ GENIA NNE OntoNotes 5.0 + 10.18653/v1/2022.acl-long.67 <fixed-case>P</fixed-case>rophet<fixed-case>C</fixed-case>hat: Enhancing Dialogue Generation with Simulation of Future Conversation @@ -1022,6 +1089,7 @@ liu-etal-2022-prophetchat DailyDialog PERSONA-CHAT + 10.18653/v1/2022.acl-long.68 Modeling Multi-hop Question Answering as Single Sequence Prediction @@ -1037,6 +1105,7 @@ HotpotQA IIRC SQuAD + 10.18653/v1/2022.acl-long.69 Learning Disentangled Semantic Representations for Zero-Shot Cross-Lingual Transfer in Multilingual Machine Reading Comprehension @@ -1057,6 +1126,7 @@ TyDi QA TyDiQA-GoldP XQuAD + 10.18653/v1/2022.acl-long.70 Multi-Granularity Structural Knowledge Distillation for Language Model Compression @@ -1074,6 +1144,7 @@ MRPC QNLI SST + 10.18653/v1/2022.acl-long.71 Auto-Debias: Debiasing Masked Language Models with Automated Biased Prompts @@ -1086,6 +1157,7 @@ guo-etal-2022-auto CrowS-Pairs GLUE + 10.18653/v1/2022.acl-long.72 Where to Go for the Holidays: Towards Mixed-Type Dialogs for Clarification of User Goals @@ -1103,6 +1175,7 @@ DuRecDial KdConv MultiWOZ + 10.18653/v1/2022.acl-long.73 Semi-supervised Domain Adaptation for Dependency Parsing with Dynamic Matching Network @@ -1113,6 +1186,7 @@ Supervised parsing models have achieved impressive results on in-domain texts. However, their performances drop drastically on out-of-domain texts due to the data distribution shift. The shared-private model has shown its promising advantages for alleviating this problem via feature separation, whereas prior works pay more attention to enhance shared features but neglect the in-depth relevance of specific ones. To address this issue, we for the first time apply a dynamic matching network on the shared-private model for semi-supervised cross-domain dependency parsing. Meanwhile, considering the scarcity of target-domain labeled data, we leverage unlabeled data from two aspects, i.e., designing a new training strategy to improve the capability of the dynamic matching network and fine-tuning BERT to obtain domain-related contextualized representations. Experiments on benchmark datasets show that our proposed model consistently outperforms various baselines, leading to new state-of-the-art results on all domains. Detailed analysis on different matching strategies demonstrates that it is essential to learn suitable matching weights to emphasize useful features and ignore useless or even harmful ones. Besides, our proposed model can be directly extended to multi-source domain adaptation and achieves best performances among various baselines, further verifying the effectiveness and robustness. 2022.acl-long.74 li-etal-2022-semi + 10.18653/v1/2022.acl-long.74 A Closer Look at How Fine-tuning Changes <fixed-case>BERT</fixed-case> @@ -1123,6 +1197,7 @@ 2022.acl-long.75 zhou-srikumar-2022-closer utahnlp/BERT-fine-tuning-analysis + 10.18653/v1/2022.acl-long.75 Sentence-aware Contrastive Learning for Open-Domain Passage Retrieval @@ -1137,6 +1212,7 @@ Natural Questions SQuAD TriviaQA + 10.18653/v1/2022.acl-long.76 <fixed-case>F</fixed-case>ai<fixed-case>RR</fixed-case>: Faithful and Robust Deductive Reasoning over Natural Language @@ -1150,6 +1226,7 @@ sanyal-etal-2022-fairr ink-usc/fairr ProofWriter + 10.18653/v1/2022.acl-long.77 <fixed-case>H</fixed-case>i<fixed-case>T</fixed-case>ab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation @@ -1171,6 +1248,7 @@ FinQA ToTTo WikiSQL + 10.18653/v1/2022.acl-long.78 Doctor Recommendation in Online Health Forums via Expertise Learning @@ -1183,6 +1261,7 @@ 2022.acl-long.79 lu-etal-2022-doctor polyusmart/doctor-recommendation + 10.18653/v1/2022.acl-long.79 Continual Prompt Tuning for Dialog State Tracking @@ -1197,6 +1276,7 @@ 2022.acl-long.80.software.zip zhu-etal-2022-continual thu-coai/cpt4dst + 10.18653/v1/2022.acl-long.80 There’s a Time and Place for Reasoning Beyond the Image @@ -1211,6 +1291,7 @@ fu-etal-2022-theres zeyofu/tara WIT + 10.18653/v1/2022.acl-long.81 <fixed-case>FORTAP</fixed-case>: Using Formulas for Numerical-Reasoning-Aware Table Pretraining @@ -1227,6 +1308,7 @@ 2022.acl-long.82.software.zip cheng-etal-2022-fortap microsoft/TUTA_table_understanding + 10.18653/v1/2022.acl-long.82 Multimodal fusion via cortical network inspired losses @@ -1235,6 +1317,7 @@ Information integration from different modalities is an active area of research. Human beings and, in general, biological neural systems are quite adept at using a multitude of signals from different sensory perceptive fields to interact with the environment and each other. Recent work in deep fusion models via neural networks has led to substantial improvements over unimodal approaches in areas like speech recognition, emotion recognition and analysis, captioning and image description. However, such research has mostly focused on architectural changes allowing for fusion of different modalities while keeping the model complexity manageable.Inspired by neuroscientific ideas about multisensory integration and processing, we investigate the effect of introducing neural dependencies in the loss functions. Experiments on multimodal sentiment analysis tasks with different models show that our approach provides a consistent performance boost. 2022.acl-long.83 shankar-2022-multimodal + 10.18653/v1/2022.acl-long.83 Modeling Temporal-Modal Entity Graph for Procedural Multimodal Machine Comprehension @@ -1252,6 +1335,7 @@ zhang-etal-2022-modeling RecipeQA Visual Question Answering + 10.18653/v1/2022.acl-long.84 Explanation Graph Generation via Pre-trained Language Models: An Empirical Study with Contrastive Learning @@ -1264,6 +1348,7 @@ 2022.acl-long.85.software.zip saha-etal-2022-explanation swarnahub/explagraphgen + 10.18653/v1/2022.acl-long.85 Unsupervised Extractive Opinion Summarization Using Sparse Coding @@ -1276,6 +1361,7 @@ 2022.acl-long.86.software.zip basu-roy-chowdhury-etal-2022-unsupervised brcsomnath/semae + 10.18653/v1/2022.acl-long.86 <fixed-case>L</fixed-case>ex<fixed-case>S</fixed-case>ub<fixed-case>C</fixed-case>on: Integrating Knowledge from Lexical Resources into Contextual Embeddings for Lexical Substitution @@ -1289,6 +1375,7 @@ 2022.acl-long.87.software.zip michalopoulos-etal-2022-lexsubcon gmichalo/lexsubcon + 10.18653/v1/2022.acl-long.87 Think Before You Speak: Explicitly Generating Implicit Commonsense Knowledge for Response Generation @@ -1307,6 +1394,7 @@ zhou-etal-2022-think ConceptNet MuTual + 10.18653/v1/2022.acl-long.88 Flow-Adapter Architecture for Unsupervised Machine Translation @@ -1317,6 +1405,7 @@ In this work, we propose a flow-adapter architecture for unsupervised NMT. It leverages normalizing flows to explicitly model the distributions of sentence-level latent representations, which are subsequently used in conjunction with the attention mechanism for the translation task. The primary novelties of our model are: (a) capturing language-specific sentence representations separately for each language using normalizing flows and (b) using a simple transformation of these latent representations for translating from one language to another. This architecture allows for unsupervised training of each language independently. While there is prior work on latent variables for supervised MT, to the best of our knowledge, this is the first work that uses latent variables and normalizing flows for unsupervised MT. We obtain competitive results on several unsupervised MT benchmarks. 2022.acl-long.89 liu-etal-2022-flow + 10.18653/v1/2022.acl-long.89 Efficient Unsupervised Sentence Compression by Fine-tuning Transformers with Reinforcement Learning @@ -1330,6 +1419,7 @@ complementizer/rl-sentence-compression NEWSROOM Sentence Compression + 10.18653/v1/2022.acl-long.90 Tracing Origins: Coreference-aware Machine Reading Comprehension @@ -1346,6 +1436,7 @@ Quoref SQuAD SearchQA + 10.18653/v1/2022.acl-long.91 <fixed-case>W</fixed-case>at<fixed-case>C</fixed-case>laim<fixed-case>C</fixed-case>heck: A new Dataset for Claim Entailment and Inference @@ -1357,6 +1448,7 @@ 2022.acl-long.92 khan-etal-2022-watclaimcheck PUBHEALTH + 10.18653/v1/2022.acl-long.92 <fixed-case>F</fixed-case>rugal<fixed-case>S</fixed-case>core: Learning Cheaper, Lighter and Faster Evaluation Metrics for Automatic Text Generation @@ -1369,6 +1461,7 @@ 2022.acl-long.93 kamal-eddine-etal-2022-frugalscore CNN/Daily Mail + 10.18653/v1/2022.acl-long.93 A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation @@ -1385,6 +1478,7 @@ narayan-etal-2022-well google-research/language SQuAD + 10.18653/v1/2022.acl-long.94 Synthetic Question Value Estimation for Domain Adaptation of Question Answering @@ -1400,6 +1494,7 @@ Natural Questions NewsQA TriviaQA + 10.18653/v1/2022.acl-long.95 Better Language Model with Hypernym Class Prediction @@ -1415,6 +1510,7 @@ richardbaihe/robustlm WikiText-103 WikiText-2 + 10.18653/v1/2022.acl-long.96 Tackling Fake News Detection by Continually Improving Social Context Representations using Graph Neural Networks @@ -1426,6 +1522,7 @@ 2022.acl-long.97 mehta-etal-2022-tackling hockeybro12/fakenews_inference_operators + 10.18653/v1/2022.acl-long.97 Understanding Gender Bias in Knowledge Base Embeddings @@ -1439,6 +1536,7 @@ Knowledge base (KB) embeddings have been shown to contain gender biases. In this paper, we study two questions regarding these biases: how to quantify them, and how to trace their origins in KB? Specifically, first, we develop two novel bias measures respectively for a group of person entities and an individual person entity. Evidence of their validity is observed by comparison with real-world census data. Second, we use the influence function to inspect the contribution of each triple in KB to the overall group bias. To exemplify the potential applications of our study, we also present two strategies (by adding and removing KB triples) to mitigate gender biases in KB embeddings. 2022.acl-long.98 du-etal-2022-understanding + 10.18653/v1/2022.acl-long.98 Computational Historical Linguistics and Language Diversity in <fixed-case>S</fixed-case>outh <fixed-case>A</fixed-case>sia @@ -1451,6 +1549,7 @@ 2022.acl-long.99 arora-etal-2022-computational Universal Dependencies + 10.18653/v1/2022.acl-long.99 Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization @@ -1465,6 +1564,7 @@ ladhak-etal-2022-faithful fladhak/effective-faithfulness WikiHow + 10.18653/v1/2022.acl-long.100 Slangvolution: <fixed-case>A</fixed-case> Causal Analysis of Semantic Change and Frequency Dynamics in Slang @@ -1478,6 +1578,7 @@ 2022.acl-long.101.software.zip keidar-etal-2022-slangvolution andreasopedal/slangvolution + 10.18653/v1/2022.acl-long.101 Spurious Correlations in Reference-Free Evaluation of Text Generation @@ -1491,6 +1592,7 @@ esdurmus/adversarial_eval DailyDialog PERSONA-CHAT + 10.18653/v1/2022.acl-long.102 On The Ingredients of an Effective Zero-shot Semantic Parser @@ -1502,6 +1604,7 @@ Semantic parsers map natural language utterances into meaning representations (e.g., programs). Such models are typically bottlenecked by the paucity of training data due to the required laborious annotation efforts. Recent studies have performed zero-shot learning by synthesizing training examples of canonical utterances and programs from a grammar, and further paraphrasing these utterances to improve linguistic diversity. However, such synthetic examples cannot fully capture patterns in real data. In this paper we analyze zero-shot parsers through the lenses of the language and logical gaps (Herzig and Berant, 2019), which quantify the discrepancy of language and programmatic patterns between the canonical examples and real-world user-issued ones. We propose bridging these gaps using improved grammars, stronger paraphrasers, and efficient learning methods using canonical examples that most likely reflect real user intents. Our model achieves strong performance on two semantic parsing benchmarks (Scholar, Geo) with zero labeled data. 2022.acl-long.103 yin-etal-2022-ingredients + 10.18653/v1/2022.acl-long.103 Bias Mitigation in Machine Translation Quality Estimation @@ -1516,6 +1619,7 @@ agesb/transquest MLQE-PE WikiMatrix + 10.18653/v1/2022.acl-long.104 Unified Speech-Text Pre-training for Speech Translation and Recognition @@ -1537,6 +1641,7 @@ Libri-Light LibriSpeech MuST-C + 10.18653/v1/2022.acl-long.105 Match the Script, Adapt if Multilingual: Analyzing the Effect of Multilingual Pretraining on Cross-lingual Transferability @@ -1548,6 +1653,7 @@ 2022.acl-long.106 fujinuma-etal-2022-match XNLI + 10.18653/v1/2022.acl-long.106 Structured Pruning Learns Compact and Accurate Models @@ -1566,6 +1672,7 @@ QNLI SQuAD SST + 10.18653/v1/2022.acl-long.107 How can <fixed-case>NLP</fixed-case> Help Revitalize Endangered Languages? A Case Study and Roadmap for the <fixed-case>C</fixed-case>herokee Language @@ -1577,6 +1684,7 @@ 2022.acl-long.108 zhang-etal-2022-nlp zhangshiyue/revitalizecherokee + 10.18653/v1/2022.acl-long.108 Differentiable Multi-Agent Actor-Critic for Multi-Step Radiology Report Summarization @@ -1588,6 +1696,7 @@ The IMPRESSIONS section of a radiology report about an imaging study is a summary of the radiologist’s reasoning and conclusions, and it also aids the referring physician in confirming or excluding certain diagnoses. A cascade of tasks are required to automatically generate an abstractive summary of the typical information-rich radiology report. These tasks include acquisition of salient content from the report and generation of a concise, easily consumable IMPRESSIONS section. Prior research on radiology report summarization has focused on single-step end-to-end models – which subsume the task of salient content acquisition. To fully explore the cascade structure and explainability of radiology report summarization, we introduce two innovations. First, we design a two-step approach: extractive summarization followed by abstractive summarization. Second, we additionally break down the extractive part into two independent tasks: extraction of salient (1) sentences and (2) keywords. Experiments on English radiology reports from two clinical sites show our novel approach leads to a more precise summary compared to single-step and to two-step-with-single-extractive-process baselines with an overall improvement in F1 score of 3-4%. 2022.acl-long.109 karn-etal-2022-differentiable + 10.18653/v1/2022.acl-long.109 Online Semantic Parsing for Latency Reduction in Task-Oriented Dialogue @@ -1601,6 +1710,7 @@ Standard conversational semantic parsing maps a complete user utterance into an executable program, after which the program is executed to respond to the user. This could be slow when the program contains expensive function calls. We investigate the opportunity to reduce latency by predicting and executing function calls while the user is still speaking. We introduce the task of online semantic parsing for this purpose, with a formal latency reduction metric inspired by simultaneous machine translation. We propose a general framework with first a learned prefix-to-program prediction module, and then a simple yet effective thresholding heuristic for subprogram selection for early execution. Experiments on the SMCalFlow and TreeDST datasets show our approach achieves large latency reduction with good parsing quality, with a 30%–65% latency reduction depending on function execution time and allowed cost. 2022.acl-long.110 zhou-etal-2022-online + 10.18653/v1/2022.acl-long.110 Few-Shot Tabular Data Enrichment Using Fine-Tuned Transformer Architectures @@ -1611,6 +1721,7 @@ 2022.acl-long.111 2022.acl-long.111.software.zip harari-katz-2022-shot + 10.18653/v1/2022.acl-long.111 <fixed-case>S</fixed-case>umm<tex-math>^N</tex-math>: A Multi-Stage Summarization Framework for Long Input Dialogues and Documents @@ -1630,6 +1741,7 @@ psunlpgroup/summ-n GovReport QMSum + 10.18653/v1/2022.acl-long.112 Open Domain Question Answering with A Unified Knowledge Interface @@ -1648,6 +1760,7 @@ Natural Questions OTT-QA WebQuestions + 10.18653/v1/2022.acl-long.113 Principled Paraphrase Generation with Parallel Corpora @@ -1661,6 +1774,7 @@ 2022.acl-long.114 ormazabal-etal-2022-principled aitorormazabal/paraphrasing-from-parallel + 10.18653/v1/2022.acl-long.114 <fixed-case>G</fixed-case>lobal<fixed-case>W</fixed-case>o<fixed-case>Z</fixed-case>: Globalizing <fixed-case>M</fixed-case>ulti<fixed-case>W</fixed-case>o<fixed-case>Z</fixed-case> to Develop Multilingual Task-Oriented Dialogue Systems @@ -1677,6 +1791,7 @@ 2022.acl-long.115.software.zip ding-etal-2022-globalwoz MultiWOZ + 10.18653/v1/2022.acl-long.115 Domain Knowledge Transferring for Pre-trained Language Model via Calibrated Activation Boundary Distillation @@ -1691,6 +1806,7 @@ dmcb-gist/doktra BLUE HOC + 10.18653/v1/2022.acl-long.116 Retrieval-guided Counterfactual Generation for <fixed-case>QA</fixed-case> @@ -1709,6 +1825,7 @@ Quoref SQuAD TriviaQA + 10.18653/v1/2022.acl-long.117 <fixed-case>DYLE</fixed-case>: Dynamic Latent Extraction for Abstractive Long-Input Summarization @@ -1729,6 +1846,7 @@ yale-lily/dyle GovReport QMSum + 10.18653/v1/2022.acl-long.118 Searching for fingerspelled content in <fixed-case>A</fixed-case>merican <fixed-case>S</fixed-case>ign <fixed-case>L</fixed-case>anguage @@ -1740,6 +1858,7 @@ Natural language processing for sign language video—including tasks like recognition, translation, and search—is crucial for making artificial intelligence technologies accessible to deaf individuals, and is gaining research interest in recent years. In this paper, we address the problem of searching for fingerspelled keywords or key phrases in raw sign language videos. This is an important task since significant content in sign language is often conveyed via fingerspelling, and to our knowledge the task has not been studied before. We propose an end-to-end model for this task, FSS-Net, that jointly detects fingerspelling and matches it to a text sequence. Our experiments, done on a large public dataset of ASL fingerspelling in the wild, show the importance of fingerspelling detection as a component of a search and retrieval model. Our model significantly outperforms baseline methods adapted from prior work on related tasks. 2022.acl-long.119 shi-etal-2022-searching + 10.18653/v1/2022.acl-long.119 Skill Induction and Planning with Latent Language @@ -1751,6 +1870,7 @@ 2022.acl-long.120 sharma-etal-2022-skill ALFRED + 10.18653/v1/2022.acl-long.120 <fixed-case>F</fixed-case>ully-<fixed-case>S</fixed-case>emantic <fixed-case>P</fixed-case>arsing and <fixed-case>G</fixed-case>eneration: the <fixed-case>B</fixed-case>abel<fixed-case>N</fixed-case>et <fixed-case>M</fixed-case>eaning <fixed-case>R</fixed-case>epresentation @@ -1762,6 +1882,7 @@ 2022.acl-long.121 martinez-lorenzo-etal-2022-fully sapienzanlp/bmr + 10.18653/v1/2022.acl-long.121 Leveraging Similar Users for Personalized Language Modeling with Limited Data @@ -1774,6 +1895,7 @@ Personalized language models are designed and trained to capture language patterns specific to individual users. This makes them more accurate at predicting what a user will write. However, when a new user joins a platform and not enough text is available, it is harder to build effective personalized language models. We propose a solution for this problem, using a model trained on users that are similar to a new user. In this paper, we explore strategies for finding the similarity between new users and existing ones and methods for using the data from existing users who are a good match. We further explore the trade-off between available data for new users and how well their language can be modeled. 2022.acl-long.122 welch-etal-2022-leveraging + 10.18653/v1/2022.acl-long.122 <fixed-case>DEEP</fixed-case>: <fixed-case>DE</fixed-case>noising Entity Pre-training for Neural Machine Translation @@ -1786,6 +1908,7 @@ 2022.acl-long.123 hu-etal-2022-deep ParaCrawl + 10.18653/v1/2022.acl-long.123 Multi-Modal Sarcasm Detection via Cross-Modal Graph Convolutional Network @@ -1802,6 +1925,7 @@ 2022.acl-long.124 2022.acl-long.124.software.zip liang-etal-2022-multi + 10.18653/v1/2022.acl-long.124 Composable Sparse Fine-Tuning for Cross-Lingual Transfer @@ -1817,6 +1941,7 @@ CoNLL-2003 GLUE MasakhaNER + 10.18653/v1/2022.acl-long.125 Toward Annotator Group Bias in Crowdsourcing @@ -1832,6 +1957,7 @@ Crowdsourcing has emerged as a popular approach for collecting annotated data to train supervised machine learning models. However, annotator bias can lead to defective annotations. Though there are a few works investigating individual annotator bias, the group effects in annotators are largely overlooked. In this work, we reveal that annotators within the same demographic group tend to show consistent group bias in annotation tasks and thus we conduct an initial study on annotator group bias. We first empirically verify the existence of annotator group bias in various real-world crowdsourcing datasets. Then, we develop a novel probabilistic graphical framework GroupAnno to capture annotator group bias with an extended Expectation Maximization (EM) algorithm. We conduct experiments on both synthetic and real-world datasets. Experimental results demonstrate the effectiveness of our model in modeling annotator group bias in label aggregation and model learning over competitive baselines. 2022.acl-long.126 liu-etal-2022-toward + 10.18653/v1/2022.acl-long.126 Under the Morphosyntactic Lens: A Multifaceted Evaluation of Gender Bias in Speech Translation @@ -1847,6 +1973,7 @@ mgaido91/FBK-fairseq-ST Europarl-ST WinoBias + 10.18653/v1/2022.acl-long.127 Answering Open-Domain Multi-Answer Questions via a Recall-then-Verify Framework @@ -1858,6 +1985,7 @@ shao-huang-2022-answering zhihongshao/rectify Natural Questions + 10.18653/v1/2022.acl-long.128 Probing as Quantifying Inductive Bias @@ -1871,6 +1999,7 @@ immer-etal-2022-probing BoolQ SuperGLUE + 10.18653/v1/2022.acl-long.129 Probing Structured Pruning on Multilingual Pre-trained Models: Settings, Algorithms, and Efficiency @@ -1892,6 +2021,7 @@ TyDi QA XNLI XQuAD + 10.18653/v1/2022.acl-long.130 <fixed-case>GPT</fixed-case>-<fixed-case>D</fixed-case>: Inducing Dementia-related Linguistic Anomalies by Deliberate Degradation of Artificial Neural Language Models @@ -1906,6 +2036,7 @@ 2022.acl-long.131.software.zip li-etal-2022-gpt linguisticanomalies/hammer-nets + 10.18653/v1/2022.acl-long.131 An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models @@ -1921,6 +2052,7 @@ CrowS-Pairs StereoSet WikiText-2 + 10.18653/v1/2022.acl-long.132 Exploring and Adapting <fixed-case>C</fixed-case>hinese <fixed-case>GPT</fixed-case> to <fixed-case>P</fixed-case>inyin Input Method @@ -1937,6 +2069,7 @@ 2022.acl-long.133 tan-etal-2022-exploring VisualJoyce/Transformers4IME + 10.18653/v1/2022.acl-long.133 Enhancing Cross-lingual Natural Language Inference by Prompt-learning from Cross-lingual Templates @@ -1951,6 +2084,7 @@ qi-etal-2022-enhancing qikunxun/pct PAWS-X + 10.18653/v1/2022.acl-long.134 Sense Embeddings are also Biased – Evaluating Social Biases in Static and Contextualised Sense Embeddings @@ -1964,6 +2098,7 @@ zhou-etal-2022-sense CrowS-Pairs StereoSet + 10.18653/v1/2022.acl-long.135 Hybrid Semantics for Goal-Directed Natural Language Generation @@ -1973,6 +2108,7 @@ We consider the problem of generating natural language given a communicative goal and a world description. We ask the question: is it possible to combine complementary meaning representations to scale a goal-directed NLG system without losing expressiveness? In particular, we consider using two meaning representations, one based on logical semantics and the other based on distributional semantics. We build upon an existing goal-directed generation system, S-STRUCT, which models sentence generation as planning in a Markov decision process. We develop a hybrid approach, which uses distributional semantics to quickly and imprecisely add the main elements of the sentence and then uses first-order logic based semantics to more slowly add the precise details. We find that our hybrid method allows S-STRUCT’s generation to scale significantly better in early phases of generation and that the hybrid can often generate sentences with the same quality as S-STRUCT in substantially less time. However, we also observe and give insight into cases where the imprecision in distributional semantics leads to generation that is not as good as using pure logical semantics. 2022.acl-long.136 baumler-ray-2022-hybrid + 10.18653/v1/2022.acl-long.136 Predicting Intervention Approval in Clinical Trials through Multi-Document Summarization @@ -1982,6 +2118,7 @@ Clinical trials offer a fundamental opportunity to discover new treatments and advance the medical knowledge. However, the uncertainty of the outcome of a trial can lead to unforeseen costs and setbacks. In this study, we propose a new method to predict the effectiveness of an intervention in a clinical trial. Our method relies on generating an informative summary from multiple documents available in the literature about the intervention under study. Specifically, our method first gathers all the abstracts of PubMed articles related to the intervention. Then, an evidence sentence, which conveys information about the effectiveness of the intervention, is extracted automatically from each abstract. Based on the set of evidence sentences extracted from the abstracts, a short summary about the intervention is constructed. Finally, the produced summaries are used to train a BERT-based classifier, in order to infer the effectiveness of an intervention. To evaluate our proposed method, we introduce a new dataset which is a collection of clinical trials together with their associated PubMed articles. Our experiments, demonstrate the effectiveness of producing short informative summaries and using them to predict the effectiveness of an intervention. 2022.acl-long.137 katsimpras-paliouras-2022-predicting + 10.18653/v1/2022.acl-long.137 <fixed-case>B</fixed-case>i<fixed-case>TIIMT</fixed-case>: A Bilingual Text-infilling Method for Interactive Machine Translation @@ -1997,6 +2134,7 @@ 2022.acl-long.138 xiao-etal-2022-bitiimt WMT 2014 + 10.18653/v1/2022.acl-long.138 Distributionally Robust Finetuning <fixed-case>BERT</fixed-case> for Covariate Drift in Spoken Language Understanding @@ -2007,6 +2145,7 @@ In this study, we investigate robustness against covariate drift in spoken language understanding (SLU). Covariate drift can occur in SLUwhen there is a drift between training and testing regarding what users request or how they request it. To study this we propose a method that exploits natural variations in data to create a covariate drift in SLU datasets. Experiments show that a state-of-the-art BERT-based model suffers performance loss under this drift. To mitigate the performance loss, we investigate distributionally robust optimization (DRO) for finetuning BERT-based models. We discuss some recent DRO methods, propose two new variants and empirically show that DRO improves robustness under drift. 2022.acl-long.139 broscheit-etal-2022-distributionally + 10.18653/v1/2022.acl-long.139 Enhancing <fixed-case>C</fixed-case>hinese Pre-trained Language Model via Heterogeneous Linguistics Graph @@ -2026,6 +2165,7 @@ CMRC CMRC 2018 DRCD + 10.18653/v1/2022.acl-long.140 Divide and Denoise: Learning from Noisy Labels in Fine-Grained Entity Typing with Cluster-Wise Loss Correction @@ -2037,6 +2177,7 @@ Fine-grained Entity Typing (FET) has made great progress based on distant supervision but still suffers from label noise. Existing FET noise learning methods rely on prediction distributions in an instance-independent manner, which causes the problem of confirmation bias. In this work, we propose a clustering-based loss correction framework named Feature Cluster Loss Correction (FCLC), to address these two problems. FCLC first train a coarse backbone model as a feature extractor and noise estimator. Loss correction is then applied to each feature cluster, learning directly from the noisy labels. Experimental results on three public datasets show that FCLC achieves the best performance over existing competitive systems. Auxiliary experiments further demonstrate that FCLC is stable to hyperparameters and it does help mitigate confirmation bias. We also find that in the extreme case of no clean data, the FCLC framework still achieves competitive performance. 2022.acl-long.141 pang-etal-2022-divide + 10.18653/v1/2022.acl-long.141 Towards Robustness of Text-to-<fixed-case>SQL</fixed-case> Models Against Natural and Realistic Adversarial Table Perturbation @@ -2055,6 +2196,7 @@ ConceptNet SParC WikiSQL + 10.18653/v1/2022.acl-long.142 Overcoming Catastrophic Forgetting beyond Continual Learning: Balanced Training for Neural Machine Translation @@ -2067,6 +2209,7 @@ ictnlp/cokd CIFAR-10 CIFAR-100 + 10.18653/v1/2022.acl-long.143 Metaphors in Pre-Trained Language Models: Probing and Generalization Across Datasets and Languages @@ -2079,6 +2222,7 @@ 2022.acl-long.144.software.zip aghazadeh-etal-2022-metaphors ehsanaghazadeh/metaphors_in_plms + 10.18653/v1/2022.acl-long.144 Discrete Opinion Tree Induction for Aspect-based Sentiment Analysis @@ -2092,6 +2236,7 @@ 2022.acl-long.145.software.zip chen-etal-2022-discrete MAMS + 10.18653/v1/2022.acl-long.145 Investigating Non-local Features for Neural Constituency Parsing @@ -2104,6 +2249,7 @@ 2022.acl-long.146.software.zip cui-etal-2022-investigating ringos/nfc-parser + 10.18653/v1/2022.acl-long.146 Learning from Sibling Mentions with Scalable Graph Inference in Fine-Grained Entity Typing @@ -2119,6 +2265,7 @@ 2022.acl-long.147 2022.acl-long.147.software.zip chen-etal-2022-learning-sibling + 10.18653/v1/2022.acl-long.147 A Variational Hierarchical Model for Neural Cross-Lingual Summarization @@ -2136,6 +2283,7 @@ liang-etal-2022-variational xl2248/vhm LCSTS + 10.18653/v1/2022.acl-long.148 On the Robustness of Question Rewriting Systems to Questions of Varying Hardness @@ -2149,6 +2297,7 @@ ye-etal-2022-robustness CANARD QuAC + 10.18653/v1/2022.acl-long.149 <fixed-case>O</fixed-case>pen<fixed-case>H</fixed-case>ands: Making Sign Language Recognition Accessible with Pose-based Pretrained Models across Languages @@ -2165,6 +2314,7 @@ AUTSL GSL WLASL + 10.18653/v1/2022.acl-long.150 bert2<fixed-case>BERT</fixed-case>: Towards Reusable Pretrained Language Models @@ -2185,6 +2335,7 @@ BookCorpus CoLA GLUE + 10.18653/v1/2022.acl-long.151 Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis @@ -2196,6 +2347,7 @@ 2022.acl-long.152 ling-etal-2022-vision nustm/vlp-mabsa + 10.18653/v1/2022.acl-long.152 "<fixed-case>Y</fixed-case>ou might think about slightly revising the title”: Identifying Hedges in Peer-tutoring Interactions @@ -2206,6 +2358,7 @@ Hedges have an important role in the management of rapport. In peer-tutoring, they are notably used by tutors in dyads experiencing low rapport to tone down the impact of instructions and negative feedback.Pursuing the objective of building a tutoring agent that manages rapport with teenagers in order to improve learning, we used a multimodal peer-tutoring dataset to construct a computational framework for identifying hedges. We compared approaches relying on pre-trained resources with others that integrate insights from the social science literature. Our best performance involved a hybrid approach that outperforms the existing baseline while being easier to interpret. We employ a model explainability tool to explore the features that characterize hedges in peer-tutoring conversations, and we identify some novel features, and the benefits of a such a hybrid model approach. 2022.acl-long.153 raphalen-etal-2022-might + 10.18653/v1/2022.acl-long.153 Efficient Cluster-Based <tex-math>k</tex-math>-Nearest-Neighbor Machine Translation @@ -2221,6 +2374,7 @@ wang-etal-2022-efficient tjunlp-lab/pckmt WikiMatrix + 10.18653/v1/2022.acl-long.154 Headed-Span-Based Projective Dependency Parsing @@ -2232,6 +2386,7 @@ yang-tu-2022-headed sustcsonglin/span-based-dependency-parsing Penn Treebank + 10.18653/v1/2022.acl-long.155 Decoding Part-of-Speech from Human <fixed-case>EEG</fixed-case> Signals @@ -2243,6 +2398,7 @@ This work explores techniques to predict Part-of-Speech (PoS) tags from neural signals measured at millisecond resolution with electroencephalography (EEG) during text reading. We first show that information about word length, frequency and word class is encoded by the brain at different post-stimulus latencies. We then demonstrate that pre-training on averaged EEG data and data augmentation techniques boost PoS decoding accuracy for single EEG trials. Finally, applying optimised temporally-resolved decoding techniques we show that Transformers substantially outperform linear-SVMs on PoS tagging of unigram and bigram data. 2022.acl-long.156 murphy-etal-2022-decoding + 10.18653/v1/2022.acl-long.156 Robust Lottery Tickets for Pre-trained Language Models @@ -2263,6 +2419,7 @@ AG News IMDb Movie Reviews SST + 10.18653/v1/2022.acl-long.157 Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification @@ -2282,6 +2439,7 @@ thunlp/knowledgeableprompttuning C4 IMDb Movie Reviews + 10.18653/v1/2022.acl-long.158 Cross-Lingual Contrastive Learning for Fine-Grained Entity Typing for Low-Resource Languages @@ -2301,6 +2459,7 @@ thunlp/crosset Few-NERD Open Entity + 10.18653/v1/2022.acl-long.159 <fixed-case>MELM</fixed-case>: Data Augmentation with Masked Entity Language Modeling for Low-Resource <fixed-case>NER</fixed-case> @@ -2317,6 +2476,7 @@ 2022.acl-long.160.software.zip zhou-etal-2022-melm randyzhouran/melm + 10.18653/v1/2022.acl-long.160 <fixed-case>W</fixed-case>ord2<fixed-case>B</fixed-case>ox: Capturing Set-Theoretic Semantics of Words using Box Embeddings @@ -2332,6 +2492,7 @@ 2022.acl-long.161 2022.acl-long.161.software.zip dasgupta-etal-2022-word2box + 10.18653/v1/2022.acl-long.161 <fixed-case>IAM</fixed-case>: A Comprehensive and Large-Scale Dataset for Integrated Argument Mining Tasks @@ -2348,6 +2509,7 @@ cheng-etal-2022-iam liyingcheng95/iam IAM Dataset + 10.18653/v1/2022.acl-long.162 <fixed-case>PLANET</fixed-case>: Dynamic Content Planning in Autoregressive Transformers for Long-form Text Generation @@ -2361,6 +2523,7 @@ Despite recent progress of pre-trained language models on generating fluent text, existing methods still suffer from incoherence problems in long-form text generation tasks that require proper content control and planning to form a coherent high-level logical flow. In this work, we propose PLANET, a novel generation framework leveraging autoregressive self-attention mechanism to conduct content planning and surface realization dynamically. To guide the generation of output sentences, our framework enriches the Transformer decoder with latent representations to maintain sentence-level semantic plans grounded by bag-of-words. Moreover, we introduce a new coherence-based contrastive learning objective to further improve the coherence of output. Extensive experiments are conducted on two challenging long-form text generation tasks including counterargument generation and opinion article generation. Both automatic and human evaluations show that our method significantly outperforms strong baselines and generates more coherent texts with richer contents. 2022.acl-long.163 hu-etal-2022-planet + 10.18653/v1/2022.acl-long.163 <fixed-case>CTRLE</fixed-case>val: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation @@ -2376,6 +2539,7 @@ 2022.acl-long.164 2022.acl-long.164.software.zip ke-etal-2022-ctrleval + 10.18653/v1/2022.acl-long.164 Beyond the Granularity: Multi-Perspective Dialogue Collaborative Selection for Dialogue State Tracking @@ -2390,6 +2554,7 @@ 2022.acl-long.165.software.zip guo-etal-2022-beyond guojinyu88/dicos-master + 10.18653/v1/2022.acl-long.165 Are Prompt-based Models Clueless? @@ -2403,6 +2568,7 @@ GLUE SNLI SuperGLUE + 10.18653/v1/2022.acl-long.166 Learning Confidence for Transformer-based Neural Machine Translation @@ -2417,6 +2583,7 @@ 2022.acl-long.167.software.zip lu-etal-2022-learning yulu-dada/learned-conf-nmt + 10.18653/v1/2022.acl-long.167 Things not Written in Text: Exploring Spatial Commonsense from Visual Signals @@ -2432,6 +2599,7 @@ xxxiaol/spatial-commonsense COCO Relative Size + 10.18653/v1/2022.acl-long.168 Conditional Bilingual Mutual Information Based Adaptive Training for Neural Machine Translation @@ -2447,6 +2615,7 @@ 2022.acl-long.169 zhang-etal-2022-conditional songmzhang/cbmi + 10.18653/v1/2022.acl-long.169 <fixed-case>C</fixed-case>luster<fixed-case>F</fixed-case>ormer: Neural Clustering Attention for Efficient and Effective Transformer @@ -2465,6 +2634,7 @@ MPQA Opinion Corpus SNLI WikiQA + 10.18653/v1/2022.acl-long.170 Bottom-Up Constituency Parsing and Nested Named Entity Recognition with Pointer Networks @@ -2477,6 +2647,7 @@ sustcsonglin/pointer-net-for-nested GENIA Penn Treebank + 10.18653/v1/2022.acl-long.171 Redistributing Low-Frequency Words: Making the Most of Monolingual Data in Non-Autoregressive Translation @@ -2489,6 +2660,7 @@ Knowledge distillation (KD) is the preliminary step for training non-autoregressive translation (NAT) models, which eases the training of NAT models at the cost of losing important information for translating low-frequency words. In this work, we provide an appealing alternative for NAT – monolingual KD, which trains NAT student on external monolingual data with AT teacher trained on the original bilingual data. Monolingual KD is able to transfer both the knowledge of the original bilingual data (implicitly encoded in the trained AT teacher model) and that of the new monolingual data to the NAT student model. Extensive experiments on eight WMT benchmarks over two advanced NAT models show that monolingual KD consistently outperforms the standard KD by improving low-frequency word translation, without introducing any computational cost. Monolingual KD enjoys desirable expandability, which can be further enhanced (when given more computational budget) by combining with the standard KD, a reverse monolingual KD, or enlarging the scale of monolingual data. Extensive analyses demonstrate that these techniques can be used together profitably to further recall the useful information lost in the standard KD. Encouragingly, combining with standard KD, our approach achieves 30.4 and 34.1 BLEU points on the WMT14 English-German and German-English datasets, respectively. Our code and trained models are freely available at https://github.com/alphadl/RLFW-NAT.mono. 2022.acl-long.172 ding-etal-2022-redistributing + 10.18653/v1/2022.acl-long.172 Dependency Parsing as <fixed-case>MRC</fixed-case>-based Span-Span Prediction @@ -2507,6 +2679,7 @@ ShannonAI/mrc-for-dependency-parsing Penn Treebank Universal Dependencies + 10.18653/v1/2022.acl-long.173 Adversarial Soft Prompt Tuning for Cross-Domain Sentiment Analysis @@ -2516,6 +2689,7 @@ Cross-domain sentiment analysis has achieved promising results with the help of pre-trained language models. As GPT-3 appears, prompt tuning has been widely explored to enable better semantic modeling in many natural language processing tasks. However, directly using a fixed predefined template for cross-domain research cannot model different distributions of the \operatorname{[MASK]} token in different domains, thus making underuse of the prompt tuning technique. In this paper, we propose a novel Adversarial Soft Prompt Tuning method (AdSPT) to better model cross-domain sentiment analysis. On the one hand, AdSPT adopts separate soft prompts instead of hard templates to learn different vectors for different domains, thus alleviating the domain discrepancy of the \operatorname{[MASK]} token in the masked language modeling task. On the other hand, AdSPT uses a novel domain adversarial training strategy to learn domain-invariant representations between each source domain and the target domain. Experiments on a publicly available sentiment analysis dataset show that our model achieves the new state-of-the-art results for both single-source domain adaptation and multi-source domain adaptation. 2022.acl-long.174 wu-shi-2022-adversarial + 10.18653/v1/2022.acl-long.174 Generating Scientific Claims for Zero-Shot Scientific Fact Checking @@ -2533,6 +2707,7 @@ allenai/scientific-claim-generation FEVER SciFact + 10.18653/v1/2022.acl-long.175 Modeling Dual Read/Write Paths for Simultaneous Machine Translation @@ -2543,6 +2718,7 @@ 2022.acl-long.176 zhang-feng-2022-modeling ictnlp/dual-paths + 10.18653/v1/2022.acl-long.176 <fixed-case>E</fixed-case>xt<fixed-case>E</fixed-case>n<fixed-case>D</fixed-case>: Extractive Entity Disambiguation @@ -2555,6 +2731,7 @@ barba-etal-2022-extend sapienzanlp/extend AIDA CoNLL-YAGO + 10.18653/v1/2022.acl-long.177 Hierarchical Sketch Induction for Paraphrase Generation @@ -2570,6 +2747,7 @@ GLUE Paralex Quora Question Pairs + 10.18653/v1/2022.acl-long.178 Alignment-Augmented Consistent Translation for Multilingual Open Information Extraction @@ -2585,6 +2763,7 @@ kolluru-etal-2022-alignment dair-iitd/moie X-SRL + 10.18653/v1/2022.acl-long.179 Text-to-Table: A New Way of Information Extraction @@ -2598,6 +2777,7 @@ shirley-wu/text_to_table RotoWire WikiBio + 10.18653/v1/2022.acl-long.180 Accelerating Code Search with Deep Hashing and Code Classification @@ -2613,6 +2793,7 @@ 2022.acl-long.181 gu-etal-2022-accelerating CodeSearchNet + 10.18653/v1/2022.acl-long.181 Other Roles Matter! Enhancing Role-Oriented Dialogue Summarization via Role Interactions @@ -2628,6 +2809,7 @@ 2022.acl-long.182.software.zip lin-etal-2022-roles xiaolinandy/rods + 10.18653/v1/2022.acl-long.182 <fixed-case>C</fixed-case>lar<fixed-case>ET</fixed-case>: Pre-training a Correlation-Aware Context-To-Event Transformer for Event-Centric Generation and Classification @@ -2644,6 +2826,7 @@ yczhou001/ClarET GLUE ROCStories + 10.18653/v1/2022.acl-long.183 Measuring and Mitigating Name Biases in Neural Machine Translation @@ -2654,6 +2837,7 @@ Neural Machine Translation (NMT) systems exhibit problematic biases, such as stereotypical gender bias in the translation of occupation terms into languages with grammatical gender. In this paper we describe a new source of bias prevalent in NMT systems, relating to translations of sentences containing person names. To correctly translate such sentences, a NMT system needs to determine the gender of the name. We show that leading systems are particularly poor at this task, especially for female given names. This bias is deeper than given name gender: we show that the translation of terms with ambiguous sentiment can also be affected by person names, and the same holds true for proper nouns denoting race. To mitigate these biases we propose a simple but effective data augmentation method based on randomly switching entities during translation, which effectively eliminates the problem without any effect on translation quality. 2022.acl-long.184 wang-etal-2022-measuring + 10.18653/v1/2022.acl-long.184 Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation @@ -2668,6 +2852,7 @@ In this paper, we present a substantial step in better understanding the SOTA sequence-to-sequence (Seq2Seq) pretraining for neural machine translation (NMT). We focus on studying the impact of the jointly pretrained decoder, which is the main difference between Seq2Seq pretraining and previous encoder-based pretraining approaches for NMT. By carefully designing experiments on three language pairs, we find that Seq2Seq pretraining is a double-edged sword: On one hand, it helps NMT models to produce more diverse translations and reduce adequacy-related translation errors. On the other hand, the discrepancies between Seq2Seq pretraining and NMT finetuning limit the translation quality (i.e., domain discrepancy) and induce the over-estimation issue (i.e., objective discrepancy). Based on these observations, we further propose simple and effective strategies, named in-domain pretraining and input adaptation to remedy the domain and objective discrepancies, respectively. Experimental results on several language pairs show that our approach can consistently improve both translation performance and model robustness upon Seq2Seq pretraining. 2022.acl-long.185 wang-etal-2022-understanding + 10.18653/v1/2022.acl-long.185 <fixed-case>MSCTD</fixed-case>: A Multimodal Sentiment Chat Translation Dataset @@ -2684,6 +2869,7 @@ BMELD MELD OpenViDial + 10.18653/v1/2022.acl-long.186 Learning Disentangled Textual Representations via Statistical Measures of Similarity @@ -2696,6 +2882,7 @@ 2022.acl-long.187 2022.acl-long.187.software.zip colombo-etal-2022-learning + 10.18653/v1/2022.acl-long.187 On the Sensitivity and Stability of Model Interpretations in <fixed-case>NLP</fixed-case> @@ -2710,6 +2897,7 @@ uclanlp/nlp-interpretation-faithfulness AG News SST + 10.18653/v1/2022.acl-long.188 Down and Across: Introducing Crossword-Solving as a New <fixed-case>NLP</fixed-case> Benchmark @@ -2721,6 +2909,7 @@ Solving crossword puzzles requires diverse reasoning capabilities, access to a vast amount of knowledge about language and the world, and the ability to satisfy the constraints imposed by the structure of the puzzle. In this work, we introduce solving crossword puzzles as a new natural language understanding task. We release a corpus of crossword puzzles collected from the New York Times daily crossword spanning 25 years and comprised of a total of around nine thousand puzzles. These puzzles include a diverse set of clues: historic, factual, word meaning, synonyms/antonyms, fill-in-the-blank, abbreviations, prefixes/suffixes, wordplay, and cross-lingual, as well as clues that depend on the answers to other clues. We separately release the clue-answer pairs from these puzzles as an open-domain question answering dataset containing over half a million unique clue-answer pairs. For the question answering task, our baselines include several sequence-to-sequence and retrieval-based generative models. We also introduce a non-parametric constraint satisfaction baseline for solving the entire crossword puzzle. Finally, we propose an evaluation framework which consists of several complementary performance metrics. 2022.acl-long.189 kulshreshtha-etal-2022-across + 10.18653/v1/2022.acl-long.189 Generating Data to Mitigate Spurious Correlations in Natural Language Inference Datasets @@ -2737,6 +2926,7 @@ HANS MultiNLI SNLI + 10.18653/v1/2022.acl-long.190 <fixed-case>GL</fixed-case>-<fixed-case>CL</fixed-case>e<fixed-case>F</fixed-case>: A Global–Local Contrastive Learning Framework for Cross-lingual Spoken Language Understanding @@ -2753,6 +2943,7 @@ 2022.acl-long.191.software.zip qin-etal-2022-gl lightchen233/gl-clef + 10.18653/v1/2022.acl-long.191 Good Examples Make A Faster Learner: Simple Demonstration-based Learning for Low-resource <fixed-case>NER</fixed-case> @@ -2773,6 +2964,7 @@ lee-etal-2022-good ink-usc/fewner BC5CDR + 10.18653/v1/2022.acl-long.192 Contextual Representation Learning beyond Masked Language Modeling @@ -2790,6 +2982,7 @@ MRPC QNLI SST + 10.18653/v1/2022.acl-long.193 Efficient Hyper-parameter Search for Knowledge Graph Embedding @@ -2804,6 +2997,7 @@ automl-research/kgtuner FB15k-237 OGB + 10.18653/v1/2022.acl-long.194 A Meta-framework for Spatiotemporal Quantity Extraction from Text @@ -2817,6 +3011,7 @@ News events are often associated with quantities (e.g., the number of COVID-19 patients or the number of arrests in a protest), and it is often important to extract their type, time, and location from unstructured text in order to analyze these quantity events. This paper thus formulates the NLP problem of spatiotemporal quantity extraction, and proposes the first meta-framework for solving it. This meta-framework contains a formalism that decomposes the problem into several information extraction tasks, a shareable crowdsourcing pipeline, and transformer-based baseline models. We demonstrate the meta-framework in three domains—the COVID-19 pandemic, Black Lives Matter protests, and 2020 California wildfires—to show that the formalism is general and extensible, the crowdsourcing pipeline facilitates fast and high-quality data annotation, and the baseline system can handle spatiotemporal quantity extraction well enough to be practically useful. We release all resources for future research on this topic at https://github.com/steqe. 2022.acl-long.195 ning-etal-2022-meta + 10.18653/v1/2022.acl-long.195 Leveraging Visual Knowledge in Language Tasks: An Empirical Study on Intermediate Pre-training for Cross-Modal Knowledge Transfer @@ -2838,6 +3033,7 @@ PIQA WikiText-103 WikiText-2 + 10.18653/v1/2022.acl-long.196 A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models @@ -2858,6 +3054,7 @@ OK-VQA Visual Genome nocaps + 10.18653/v1/2022.acl-long.197 Continual Few-shot Relation Learning via Embedding Space Regularization and Data Augmentation @@ -2870,6 +3067,7 @@ qin-joty-2022-continual qcwthu/continual_fewshot_relation_learning FewRel + 10.18653/v1/2022.acl-long.198 Variational Graph Autoencoding as Cheap Supervision for <fixed-case>AMR</fixed-case> Coreference Resolution @@ -2882,6 +3080,7 @@ 2022.acl-long.199 li-etal-2022-variational AMR Bank + 10.18653/v1/2022.acl-long.199 Identifying <fixed-case>C</fixed-case>hinese Opinion Expressions with Extremely-Noisy Crowdsourcing Annotations @@ -2897,6 +3096,7 @@ 2022.acl-long.200.software.zip zhang-etal-2022-identifying MPQA Opinion Corpus + 10.18653/v1/2022.acl-long.200 Sequence-to-Sequence Knowledge Graph Completion and Question Answering @@ -2915,6 +3115,7 @@ WebQuestions WebQuestionsSP WikiMovies + 10.18653/v1/2022.acl-long.201 Learning to Mediate Disparities Towards Pragmatic Communication @@ -2926,6 +3127,7 @@ 2022.acl-long.202 bao-etal-2022-learning sled-group/pragmatic-rational-speaker + 10.18653/v1/2022.acl-long.202 Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval @@ -2939,6 +3141,7 @@ MS MARCO Natural Questions TriviaQA + 10.18653/v1/2022.acl-long.203 Multimodal Dialogue Response Generation @@ -2958,6 +3161,7 @@ 2022.acl-long.204.software.zip sun-etal-2022-multimodal ImageNet + 10.18653/v1/2022.acl-long.204 <fixed-case>CAKE</fixed-case>: A Scalable Commonsense-Aware Framework For Multi-View Knowledge Graph Completion @@ -2973,6 +3177,7 @@ ConceptNet FB15k-237 NELL-995 + 10.18653/v1/2022.acl-long.205 Confidence Based Bidirectional Global Context Aware Training Framework for Neural Machine Translation @@ -2987,6 +3192,7 @@ 2022.acl-long.206 2022.acl-long.206.software.zip zhou-etal-2022-confidence + 10.18653/v1/2022.acl-long.206 <fixed-case>BRIO</fixed-case>: Bringing Order to Abstractive Summarization @@ -3001,6 +3207,7 @@ yixinl7/brio CNN/Daily Mail XSum + 10.18653/v1/2022.acl-long.207 Leveraging Relaxed Equilibrium by Lazy Transition for Sequence Modeling @@ -3012,6 +3219,7 @@ 2022.acl-long.208.software.zip ai-fang-2022-leveraging LAMBADA + 10.18653/v1/2022.acl-long.208 <fixed-case>FIBER</fixed-case>: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation Framework @@ -3031,6 +3239,7 @@ ActivityNet Captions VATEX Visual Question Answering + 10.18653/v1/2022.acl-long.209 <fixed-case>K</fixed-case>en<fixed-case>M</fixed-case>e<fixed-case>SH</fixed-case>: Knowledge-enhanced End-to-end Biomedical Text Labelling @@ -3042,6 +3251,7 @@ 2022.acl-long.210 wang-etal-2022-kenmesh xdwang0726/kenmesh + 10.18653/v1/2022.acl-long.210 A Taxonomy of Empathetic Questions in Social Dialogs @@ -3055,6 +3265,7 @@ 2022.acl-long.211.software.zip svikhnushina-etal-2022-taxonomy sea94/eqt + 10.18653/v1/2022.acl-long.211 Enhanced Multi-Channel Graph Convolutional Network for Aspect Sentiment Triplet Extraction @@ -3069,6 +3280,7 @@ 2022.acl-long.212.software.zip chen-etal-2022-enhanced ccchenhao997/emcgcn-aste + 10.18653/v1/2022.acl-long.212 <fixed-case>P</fixed-case>roto<fixed-case>TE</fixed-case>x: Explaining Model Decisions with Prototype Tensors @@ -3082,6 +3294,7 @@ 2022.acl-long.213 das-etal-2022-prototex anubrata/prototex + 10.18653/v1/2022.acl-long.213 Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data @@ -3099,6 +3312,7 @@ zhou-etal-2022-show shuyanzhou/wikihow_hierarchy HowTo100M + 10.18653/v1/2022.acl-long.214 Cross-Modal Discrete Representation Learning @@ -3115,6 +3329,7 @@ ImageNet MSR-VTT Places205 + 10.18653/v1/2022.acl-long.215 Improving Event Representation via Simultaneous Weakly Supervised Contrastive Learning and Clustering @@ -3130,6 +3345,7 @@ 2022.acl-long.216.software.zip gao-etal-2022-improving gaojun4ever/swcc4event + 10.18653/v1/2022.acl-long.216 Contrastive Visual Semantic Pretraining Magnifies the Semantics of Natural Language Representations @@ -3140,6 +3356,7 @@ 2022.acl-long.217 2022.acl-long.217.software.zip wolfe-caliskan-2022-contrastive + 10.18653/v1/2022.acl-long.217 <fixed-case>C</fixed-case>on<fixed-case>T</fixed-case>in<fixed-case>T</fixed-case>in: Continual Learning from Task Instructions @@ -3150,6 +3367,7 @@ The mainstream machine learning paradigms for NLP often work with two underlying presumptions. First, the target task is predefined and static; a system merely needs to learn to solve it exclusively. Second, the supervision of a task mainly comes from a set of labeled examples. A question arises: how to build a system that can keep learning new tasks from their instructions?This work defines a new learning paradigm ConTinTin (Continual Learning from Task Instructions), in which a system should learn a sequence of new tasks one by one, each task is explained by a piece of textual instruction. The system is required to (i) generate the expected outputs of a new task by learning from its instruction, (ii) transfer the knowledge acquired from upstream tasks to help solve downstream tasks (i.e., forward-transfer), and (iii) retain or even improve the performance on earlier tasks after learning new tasks (i.e., backward-transfer). This new problem is studied on a stream of more than 60 tasks, each equipped with an instruction. Technically, our method InstructionSpeak contains two strategies that make full use of task instructions to improve forward-transfer and backward-transfer: one is to learn from negative outputs, the other is to re-visit instructions of previous tasks. To our knowledge, this is the first time to study ConTinTin in NLP. In addition to the problem formulation and our promising approach, this work also contributes to providing rich analyses for the community to better understand this novel learning problem. 2022.acl-long.218 yin-etal-2022-contintin + 10.18653/v1/2022.acl-long.218 Automated Crossword Solving @@ -3166,6 +3384,7 @@ 2022.acl-long.219.software.zip wallace-etal-2022-automated albertkx/berkeley-crossword-solver + 10.18653/v1/2022.acl-long.219 Learned Incremental Representations for Parsing @@ -3179,6 +3398,7 @@ kitaev-etal-2022-learned thomaslu2000/incremental-parsing-representations Penn Treebank + 10.18653/v1/2022.acl-long.220 Knowledge Enhanced Reflection Generation for Counseling Dialogues @@ -3192,6 +3412,7 @@ 2022.acl-long.221 shen-etal-2022-knowledge ConceptNet + 10.18653/v1/2022.acl-long.221 Misinfo Reaction Frames: Reasoning about Readers’ Reactions to News Headlines @@ -3209,6 +3430,7 @@ skgabriel/mrf-modeling CoAID RealNews + 10.18653/v1/2022.acl-long.222 On Continual Model Refinement in Out-of-Distribution Data Streams @@ -3227,6 +3449,7 @@ Natural Questions SQuAD SearchQA + 10.18653/v1/2022.acl-long.223 Achieving Conversational Goals with Unsupervised Post-hoc Knowledge Injection @@ -3240,6 +3463,7 @@ 2022.acl-long.224.software.zip majumder-etal-2022-achieving majumderb/poki + 10.18653/v1/2022.acl-long.224 Generated Knowledge Prompting for Commonsense Reasoning @@ -3261,6 +3485,7 @@ ConceptNet NumerSense QASC + 10.18653/v1/2022.acl-long.225 Training Data is More Valuable than You Think: A Simple and Effective Method by Retrieving from Training Data @@ -3287,6 +3512,7 @@ WikiHow WikiText-103 WikiText-2 + 10.18653/v1/2022.acl-long.226 Life after <fixed-case>BERT</fixed-case>: What do Other Muppets Understand about Language? @@ -3300,6 +3526,7 @@ lialin-etal-2022-life kev-zhao/life-after-bert WebText + 10.18653/v1/2022.acl-long.227 Tailor: Generating and Perturbing Text with Semantic Controls @@ -3317,6 +3544,7 @@ SNLI StylePTB Universal Dependencies + 10.18653/v1/2022.acl-long.228 <fixed-case>T</fixed-case>ruthful<fixed-case>QA</fixed-case>: Measuring How Models Mimic Human Falsehoods @@ -3330,6 +3558,7 @@ lin-etal-2022-truthfulqa sylinrl/truthfulqa TruthfulQA + 10.18653/v1/2022.acl-long.229 Adaptive Testing and Debugging of <fixed-case>NLP</fixed-case> Models @@ -3340,6 +3569,7 @@ 2022.acl-long.230 ribeiro-lundberg-2022-adaptive PAWS + 10.18653/v1/2022.acl-long.230 Right for the Right Reason: Evidence Extraction for Trustworthy Tabular Reasoning @@ -3354,6 +3584,7 @@ 2022.acl-long.231 gupta-etal-2022-right TabFact + 10.18653/v1/2022.acl-long.231 Interactive Word Completion for <fixed-case>P</fixed-case>lains <fixed-case>C</fixed-case>ree @@ -3364,6 +3595,7 @@ The composition of richly-inflected words in morphologically complex languages can be a challenge for language learners developing literacy. Accordingly, Lane and Bird (2020) proposed a finite state approach which maps prefixes in a language to a set of possible completions up to the next morpheme boundary, for the incremental building of complex words. In this work, we develop an approach to morph-based auto-completion based on a finite state morphological analyzer of Plains Cree (nêhiyawêwin), showing the portability of the concept to a much larger, more complete morphological transducer. Additionally, we propose and compare various novel ranking strategies on the morph auto-complete output. The best weighting scheme ranks the target completion in the top 10 results in 64.9% of queries, and in the top 50 in 73.9% of queries. 2022.acl-long.232 lane-etal-2022-interactive + 10.18653/v1/2022.acl-long.232 <fixed-case>LAG</fixed-case>r: Label Aligned Graphs for Better Systematic Generalization in Semantic Parsing @@ -3374,6 +3606,7 @@ 2022.acl-long.233 jambor-bahdanau-2022-lagr CFQ + 10.18653/v1/2022.acl-long.233 <fixed-case>T</fixed-case>oxi<fixed-case>G</fixed-case>en: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection @@ -3391,6 +3624,7 @@ ToxiGen Hate Speech Implicit Hate + 10.18653/v1/2022.acl-long.234 Direct Speech-to-Speech Translation With Discrete Units @@ -3411,6 +3645,7 @@ 2022.acl-long.235 lee-etal-2022-direct LibriSpeech + 10.18653/v1/2022.acl-long.235 Hallucinated but Factual! Inspecting the Factuality of Hallucinations in Abstractive Summarization @@ -3423,6 +3658,7 @@ 2022.acl-long.236.software.zip cao-etal-2022-hallucinated mcao516/entfa + 10.18653/v1/2022.acl-long.236 <fixed-case>E</fixed-case>nt<fixed-case>SUM</fixed-case>: A Data Set for Entity-Centric Extractive Summarization @@ -3434,6 +3670,7 @@ 2022.acl-long.237 maddela-etal-2022-entsum bloomberg/entsum + 10.18653/v1/2022.acl-long.237 Sentence-level Privacy for Document Embeddings @@ -3445,6 +3682,7 @@ 2022.acl-long.238 meehan-etal-2022-sentence IMDb Movie Reviews + 10.18653/v1/2022.acl-long.238 Dataset Geography: Mapping Language Data to Language Users @@ -3460,6 +3698,7 @@ Natural Questions SQuAD TyDi QA + 10.18653/v1/2022.acl-long.239 <fixed-case>ILDAE</fixed-case>: Instance-Level Difficulty Analysis of Evaluation Data @@ -3479,6 +3718,7 @@ SNLI SWAG WinoGrande + 10.18653/v1/2022.acl-long.240 Image Retrieval from Contextual Descriptions @@ -3496,6 +3736,7 @@ Spot-the-diff Video Storytelling YouCook + 10.18653/v1/2022.acl-long.241 Multilingual Molecular Representation Learning via Contrastive Pre-training @@ -3509,6 +3750,7 @@ 2022.acl-long.242 guo-etal-2022-multilingual MoleculeNet + 10.18653/v1/2022.acl-long.242 Investigating Failures of Automatic Translation @@ -3521,6 +3763,7 @@ in the Case of Unambiguous Gender 2022.acl-long.243 2022.acl-long.243.software.zip renduchintala-williams-2022-investigating + 10.18653/v1/2022.acl-long.243 Cross-Task Generalization via Natural Language Crowdsourcing Instructions @@ -3540,6 +3783,7 @@ in the Case of Unambiguous Gender QASC Quoref WinoGrande + 10.18653/v1/2022.acl-long.244 Imputing Out-of-Vocabulary Embeddings with <fixed-case>LOVE</fixed-case> Makes <fixed-case>L</fixed-case>anguage<fixed-case>M</fixed-case>odels Robust with Little Cost @@ -3553,6 +3797,7 @@ in the Case of Unambiguous Gender chen-etal-2022-imputing tigerchen52/love SST + 10.18653/v1/2022.acl-long.245 <fixed-case>N</fixed-case>um<fixed-case>GLUE</fixed-case>: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks @@ -3570,6 +3815,7 @@ in the Case of Unambiguous Gender GLUE MATH SuperGLUE + 10.18653/v1/2022.acl-long.246 <fixed-case>U</fixed-case>pstream <fixed-case>M</fixed-case>itigation <fixed-case>I</fixed-case>s <i> @@ -3584,6 +3830,7 @@ in the Case of Unambiguous Gender 2022.acl-long.247 2022.acl-long.247.software.zip steed-etal-2022-upstream + 10.18653/v1/2022.acl-long.247 Improving Multi-label Malevolence Detection in Dialogues through Multi-faceted Label Correlation Enhancement @@ -3598,6 +3845,7 @@ in the Case of Unambiguous Gender 2022.acl-long.248.software.zip zhang-etal-2022-improving-multi repozhang/malevolent_dialogue + 10.18653/v1/2022.acl-long.248 How Do We Answer Complex Questions: Discourse Structure of Long-form Answers @@ -3611,6 +3859,7 @@ in the Case of Unambiguous Gender utcsnlp/lfqa_discourse ELI5 Natural Questions + 10.18653/v1/2022.acl-long.249 Understanding Iterative Revision from Human-Written Text @@ -3625,6 +3874,7 @@ in the Case of Unambiguous Gender 2022.acl-long.250 du-etal-2022-understanding-iterative vipulraheja/iterater + 10.18653/v1/2022.acl-long.250 Making Transformers Solve Compositional Tasks @@ -3640,6 +3890,7 @@ in the Case of Unambiguous Gender google-research/google-research CFQ SCAN + 10.18653/v1/2022.acl-long.251 Can Transformer be Too Compositional? Analysing Idiom Processing in Neural Machine Translation @@ -3651,6 +3902,7 @@ in the Case of Unambiguous Gender 2022.acl-long.252 2022.acl-long.252.software.zip dankers-etal-2022-transformer + 10.18653/v1/2022.acl-long.252 <fixed-case>C</fixed-case>onditional<fixed-case>QA</fixed-case>: A Complex Reading Comprehension Dataset with Conditional Answers @@ -3666,6 +3918,7 @@ in the Case of Unambiguous Gender PolicyQA QASPER ShARC + 10.18653/v1/2022.acl-long.253 Prompt-free and Efficient Few-shot Learning with Language Models @@ -3688,6 +3941,7 @@ in the Case of Unambiguous Gender SST SuperGLUE WiC + 10.18653/v1/2022.acl-long.254 Continual Sequence Generation with Adaptive Compositional Modules @@ -3702,6 +3956,7 @@ in the Case of Unambiguous Gender GT-SALT/Adaptive-Compositional-Modules MultiWOZ WikiSQL + 10.18653/v1/2022.acl-long.255 An Investigation of the (In)effectiveness of Counterfactually Augmented Data @@ -3714,6 +3969,7 @@ in the Case of Unambiguous Gender joshi-he-2022-investigation joshinh/investigation-cad BoolQ + 10.18653/v1/2022.acl-long.256 Inducing Positive Perspectives with Text Reframing @@ -3727,6 +3983,7 @@ in the Case of Unambiguous Gender 2022.acl-long.257 ziems-etal-2022-inducing gt-salt/positive-frames + 10.18653/v1/2022.acl-long.257 <fixed-case>VALUE</fixed-case>: <fixed-case>U</fixed-case>nderstanding Dialect Disparity in <fixed-case>NLU</fixed-case> @@ -3742,6 +3999,7 @@ in the Case of Unambiguous Gender CoLA GLUE QNLI + 10.18653/v1/2022.acl-long.258 From the Detection of Toxic Spans in Online Discussions to the Analysis of Toxic-to-Civil Transfer @@ -3755,6 +4013,7 @@ in the Case of Unambiguous Gender 2022.acl-long.259 pavlopoulos-etal-2022-detection ipavlopoulos/toxic_spans + 10.18653/v1/2022.acl-long.259 <fixed-case>F</fixed-case>orm<fixed-case>N</fixed-case>et: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction @@ -3773,6 +4032,7 @@ in the Case of Unambiguous Gender 2022.acl-long.260 lee-etal-2022-formnet FUNSD + 10.18653/v1/2022.acl-long.260 The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems @@ -3787,6 +4047,7 @@ in the Case of Unambiguous Gender ziems-etal-2022-moral gt-salt/mic ETHICS + 10.18653/v1/2022.acl-long.261 Token Dropping for Efficient <fixed-case>BERT</fixed-case> Pretraining @@ -3805,6 +4066,7 @@ in the Case of Unambiguous Gender QNLI SQuAD SST + 10.18653/v1/2022.acl-long.262 <fixed-case>D</fixed-case>ial<fixed-case>F</fixed-case>act: A Benchmark for Fact-Checking in Dialogue @@ -3821,6 +4083,7 @@ in the Case of Unambiguous Gender FEVER VitaminC Wizard of Wikipedia + 10.18653/v1/2022.acl-long.263 The Trade-offs of Domain Adaptation for Neural Language Models @@ -3830,6 +4093,7 @@ in the Case of Unambiguous Gender This work connects language model adaptation with concepts of machine learning theory. We consider a training setup with a large out-of-domain set and a small in-domain set. We derive how the benefit of training a model on either set depends on the size of the sets and the distance between their underlying distributions. We analyze how out-of-domain pre-training before in-domain fine-tuning achieves better generalization than either solution independently. Finally, we present how adaptation techniques based on data selection, such as importance sampling, intelligent data selection and influence functions, can be presented in a common framework which highlights their similarity and also their subtle differences. 2022.acl-long.264 grangier-iter-2022-trade + 10.18653/v1/2022.acl-long.264 Towards Afrocentric <fixed-case>NLP</fixed-case> for <fixed-case>A</fixed-case>frican Languages: Where We Are and Where We Can Go @@ -3839,6 +4103,7 @@ in the Case of Unambiguous Gender Aligning with ACL 2022 special Theme on “Language Diversity: from Low Resource to Endangered Languages”, we discuss the major linguistic and sociopolitical challenges facing development of NLP technologies for African languages. Situating African languages in a typological framework, we discuss how the particulars of these languages can be harnessed. To facilitate future research, we also highlight current efforts, communities, venues, datasets, and tools. Our main objective is to motivate and advocate for an Afrocentric approach to technology development. With this in mind, we recommend what technologies to build and how to build, evaluate, and deploy them based on the needs of local African communities. 2022.acl-long.265 adebara-abdul-mageed-2022-towards + 10.18653/v1/2022.acl-long.265 Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction @@ -3853,6 +4118,7 @@ in the Case of Unambiguous Gender makstarnavskyi/gector-large FCE WI-LOCNESS + 10.18653/v1/2022.acl-long.266 Speaker Information Can Guide Models to Better Inductive Biases: A Case Study On Predicting Code-Switching @@ -3866,6 +4132,7 @@ in the Case of Unambiguous Gender 2022.acl-long.267.software.zip ostapenko-etal-2022-speaker ostapen/switch-and-explain + 10.18653/v1/2022.acl-long.267 Detecting Unassimilated Borrowings in <fixed-case>S</fixed-case>panish: <fixed-case>A</fixed-case>n Annotated Corpus and Approaches to Modeling @@ -3877,6 +4144,7 @@ in the Case of Unambiguous Gender 2022.acl-long.268.software.zip alvarez-mellado-lignos-2022-detecting lirondos/coalas + 10.18653/v1/2022.acl-long.268 Is Attention Explanation? An Introduction to the Debate @@ -3891,6 +4159,7 @@ in the Case of Unambiguous Gender The performance of deep learning models in NLP and other fields of machine learning has led to a rise in their popularity, and so the need for explanations of these models becomes paramount. Attention has been seen as a solution to increase performance, while providing some explanations. However, a debate has started to cast doubt on the explanatory power of attention in neural networks. Although the debate has created a vast literature thanks to contributions from various areas, the lack of communication is becoming more and more tangible. In this paper, we provide a clear overview of the insights on the debate by critically confronting works from these different areas. This holistic vision can be of great interest for future works in all the communities concerned by this debate. We sum up the main challenges spotted in these areas, and we conclude by discussing the most promising future avenues on attention as an explanation. 2022.acl-long.269 bibal-etal-2022-attention + 10.18653/v1/2022.acl-long.269 There Are a Thousand Hamlets in a Thousand People’s Eyes: Enhancing Knowledge-grounded Dialogue with Personal Memory @@ -3904,6 +4173,7 @@ in the Case of Unambiguous Gender 2022.acl-long.270 2022.acl-long.270.software.zip fu-etal-2022-thousand + 10.18653/v1/2022.acl-long.270 Neural Pipeline for Zero-Shot Data-to-Text Generation @@ -3915,6 +4185,7 @@ in the Case of Unambiguous Gender kasner-dusek-2022-neural kasnerz/zeroshot-d2t-pipeline WikiSplit + 10.18653/v1/2022.acl-long.271 Not always about you: Prioritizing community needs when developing endangered language technology @@ -3926,6 +4197,7 @@ in the Case of Unambiguous Gender Languages are classified as low-resource when they lack the quantity of data necessary for training statistical and machine learning tools and models. Causes of resource scarcity vary but can include poor access to technology for developing these resources, a relatively small population of speakers, or a lack of urgency for collecting such resources in bilingual populations where the second language is high-resource. As a result, the languages described as low-resource in the literature are as different as Finnish on the one hand, with millions of speakers using it in every imaginable domain, and Seneca, with only a small-handful of fluent speakers using the language primarily in a restricted domain. While issues stemming from the lack of resources necessary to train models unite this disparate group of languages, many other issues cut across the divide between widely-spoken low-resource languages and endangered languages. In this position paper, we discuss the unique technological, cultural, practical, and ethical challenges that researchers and indigenous speech community members face when working together to develop language technology to support endangered language documentation and revitalization. We report the perspectives of language teachers, Master Speakers and elders from indigenous communities, as well as the point of view of academics. We describe an ongoing fruitful collaboration and make recommendations for future partnerships between academic researchers and language community stakeholders. 2022.acl-long.272 liu-etal-2022-always + 10.18653/v1/2022.acl-long.272 Automatic Identification and Classification of Bragging in Social Media @@ -3937,6 +4209,7 @@ in the Case of Unambiguous Gender Bragging is a speech act employed with the goal of constructing a favorable self-image through positive statements about oneself. It is widespread in daily communication and especially popular in social media, where users aim to build a positive image of their persona directly or indirectly. In this paper, we present the first large scale study of bragging in computational linguistics, building on previous research in linguistics and pragmatics. To facilitate this, we introduce a new publicly available data set of tweets annotated for bragging and their types. We empirically evaluate different transformer-based models injected with linguistic information in (a) binary bragging classification, i.e., if tweets contain bragging statements or not; and (b) multi-class bragging type prediction including not bragging. Our results show that our models can predict bragging with macro F1 up to 72.42 and 35.95 in the binary and multi-class classification tasks respectively. Finally, we present an extensive linguistic and error analysis of bragging prediction to guide future research on this topic. 2022.acl-long.273 jin-etal-2022-automatic + 10.18653/v1/2022.acl-long.273 Automatic Error Analysis for Document-level Information Extraction @@ -3954,6 +4227,7 @@ in the Case of Unambiguous Gender das-etal-2022-automatic icejinx33/auto-err-template-fill SciREX + 10.18653/v1/2022.acl-long.274 Learning Functional Distributional Semantics with Visual Data @@ -3964,6 +4238,7 @@ in the Case of Unambiguous Gender 2022.acl-long.275 liu-emerson-2022-learning Visual Question Answering + 10.18653/v1/2022.acl-long.275 e<fixed-case>P</fixed-case>i<fixed-case>C</fixed-case>: Employing Proverbs in Context as a Benchmark for Abstract Language Understanding @@ -3976,6 +4251,7 @@ in the Case of Unambiguous Gender ghosh-srivastava-2022-epic sgdgp/epic GLUE + 10.18653/v1/2022.acl-long.276 Chart-to-Text: A Large-Scale Benchmark for Chart Summarization @@ -3993,6 +4269,7 @@ in the Case of Unambiguous Gender Chart-to-text Chart2Text + 10.18653/v1/2022.acl-long.277 Characterizing Idioms: Conventionality and Contingency @@ -4004,6 +4281,7 @@ in the Case of Unambiguous Gender Idioms are unlike most phrases in two important ways. First, words in an idiom have non-canonical meanings. Second, the non-canonical meanings of words in an idiom are contingent on the presence of other words in the idiom. Linguistic theories differ on whether these properties depend on one another, as well as whether special theoretical machinery is needed to accommodate idioms. We define two measures that correspond to the properties above, and we show that idioms fall at the expected intersection of the two dimensions, but that the dimensions themselves are not correlated. Our results suggest that introducing special machinery to handle idioms may not be warranted. 2022.acl-long.278 socolof-etal-2022-characterizing + 10.18653/v1/2022.acl-long.278 Bag-of-Words vs. Graph vs. Sequence in Text Classification: Questioning the Necessity of Text-Graphs and the Surprising Strength of a Wide <fixed-case>MLP</fixed-case> @@ -4015,6 +4293,7 @@ in the Case of Unambiguous Gender 2022.acl-long.279.software.zip galke-scherp-2022-bag lgalke/text-clf-baselines + 10.18653/v1/2022.acl-long.279 Generative Pretraining for Paraphrase Evaluation @@ -4032,6 +4311,7 @@ in the Case of Unambiguous Gender PARANMT-50M PAWS SNLI + 10.18653/v1/2022.acl-long.280 Incorporating Stock Market Signals for <fixed-case>T</fixed-case>witter Stance Detection @@ -4047,6 +4327,7 @@ in the Case of Unambiguous Gender 2022.acl-long.281.software.zip conforti-etal-2022-incorporating cambridge-wtwt/acl2022-wtwt-stocks + 10.18653/v1/2022.acl-long.281 Multilingual Mix: Example Interpolation Improves Multilingual Neural Machine Translation @@ -4060,6 +4341,7 @@ in the Case of Unambiguous Gender Multilingual neural machine translation models are trained to maximize the likelihood of a mix of examples drawn from multiple language pairs. The dominant inductive bias applied to these models is a shared vocabulary and a shared set of parameters across languages; the inputs and labels corresponding to examples drawn from different language pairs might still reside in distinct sub-spaces. In this paper, we introduce multilingual crossover encoder-decoder (mXEncDec) to fuse language pairs at an instance level. Our approach interpolates instances from different language pairs into joint ‘crossover examples’ in order to encourage sharing input and output spaces across languages. To ensure better fusion of examples in multilingual settings, we propose several techniques to improve example interpolation across dissimilar languages under heavy data imbalance. Experiments on a large-scale WMT multilingual dataset demonstrate that our approach significantly improves quality on English-to-Many, Many-to-English and zero-shot translation tasks (from +0.5 BLEU up to +5.5 BLEU points). Results on code-switching sets demonstrate the capability of our approach to improve model generalization to out-of-distribution multilingual examples. We also conduct qualitative and quantitative representation comparisons to analyze the advantages of our approach at the representation level. 2022.acl-long.282 cheng-etal-2022-multilingual + 10.18653/v1/2022.acl-long.282 Word Segmentation as Unsupervised Constituency Parsing @@ -4069,6 +4351,7 @@ in the Case of Unambiguous Gender 2022.acl-long.283 alhama-2022-word OpenSubtitles + 10.18653/v1/2022.acl-long.283 <fixed-case>S</fixed-case>afety<fixed-case>K</fixed-case>it: First Aid for Measuring Safety in Open-domain Conversational Systems @@ -4085,6 +4368,7 @@ in the Case of Unambiguous Gender dinan-etal-2022-safetykit Blended Skill Talk HONEST + 10.18653/v1/2022.acl-long.284 Zero-Shot Cross-lingual Semantic Parsing @@ -4099,6 +4383,7 @@ in the Case of Unambiguous Gender ATIS MKQA ParaCrawl + 10.18653/v1/2022.acl-long.285 The Paradox of the Compositionality of Natural Language: A Neural Machine Translation Case Study @@ -4111,6 +4396,7 @@ in the Case of Unambiguous Gender 2022.acl-long.286.software.zip dankers-etal-2022-paradox i-machine-think/compositionality_paradox_mt + 10.18653/v1/2022.acl-long.286 Multilingual Document-Level Translation Enables Zero-Shot Transfer From Sentences to Documents @@ -4124,6 +4410,7 @@ in the Case of Unambiguous Gender Document-level neural machine translation (DocNMT) achieves coherent translations by incorporating cross-sentence context. However, for most language pairs there’s a shortage of parallel documents, although parallel sentences are readily available. In this paper, we study whether and how contextual modeling in DocNMT is transferable via multilingual modeling. We focus on the scenario of zero-shot transfer from teacher languages with document level data to student languages with no documents but sentence level data, and for the first time treat document-level translation as a transfer learning problem. Using simple concatenation-based DocNMT, we explore the effect of 3 factors on the transfer: the number of teacher languages with document level data, the balance between document and sentence level data at training, and the data condition of parallel documents (genuine vs. back-translated). Our experiments on Europarl-7 and IWSLT-10 show the feasibility of multilingual transfer for DocNMT, particularly on document-specific metrics. We observe that more teacher languages and adequate data balance both contribute to better transfer quality. Surprisingly, the transfer is less sensitive to the data condition, where multilingual DocNMT delivers decent performance with either back-translated or genuine document pairs. 2022.acl-long.287 zhang-etal-2022-multilingual + 10.18653/v1/2022.acl-long.287 Cross-Lingual Phrase Retrieval @@ -4139,6 +4426,7 @@ in the Case of Unambiguous Gender Cross-lingual retrieval aims to retrieve relevant text across languages. Current methods typically achieve cross-lingual retrieval by learning language-agnostic text representations in word or sentence level. However, how to learn phrase representations for cross-lingual phrase retrieval is still an open problem. In this paper, we propose , a cross-lingual phrase retriever that extracts phrase representations from unlabeled example sentences. Moreover, we create a large-scale cross-lingual phrase retrieval dataset, which contains 65K bilingual phrase pairs and 4.2M example sentences in 8 English-centric language pairs. Experimental results show that outperforms state-of-the-art baselines which utilize word-level or sentence-level representations. also shows impressive zero-shot transferability that enables the model to perform retrieval in an unseen language pair during training. Our dataset, code, and trained models are publicly available at github.com/cwszz/XPR/. 2022.acl-long.288 zheng-etal-2022-cross-lingual + 10.18653/v1/2022.acl-long.288 Improving Compositional Generalization with Self-Training for Data-to-Text Generation @@ -4154,6 +4442,7 @@ in the Case of Unambiguous Gender mehta-etal-2022-improving google-research/google-research SGD + 10.18653/v1/2022.acl-long.289 <fixed-case>MMC</fixed-case>o<fixed-case>QA</fixed-case>: Conversational Question Answering over Text, Tables, and Images @@ -4168,6 +4457,7 @@ in the Case of Unambiguous Gender liyongqi67/mmcoqa ManyModalQA ORConvQA + 10.18653/v1/2022.acl-long.290 Effective Token Graph Modeling using a Novel Labeling Strategy for Structured Sentiment Analysis @@ -4183,6 +4473,7 @@ in the Case of Unambiguous Gender xgswlg/tgls MPQA Opinion Corpus NoReC_fine + 10.18653/v1/2022.acl-long.291 <fixed-case>P</fixed-case>rom<fixed-case>DA</fixed-case>: Prompt-based Data Augmentation for Low-Resource <fixed-case>NLU</fixed-case> Tasks @@ -4201,6 +4492,7 @@ in the Case of Unambiguous Gender garyyufei/promda CoNLL-2003 SST + 10.18653/v1/2022.acl-long.292 Disentangled Sequence to Sequence Learning for Compositional Generalization @@ -4212,6 +4504,7 @@ in the Case of Unambiguous Gender zheng-lapata-2022-disentangled mswellhao/dangle CFQ + 10.18653/v1/2022.acl-long.293 <fixed-case>RST</fixed-case> Discourse Parsing with Second-Stage <fixed-case>EDU</fixed-case>-Level Pre-training @@ -4224,6 +4517,7 @@ in the Case of Unambiguous Gender 2022.acl-long.294 2022.acl-long.294.software.zip yu-etal-2022-rst + 10.18653/v1/2022.acl-long.294 <fixed-case>S</fixed-case>im<fixed-case>KGC</fixed-case>: Simple Contrastive Knowledge Graph Completion with Pre-trained Language Models @@ -4237,6 +4531,7 @@ in the Case of Unambiguous Gender 2022.acl-long.295.software.zip wang-etal-2022-simkgc intfloat/simkgc + 10.18653/v1/2022.acl-long.295 Do Transformer Models Show Similar Attention Patterns to Task-Specific Human Gaze? @@ -4250,6 +4545,7 @@ in the Case of Unambiguous Gender eberle-etal-2022-transformer oeberle/task_gaze_transformers SST + 10.18653/v1/2022.acl-long.296 <fixed-case>L</fixed-case>ex<fixed-case>GLUE</fixed-case>: A Benchmark Dataset for Legal Language Understanding in <fixed-case>E</fixed-case>nglish @@ -4272,6 +4568,7 @@ in the Case of Unambiguous Gender ECtHR GLUE SuperGLUE + 10.18653/v1/2022.acl-long.297 <fixed-case>D</fixed-case>i<fixed-case>B</fixed-case>i<fixed-case>MT</fixed-case>: A Novel Benchmark for Measuring <fixed-case>W</fixed-case>ord <fixed-case>S</fixed-case>ense <fixed-case>D</fixed-case>isambiguation Biases in <fixed-case>M</fixed-case>achine <fixed-case>T</fixed-case>ranslation @@ -4286,6 +4583,7 @@ in the Case of Unambiguous Gender campolungo-etal-2022-dibimt Various fixes throughout the paper. + 10.18653/v1/2022.acl-long.298 Improving Word Translation via Two-Stage Contrastive Learning @@ -4302,6 +4600,7 @@ in the Case of Unambiguous Gender cambridgeltl/contrastivebli PanLex-BLI XLING + 10.18653/v1/2022.acl-long.299 Scheduled Multi-task Learning for Neural Chat Translation @@ -4316,6 +4615,7 @@ in the Case of Unambiguous Gender liang-etal-2022-scheduled xl2248/sml BMELD + 10.18653/v1/2022.acl-long.300 <fixed-case>F</fixed-case>air<fixed-case>L</fixed-case>ex: A Multilingual Benchmark for Evaluating Fairness in Legal Text Processing @@ -4332,6 +4632,7 @@ in the Case of Unambiguous Gender chalkidis-etal-2022-fairlex coastalcph/fairlex ECtHR + 10.18653/v1/2022.acl-long.301 Towards Abstractive Grounded Summarization of Podcast Transcripts @@ -4345,6 +4646,7 @@ in the Case of Unambiguous Gender 2022.acl-long.302 song-etal-2022-towards tencent-ailab/grndpodcastsum + 10.18653/v1/2022.acl-long.302 <fixed-case>F</fixed-case>i<fixed-case>NER</fixed-case>: Financial Numeric Entity Recognition for <fixed-case>XBRL</fixed-case> Tagging @@ -4361,6 +4663,7 @@ in the Case of Unambiguous Gender loukas-etal-2022-finer nlpaueb/finer FiNER-139 + 10.18653/v1/2022.acl-long.303 Keywords and Instances: A Hierarchical Contrastive Learning Framework Unifying Hybrid Granularities for Text Generation @@ -4381,6 +4684,7 @@ in the Case of Unambiguous Gender 2022.acl-long.304.software.zip li-etal-2022-keywords ROCStories + 10.18653/v1/2022.acl-long.304 <fixed-case>EPT</fixed-case>-<fixed-case>X</fixed-case>: An Expression-Pointer Transformer model that generates e<fixed-case>X</fixed-case>planations for numbers @@ -4393,6 +4697,7 @@ in the Case of Unambiguous Gender 2022.acl-long.305 2022.acl-long.305.software.tgz kim-etal-2022-ept + 10.18653/v1/2022.acl-long.305 Identifying the Human Values behind Arguments @@ -4407,6 +4712,7 @@ in the Case of Unambiguous Gender 2022.acl-long.306 kiesel-etal-2022-identifying webis-de/acl-22 + 10.18653/v1/2022.acl-long.306 <fixed-case>B</fixed-case>ench<fixed-case>IE</fixed-case>: A Framework for Multi-Faceted Fact-Based Open Information Extraction Evaluation @@ -4423,6 +4729,7 @@ in the Case of Unambiguous Gender gashteovski-etal-2022-benchie gkiril/benchie BenchIE + 10.18653/v1/2022.acl-long.307 Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition @@ -4443,6 +4750,7 @@ in the Case of Unambiguous Gender LRW Libri-Light LibriSpeech + 10.18653/v1/2022.acl-long.308 <fixed-case>S</fixed-case>umma<fixed-case>R</fixed-case>eranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive Summarization @@ -4457,6 +4765,7 @@ in the Case of Unambiguous Gender ntunlp/summareranker CNN/Daily Mail Reddit TIFU + 10.18653/v1/2022.acl-long.309 Understanding Multimodal Procedural Knowledge by Sequencing Multimodal Instructional Manuals @@ -4473,6 +4782,7 @@ in the Case of Unambiguous Gender wu-etal-2022-understanding RecipeQA WikiHow + 10.18653/v1/2022.acl-long.310 Zoom Out and Observe: News Environment Perception for Fake News Detection @@ -4487,6 +4797,7 @@ in the Case of Unambiguous Gender 2022.acl-long.311 sheng-etal-2022-zoom ictmcg/news-environment-perception + 10.18653/v1/2022.acl-long.311 Divide and Rule: Effective Pre-Training for Context-Aware Multi-Encoder Translation Models @@ -4502,6 +4813,7 @@ in the Case of Unambiguous Gender IWSLT 2017 OpenSubtitles WMT 2014 + 10.18653/v1/2022.acl-long.312 Saliency as Evidence: Event Detection with Trigger Saliency Attribution @@ -4514,6 +4826,7 @@ in the Case of Unambiguous Gender 2022.acl-long.313.software.zip liu-etal-2022-saliency MAVEN + 10.18653/v1/2022.acl-long.313 <fixed-case>SRL4E</fixed-case> – <fixed-case>S</fixed-case>emantic <fixed-case>R</fixed-case>ole <fixed-case>L</fixed-case>abeling for <fixed-case>E</fixed-case>motions: <fixed-case>A</fixed-case> Unified Evaluation Framework @@ -4525,6 +4838,7 @@ in the Case of Unambiguous Gender 2022.acl-long.314 campagnano-etal-2022-srl4e sapienzanlp/srl4e + 10.18653/v1/2022.acl-long.314 Context Matters: A Pragmatic Study of <fixed-case>PLM</fixed-case>s’ Negation Understanding @@ -4536,6 +4850,7 @@ in the Case of Unambiguous Gender gubelmann-handschuh-2022-context GLUE SuperGLUE + 10.18653/v1/2022.acl-long.315 Probing for Predicate Argument Structures in Pretrained Language Models @@ -4546,6 +4861,7 @@ in the Case of Unambiguous Gender 2022.acl-long.316 conia-navigli-2022-probing sapienzanlp/srl-pas-probing + 10.18653/v1/2022.acl-long.316 Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction @@ -4560,6 +4876,7 @@ in the Case of Unambiguous Gender 2022.acl-long.317.software.zip huang-etal-2022-multilingual-generative pluslabnlp/x-gear + 10.18653/v1/2022.acl-long.317 Identifying Moments of Change from Longitudinal User Text @@ -4573,6 +4890,7 @@ in the Case of Unambiguous Gender Identifying changes in individuals’ behaviour and mood, as observed via content shared on online platforms, is increasingly gaining importance. Most research to-date on this topic focuses on either: (a) identifying individuals at risk or with a certain mental health condition given a batch of posts or (b) providing equivalent labels at the post level. A disadvantage of such work is the lack of a strong temporal component and the inability to make longitudinal assessments following an individual’s trajectory and allowing timely interventions. Here we define a new task, that of identifying moments of change in individuals on the basis of their shared content online. The changes we consider are sudden shifts in mood (switches) or gradual mood progression (escalations). We have created detailed guidelines for capturing moments of change and a corpus of 500 manually annotated user timelines (18.7K posts). We have developed a variety of baseline models drawing inspiration from related tasks and show that the best performance is obtained through context aware sequential modelling. We also introduce new metrics for capturing rare events in temporal windows. 2022.acl-long.318 tsakalidis-etal-2022-identifying + 10.18653/v1/2022.acl-long.318 Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System @@ -4589,6 +4907,7 @@ in the Case of Unambiguous Gender 2022.acl-long.319.software.zip su-etal-2022-multi awslabs/pptod + 10.18653/v1/2022.acl-long.319 Graph Enhanced Contrastive Learning for Radiology Findings Summarization @@ -4604,6 +4923,7 @@ in the Case of Unambiguous Gender 2022.acl-long.320.software.zip hu-etal-2022-graph jinpeng01/aig_cl + 10.18653/v1/2022.acl-long.320 Semi-Supervised Formality Style Transfer with Consistency Training @@ -4616,6 +4936,7 @@ in the Case of Unambiguous Gender liu-etal-2022-semi aolius/semi-fst GYAFC + 10.18653/v1/2022.acl-long.321 Cross-Lingual Ability of Multilingual Masked Language Models: A Study of Language Structure @@ -4627,6 +4948,7 @@ in the Case of Unambiguous Gender 2022.acl-long.322 chai-etal-2022-cross XNLI + 10.18653/v1/2022.acl-long.322 Rare and Zero-shot Word Sense Disambiguation using <fixed-case>Z</fixed-case>-Reweighting @@ -4641,6 +4963,7 @@ in the Case of Unambiguous Gender su-etal-2022-rare suytingwan/wsd-z-reweighting Word Sense Disambiguation: a Unified Evaluation Framework and Empirical Comparison + 10.18653/v1/2022.acl-long.323 <fixed-case>N</fixed-case>ibbling at the Hard Core of <fixed-case>W</fixed-case>ord <fixed-case>S</fixed-case>ense <fixed-case>D</fixed-case>isambiguation @@ -4654,6 +4977,7 @@ in the Case of Unambiguous Gender maru-etal-2022-nibbling sapienzanlp/wsd-hard-benchmark Word Sense Disambiguation: a Unified Evaluation Framework and Empirical Comparison + 10.18653/v1/2022.acl-long.324 Large Scale Substitution-based Word Sense Induction @@ -4667,6 +4991,7 @@ in the Case of Unambiguous Gender eyal-etal-2022-large CoarseWSD-20 WiC + 10.18653/v1/2022.acl-long.325 Can Synthetic Translations Improve Bitext Quality? @@ -4677,6 +5002,7 @@ in the Case of Unambiguous Gender 2022.acl-long.326 briakou-carpuat-2022-synthetic WikiMatrix + 10.18653/v1/2022.acl-long.326 Unsupervised Dependency Graph Network @@ -4692,6 +5018,7 @@ in the Case of Unambiguous Gender shen-etal-2022-unsupervised yikangshen/udgn Penn Treebank + 10.18653/v1/2022.acl-long.327 <fixed-case>W</fixed-case>iki<fixed-case>D</fixed-case>iverse: A Multimodal Entity Linking Dataset with Diversified Contextual Topics and Entity Types @@ -4710,6 +5037,7 @@ in the Case of Unambiguous Gender wang-etal-2022-wikidiverse wangxw5/wikidiverse ZESHEL + 10.18653/v1/2022.acl-long.328 Rewire-then-Probe: A Contrastive Recipe for Probing Biomedical Knowledge of Pre-trained Language Models @@ -4728,6 +5056,7 @@ in the Case of Unambiguous Gender BLUE BioLAMA LAMA + 10.18653/v1/2022.acl-long.329 Fine- and Coarse-Granularity Hybrid Self-Attention for Efficient <fixed-case>BERT</fixed-case> @@ -4745,6 +5074,7 @@ in the Case of Unambiguous Gender GLUE QNLI RACE + 10.18653/v1/2022.acl-long.330 Compression of Generative Pre-trained Language Models via Quantization @@ -4764,6 +5094,7 @@ in the Case of Unambiguous Gender PERSONA-CHAT WikiText-103 WikiText-2 + 10.18653/v1/2022.acl-long.331 Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration @@ -4780,6 +5111,7 @@ in the Case of Unambiguous Gender Conceptual Captions Objects365 Places + 10.18653/v1/2022.acl-long.332 <fixed-case>D</fixed-case>ialog<fixed-case>VED</fixed-case>: A Pre-trained Latent Variable Encoder-Decoder Model for Dialog Response Generation @@ -4803,6 +5135,7 @@ in the Case of Unambiguous Gender DSTC7 Task 2 DailyDialog PERSONA-CHAT + 10.18653/v1/2022.acl-long.333 Contextual Fine-to-Coarse Distillation for Coarse-grained Response Selection in Open-Domain Conversations @@ -4823,6 +5156,7 @@ in the Case of Unambiguous Gender 2022.acl-long.334 chen-etal-2022-contextual lemuria-wchen/CFC + 10.18653/v1/2022.acl-long.334 Textomics: A Dataset for Genomics Data Summary Generation @@ -4835,6 +5169,7 @@ in the Case of Unambiguous Gender 2022.acl-long.335.software.zip wang-etal-2022-textomics amos814/textomics + 10.18653/v1/2022.acl-long.335 A Contrastive Framework for Learning Sentence Representations from Pairwise and Triple-wise Perspective in Angular Space @@ -4852,6 +5187,7 @@ in the Case of Unambiguous Gender MRPC SST SentEval + 10.18653/v1/2022.acl-long.336 Packed Levitated Marker for Entity and Relation Extraction @@ -4871,6 +5207,7 @@ in the Case of Unambiguous Gender Few-NERD OntoNotes 5.0 SciERC + 10.18653/v1/2022.acl-long.337 An Interpretable Neuro-Symbolic Reasoning Framework for Task-Oriented Dialogue Generation @@ -4884,6 +5221,7 @@ in the Case of Unambiguous Gender 2022.acl-long.338.software.zip yang-etal-2022-interpretable shiquanyang/ns-dial + 10.18653/v1/2022.acl-long.338 Impact of Evaluation Methodologies on Code Summarization @@ -4897,6 +5235,7 @@ in the Case of Unambiguous Gender 2022.acl-long.339 nie-etal-2022-impact engineeringsoftware/time-segmented-evaluation + 10.18653/v1/2022.acl-long.339 <fixed-case>KG</fixed-case>-<fixed-case>F</fixed-case>i<fixed-case>D</fixed-case>: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering @@ -4915,6 +5254,7 @@ in the Case of Unambiguous Gender yu-etal-2022-kg Natural Questions TriviaQA + 10.18653/v1/2022.acl-long.340 Which side are you on? Insider-Outsider classification in conspiracy-theoretic social media @@ -4928,6 +5268,7 @@ in the Case of Unambiguous Gender 2022.acl-long.341 2022.acl-long.341.software.zip holur-etal-2022-side + 10.18653/v1/2022.acl-long.341 Learning From Failure: Data Capture in an <fixed-case>A</fixed-case>ustralian Aboriginal Community @@ -4938,6 +5279,7 @@ in the Case of Unambiguous Gender Most low resource language technology development is premised on the need to collect data for training statistical models. When we follow the typical process of recording and transcribing text for small Indigenous languages, we hit up against the so-called “transcription bottleneck.” Therefore it is worth exploring new ways of engaging with speakers which generate data while avoiding the transcription bottleneck. We have deployed a prototype app for speakers to use for confirming system guesses in an approach to transcription based on word spotting. However, in the process of testing the app we encountered many new problems for engagement with speakers. This paper presents a close-up study of the process of deploying data capture technology on the ground in an Australian Aboriginal community. We reflect on our interactions with participants and draw lessons that apply to anyone seeking to develop methods for language data collection in an Indigenous community. 2022.acl-long.342 le-ferrand-etal-2022-learning + 10.18653/v1/2022.acl-long.342 Deep Inductive Logic Reasoning for Multi-Hop Reading Comprehension @@ -4949,6 +5291,7 @@ in the Case of Unambiguous Gender wang-pan-2022-deep MedHop WikiHop + 10.18653/v1/2022.acl-long.343 <fixed-case>CICERO</fixed-case>: A Dataset for Contextualized Commonsense Inference in Dialogues @@ -4966,6 +5309,7 @@ in the Case of Unambiguous Gender DREAM DailyDialog MuTual + 10.18653/v1/2022.acl-long.344 A Comparative Study of Faithfulness Metrics for Model Interpretability Methods @@ -4978,6 +5322,7 @@ in the Case of Unambiguous Gender chan-etal-2022-comparative IMDb Movie Reviews SST + 10.18653/v1/2022.acl-long.345 <fixed-case>SP</fixed-case>o<fixed-case>T</fixed-case>: Better Frozen Model Adaptation through Soft Prompt Transfer @@ -5013,6 +5358,7 @@ in the Case of Unambiguous Gender WSC WiC WinoGrande + 10.18653/v1/2022.acl-long.346 Pass off Fish Eyes for Pearls: Attacking Model Selection of Pre-trained Models @@ -5034,6 +5380,7 @@ in the Case of Unambiguous Gender OLID QNLI SST + 10.18653/v1/2022.acl-long.347 Educational Question Generation of Children Storybooks via Question Type Distribution Learning and Event-centric Summarization @@ -5050,6 +5397,7 @@ in the Case of Unambiguous Gender zhao-etal-2022-educational zhaozj89/Educational-Question-Generation FairytaleQA + 10.18653/v1/2022.acl-long.348 <fixed-case>H</fixed-case>eter<fixed-case>MPC</fixed-case>: A Heterogeneous Graph Neural Network for Response Generation in Multi-Party Conversations @@ -5065,6 +5413,7 @@ in the Case of Unambiguous Gender 2022.acl-long.349 gu-etal-2022-hetermpc lxchtan/hetermpc + 10.18653/v1/2022.acl-long.349 The patient is more dead than alive: exploring the current state of the multi-document summarisation of the biomedical literature @@ -5076,6 +5425,7 @@ in the Case of Unambiguous Gender Although multi-document summarisation (MDS) of the biomedical literature is a highly valuable task that has recently attracted substantial interest, evaluation of the quality of biomedical summaries lacks consistency and transparency. In this paper, we examine the summaries generated by two current models in order to understand the deficiencies of existing evaluation approaches in the context of the challenges that arise in the MDS task. Based on this analysis, we propose a new approach to human evaluation and identify several challenges that must be overcome to develop effective biomedical MDS systems. 2022.acl-long.350 otmakhova-etal-2022-patient + 10.18653/v1/2022.acl-long.350 A Multi-Document Coverage Reward for <fixed-case>RELAX</fixed-case>ed Multi-Document Summarization @@ -5089,6 +5439,7 @@ in the Case of Unambiguous Gender jacob-parnell-rozetta/longformer_coverage Multi-News WCEP + 10.18653/v1/2022.acl-long.351 <fixed-case>KNN</fixed-case>-Contrastive Learning for Out-of-Domain Intent Classification @@ -5099,6 +5450,7 @@ in the Case of Unambiguous Gender The Out-of-Domain (OOD) intent classification is a basic and challenging task for dialogue systems. Previous methods commonly restrict the region (in feature space) of In-domain (IND) intent features to be compact or simply-connected implicitly, which assumes no OOD intents reside, to learn discriminative semantic features. Then the distribution of the IND intent features is often assumed to obey a hypothetical distribution (Gaussian mostly) and samples outside this distribution are regarded as OOD samples. In this paper, we start from the nature of OOD intent classification and explore its optimization objective. We further propose a simple yet effective method, named KNN-contrastive learning. Our approach utilizes k-nearest neighbors (KNN) of IND intents to learn discriminative semantic features that are more conducive to OOD detection.Notably, the density-based novelty detection algorithm is so well-grounded in the essence of our method that it is reasonable to use it as the OOD detection algorithm without making any requirements for the feature distribution.Extensive experiments on four public datasets show that our approach can not only enhance the OOD detection performance substantially but also improve the IND intent classification while requiring no restrictions on feature distribution. 2022.acl-long.352 zhou-etal-2022-knn + 10.18653/v1/2022.acl-long.352 A Neural Network Architecture for Program Understanding Inspired by Human Behaviors @@ -5115,6 +5467,7 @@ in the Case of Unambiguous Gender recklessronan/pgnn-ek CodeSearchNet CodeXGLUE + 10.18653/v1/2022.acl-long.353 <fixed-case>F</fixed-case>a<fixed-case>VIQ</fixed-case>: <fixed-case>FA</fixed-case>ct Verification from Information-seeking Questions @@ -5134,6 +5487,7 @@ in the Case of Unambiguous Gender FM2 KILT Natural Questions + 10.18653/v1/2022.acl-long.354 Simulating Bandit Learning from User Feedback for Extractive Question Answering @@ -5152,6 +5506,7 @@ in the Case of Unambiguous Gender SQuAD SearchQA TriviaQA + 10.18653/v1/2022.acl-long.355 Beyond Goldfish Memory: Long-Term Open-Domain Conversation @@ -5163,6 +5518,7 @@ in the Case of Unambiguous Gender 2022.acl-long.356 xu-etal-2022-beyond PERSONA-CHAT + 10.18653/v1/2022.acl-long.356 <fixed-case>R</fixed-case>e<fixed-case>CLIP</fixed-case>: A Strong Zero-Shot Baseline for Referring Expression Comprehension @@ -5180,6 +5536,7 @@ in the Case of Unambiguous Gender CLEVR COCO RefCOCO + 10.18653/v1/2022.acl-long.357 Dynamic Prefix-Tuning for Generative Template-based Event Extraction @@ -5191,6 +5548,7 @@ in the Case of Unambiguous Gender We consider event extraction in a generative manner with template-based conditional generation.Although there is a rising trend of casting the task of event extraction as a sequence generation problem with prompts, these generation-based methods have two significant challenges, including using suboptimal prompts and static event type information.In this paper, we propose a generative template-based event extraction method with dynamic prefix (GTEE-DynPref) by integrating context information with type-specific prefixes to learn a context-specific prefix for each context.Experimental results show that our model achieves competitive results with the state-of-the-art classification-based model OneIE on ACE 2005 and achieves the best performances on ERE.Additionally, our model is proven to be portable to new types of events effectively. 2022.acl-long.358 liu-etal-2022-dynamic + 10.18653/v1/2022.acl-long.358 <fixed-case>E</fixed-case>-<fixed-case>LANG</fixed-case>: Energy-Based Joint Inferencing of Super and Swift Language Models @@ -5204,6 +5562,7 @@ in the Case of Unambiguous Gender GLUE QNLI SuperGLUE + 10.18653/v1/2022.acl-long.359 <fixed-case>PRIMERA</fixed-case>: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization @@ -5223,6 +5582,7 @@ in the Case of Unambiguous Gender WikiSum arXiv arXiv Summarization Dataset + 10.18653/v1/2022.acl-long.360 Dynamic Global Memory for Document-level Argument Extraction @@ -5235,6 +5595,7 @@ in the Case of Unambiguous Gender 2022.acl-long.361.software.zip du-etal-2022-dynamic xinyadu/memory_docie + 10.18653/v1/2022.acl-long.361 Measuring the Impact of (Psycho-)Linguistic and Readability Features and Their Spill Over Effects on the Prediction of Eye Movement Patterns @@ -5246,6 +5607,7 @@ in the Case of Unambiguous Gender There is a growing interest in the combined use of NLP and machine learning methods to predict gaze patterns during naturalistic reading. While promising results have been obtained through the use of transformer-based language models, little work has been undertaken to relate the performance of such models to general text characteristics. In this paper we report on experiments with two eye-tracking corpora of naturalistic reading and two language models (BERT and GPT-2). In all experiments, we test effects of a broad spectrum of features for predicting human reading behavior that fall into five categories (syntactic complexity, lexical richness, register-based multiword combinations, readability and psycholinguistic word properties). Our experiments show that both the features included and the architecture of the transformer-based language models play a role in predicting multiple eye-tracking measures during naturalistic reading. We also report the results of experiments aimed at determining the relative importance of features from different groups using SP-LIME. 2022.acl-long.362 wiechmann-kerz-2022-measuring + 10.18653/v1/2022.acl-long.362 Alternative Input Signals Ease Transfer in Multilingual Machine Translation @@ -5260,6 +5622,7 @@ in the Case of Unambiguous Gender Recent work in multilingual machine translation (MMT) has focused on the potential of positive transfer between languages, particularly cases where higher-resourced languages can benefit lower-resourced ones. While training an MMT model, the supervision signals learned from one language pair can be transferred to the other via the tokens shared by multiple source languages. However, the transfer is inhibited when the token overlap among source languages is small, which manifests naturally when languages use different writing systems. In this paper, we tackle inhibited transfer by augmenting the training data with alternative signals that unify different writing systems, such as phonetic, romanized, and transliterated input. We test these signals on Indic and Turkic languages, two language families where the writing systems differ but languages still share common features. Our results indicate that a straightforward multi-source self-ensemble – training a model on a mixture of various signals and ensembling the outputs of the same model fed with different signals during inference, outperforms strong ensemble baselines by 1.3 BLEU points on both language families. Further, we find that incorporating alternative inputs via self-ensemble can be particularly effective when training set is small, leading to +5 BLEU when only 5% of the total training data is accessible. Finally, our analysis demonstrates that including alternative signals yields more consistency and translates named entities more accurately, which is crucial for increased factuality of automated systems. 2022.acl-long.363 sun-etal-2022-alternative + 10.18653/v1/2022.acl-long.363 Phone-ing it in: Towards Flexible Multi-Modal Language Model Training by Phonetic Representations of Data @@ -5271,6 +5634,7 @@ in the Case of Unambiguous Gender leong-whitenack-2022-phone sil-ai/phone-it-in MasakhaNER + 10.18653/v1/2022.acl-long.364 Noisy Channel Language Model Prompting for Few-Shot Text Classification @@ -5285,6 +5649,7 @@ in the Case of Unambiguous Gender shmsw25/Channel-LM-Prompting AG News SST + 10.18653/v1/2022.acl-long.365 Multilingual unsupervised sequence segmentation transfers to extremely low-resource languages @@ -5297,6 +5662,7 @@ in the Case of Unambiguous Gender 2022.acl-long.366 downey-etal-2022-multilingual cmdowney88/xlslm + 10.18653/v1/2022.acl-long.366 <fixed-case>K</fixed-case>inya<fixed-case>BERT</fixed-case>: a Morphology-aware <fixed-case>K</fixed-case>inyarwanda Language Model @@ -5310,6 +5676,7 @@ in the Case of Unambiguous Gender anzeyimana/kinyabert-acl2022 GLUE QNLI + 10.18653/v1/2022.acl-long.367 On the Calibration of Pre-trained Language Models using Mixup Guided by Area Under the Margin and Saliency @@ -5321,6 +5688,7 @@ in the Case of Unambiguous Gender park-caragea-2022-calibration SNLI SWAG + 10.18653/v1/2022.acl-long.368 <fixed-case>IMPLI</fixed-case>: Investigating <fixed-case>NLI</fixed-case> Models’ Performance on Figurative Language @@ -5332,6 +5700,7 @@ in the Case of Unambiguous Gender 2022.acl-long.369 stowe-etal-2022-impli ukplab/acl2022-impli + 10.18653/v1/2022.acl-long.369 <fixed-case>QAC</fixed-case>onv: Question Answering on Informative Conversations @@ -5353,6 +5722,7 @@ in the Case of Unambiguous Gender Molweni QuAC SQuAD + 10.18653/v1/2022.acl-long.370 Prix-<fixed-case>LM</fixed-case>: Pretraining for Multilingual Knowledge Base Construction @@ -5369,6 +5739,7 @@ in the Case of Unambiguous Gender DBpedia LAMA XL-BEL + 10.18653/v1/2022.acl-long.371 Semantic Composition with <fixed-case>PSHRG</fixed-case> for Derivation Tree Reconstruction from Graph-Based Meaning Representations @@ -5379,6 +5750,7 @@ in the Case of Unambiguous Gender We introduce a data-driven approach to generating derivation trees from meaning representation graphs with probabilistic synchronous hyperedge replacement grammar (PSHRG). SHRG has been used to produce meaning representation graphs from texts and syntax trees, but little is known about its viability on the reverse. In particular, we experiment on Dependency Minimal Recursion Semantics (DMRS) and adapt PSHRG as a formalism that approximates the semantic composition of DMRS graphs and simultaneously recovers the derivations that license the DMRS graphs. Consistent results are obtained as evaluated on a collection of annotated corpora. This work reveals the ability of PSHRG in formalizing a syntax–semantics interface, modelling compositional graph-to-tree translations, and channelling explainability to surface realization. 2022.acl-long.372 lo-etal-2022-semantic + 10.18653/v1/2022.acl-long.372 <fixed-case>HOLM</fixed-case>: Hallucinating Objects with Language Models for Referring Expression Recognition in Partially-Observed Scenes @@ -5390,6 +5762,7 @@ in the Case of Unambiguous Gender 2022.acl-long.373 cirik-etal-2022-holm Visual Genome + 10.18653/v1/2022.acl-long.373 Multi Task Learning For Zero Shot Performance Prediction of Multilingual Models @@ -5409,6 +5782,7 @@ in the Case of Unambiguous Gender XCOPA XNLI XQuAD + 10.18653/v1/2022.acl-long.374 <tex-math>\infty</tex-math>-former: Infinite Memory Transformer @@ -5423,6 +5797,7 @@ in the Case of Unambiguous Gender PG-19 WikiText-103 WikiText-2 + 10.18653/v1/2022.acl-long.375 Systematic Inequalities in Language Technology Performance across the World’s Languages @@ -5435,6 +5810,7 @@ in the Case of Unambiguous Gender 2022.acl-long.376.software.zip blasi-etal-2022-systematic neubig/globalutility + 10.18653/v1/2022.acl-long.376 <fixed-case>CaMEL</fixed-case>: <fixed-case>C</fixed-case>ase <fixed-case>M</fixed-case>arker <fixed-case>E</fixed-case>xtraction without <fixed-case>L</fixed-case>abels @@ -5448,6 +5824,7 @@ in the Case of Unambiguous Gender 2022.acl-long.377.software.zip weissweiler-etal-2022-camel leonieweissweiler/camel + 10.18653/v1/2022.acl-long.377 Improving Generalizability in Implicitly Abusive Language Detection with Concept Activation Vectors @@ -5460,6 +5837,7 @@ in the Case of Unambiguous Gender 2022.acl-long.378.software.zip nejadgholi-etal-2022-improving isarnejad/tcav-for-text-classifiers + 10.18653/v1/2022.acl-long.378 Reports of personal experiences and stories in argumentation: datasets and analysis @@ -5469,6 +5847,7 @@ in the Case of Unambiguous Gender Reports of personal experiences or stories can play a crucial role in argumentation, as they represent an immediate and (often) relatable way to back up one’s position with respect to a given topic. They are easy to understand and increase empathy: this makes them powerful in argumentation. The impact of personal reports and stories in argumentation has been studied in the Social Sciences, but it is still largely underexplored in NLP. Our work is the first step towards filling this gap: our goal is to develop robust classifiers to identify documents containing personal experiences and reports. The main challenge is the scarcity of annotated data: our solution is to leverage existing annotations to be able to scale-up the analysis. Our contribution is two-fold. First, we conduct a set of in-domain and cross-domain experiments involving three datasets (two from Argument Mining, one from the Social Sciences), modeling architectures, training setups and fine-tuning options tailored to the involved domains. We show that despite the differences among datasets and annotations, robust cross-domain classification is possible. Second, we employ linear regression for performance mining, identifying performance trends both for overall classification performance and individual classifier predictions. 2022.acl-long.379 falk-lapesa-2022-reports + 10.18653/v1/2022.acl-long.379 Non-neural Models Matter: a Re-evaluation of Neural Referring Expression Generation Systems @@ -5481,6 +5860,7 @@ in the Case of Unambiguous Gender 2022.acl-long.380.software.zip same-etal-2022-non WebNLG + 10.18653/v1/2022.acl-long.380 Bridging the Generalization Gap in Text-to-<fixed-case>SQL</fixed-case> Parsing with Schema Expansion @@ -5492,6 +5872,7 @@ in the Case of Unambiguous Gender Text-to-SQL parsers map natural language questions to programs that are executable over tables to generate answers, and are typically evaluated on large-scale datasets like Spider (Yu et al., 2018). We argue that existing benchmarks fail to capture a certain out-of-domain generalization problem that is of significant practical importance: matching domain specific phrases to composite operation over columns. To study this problem, we first propose a synthetic dataset along with a re-purposed train/test split of the Squall dataset (Shi et al., 2020) as new benchmarks to quantify domain generalization over column operations, and find existing state-of-the-art parsers struggle in these benchmarks. We propose to address this problem by incorporating prior domain knowledge by preprocessing table schemas, and design a method that consists of two components: schema expansion and schema pruning. This method can be easily applied to multiple existing base parsers, and we show that it significantly outperforms baseline parsers on this domain generalization problem, boosting the underlying parsers’ overall performance by up to 13.8% relative accuracy gain (5.1% absolute) on the new Squall data split. 2022.acl-long.381 zhao-etal-2022-bridging + 10.18653/v1/2022.acl-long.381 Predicate-Argument Based Bi-Encoder for Paraphrase Identification @@ -5505,6 +5886,7 @@ in the Case of Unambiguous Gender peng-etal-2022-predicate GLUE PIT + 10.18653/v1/2022.acl-long.382 <fixed-case>MINER</fixed-case>: Improving Out-of-Vocabulary Named Entity Recognition from an Information Theoretic Perspective @@ -5523,6 +5905,7 @@ in the Case of Unambiguous Gender wang-etal-2022-miner beyonderxx/miner WNUT 2017 + 10.18653/v1/2022.acl-long.383 Leveraging <fixed-case>W</fixed-case>ikipedia article evolution for promotional tone detection @@ -5533,6 +5916,7 @@ in the Case of Unambiguous Gender 2022.acl-long.384 de-kock-vlachos-2022-leveraging christinedekock11/wiki-evolve + 10.18653/v1/2022.acl-long.384 From text to talk: <fixed-case>H</fixed-case>arnessing conversational corpora for humane and diversity-aware language technology @@ -5542,6 +5926,7 @@ in the Case of Unambiguous Gender Informal social interaction is the primordial home of human language. Linguistically diverse conversational corpora are an important and largely untapped resource for computational linguistics and language technology. Through the efforts of a worldwide language documentation movement, such corpora are increasingly becoming available. We show how interactional data from 63 languages (26 families) harbours insights about turn-taking, timing, sequential structure and social action, with implications for language technology, natural language understanding, and the design of conversational interfaces. Harnessing linguistically diverse conversational corpora will provide the empirical foundations for flexible, localizable, humane language technologies of the future. 2022.acl-long.385 dingemanse-liesenfeld-2022-text + 10.18653/v1/2022.acl-long.385 Flooding-<fixed-case>X</fixed-case>: Improving <fixed-case>BERT</fixed-case>’s Resistance to Adversarial Attacks via Loss-Restricted Fine-Tuning @@ -5562,6 +5947,7 @@ in the Case of Unambiguous Gender AG News IMDb Movie Reviews SST + 10.18653/v1/2022.acl-long.386 <fixed-case>R</fixed-case>o<fixed-case>M</fixed-case>e: A Robust Metric for Evaluating Natural Language Generation @@ -5577,6 +5963,7 @@ in the Case of Unambiguous Gender rashad101/rome CoLA KELM + 10.18653/v1/2022.acl-long.387 Finding Structural Knowledge in Multimodal-<fixed-case>BERT</fixed-case> @@ -5590,6 +5977,7 @@ in the Case of Unambiguous Gender vsjmilewski/multimodal-probes Flickr30k Visual Genome + 10.18653/v1/2022.acl-long.388 Fully Hyperbolic Neural Networks @@ -5607,6 +5995,7 @@ in the Case of Unambiguous Gender chen-etal-2022-fully chenweize1998/fully-hyperbolic-nn FB15k-237 + 10.18653/v1/2022.acl-long.389 Neural Machine Translation with Phrase-Level Universal Visual Representations @@ -5617,6 +6006,7 @@ in the Case of Unambiguous Gender 2022.acl-long.390 fang-feng-2022-neural ictnlp/pluvr + 10.18653/v1/2022.acl-long.390 <fixed-case>M</fixed-case>3<fixed-case>ED</fixed-case>: Multi-modal Multi-scene Multi-label Emotional Dialogue Database @@ -5639,6 +6029,7 @@ in the Case of Unambiguous Gender EmotionLines IEMOCAP MELD + 10.18653/v1/2022.acl-long.391 Few-shot Named Entity Recognition with Self-describing Networks @@ -5654,6 +6045,7 @@ in the Case of Unambiguous Gender chen-etal-2022-shot chen700564/sdnet WNUT 2017 + 10.18653/v1/2022.acl-long.392 <fixed-case>S</fixed-case>peech<fixed-case>T</fixed-case>5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing @@ -5681,6 +6073,7 @@ in the Case of Unambiguous Gender MuST-C VoxCeleb1 WHAM! + 10.18653/v1/2022.acl-long.393 Human Evaluation and Correlation with Automatic Metrics in Consultation Note Generation @@ -5697,6 +6090,7 @@ in the Case of Unambiguous Gender 2022.acl-long.394 moramarco-etal-2022-human CNN/Daily Mail + 10.18653/v1/2022.acl-long.394 Unified Structure Generation for Universal Information Extraction @@ -5714,6 +6108,7 @@ in the Case of Unambiguous Gender lu-etal-2022-unified CoNLL-2003 SciERC + 10.18653/v1/2022.acl-long.395 Subgraph Retrieval Enhanced Model for Multi-hop Knowledge Base Question Answering @@ -5729,6 +6124,7 @@ in the Case of Unambiguous Gender 2022.acl-long.396 zhang-etal-2022-subgraph ruckbreasoning/subgraphretrievalkbqa + 10.18653/v1/2022.acl-long.396 Pre-training to Match for Unified Low-shot Relation Extraction @@ -5743,6 +6139,7 @@ in the Case of Unambiguous Gender liu-etal-2022-pre fc-liu/mcmn FewRel + 10.18653/v1/2022.acl-long.397 Can Prompt Probe Pretrained Language Models? Understanding the Invisible Risks from a Causal View @@ -5760,6 +6157,7 @@ in the Case of Unambiguous Gender BioLAMA LAMA WebText + 10.18653/v1/2022.acl-long.398 Evaluating Extreme Hierarchical Multi-label Classification @@ -5769,6 +6167,7 @@ in the Case of Unambiguous Gender Several natural language processing (NLP) tasks are defined as a classification problem in its most complex form: Multi-label Hierarchical Extreme classification, in which items may be associated with multiple classes from a set of thousands of possible classes organized in a hierarchy and with a highly unbalanced distribution both in terms of class frequency and the number of labels per item. We analyze the state of the art of evaluation metrics based on a set of formal properties and we define an information theoretic based metric inspired by the Information Contrast Model (ICM). Experiments on synthetic data and a case study on real data show the suitability of the ICM for such scenarios. 2022.acl-long.399 amigo-delgado-2022-evaluating + 10.18653/v1/2022.acl-long.399 What does the sea say to the shore? A <fixed-case>BERT</fixed-case> based <fixed-case>DST</fixed-case> style approach for speaker to dialogue attribution in novels @@ -5779,6 +6178,7 @@ in the Case of Unambiguous Gender We present a complete pipeline to extract characters in a novel and link them to their direct-speech utterances. Our model is divided into three independent components: extracting direct-speech, compiling a list of characters, and attributing those characters to their utterances. Although we find that existing systems can perform the first two tasks accurately, attributing characters to direct speech is a challenging problem due to the narrator’s lack of explicit character mentions, and the frequent use of nominal and pronominal coreference when such explicit mentions are made. We adapt the progress made on Dialogue State Tracking to tackle a new problem: attributing speakers to dialogues. This is the first application of deep learning to speaker attribution, and it shows that is possible to overcome the need for the hand-crafted features and rules used in the past. Our full pipeline improves the performance of state-of-the-art models by a relative 50% in F1-score. 2022.acl-long.400 cuesta-lazaro-etal-2022-sea + 10.18653/v1/2022.acl-long.400 Measuring Fairness of Text Classifiers via Prediction Sensitivity @@ -5792,6 +6192,7 @@ in the Case of Unambiguous Gender With the rapid growth in language processing applications, fairness has emerged as an important consideration in data-driven solutions. Although various fairness definitions have been explored in the recent literature, there is lack of consensus on which metrics most accurately reflect the fairness of a system. In this work, we propose a new formulation – accumulated prediction sensitivity, which measures fairness in machine learning models based on the model’s prediction sensitivity to perturbations in input features. The metric attempts to quantify the extent to which a single prediction depends on a protected attribute, where the protected attribute encodes the membership status of an individual in a protected group. We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness. It also correlates well with humans’ perception of fairness. We conduct experiments on two text classification datasets – Jigsaw Toxicity, and Bias in Bios, and evaluate the correlations between metrics and manual annotations on whether the model produced a fair outcome. We observe that the proposed fairness metric based on prediction sensitivity is statistically significantly more correlated with human annotation than the existing counterfactual fairness metric. 2022.acl-long.401 krishna-etal-2022-measuring + 10.18653/v1/2022.acl-long.401 <fixed-case>R</fixed-case>otate<fixed-case>QVS</fixed-case>: Representing Temporal Information as Rotations in Quaternion Vector Space for Temporal Knowledge Graph Completion @@ -5806,6 +6207,7 @@ in the Case of Unambiguous Gender chen-etal-2022-rotateqvs ICEWS YAGO + 10.18653/v1/2022.acl-long.402 Feeding What You Need by Understanding What You Learned @@ -5823,6 +6225,7 @@ in the Case of Unambiguous Gender HotpotQA RACE SQuAD + 10.18653/v1/2022.acl-long.403 Probing Simile Knowledge from Pre-trained Language Models @@ -5842,6 +6245,7 @@ in the Case of Unambiguous Gender chen-etal-2022-probing nairoj/Probing-Simile-from-PLM BookCorpus + 10.18653/v1/2022.acl-long.404 An Effective and Efficient Entity Alignment Decoding Algorithm via Third-Order Tensor Isomorphism @@ -5858,6 +6262,7 @@ in the Case of Unambiguous Gender 2022.acl-long.405 2022.acl-long.405.software.zip mao-etal-2022-effective + 10.18653/v1/2022.acl-long.405 Entailment Graph Learning with Textual Entailment and Soft Transitivity @@ -5870,6 +6275,7 @@ in the Case of Unambiguous Gender chen-etal-2022-entailment zacharychenpk/egt2 FIGER + 10.18653/v1/2022.acl-long.406 Logic Traps in Evaluating Attribution Scores @@ -5886,6 +6292,7 @@ in the Case of Unambiguous Gender GLUE RACE SST + 10.18653/v1/2022.acl-long.407 Continual Pre-training of Language Models for Math Problem Understanding with Syntax-Aware Memory Network @@ -5900,6 +6307,7 @@ in the Case of Unambiguous Gender 2022.acl-long.408 gong-etal-2022-continual MATH + 10.18653/v1/2022.acl-long.408 Multitasking Framework for Unsupervised Simple Definition Generation @@ -5914,6 +6322,7 @@ in the Case of Unambiguous Gender 2022.acl-long.409.software.zip kong-etal-2022-multitasking blcuicall/simpdefiner + 10.18653/v1/2022.acl-long.409 Learning to Reason Deductively: Math Word Problem Solving as Complex Relation Extraction @@ -5929,6 +6338,7 @@ in the Case of Unambiguous Gender Math23K MathQA SVAMP + 10.18653/v1/2022.acl-long.410 When did you become so smart, oh wise one?! Sarcasm Explanation in Multi-modal Multi-party Dialogues @@ -5943,6 +6353,7 @@ in the Case of Unambiguous Gender kumar-etal-2022-become lcs2-iiitd/maf WITS + 10.18653/v1/2022.acl-long.411 Toward Interpretable Semantic Textual Similarity via Optimal Transport-based Contrastive Sentence Learning @@ -5957,6 +6368,7 @@ in the Case of Unambiguous Gender lee-etal-2022-toward sh0416/clrcmd SNLI + 10.18653/v1/2022.acl-long.412 Pre-training and Fine-tuning Neural Topic Model: A Simple yet Effective Approach to Incorporating External Knowledge @@ -5972,6 +6384,7 @@ in the Case of Unambiguous Gender zhang-etal-2022-pre OpenWebText WebText + 10.18653/v1/2022.acl-long.413 Multi-View Document Representation Learning for Open-Domain Dense Retrieval @@ -5987,6 +6400,7 @@ in the Case of Unambiguous Gender Natural Questions SQuAD TriviaQA + 10.18653/v1/2022.acl-long.414 Graph Pre-training for <fixed-case>AMR</fixed-case> Parsing and Generation @@ -6004,6 +6418,7 @@ in the Case of Unambiguous Gender LDC2020T02 New3 The Little Prince + 10.18653/v1/2022.acl-long.415 Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning Skills @@ -6018,6 +6433,7 @@ in the Case of Unambiguous Gender oriyor/turning_tables DROP IIRC + 10.18653/v1/2022.acl-long.416 <fixed-case>RNG</fixed-case>-<fixed-case>KBQA</fixed-case>: Generation Augmented Iterative Ranking for Knowledge Base Question Answering @@ -6032,6 +6448,7 @@ in the Case of Unambiguous Gender 2022.acl-long.417.software.zip ye-etal-2022-rng salesforce/rng-kbqa + 10.18653/v1/2022.acl-long.417 Rethinking Self-Supervision Objectives for Generalizable Coherence Modeling @@ -6043,6 +6460,7 @@ in the Case of Unambiguous Gender 2022.acl-long.418 2022.acl-long.418.software.zip jwalapuram-etal-2022-rethinking + 10.18653/v1/2022.acl-long.418 Just Rank: Rethinking Evaluation with Word and Sentence Similarities @@ -6059,6 +6477,7 @@ in the Case of Unambiguous Gender SST SciCite SentEval + 10.18653/v1/2022.acl-long.419 <fixed-case>M</fixed-case>arkup<fixed-case>LM</fixed-case>: Pre-training of Text and Markup Language for Visually Rich Document Understanding @@ -6070,6 +6489,7 @@ in the Case of Unambiguous Gender Multimodal pre-training with text, layout, and image has made significant progress for Visually Rich Document Understanding (VRDU), especially the fixed-layout documents such as scanned document images. While, there are still a large number of digital documents where the layout information is not fixed and needs to be interactively and dynamically rendered for visualization, making existing layout-based pre-training approaches not easy to apply. In this paper, we propose MarkupLM for document understanding tasks with markup languages as the backbone, such as HTML/XML-based documents, where text and markup information is jointly pre-trained. Experiment results show that the pre-trained MarkupLM significantly outperforms the existing strong baseline models on several document understanding tasks. The pre-trained model and code will be publicly available at https://aka.ms/markuplm. 2022.acl-long.420 li-etal-2022-markuplm + 10.18653/v1/2022.acl-long.420 <fixed-case>CLIP</fixed-case> Models are Few-Shot Learners: Empirical Studies on <fixed-case>VQA</fixed-case> and Visual Entailment @@ -6085,6 +6505,7 @@ in the Case of Unambiguous Gender song-etal-2022-clip SNLI-VE Visual Question Answering + 10.18653/v1/2022.acl-long.421 <fixed-case>KQA</fixed-case> Pro: A Dataset with Explicit Compositional Programs for Complex Question Answering over Knowledge Base @@ -6109,6 +6530,7 @@ in the Case of Unambiguous Gender ComplexWebQuestions MetaQA WebQuestions + 10.18653/v1/2022.acl-long.422 Debiased Contrastive Learning of Unsupervised Sentence Representations @@ -6121,6 +6543,7 @@ in the Case of Unambiguous Gender 2022.acl-long.423 zhou-etal-2022-debiased rucaibox/dclr + 10.18653/v1/2022.acl-long.423 <fixed-case>MSP</fixed-case>: Multi-Stage Prompting for Making Pre-trained Language Models Better Translators @@ -6133,6 +6556,7 @@ in the Case of Unambiguous Gender 2022.acl-long.424 tan-etal-2022-msp thunlp-mt/plm4mt + 10.18653/v1/2022.acl-long.424 <fixed-case>S</fixed-case>ales<fixed-case>B</fixed-case>ot: Transitioning from Chit-Chat to Task-Oriented Dialogues @@ -6148,6 +6572,7 @@ in the Case of Unambiguous Gender CommonsenseQA SGD SWAG + 10.18653/v1/2022.acl-long.425 <fixed-case>UCT</fixed-case>opic: Unsupervised Contrastive Learning for Phrase Representations and Topic Mining @@ -6165,6 +6590,7 @@ in the Case of Unambiguous Gender KP20k KPTimes WNUT 2017 + 10.18653/v1/2022.acl-long.426 <fixed-case>XLM</fixed-case>-<fixed-case>E</fixed-case>: Cross-lingual Language Model Pre-training via <fixed-case>ELECTRA</fixed-case> @@ -6191,6 +6617,7 @@ in the Case of Unambiguous Gender XNLI XQuAD XTREME + 10.18653/v1/2022.acl-long.427 Nested Named Entity Recognition as Latent Lexicalized Constituency Parsing @@ -6203,6 +6630,7 @@ in the Case of Unambiguous Gender lou-etal-2022-nested louchao98/nner_as_parsing NNE + 10.18653/v1/2022.acl-long.428 Can Explanations Be Useful for Calibrating Black Box Models? @@ -6218,6 +6646,7 @@ in the Case of Unambiguous Gender MRPC QNLI SQuAD + 10.18653/v1/2022.acl-long.429 <fixed-case>OIE</fixed-case>@<fixed-case>OIA</fixed-case>: an Adaptable and Efficient Open Information Extraction Framework @@ -6229,6 +6658,7 @@ in the Case of Unambiguous Gender Different Open Information Extraction (OIE) tasks require different types of information, so the OIE field requires strong adaptability of OIE algorithms to meet different task requirements. This paper discusses the adaptability problem in existing OIE systems and designs a new adaptable and efficient OIE system - OIE@OIA as a solution. OIE@OIA follows the methodology of Open Information eXpression (OIX): parsing a sentence to an Open Information Annotation (OIA) Graph and then adapting the OIA graph to different OIE tasks with simple rules. As the core of our OIE@OIA system, we implement an end-to-end OIA generator by annotating a dataset (we make it open available) and designing an efficient learning algorithm for the complex OIA graph. We easily adapt the OIE@OIA system to accomplish three popular OIE tasks. The experimental show that our OIE@OIA achieves new SOTA performances on these tasks, showing the great adaptability of our OIE@OIA system. Furthermore, compared to other end-to-end OIE baselines that need millions of samples for training, our OIE@OIA needs much fewer training samples (12K), showing a significant advantage in terms of efficiency. 2022.acl-long.430 wang-etal-2022-oie + 10.18653/v1/2022.acl-long.430 <fixed-case>R</fixed-case>e<fixed-case>ACC</fixed-case>: A Retrieval-Augmented Code Completion Framework @@ -6245,6 +6675,7 @@ in the Case of Unambiguous Gender celbree/reacc CodeSearchNet CodeXGLUE + 10.18653/v1/2022.acl-long.431 Does Recommend-Revise Produce Reliable Annotations? An Analysis on Missing Instances in <fixed-case>D</fixed-case>oc<fixed-case>RED</fixed-case> @@ -6260,6 +6691,7 @@ in the Case of Unambiguous Gender huang-etal-2022-recommend andrewzhe/revisit-docred DocRED + 10.18653/v1/2022.acl-long.432 <fixed-case>U</fixed-case>ni<fixed-case>PELT</fixed-case>: A Unified Framework for Parameter-Efficient Language Model Tuning @@ -6278,6 +6710,7 @@ in the Case of Unambiguous Gender morningmoni/unipelt GLUE QNLI + 10.18653/v1/2022.acl-long.433 An Empirical Study of Memorization in <fixed-case>NLP</fixed-case> @@ -6290,6 +6723,7 @@ in the Case of Unambiguous Gender xszheng2020/memorization CIFAR-10 SST + 10.18653/v1/2022.acl-long.434 <fixed-case>A</fixed-case>mericas<fixed-case>NLI</fixed-case>: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages @@ -6317,6 +6751,7 @@ in the Case of Unambiguous Gender AmericasNLP/americasnlp2021 SNLI SuperGLUE + 10.18653/v1/2022.acl-long.435 Towards Learning (Dis)-Similarity of Source Code from Program Contrasts @@ -6331,6 +6766,7 @@ in the Case of Unambiguous Gender 2022.acl-long.436 ding-etal-2022-towards CodeXGLUE + 10.18653/v1/2022.acl-long.436 Guided Attention Multimodal Multitask Financial Forecasting with Inter-Company Relationships and Global and Local News @@ -6341,6 +6777,7 @@ in the Case of Unambiguous Gender 2022.acl-long.437 2022.acl-long.437.software.zip ang-lim-2022-guided + 10.18653/v1/2022.acl-long.437 On Vision Features in Multimodal Machine Translation @@ -6356,6 +6793,7 @@ in the Case of Unambiguous Gender 2022.acl-long.438 li-etal-2022-vision libeineu/fairseq_mmt + 10.18653/v1/2022.acl-long.438 <fixed-case>CONT</fixed-case>ai<fixed-case>NER</fixed-case>: Few-Shot Named Entity Recognition via Contrastive Learning @@ -6370,6 +6808,7 @@ in the Case of Unambiguous Gender psunlpgroup/container Few-NERD WNUT 2017 + 10.18653/v1/2022.acl-long.439 <fixed-case>C</fixed-case>ree Corpus: A Collection of nêhiyawêwin Resources @@ -6382,6 +6821,7 @@ in the Case of Unambiguous Gender Plains Cree (nêhiyawêwin) is an Indigenous language that is spoken in Canada and the USA. It is the most widely spoken dialect of Cree and a morphologically complex language that is polysynthetic, highly inflective, and agglutinative. It is an extremely low resource language, with no existing corpus that is both available and prepared for supporting the development of language technologies. To support nêhiyawêwin revitalization and preservation, we developed a corpus covering diverse genres, time periods, and texts for a variety of intended audiences. The data has been verified and cleaned; it is ready for use in developing language technologies for nêhiyawêwin. The corpus includes the corresponding English phrases or audio files where available. We demonstrate the utility of the corpus through its community use and its use to build language technologies that can provide the types of support that community members have expressed are desirable. The corpus is available for public use. 2022.acl-long.440 teodorescu-etal-2022-cree + 10.18653/v1/2022.acl-long.440 Learning to Rank Visual Stories From Human Ranking Data @@ -6399,6 +6839,7 @@ in the Case of Unambiguous Gender academiasinicanlplab/vhed VIST VIST-Edit + 10.18653/v1/2022.acl-long.441 Universal Conditional Masked Language Pre-training for Neural Machine Translation @@ -6412,6 +6853,7 @@ in the Case of Unambiguous Gender 2022.acl-long.442 li-etal-2022-universal huawei-noah/Pretrained-Language-Model + 10.18653/v1/2022.acl-long.442 <fixed-case>CARETS</fixed-case>: A Consistency And Robustness Evaluative Test Suite for <fixed-case>VQA</fixed-case> @@ -6427,6 +6869,7 @@ in the Case of Unambiguous Gender GQA Visual Genome Visual Question Answering + 10.18653/v1/2022.acl-long.443 Phrase-aware Unsupervised Constituency Parsing @@ -6439,6 +6882,7 @@ in the Case of Unambiguous Gender Recent studies have achieved inspiring success in unsupervised grammar induction using masked language modeling (MLM) as the proxy task. Despite their high accuracy in identifying low-level structures, prior arts tend to struggle in capturing high-level structures like clauses, since the MLM task usually only requires information from local context. In this work, we revisit LM-based constituency parsing from a phrase-centered perspective. Inspired by the natural reading process of human, we propose to regularize the parser with phrases extracted by an unsupervised phrase tagger to help the LM model quickly manage low-level structures. For a better understanding of high-level structures, we propose a phrase-guided masking strategy for LM to emphasize more on reconstructing non-phrase words. We show that the initial phrase regularization serves as an effective bootstrap, and phrase-guided masking improves the identification of high-level structures. Experiments on the public benchmark with two different backbone models demonstrate the effectiveness and generality of our method. 2022.acl-long.444 gu-etal-2022-phrase + 10.18653/v1/2022.acl-long.444 Achieving Reliable Human Assessment of Open-Domain Dialogue Systems @@ -6454,6 +6898,7 @@ in the Case of Unambiguous Gender tianboji/dialogue-eval ConvAI2 FED + 10.18653/v1/2022.acl-long.445 Updated Headline Generation: Creating Updated Summaries for Evolving News Stories @@ -6464,6 +6909,7 @@ in the Case of Unambiguous Gender We propose the task of updated headline generation, in which a system generates a headline for an updated article, considering both the previous article and headline. The system must identify the novel information in the article update, and modify the existing headline accordingly. We create data for this task using the NewsEdits corpus by automatically identifying contiguous article versions that are likely to require a substantive headline update. We find that models conditioned on the prior headline and body revisions produce headlines judged by humans to be as factual as gold headlines while making fewer unnecessary edits compared to a standard headline generation model. Our experiments establish benchmarks for this new contextual summarization task. 2022.acl-long.446 panthaplackel-etal-2022-updated + 10.18653/v1/2022.acl-long.446 <fixed-case>S</fixed-case>a<fixed-case>F</fixed-case>e<fixed-case>RD</fixed-case>ialogues: Taking Feedback Gracefully after Conversational Safety Failures @@ -6474,6 +6920,7 @@ in the Case of Unambiguous Gender Current open-domain conversational models can easily be made to talk in inadequate ways. Online learning from conversational feedback given by the conversation partner is a promising avenue for a model to improve and adapt, so as to generate fewer of these safety failures. However, current state-of-the-art models tend to react to feedback with defensive or oblivious responses. This makes for an unpleasant experience and may discourage conversation partners from giving feedback in the future. This work proposes SaFeRDialogues, a task and dataset of graceful responses to conversational feedback about safety failures.We collect a dataset of 8k dialogues demonstrating safety failures, feedback signaling them, and a response acknowledging the feedback. We show how fine-tuning on this dataset results in conversations that human raters deem considerably more likely to lead to a civil conversation, without sacrificing engagingness or general conversational ability. 2022.acl-long.447 ung-etal-2022-saferdialogues + 10.18653/v1/2022.acl-long.447 Compositional Generalization in Dependency Parsing @@ -6485,6 +6932,7 @@ in the Case of Unambiguous Gender Compositionality— the ability to combine familiar units like words into novel phrases and sentences— has been the focus of intense interest in artificial intelligence in recent years. To test compositional generalization in semantic parsing, Keysers et al. (2020) introduced Compositional Freebase Queries (CFQ). This dataset maximizes the similarity between the test and train distributions over primitive units, like words, while maximizing the compound divergence: the dissimilarity between test and train distributions over larger structures, like phrases. Dependency parsing, however, lacks a compositional generalization benchmark. In this work, we introduce a gold-standard set of dependency parses for CFQ, and use this to analyze the behaviour of a state-of-the art dependency parser (Qi et al., 2020) on the CFQ dataset. We find that increasing compound divergence degrades dependency parsing performance, although not as dramatically as semantic parsing performance. Additionally, we find the performance of the dependency parser does not uniformly degrade relative to compound divergence, and the parser performs differently on different splits with the same compound divergence. We explore a number of hypotheses for what causes the non-uniform degradation in dependency parsing performance, and identify a number of syntactic structures that drive the dependency parser’s lower performance on the most challenging splits. 2022.acl-long.448 goodwin-etal-2022-compositional + 10.18653/v1/2022.acl-long.448 <fixed-case>ASPECTNEWS</fixed-case>: Aspect-Oriented Summarization of News Documents @@ -6499,6 +6947,7 @@ in the Case of Unambiguous Gender 2022.acl-long.449.software.zip ahuja-etal-2022-aspectnews oja/aosumm + 10.18653/v1/2022.acl-long.449 <fixed-case>M</fixed-case>em<fixed-case>S</fixed-case>um: Extractive Summarization of Long Documents Using Multi-Step Episodic <fixed-case>M</fixed-case>arkov Decision Processes @@ -6512,6 +6961,7 @@ in the Case of Unambiguous Gender gu-etal-2022-memsum nianlonggu/memsum GovReport + 10.18653/v1/2022.acl-long.450 <fixed-case>CLUES</fixed-case>: A Benchmark for Learning Classifiers using Natural Language Explanations @@ -6524,6 +6974,7 @@ in the Case of Unambiguous Gender 2022.acl-long.451.software.zip menon-etal-2022-clues CLUES (Classifier Learning Using natural language ExplanationS) + 10.18653/v1/2022.acl-long.451 Substructure Distribution Projection for Zero-Shot Cross-Lingual Dependency Parsing @@ -6537,6 +6988,7 @@ in the Case of Unambiguous Gender shi-etal-2022-substructure Universal Dependencies WikiMatrix + 10.18653/v1/2022.acl-long.452 Multilingual Detection of Personal Employment Status on <fixed-case>T</fixed-case>witter @@ -6550,6 +7002,7 @@ in the Case of Unambiguous Gender 2022.acl-long.453 tonneau-etal-2022-multilingual manueltonneau/twitter-unemployment + 10.18653/v1/2022.acl-long.453 <fixed-case>M</fixed-case>ulti<fixed-case>H</fixed-case>iertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data @@ -6567,6 +7020,7 @@ in the Case of Unambiguous Gender HybridQA MATH MathQA + 10.18653/v1/2022.acl-long.454 Transformers in the loop: Polarity in neural models of language @@ -6581,6 +7035,7 @@ in the Case of Unambiguous Gender altsoph/transformers-in-the-loop Natural sentences that contain *any* Synthetic parallel sentences that contain *any* + 10.18653/v1/2022.acl-long.455 Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation @@ -6595,6 +7050,7 @@ in the Case of Unambiguous Gender 2022.acl-long.456.software.zip he-etal-2022-bridging zwhe99/selftraining4unmt + 10.18653/v1/2022.acl-long.456 <fixed-case>SDR</fixed-case>: Efficient Neural Re-ranking using Succinct Document Representation @@ -6607,6 +7063,7 @@ in the Case of Unambiguous Gender 2022.acl-long.457 cohen-etal-2022-sdr MS MARCO + 10.18653/v1/2022.acl-long.457 The <fixed-case>AI</fixed-case> Doctor Is In: A Survey of Task-Oriented Dialogue Systems for Healthcare Applications @@ -6616,6 +7073,7 @@ in the Case of Unambiguous Gender Task-oriented dialogue systems are increasingly prevalent in healthcare settings, and have been characterized by a diverse range of architectures and objectives. Although these systems have been surveyed in the medical community from a non-technical perspective, a systematic review from a rigorous computational perspective has to date remained noticeably absent. As a result, many important implementation details of healthcare-oriented dialogue systems remain limited or underspecified, slowing the pace of innovation in this area. To fill this gap, we investigated an initial pool of 4070 papers from well-known computer science, natural language processing, and artificial intelligence venues, identifying 70 papers discussing the system-level implementation of task-oriented dialogue systems for healthcare applications. We conducted a comprehensive technical review of these papers, and present our key findings including identified gaps and corresponding recommendations. 2022.acl-long.458 valizadeh-parde-2022-ai + 10.18653/v1/2022.acl-long.458 <fixed-case>SHIELD</fixed-case>: Defending Textual Neural Networks against Multiple Black-Box Adversarial Attacks with Stochastic Multi-Expert Patcher @@ -6627,6 +7085,7 @@ in the Case of Unambiguous Gender 2022.acl-long.459 le-etal-2022-shield lethaiq/shield-defend-adversarial-texts + 10.18653/v1/2022.acl-long.459 Accurate Online Posterior Alignments for Principled Lexically-Constrained Decoding @@ -6637,6 +7096,7 @@ in the Case of Unambiguous Gender Online alignment in machine translation refers to the task of aligning a target word to a source word when the target sequence has only been partially decoded. Good online alignments facilitate important applications such as lexically constrained translation where user-defined dictionaries are used to inject lexical constraints into the translation model. We propose a novel posterior alignment technique that is truly online in its execution and superior in terms of alignment error rates compared to existing methods. Our proposed inference technique jointly considers alignment and token probabilities in a principled manner and can be seamlessly integrated within existing constrained beam-search decoding algorithms. On five language pairs, including two distant language pairs, we achieve consistent drop in alignment error rates. When deployed on seven lexically constrained translation tasks, we achieve significant improvements in BLEU specifically around the constrained positions. 2022.acl-long.460 chatterjee-etal-2022-accurate + 10.18653/v1/2022.acl-long.460 Leveraging Task Transferability to Meta-learning for Clinical Section Classification with Limited Data @@ -6648,6 +7108,7 @@ in the Case of Unambiguous Gender Identifying sections is one of the critical components of understanding medical information from unstructured clinical notes and developing assistive technologies for clinical note-writing tasks. Most state-of-the-art text classification systems require thousands of in-domain text data to achieve high performance. However, collecting in-domain and recent clinical note data with section labels is challenging given the high level of privacy and sensitivity. The present paper proposes an algorithmic way to improve the task transferability of meta-learning-based text classification in order to address the issue of low-resource target data. Specifically, we explore how to make the best use of the source dataset and propose a unique task transferability measure named Normalized Negative Conditional Entropy (NNCE). Leveraging the NNCE, we develop strategies for selecting clinical categories and sections from source task data to boost cross-domain meta-learning accuracy. Experimental results show that our task selection strategies improve section classification accuracy significantly compared to meta-learning algorithms. 2022.acl-long.461 chen-etal-2022-leveraging + 10.18653/v1/2022.acl-long.461 Reinforcement Guided Multi-Task Learning Framework for Low-Resource Stereotype Detection @@ -6664,6 +7125,7 @@ in the Case of Unambiguous Gender Hate Speech Hate Speech and Offensive Language StereoSet + 10.18653/v1/2022.acl-long.462 Letters From the Past: Modeling Historical Sound Change Through Diachronic Character Embeddings @@ -6674,6 +7136,7 @@ in the Case of Unambiguous Gender 2022.acl-long.463 boldsen-paggio-2022-letters syssel/letters-from-the-past + 10.18653/v1/2022.acl-long.463 A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation @@ -6690,6 +7153,7 @@ in the Case of Unambiguous Gender 2022.acl-long.464.software.zip liu-etal-2022-token microsoft/HaDes + 10.18653/v1/2022.acl-long.464 Low-Rank Softmax Can Have Unargmaxable Classes in Theory but Rarely in Practice @@ -6702,6 +7166,7 @@ in the Case of Unambiguous Gender 2022.acl-long.465.software.zip grivas-etal-2022-low andreasgrv/unargmaxable + 10.18653/v1/2022.acl-long.465 <fixed-case>P</fixed-case>rompt for Extraction? <fixed-case>PAIE</fixed-case>: <fixed-case>P</fixed-case>rompting Argument Interaction for Event Argument Extraction @@ -6718,6 +7183,7 @@ in the Case of Unambiguous Gender 2022.acl-long.466.software.zip ma-etal-2022-prompt mayubo2333/paie + 10.18653/v1/2022.acl-long.466 Reducing Position Bias in Simultaneous Machine Translation with Length-Aware Framework @@ -6727,6 +7193,7 @@ in the Case of Unambiguous Gender Simultaneous machine translation (SiMT) starts translating while receiving the streaming source inputs, and hence the source sentence is always incomplete during translating. Different from the full-sentence MT using the conventional seq-to-seq architecture, SiMT often applies prefix-to-prefix architecture, which forces each target word to only align with a partial source prefix to adapt to the incomplete source in streaming inputs. However, the source words in the front positions are always illusoryly considered more important since they appear in more prefixes, resulting in position bias, which makes the model pay more attention on the front source positions in testing. In this paper, we first analyze the phenomenon of position bias in SiMT, and develop a Length-Aware Framework to reduce the position bias by bridging the structural gap between SiMT and full-sentence MT. Specifically, given the streaming inputs, we first predict the full-sentence length and then fill the future source position with positional encoding, thereby turning the streaming inputs into a pseudo full-sentence. The proposed framework can be integrated into most existing SiMT methods to further improve performance. Experiments on two representative SiMT methods, including the state-of-the-art adaptive policy, show that our method successfully reduces the position bias and thereby achieves better SiMT performance. 2022.acl-long.467 zhang-feng-2022-reducing + 10.18653/v1/2022.acl-long.467 A Statutory Article Retrieval Dataset in <fixed-case>F</fixed-case>rench @@ -6739,6 +7206,7 @@ in the Case of Unambiguous Gender louis-spanakis-2022-statutory maastrichtlawtech/bsard BSARD + 10.18653/v1/2022.acl-long.468 <fixed-case>P</fixed-case>ara<fixed-case>D</fixed-case>etox: Detoxification with Parallel Data @@ -6755,6 +7223,7 @@ in the Case of Unambiguous Gender 2022.acl-long.469 logacheva-etal-2022-paradetox skoltech-nlp/paradetox + 10.18653/v1/2022.acl-long.469 Interpreting Character Embeddings With Perceptual Representations: The Case of Shape, Sound, and Color @@ -6767,6 +7236,7 @@ in the Case of Unambiguous Gender 2022.acl-long.470.software.zip boldsen-etal-2022-interpreting syssel/interpreting-character-embeddings + 10.18653/v1/2022.acl-long.470 Fine-Grained Controllable Text Generation Using Non-Residual Prompting @@ -6783,6 +7253,7 @@ in the Case of Unambiguous Gender freddefrallan/non-residual-prompting C4 CommonGen + 10.18653/v1/2022.acl-long.471 Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features @@ -6794,6 +7265,7 @@ in the Case of Unambiguous Gender lux-vu-2022-language digitalphonetics/ims-toucan CSS10 + 10.18653/v1/2022.acl-long.472 <fixed-case>T</fixed-case>witt<fixed-case>I</fixed-case>rish: A <fixed-case>U</fixed-case>niversal <fixed-case>D</fixed-case>ependencies Treebank of Tweets in <fixed-case>M</fixed-case>odern <fixed-case>I</fixed-case>rish @@ -6805,6 +7277,7 @@ in the Case of Unambiguous Gender Modern Irish is a minority language lacking sufficient computational resources for the task of accurate automatic syntactic parsing of user-generated content such as tweets. Although language technology for the Irish language has been developing in recent years, these tools tend to perform poorly on user-generated content. As with other languages, the linguistic style observed in Irish tweets differs, in terms of orthography, lexicon, and syntax, from that of standard texts more commonly used for the development of language models and parsers. We release the first Universal Dependencies treebank of Irish tweets, facilitating natural language processing of user-generated content in Irish. In this paper, we explore the differences between Irish tweets and standard Irish text, and the challenges associated with dependency parsing of Irish tweets. We describe our bootstrapping method of treebank development and report on preliminary parsing experiments. 2022.acl-long.473 cassidy-etal-2022-twittirish + 10.18653/v1/2022.acl-long.473 Length Control in Abstractive Summarization by Pretraining Information Selection @@ -6817,6 +7290,7 @@ in the Case of Unambiguous Gender 2022.acl-long.474.software.zip liu-etal-2022-length yizhuliu/lengthcontrol + 10.18653/v1/2022.acl-long.474 <fixed-case>CQG</fixed-case>: A Simple and Effective Controlled Generation Framework for Multi-hop Question Generation @@ -6833,6 +7307,7 @@ in the Case of Unambiguous Gender fei-etal-2022-cqg sion-zcfei/cqg HotpotQA + 10.18653/v1/2022.acl-long.475 Word Order Does Matter and Shuffled Language Models Know It @@ -6852,6 +7327,7 @@ in the Case of Unambiguous Gender ReCoRD SuperGLUE WinoGrande + 10.18653/v1/2022.acl-long.476 An Empirical Study on Explanations in Out-of-Domain Settings @@ -6865,6 +7341,7 @@ in the Case of Unambiguous Gender gchrysostomou/ood_faith IMDb Movie Reviews SST + 10.18653/v1/2022.acl-long.477 <fixed-case>MILIE</fixed-case>: Modular & Iterative Multilingual Open Information Extraction @@ -6881,6 +7358,7 @@ in the Case of Unambiguous Gender 2022.acl-long.478 2022.acl-long.478.software.zip kotnis-etal-2022-milie + 10.18653/v1/2022.acl-long.478 What Makes Reading Comprehension Questions Difficult? @@ -6896,6 +7374,7 @@ in the Case of Unambiguous Gender MCTest RACE ReClor + 10.18653/v1/2022.acl-long.479 From Simultaneous to Streaming Machine Translation by Leveraging Streaming History @@ -6908,6 +7387,7 @@ in the Case of Unambiguous Gender 2022.acl-long.480.software.zip iranzo-sanchez-etal-2022-simultaneous MuST-C + 10.18653/v1/2022.acl-long.480 A Rationale-Centric Framework for Human-in-the-loop Machine Learning @@ -6922,6 +7402,7 @@ in the Case of Unambiguous Gender GeorgeLuImmortal/RDL-Rationales-centric-Double-robustness-Learning IMDb Movie Reviews SST + 10.18653/v1/2022.acl-long.481 Challenges and Strategies in Cross-Cultural <fixed-case>NLP</fixed-case> @@ -6944,6 +7425,7 @@ in the Case of Unambiguous Gender 2022.acl-long.482 hershcovich-etal-2022-challenges MaRVL + 10.18653/v1/2022.acl-long.482 Prototypical Verbalizer for Prompt-based Few-shot Tuning @@ -6958,6 +7440,7 @@ in the Case of Unambiguous Gender cui-etal-2022-prototypical thunlp/OpenPrompt Few-NERD + 10.18653/v1/2022.acl-long.483 Clickbait Spoiling via Question Answering and Passage Retrieval @@ -6974,6 +7457,7 @@ in the Case of Unambiguous Gender MS MARCO SQuAD TriviaQA + 10.18653/v1/2022.acl-long.484 <fixed-case>BERT</fixed-case> Learns to Teach: Knowledge Distillation with Meta Learning @@ -6990,6 +7474,7 @@ in the Case of Unambiguous Gender MRPC QNLI SST + 10.18653/v1/2022.acl-long.485 <fixed-case>STEMM</fixed-case>: Self-learning with Speech-text Manifold Mixup for Speech Translation @@ -7004,6 +7489,7 @@ in the Case of Unambiguous Gender fang-etal-2022-stemm ictnlp/stemm MuST-C + 10.18653/v1/2022.acl-long.486 Integrating Vectorized Lexical Constraints for Neural Machine Translation @@ -7015,6 +7501,7 @@ in the Case of Unambiguous Gender 2022.acl-long.487 wang-etal-2022-integrating shuo-git/vecconstnmt + 10.18653/v1/2022.acl-long.487 <fixed-case>MPII</fixed-case>: Multi-Level Mutual Promotion for Inference and Interpretation @@ -7030,6 +7517,7 @@ in the Case of Unambiguous Gender MultiNLI SNLI e-SNLI + 10.18653/v1/2022.acl-long.488 <fixed-case>S</fixed-case>table<fixed-case>M</fixed-case>o<fixed-case>E</fixed-case>: Stable Routing Strategy for Mixture of Experts @@ -7047,6 +7535,7 @@ in the Case of Unambiguous Gender dai-etal-2022-stablemoe hunter-ddm/stablemoe CC100 + 10.18653/v1/2022.acl-long.489 Boundary Smoothing for Named Entity Recognition @@ -7061,6 +7550,7 @@ in the Case of Unambiguous Gender CoNLL++ Resume NER Weibo NER + 10.18653/v1/2022.acl-long.490 Incorporating Hierarchy into Text Encoder: a Contrastive Learning Approach for Hierarchical Text Classification @@ -7076,6 +7566,7 @@ in the Case of Unambiguous Gender wzh9969/contrastive-htc RCV1 WOS + 10.18653/v1/2022.acl-long.491 Signal in Noise: Exploring Meaning Encoded in Random Character Sequences with Character-Aware Language Models @@ -7091,6 +7582,7 @@ in the Case of Unambiguous Gender 2022.acl-long.492.software.zip chu-etal-2022-signal comp-syn/garble + 10.18653/v1/2022.acl-long.492 Hyperlink-induced Pre-training for Passage Retrieval in Open-domain Question Answering @@ -7117,6 +7609,7 @@ in the Case of Unambiguous Gender MS MARCO Natural Questions TriviaQA + 10.18653/v1/2022.acl-long.493 <fixed-case>A</fixed-case>da<fixed-case>L</fixed-case>o<fixed-case>GN</fixed-case>: Adaptive Logic Graph Network for Reasoning-Based Machine Reading Comprehension @@ -7133,6 +7626,7 @@ in the Case of Unambiguous Gender nju-websoft/adalogn LogiQA ReClor + 10.18653/v1/2022.acl-long.494 <fixed-case>CAMERO</fixed-case>: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing @@ -7151,6 +7645,7 @@ in the Case of Unambiguous Gender MRPC QNLI SST + 10.18653/v1/2022.acl-long.495 Interpretability for Language Learners Using Example-Based Grammatical Error Correction @@ -7166,6 +7661,7 @@ in the Case of Unambiguous Gender kanekomasahiro/eb-gec FCE JFLEG + 10.18653/v1/2022.acl-long.496 Rethinking Negative Sampling for Handling Missing Entity Annotations @@ -7176,6 +7672,7 @@ in the Case of Unambiguous Gender Negative sampling is highly effective in handling missing annotations for named entity recognition (NER). One of our contributions is an analysis on how it makes sense through introducing two insightful concepts: missampling and uncertainty. Empirical studies show low missampling rate and high uncertainty are both essential for achieving promising performances with negative sampling. Based on the sparsity of named entities, we also theoretically derive a lower bound for the probability of zero missampling rate, which is only relevant to sentence length. The other contribution is an adaptive and weighted sampling distribution that further improves negative sampling via our former analysis. Experiments on synthetic datasets and well-annotated datasets (e.g., CoNLL-2003) show that our proposed approach benefits negative sampling in terms of F1 score and loss convergence. Besides, models with improved negative sampling have achieved new state-of-the-art results on real-world datasets (e.g., EC). 2022.acl-long.497 li-etal-2022-rethinking + 10.18653/v1/2022.acl-long.497 Distantly Supervised Named Entity Recognition via Confidence-Based Multi-Class Positive and Unlabeled Learning @@ -7187,6 +7684,7 @@ in the Case of Unambiguous Gender 2022.acl-long.498 2022.acl-long.498.software.zip zhou-etal-2022-distantly + 10.18653/v1/2022.acl-long.498 <fixed-case>U</fixed-case>ni<fixed-case>X</fixed-case>coder: Unified Cross-Modal Pre-training for Code Representation @@ -7204,6 +7702,7 @@ in the Case of Unambiguous Gender CoSQA CodeSearchNet CodeXGLUE + 10.18653/v1/2022.acl-long.499 One Country, 700+ Languages: <fixed-case>NLP</fixed-case> Challenges for Underrepresented Languages and Dialects in <fixed-case>I</fixed-case>ndonesia @@ -7223,6 +7722,7 @@ in the Case of Unambiguous Gender NLP research is impeded by a lack of resources and awareness of the challenges presented by underrepresented languages and dialects. Focusing on the languages spoken in Indonesia, the second most linguistically diverse and the fourth most populous nation of the world, we provide an overview of the current state of NLP research for Indonesia’s 700+ languages. We highlight challenges in Indonesian NLP and how these affect the performance of current NLP systems. Finally, we provide general recommendations to help develop NLP technology not only for languages of Indonesia but also other underrepresented languages. 2022.acl-long.500 aji-etal-2022-one + 10.18653/v1/2022.acl-long.500 Is <fixed-case>GPT</fixed-case>-3 Text Indistinguishable from Human Text? Scarecrow: A Framework for Scrutinizing Machine Text @@ -7237,6 +7737,7 @@ in the Case of Unambiguous Gender 2022.acl-long.501.software.zip dou-etal-2022-gpt WebText + 10.18653/v1/2022.acl-long.501 Transkimmer: Transformer Learns to Layer-wise Skim @@ -7252,6 +7753,7 @@ in the Case of Unambiguous Gender GLUE IMDb Movie Reviews QNLI + 10.18653/v1/2022.acl-long.502 <fixed-case>S</fixed-case>kip<fixed-case>BERT</fixed-case>: Efficient Inference with Shallow Layer Skipping @@ -7269,6 +7771,7 @@ in the Case of Unambiguous Gender MRPC SQuAD SST + 10.18653/v1/2022.acl-long.503 Pretraining with Artificial Language: Studying Transferable Knowledge in Language Models @@ -7279,6 +7782,7 @@ in the Case of Unambiguous Gender 2022.acl-long.504 ri-tsuruoka-2022-pretraining Penn Treebank + 10.18653/v1/2022.acl-long.504 m<fixed-case>LUKE</fixed-case>: <fixed-case>T</fixed-case>he Power of Entity Representations in Multilingual Pretrained Language Models @@ -7298,6 +7802,7 @@ in the Case of Unambiguous Gender RELX SQuAD XQuAD + 10.18653/v1/2022.acl-long.505 Evaluating Factuality in Text Simplification @@ -7313,6 +7818,7 @@ in the Case of Unambiguous Gender ashologn/evaluating-factuality-in-text-simplification Newsela WikiLarge + 10.18653/v1/2022.acl-long.506 Requirements and Motivations of Low-Resource Speech Synthesis for Language Revitalization @@ -7326,6 +7832,7 @@ in the Case of Unambiguous Gender This paper describes the motivation and development of speech synthesis systems for the purposes of language revitalization. By building speech synthesis systems for three Indigenous languages spoken in Canada, Kanien’kéha, Gitksan & SENĆOŦEN, we re-evaluate the question of how much data is required to build low-resource speech synthesis systems featuring state-of-the-art neural models. For example, preliminary results with English data show that a FastSpeech2 model trained with 1 hour of training data can produce speech with comparable naturalness to a Tacotron2 model trained with 10 hours of data. Finally, we motivate future research in evaluation and classroom integration in the field of speech synthesis for language revitalization. 2022.acl-long.507 pine-etal-2022-requirements + 10.18653/v1/2022.acl-long.507 Sharpness-Aware Minimization Improves Language Model Generalization @@ -7343,6 +7850,7 @@ in the Case of Unambiguous Gender TyDi QA TyDiQA-GoldP WebQuestions + 10.18653/v1/2022.acl-long.508 Adversarial Authorship Attribution for Deobfuscation @@ -7354,6 +7862,7 @@ in the Case of Unambiguous Gender Recent advances in natural language processing have enabled powerful privacy-invasive authorship attribution. To counter authorship attribution, researchers have proposed a variety of rule-based and learning-based text obfuscation approaches. However, existing authorship obfuscation approaches do not consider the adversarial threat model. Specifically, they are not evaluated against adversarially trained authorship attributors that are aware of potential obfuscation. To fill this gap, we investigate the problem of adversarial authorship attribution for deobfuscation. We show that adversarially trained authorship attributors are able to degrade the effectiveness of existing obfuscators from 20-30% to 5-10%. We also evaluate the effectiveness of adversarial training when the attributor makes incorrect assumptions about whether and which obfuscator was used. While there is a a clear degradation in attribution accuracy, it is noteworthy that this degradation is still at or above the attribution accuracy of the attributor that is not adversarially trained at all. Our results motivate the need to develop authorship obfuscation approaches that are resistant to deobfuscation. 2022.acl-long.509 zhai-etal-2022-adversarial + 10.18653/v1/2022.acl-long.509 Weakly Supervised Word Segmentation for Computational Language Documentation @@ -7365,6 +7874,7 @@ in the Case of Unambiguous Gender 2022.acl-long.510 okabe-etal-2022-weakly shuokabe/pyseg + 10.18653/v1/2022.acl-long.510 <fixed-case>S</fixed-case>ci<fixed-case>NLI</fixed-case>: A Corpus for Natural Language Inference on Scientific Text @@ -7381,6 +7891,7 @@ in the Case of Unambiguous Gender SNLI SWAG SuperGLUE + 10.18653/v1/2022.acl-long.511 Neural reality of argument structure constructions @@ -7395,6 +7906,7 @@ in the Case of Unambiguous Gender 2022.acl-long.512.software.zip li-etal-2022-neural spoclab-ca/neural-reality-constructions + 10.18653/v1/2022.acl-long.512 On the Robustness of Offensive Language Classifiers @@ -7408,6 +7920,7 @@ in the Case of Unambiguous Gender rusert-etal-2022-robustness jonrusert/robustnessofoffensiveclassifiers OLID + 10.18653/v1/2022.acl-long.513 Few-shot Controllable Style Transfer for Low-Resource Multilingual Settings @@ -7423,6 +7936,7 @@ in the Case of Unambiguous Gender Samanantar XFORMAL mC4 + 10.18653/v1/2022.acl-long.514 <fixed-case>ABC</fixed-case>: Attention with Bounded-memory Control @@ -7443,6 +7957,7 @@ in the Case of Unambiguous Gender WMT 2014 WikiText-103 WikiText-2 + 10.18653/v1/2022.acl-long.515 The Dangers of Underclaiming: Reasons for Caution When Reporting How <fixed-case>NLP</fixed-case> Systems Fail @@ -7453,6 +7968,7 @@ in the Case of Unambiguous Gender bowman-2022-dangers SQuAD SuperGLUE + 10.18653/v1/2022.acl-long.516 <fixed-case>REL</fixed-case>i<fixed-case>C</fixed-case>: Retrieving Evidence for Literary Claims @@ -7467,6 +7983,7 @@ in the Case of Unambiguous Gender martiansideofthemoon/relic-retrieval RELiC BEIR + 10.18653/v1/2022.acl-long.517 Analyzing Generalization of Vision and Language Navigation to Unseen Outdoor Areas @@ -7480,6 +7997,7 @@ in the Case of Unambiguous Gender raphael-sch/map2seq_vln Touchdown Dataset map2seq + 10.18653/v1/2022.acl-long.518 Adapting Coreference Resolution Models through Active Learning @@ -7493,6 +8011,7 @@ in the Case of Unambiguous Gender 2022.acl-long.519 yuan-etal-2022-adapting forest-snow/incremental-coref + 10.18653/v1/2022.acl-long.519 An Imitation Learning Curriculum for Text Editing with Non-Autoregressive Models @@ -7503,6 +8022,7 @@ in the Case of Unambiguous Gender 2022.acl-long.520 agrawal-carpuat-2022-imitation Newsela + 10.18653/v1/2022.acl-long.520 Memorisation versus Generalisation in Pre-trained Language Models @@ -7517,6 +8037,7 @@ in the Case of Unambiguous Gender CoNLL++ CoNLL-2003 WNUT 2017 + 10.18653/v1/2022.acl-long.521 <fixed-case>C</fixed-case>hat<fixed-case>M</fixed-case>atch: Evaluating Chatbots by Autonomous Chat Tournaments @@ -7530,6 +8051,7 @@ in the Case of Unambiguous Gender 2022.acl-long.522.software.zip yang-etal-2022-chatmatch ruolanyang/chatmatch + 10.18653/v1/2022.acl-long.522 Do self-supervised speech models develop human-like perception biases? @@ -7541,6 +8063,7 @@ in the Case of Unambiguous Gender millet-dunbar-2022-self AudioSet LibriSpeech + 10.18653/v1/2022.acl-long.523 Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions @@ -7559,6 +8082,7 @@ in the Case of Unambiguous Gender RxR StreetLearn Talk the Walk + 10.18653/v1/2022.acl-long.524 Learning to Generate Programs for Table Fact Verification via Structure-Aware Semantic Parsing @@ -7570,6 +8094,7 @@ in the Case of Unambiguous Gender ou-liu-2022-learning ousuixin/sasp TabFact + 10.18653/v1/2022.acl-long.525 Cluster & Tune: <fixed-case>B</fixed-case>oost Cold Start Performance in Text Classification @@ -7585,6 +8110,7 @@ in the Case of Unambiguous Gender 2022.acl-long.526 shnarch-etal-2022-cluster ibm/intermediate-training-using-clustering + 10.18653/v1/2022.acl-long.526 Overcoming a Theoretical Limitation of Self-Attention @@ -7595,6 +8121,7 @@ in the Case of Unambiguous Gender 2022.acl-long.527 chiang-cholak-2022-overcoming ndnlp/parity + 10.18653/v1/2022.acl-long.527 Prediction Difference Regularization against Perturbation for Neural Machine Translation @@ -7606,6 +8133,7 @@ in the Case of Unambiguous Gender Regularization methods applying input perturbation have drawn considerable attention and have been frequently explored for NMT tasks in recent years. Despite their simplicity and effectiveness, we argue that these methods are limited by the under-fitting of training data. In this paper, we utilize prediction difference for ground-truth tokens to analyze the fitting of token-level samples and find that under-fitting is almost as common as over-fitting. We introduce prediction difference regularization (PD-R), a simple and effective method that can reduce over-fitting and under-fitting at the same time. For all token-level samples, PD-R minimizes the prediction difference between the original pass and the input-perturbed pass, making the model less sensitive to small input changes, thus more robust to both perturbations and under-fitted training data. Experiments on three widely used WMT translation tasks show that our approach can significantly improve over existing perturbation regularization methods. On WMT16 En-De task, our model achieves 1.80 SacreBLEU improvement over vanilla transformer. 2022.acl-long.528 guo-etal-2022-prediction + 10.18653/v1/2022.acl-long.528 Make the Best of Cross-lingual Transfer: Evidence from <fixed-case>POS</fixed-case> Tagging with over 100 Languages @@ -7617,6 +8145,7 @@ in the Case of Unambiguous Gender 2022.acl-long.529 de-vries-etal-2022-make wietsedv/xpos + 10.18653/v1/2022.acl-long.529 Should a Chatbot be Sarcastic? Understanding User Preferences Towards Sarcasm Generation @@ -7627,6 +8156,7 @@ in the Case of Unambiguous Gender Previous sarcasm generation research has focused on how to generate text that people perceive as sarcastic to create more human-like interactions. In this paper, we argue that we should first turn our attention to the question of when sarcasm should be generated, finding that humans consider sarcastic responses inappropriate to many input utterances. Next, we use a theory-driven framework for generating sarcastic responses, which allows us to control the linguistic devices included during generation. For each device, we investigate how much humans associate it with sarcasm, finding that pragmatic insincerity and emotional markers are devices crucial for making sarcasm recognisable. 2022.acl-long.530 oprea-etal-2022-chatbot + 10.18653/v1/2022.acl-long.530 How Do <fixed-case>S</fixed-case>eq2<fixed-case>S</fixed-case>eq Models Perform on End-to-End Data-to-Text Generation? @@ -7639,6 +8169,7 @@ in the Case of Unambiguous Gender xunjianyin/seq2seqondata2text ToTTo WikiBio + 10.18653/v1/2022.acl-long.531 Probing for Labeled Dependency Trees @@ -7652,6 +8183,7 @@ in the Case of Unambiguous Gender muller-eberstein-etal-2022-probing personads/depprobe Universal Dependencies + 10.18653/v1/2022.acl-long.532 <fixed-case>D</fixed-case>o<fixed-case>C</fixed-case>o<fixed-case>G</fixed-case>en: <fixed-case>D</fixed-case>omain Counterfactual Generation for Low Resource Domain Adaptation @@ -7664,6 +8196,7 @@ in the Case of Unambiguous Gender 2022.acl-long.533 calderon-etal-2022-docogen nitaytech/docogen + 10.18653/v1/2022.acl-long.533 <fixed-case>L</fixed-case>i<fixed-case>LT</fixed-case>: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding @@ -7680,6 +8213,7 @@ in the Case of Unambiguous Gender FUNSD RVL-CDIP XFUND + 10.18653/v1/2022.acl-long.534 Dependency-based Mixture Language Models @@ -7693,6 +8227,7 @@ in the Case of Unambiguous Gender fadedcosine/dependency-guided-neural-text-generation Penn Treebank ROCStories + 10.18653/v1/2022.acl-long.535 Can Unsupervised Knowledge Transfer from Social Discussions Help Argument Mining? @@ -7706,6 +8241,7 @@ in the Case of Unambiguous Gender 2022.acl-long.536.software.zip dutta-etal-2022-unsupervised jeevesh8/arg_mining + 10.18653/v1/2022.acl-long.536 Entity-based Neural Local Coherence Modeling @@ -7717,6 +8253,7 @@ in the Case of Unambiguous Gender jeon-strube-2022-entity sdeva14/acl22-entity-neural-local-cohe GCDC + 10.18653/v1/2022.acl-long.537 “That Is a Suspicious Reaction!”: Interpreting Logits Variation to Detect <fixed-case>NLP</fixed-case> Adversarial Attacks @@ -7731,6 +8268,7 @@ in the Case of Unambiguous Gender mosca-etal-2022-suspicious AG News IMDb Movie Reviews + 10.18653/v1/2022.acl-long.538 Local Languages, Third Spaces, and other High-Resource Scenarios @@ -7739,6 +8277,7 @@ in the Case of Unambiguous Gender How can language technology address the diverse situations of the world’s languages? In one view, languages exist on a resource continuum and the challenge is to scale existing solutions, bringing under-resourced languages into the high-resource world. In another view, presented here, the world’s language ecology includes standardised languages, local languages, and contact languages. These are often subsumed under the label of “under-resourced languages” even though they have distinct functions and prospects. I explore this position and propose some ecologically-aware language technology agendas. 2022.acl-long.539 bird-2022-local + 10.18653/v1/2022.acl-long.539 That Slepen Al the Nyght with Open Ye! Cross-era Sequence Segmentation with Switch-memory @@ -7748,6 +8287,7 @@ in the Case of Unambiguous Gender The evolution of language follows the rule of gradual change. Grammar, vocabulary, and lexical semantic shifts take place over time, resulting in a diachronic linguistic gap. As such, a considerable amount of texts are written in languages of different eras, which creates obstacles for natural language processing tasks, such as word segmentation and machine translation. Although the Chinese language has a long history, previous Chinese natural language processing research has primarily focused on tasks within a specific era. Therefore, we propose a cross-era learning framework for Chinese word segmentation (CWS), CROSSWISE, which uses the Switch-memory (SM) module to incorporate era-specific linguistic knowledge. Experiments on four corpora from different eras show that the performance of each corpus significantly improves. Further analyses also demonstrate that the SM can effectively integrate the knowledge of the eras into the neural network. 2022.acl-long.540 tang-su-2022-slepen + 10.18653/v1/2022.acl-long.540 Fair and Argumentative Language Modeling for Computational Argumentation @@ -7760,6 +8300,7 @@ in the Case of Unambiguous Gender 2022.acl-long.541.software.zip holtermann-etal-2022-fair umanlp/fairargumentativelm + 10.18653/v1/2022.acl-long.541 Learning Adaptive Segmentation Policy for End-to-End Simultaneous Translation @@ -7773,6 +8314,7 @@ in the Case of Unambiguous Gender zhang-etal-2022-learning BSTC MuST-C + 10.18653/v1/2022.acl-long.542 Can Pre-trained Language Models Interpret Similes as Smart as Human? @@ -7786,6 +8328,7 @@ in the Case of Unambiguous Gender 2022.acl-long.543 he-etal-2022-pre abbey4799/plms-interpret-simile + 10.18653/v1/2022.acl-long.543 <fixed-case>CBLUE</fixed-case>: A <fixed-case>C</fixed-case>hinese Biomedical Language Understanding Evaluation Benchmark @@ -7828,6 +8371,7 @@ in the Case of Unambiguous Gender CLUE CMeIE SuperGLUE + 10.18653/v1/2022.acl-long.544 Learning Non-Autoregressive Models from Search for Unsupervised Sentence Summarization @@ -7839,6 +8383,7 @@ in the Case of Unambiguous Gender 2022.acl-long.545 liu-etal-2022-learning manga-uofa/naus + 10.18653/v1/2022.acl-long.545 Learning to Generalize to More: Continuous Semantic Augmentation for Neural Machine Translation @@ -7854,6 +8399,7 @@ in the Case of Unambiguous Gender 2022.acl-long.546 wei-etal-2022-learning pemywei/csanmt + 10.18653/v1/2022.acl-long.546 Lexical Knowledge Internalization for Neural Dialog Generation @@ -7869,6 +8415,7 @@ in the Case of Unambiguous Gender lividwo/ki DailyDialog Wizard of Wikipedia + 10.18653/v1/2022.acl-long.547 Modeling Syntactic-Semantic Dependency Correlations in Semantic Role Labeling Using Mixture Models @@ -7881,6 +8428,7 @@ in the Case of Unambiguous Gender 2022.acl-long.548.software.zip chen-etal-2022-modeling christomartin/syn-sem_dependency_correlation_mixture_model + 10.18653/v1/2022.acl-long.548 Learning the Beauty in Songs: Neural Singing Voice Beautifier @@ -7894,6 +8442,7 @@ in the Case of Unambiguous Gender 2022.acl-long.549 liu-etal-2022-learning-beauty moonintheriver/neuralsvb + 10.18653/v1/2022.acl-long.549 A Model-agnostic Data Manipulation Method for Persona-based Dialogue Generation @@ -7909,6 +8458,7 @@ in the Case of Unambiguous Gender cao-etal-2022-model caoyu-noob/d3 PERSONA-CHAT + 10.18653/v1/2022.acl-long.550 <fixed-case>L</fixed-case>ink<fixed-case>BERT</fixed-case>: Pretraining Language Models with Document Links @@ -7943,6 +8493,7 @@ in the Case of Unambiguous Gender SQuAD SearchQA TriviaQA + 10.18653/v1/2022.acl-long.551 Improving Time Sensitivity for Question Answering over Temporal Knowledge Graphs @@ -7955,6 +8506,7 @@ in the Case of Unambiguous Gender 2022.acl-long.552 shang-etal-2022-improving CronQuestions + 10.18653/v1/2022.acl-long.552 Self-supervised Semantic-driven Phoneme Discovery for Zero-resource Speech Recognition @@ -7967,6 +8519,7 @@ in the Case of Unambiguous Gender 2022.acl-long.553 wang-etal-2022-self LibriSpeech + 10.18653/v1/2022.acl-long.553 Softmax Bottleneck Makes Language Models Unable to Represent Multi-mode Word Distributions @@ -7979,6 +8532,7 @@ in the Case of Unambiguous Gender chang-mccallum-2022-softmax ProtoQA WebText + 10.18653/v1/2022.acl-long.554 Ditch the Gold Standard: Re-evaluating Conversational Question Answering @@ -7995,6 +8549,7 @@ in the Case of Unambiguous Gender CANARD CoQA QuAC + 10.18653/v1/2022.acl-long.555 Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity @@ -8011,6 +8566,7 @@ in the Case of Unambiguous Gender AG News MPQA Opinion Corpus SST + 10.18653/v1/2022.acl-long.556 Situated Dialogue Learning through Procedural Environment Generation @@ -8021,6 +8577,7 @@ in the Case of Unambiguous Gender We teach goal-driven agents to interactively act and speak in situated environments by training on generated curriculums. Our agents operate in LIGHT (Urbanek et al. 2019)—a large-scale crowd-sourced fantasy text adventure game wherein an agent perceives and interacts with the world through textual natural language. Goals in this environment take the form of character-based quests, consisting of personas and motivations. We augment LIGHT by learning to procedurally generate additional novel textual worlds and quests to create a curriculum of steadily increasing difficulty for training agents to achieve such goals. In particular, we measure curriculum difficulty in terms of the rarity of the quest in the original training distribution—an easier environment is one that is more likely to have been found in the unaugmented dataset. An ablation study shows that this method of learning from the tail of a distribution results in significantly higher generalization abilities as measured by zero-shot performance on never-before-seen quests. 2022.acl-long.557 ammanabrolu-etal-2022-situated + 10.18653/v1/2022.acl-long.557 <fixed-case>U</fixed-case>ni<fixed-case>TE</fixed-case>: Unified Translation Evaluation @@ -8037,6 +8594,7 @@ in the Case of Unambiguous Gender 2022.acl-long.558.software.zip wan-etal-2022-unite nlp2ct/unite + 10.18653/v1/2022.acl-long.558 Program Transfer for Answering Complex Questions over Knowledge Bases @@ -8056,6 +8614,7 @@ in the Case of Unambiguous Gender thu-keg/programtransfer ComplexWebQuestions WebQuestions + 10.18653/v1/2022.acl-long.559 <fixed-case>EAG</fixed-case>: Extract and Generate Multi-way Aligned Corpus for Complete Multi-lingual Neural Machine Translation @@ -8069,6 +8628,7 @@ in the Case of Unambiguous Gender 2022.acl-long.560.software.zip xu-etal-2022-eag OPUS-100 + 10.18653/v1/2022.acl-long.560 Using Context-to-Vector with Graph Retrofitting to Improve Word Embeddings @@ -8084,6 +8644,7 @@ in the Case of Unambiguous Gender Although contextualized embeddings generated from large-scale pre-trained models perform well in many tasks, traditional static embeddings (e.g., Skip-gram, Word2Vec) still play an important role in low-resource and lightweight settings due to their low computational cost, ease of deployment, and stability. In this paper, we aim to improve word embeddings by 1) incorporating more contextual information from existing pre-trained models into the Skip-gram framework, which we call Context-to-Vec; 2) proposing a post-processing retrofitting method for static embeddings independent of training by employing priori synonym knowledge and weighted vector distribution. Through extrinsic and intrinsic tasks, our methods are well proven to outperform the baselines by a large margin. 2022.acl-long.561 zheng-etal-2022-using + 10.18653/v1/2022.acl-long.561 Multimodal Sarcasm Target Identification in Tweets @@ -8098,6 +8659,7 @@ in the Case of Unambiguous Gender 2022.acl-long.562.software.zip wang-etal-2022-multimodal wjq-learning/msti + 10.18653/v1/2022.acl-long.562 Flexible Generation from Fragmentary Linguistic Input @@ -8109,6 +8671,7 @@ in the Case of Unambiguous Gender qian-levy-2022-flexible pqian11/fragment-completion New York Times Annotated Corpus + 10.18653/v1/2022.acl-long.563 Revisiting Over-Smoothness in Text to Speech @@ -8122,6 +8685,7 @@ in the Case of Unambiguous Gender 2022.acl-long.564 ren-etal-2022-revisiting LJSpeech + 10.18653/v1/2022.acl-long.564 Coherence boosting: When your pretrained language model is not paying enough attention @@ -8144,6 +8708,7 @@ in the Case of Unambiguous Gender PIQA SST WebText + 10.18653/v1/2022.acl-long.565 Uncertainty Estimation of Transformer Predictions for Misclassification Detection @@ -8169,6 +8734,7 @@ in the Case of Unambiguous Gender GLUE MRPC SST + 10.18653/v1/2022.acl-long.566 <fixed-case>VALSE</fixed-case>: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena @@ -8187,6 +8753,7 @@ in the Case of Unambiguous Gender VisDial Visual Question Answering Visual7W + 10.18653/v1/2022.acl-long.567 The Grammar-Learning Trajectories of Neural Language Models @@ -8203,6 +8770,7 @@ in the Case of Unambiguous Gender OpenSubtitles OpenWebText WebText + 10.18653/v1/2022.acl-long.568 Generating Scientific Definitions with Controllable Complexity @@ -8214,6 +8782,7 @@ in the Case of Unambiguous Gender 2022.acl-long.569 august-etal-2022-generating talaugust/definition-complexity + 10.18653/v1/2022.acl-long.569 Label Semantic Aware Pre-training for Few-shot Text Classification @@ -8231,6 +8800,7 @@ in the Case of Unambiguous Gender SGD SNIPS TOPv2 + 10.18653/v1/2022.acl-long.570 <fixed-case>ODE</fixed-case> Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation @@ -8250,6 +8820,7 @@ in the Case of Unambiguous Gender 2022.acl-long.571.software.zip li-etal-2022-ode libeineu/ode-transformer + 10.18653/v1/2022.acl-long.571 A Comparison of Strategies for Source-Free Domain Adaptation @@ -8261,6 +8832,7 @@ in the Case of Unambiguous Gender 2022.acl-long.572 su-etal-2022-comparison xinsu626/sourcefreedomainadaptation + 10.18653/v1/2022.acl-long.572 Ethics Sheets for <fixed-case>AI</fixed-case> Tasks @@ -8269,6 +8841,7 @@ in the Case of Unambiguous Gender Several high-profile events, such as the mass testing of emotion recognition systems on vulnerable sub-populations and using question answering systems to make moral judgments, have highlighted how technology will often lead to more adverse outcomes for those that are already marginalized. At issue here are not just individual systems and datasets, but also the AI tasks themselves. In this position paper, I make a case for thinking about ethical considerations not just at the level of individual models and datasets, but also at the level of AI tasks. I will present a new form of such an effort, Ethics Sheets for AI Tasks, dedicated to fleshing out the assumptions and ethical considerations hidden in how a task is commonly framed and in the choices we make regarding the data, method, and evaluation. I will also present a template for ethics sheets with 50 ethical considerations, using the task of emotion recognition as a running example. Ethics sheets are a mechanism to engage with and document ethical considerations before building datasets and systems. Similar to survey articles, a small number of carefully created ethics sheets can serve numerous researchers and developers. 2022.acl-long.573 mohammad-2022-ethics + 10.18653/v1/2022.acl-long.573 Learning Disentangled Representations of Negation and Uncertainty @@ -8281,6 +8854,7 @@ in the Case of Unambiguous Gender 2022.acl-long.574 vasilakes-etal-2022-learning jvasilakes/disentanglement-vae + 10.18653/v1/2022.acl-long.574 <fixed-case>latent-GLAT</fixed-case>: Glancing at Latent Variables for Parallel Text Generation @@ -8298,6 +8872,7 @@ in the Case of Unambiguous Gender bao-etal-2022-textit baoy-nlp/latent-glat DailyDialog + 10.18653/v1/2022.acl-long.575 <fixed-case>PPT</fixed-case>: Pre-trained Prompt Tuning for Few-shot Learning @@ -8317,6 +8892,7 @@ in the Case of Unambiguous Gender OCNLI SST SuperGLUE + 10.18653/v1/2022.acl-long.576 Deduplicating Training Data Makes Language Models Better @@ -8335,6 +8911,7 @@ in the Case of Unambiguous Gender Billion Word Benchmark RealNews Wiki-40B + 10.18653/v1/2022.acl-long.577 Improving the Generalizability of Depression Detection by Leveraging Clinical Questionnaires @@ -8349,6 +8926,7 @@ in the Case of Unambiguous Gender nguyen-etal-2022-improving thongnt99/acl22-depression-phq9 SMHD + 10.18653/v1/2022.acl-long.578 <fixed-case>I</fixed-case>nternet-Augmented Dialogue Generation @@ -8362,6 +8940,7 @@ in the Case of Unambiguous Gender PERSONA-CHAT Topical-Chat Wizard of Wikipedia + 10.18653/v1/2022.acl-long.579 <fixed-case>SUPERB</fixed-case>-<fixed-case>SG</fixed-case>: Enhanced Speech processing Universal <fixed-case>PER</fixed-case>formance Benchmark for Semantic and Generative Capabilities @@ -8391,6 +8970,7 @@ in the Case of Unambiguous Gender Common Voice DEMAND LibriMix + 10.18653/v1/2022.acl-long.580 Knowledge Neurons in Pretrained Transformers @@ -8406,6 +8986,7 @@ in the Case of Unambiguous Gender 2022.acl-long.581.software.zip dai-etal-2022-knowledge hunter-ddm/knowledge-neurons + 10.18653/v1/2022.acl-long.581 Meta-Learning for Fast Cross-Lingual Adaptation in Dependency Parsing @@ -8422,6 +9003,7 @@ in the Case of Unambiguous Gender 2022.acl-long.582.software.zip langedijk-etal-2022-meta annaproxy/udify-metalearning + 10.18653/v1/2022.acl-long.582 <fixed-case>F</fixed-case>rench <fixed-case>C</fixed-case>row<fixed-case>S</fixed-case>-Pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than <fixed-case>E</fixed-case>nglish @@ -8434,6 +9016,7 @@ in the Case of Unambiguous Gender 2022.acl-long.583 neveol-etal-2022-french CrowS-Pairs + 10.18653/v1/2022.acl-long.583 Few-Shot Learning with <fixed-case>S</fixed-case>iamese Networks and Label Tuning @@ -8451,6 +9034,7 @@ in the Case of Unambiguous Gender IMDb Movie Reviews ISEAR SNLI + 10.18653/v1/2022.acl-long.584 Inferring Rewards from Language in Context @@ -8463,6 +9047,7 @@ in the Case of Unambiguous Gender 2022.acl-long.585 lin-etal-2022-inferring jlin816/rewards-from-language + 10.18653/v1/2022.acl-long.585 Generating Biographies on <fixed-case>W</fixed-case>ikipedia: The Impact of Gender Bias on the Retrieval-Based Generation of Women Biographies @@ -8473,6 +9058,7 @@ in the Case of Unambiguous Gender 2022.acl-long.586 fan-gardent-2022-generating WikiSum + 10.18653/v1/2022.acl-long.586 Your Answer is Incorrect... Would you like to know why? Introducing a Bilingual Short Answer Feedback Dataset @@ -8488,6 +9074,7 @@ in the Case of Unambiguous Gender filighera-etal-2022-answer sebochs/saf SNLI + 10.18653/v1/2022.acl-long.587 Towards Better Characterization of Paraphrases @@ -8502,6 +9089,7 @@ in the Case of Unambiguous Gender GLUE MRPC PAWS + 10.18653/v1/2022.acl-long.588 <fixed-case>S</fixed-case>umm<fixed-case>S</fixed-case>creen: A Dataset for Abstractive Screenplay Summarization @@ -8516,6 +9104,7 @@ in the Case of Unambiguous Gender mingdachen/SummScreen Multi-News TVRecap + 10.18653/v1/2022.acl-long.589 Sparsifying Transformer Models with Trainable Representation Pooling @@ -8530,6 +9119,7 @@ in the Case of Unambiguous Gender applicaai/pyramidions Pubmed arXiv Summarization Dataset + 10.18653/v1/2022.acl-long.590 Uncertainty Determines the Adequacy of the Mode and the Tractability of Decoding in Sequence-to-Sequence Models @@ -8541,6 +9131,7 @@ in the Case of Unambiguous Gender 2022.acl-long.591 stahlberg-etal-2022-uncertainty JFLEG + 10.18653/v1/2022.acl-long.591 <fixed-case>F</fixed-case>lip<fixed-case>DA</fixed-case>: Effective and Robust Data Augmentation for Few-Shot Learning @@ -8562,6 +9153,7 @@ in the Case of Unambiguous Gender SuperGLUE WSC WiC + 10.18653/v1/2022.acl-long.592 Text-Free Prosody-Aware Generative Spoken Language Modeling @@ -8582,6 +9174,7 @@ in the Case of Unambiguous Gender kharitonov-etal-2022-text pytorch/fairseq LibriSpeech + 10.18653/v1/2022.acl-long.593 Lite Unified Modeling for Discriminative Reading Comprehension @@ -8598,6 +9191,7 @@ in the Case of Unambiguous Gender DREAM RACE SQuAD + 10.18653/v1/2022.acl-long.594 Bilingual alignment transfers to multilingual alignment for unsupervised parallel text mining @@ -8608,6 +9202,7 @@ in the Case of Unambiguous Gender 2022.acl-long.595 tien-steinert-threlkeld-2022-bilingual cctien/bimultialign + 10.18653/v1/2022.acl-long.595 End-to-End Modeling via Information Tree for One-Shot Natural Language Spatial Video Grounding @@ -8627,6 +9222,7 @@ in the Case of Unambiguous Gender Natural language spatial video grounding aims to detect the relevant objects in video frames with descriptive sentences as the query. In spite of the great advances, most existing methods rely on dense video frame annotations, which require a tremendous amount of human effort. To achieve effective grounding under a limited annotation budget, we investigate one-shot video grounding and learn to ground natural language in all video frames with solely one frame labeled, in an end-to-end manner. One major challenge of end-to-end one-shot video grounding is the existence of videos frames that are either irrelevant to the language query or the labeled frame. Another challenge relates to the limited supervision, which might result in ineffective representation learning. To address these challenges, we designed an end-to-end model via Information Tree for One-Shot video grounding (IT-OS). Its key module, the information tree, can eliminate the interference of irrelevant frames based on branch search and branch cropping techniques. In addition, several self-supervised tasks are proposed based on the information tree to improve the representation learning under insufficient labeling. Experiments on the benchmark dataset demonstrate the effectiveness of our model. 2022.acl-long.596 li-etal-2022-end + 10.18653/v1/2022.acl-long.596 <fixed-case>RNS</fixed-case>um: A Large-Scale Dataset for Automatic Release Note Generation via Commit Logs Summarization @@ -8639,6 +9235,7 @@ in the Case of Unambiguous Gender A release note is a technical document that describes the latest changes to a software product and is crucial in open source software development. However, it still remains challenging to generate release notes automatically. In this paper, we present a new dataset called RNSum, which contains approximately 82,000 English release notes and the associated commit messages derived from the online repositories in GitHub. Then, we propose classwise extractive-then-abstractive/abstractive summarization approaches to this task, which can employ a modern transformer-based seq2seq network like BART and can be applied to various repositories without specific constraints. The experimental results on the RNSum dataset show that the proposed methods can generate less noisy release notes at higher coverage than the baselines. We also observe that there is a significant gap in the coverage of essential information when compared to human references. Our dataset and the code are publicly available. 2022.acl-long.597 kamezawa-etal-2022-rnsum + 10.18653/v1/2022.acl-long.597 Improving Machine Reading Comprehension with Contextualized Commonsense Knowledge @@ -8654,6 +9251,7 @@ in the Case of Unambiguous Gender C3 ConceptNet DialogRE + 10.18653/v1/2022.acl-long.598 Modeling Persuasive Discourse to Adaptively Support Students’ Argumentative Writing @@ -8665,6 +9263,7 @@ in the Case of Unambiguous Gender 2022.acl-long.599.software.zip wambsganss-niklaus-2022-modeling thiemowa/-argumentative_business_model_pitches + 10.18653/v1/2022.acl-long.599 Active Evaluation: Efficient <fixed-case>NLG</fixed-case> Evaluation with Few Pairwise Comparisons @@ -8680,6 +9279,7 @@ in the Case of Unambiguous Gender ParaBank WMT 2015 WMT 2016 + 10.18653/v1/2022.acl-long.600 The Moral Debater: A Study on the Computational Generation of Morally Framed Arguments @@ -8693,6 +9293,7 @@ in the Case of Unambiguous Gender 2022.acl-long.601.software.zip alshomary-etal-2022-moral webis-de/acl-22 + 10.18653/v1/2022.acl-long.601 Pyramid-<fixed-case>BERT</fixed-case>: Reducing Complexity via Successive Core-set based Token Selection @@ -8708,6 +9309,7 @@ in the Case of Unambiguous Gender GLUE LRA QNLI + 10.18653/v1/2022.acl-long.602 Probing for the Usage of Grammatical Number @@ -8720,6 +9322,7 @@ in the Case of Unambiguous Gender A central quest of probing is to uncover how pre-trained models encode a linguistic property within their representations. An encoding, however, might be spurious—i.e., the model might not rely on it when making predictions. In this paper, we try to find an encoding that the model actually uses, introducing a usage-based probing setup. We first choose a behavioral task which cannot be solved without using the linguistic property. Then, we attempt to remove the property by intervening on the model’s representations. We contend that, if an encoding is used by the model, its removal should harm the performance on the chosen behavioral task. As a case study, we focus on how BERT encodes grammatical number, and on how it uses this encoding to solve the number agreement task. Experimentally, we find that BERT relies on a linear encoding of grammatical number to produce the correct behavioral output. We also find that BERT uses a separate encoding of grammatical number for nouns and verbs. Finally, we identify in which layers information about grammatical number is transferred from a noun to its head verb. 2022.acl-long.603 lasri-etal-2022-probing + 10.18653/v1/2022.acl-long.603 @@ -8755,6 +9358,7 @@ in the Case of Unambiguous Gender QNLI SQuAD SST + 10.18653/v1/2022.acl-short.1 Are Shortest Rationales the Best Explanations for Human Understanding? @@ -8767,6 +9371,7 @@ in the Case of Unambiguous Gender 2022.acl-short.2 shen-etal-2022-shortest huashen218/limitedink + 10.18653/v1/2022.acl-short.2 Analyzing Wrap-Up Effects through an Information-Theoretic Lens @@ -8779,6 +9384,7 @@ in the Case of Unambiguous Gender Numerous analyses of reading time (RT) data have been undertaken in the effort to learn more about the internal processes that occur during reading comprehension. However, data measured on words at the end of a sentence–or even clause–is often omitted due to the confounding factors introduced by so-called “wrap-up effects,” which manifests as a skewed distribution of RTs for these words. Consequently, the understanding of the cognitive processes that might be involved in these effects is limited. In this work, we attempt to learn more about these processes by looking for the existence–or absence–of a link between wrap-up effects and information theoretic quantities, such as word and context information content. We find that the information distribution of prior context is often predictive of sentence- and clause-final RTs (while not of sentence-medial RTs), which lends support to several prior hypotheses about the processes involved in wrap-up effects. 2022.acl-short.3 meister-etal-2022-analyzing + 10.18653/v1/2022.acl-short.3 Have my arguments been replied to? Argument Pair Extraction as Machine Reading Comprehension @@ -8791,6 +9397,7 @@ in the Case of Unambiguous Gender 2022.acl-short.4 2022.acl-short.4.software.zip bao-etal-2022-arguments + 10.18653/v1/2022.acl-short.4 On the probability–quality paradox in language generation @@ -8802,6 +9409,7 @@ in the Case of Unambiguous Gender When generating natural language from neural probabilistic models, high probability does not always coincide with high quality: It has often been observed that mode-seeking decoding methods, i.e., those that produce high-probability text under the model, lead to unnatural language. On the other hand, the lower-probability text generated by stochastic methods is perceived as more human-like. In this note, we offer an explanation for this phenomenon by analyzing language generation through an information-theoretic lens. Specifically, we posit that human-like language should contain an amount of information (quantified as negative log-probability) that is close to the entropy of the distribution over natural strings. Further, we posit that language with substantially more (or less) information is undesirable. We provide preliminary empirical evidence in favor of this hypothesis; quality ratings of both human and machine-generated text—covering multiple tasks and common decoding strategies—suggest high-quality text has an information content significantly closer to the entropy than we would expect by chance. 2022.acl-short.5 meister-etal-2022-high + 10.18653/v1/2022.acl-short.5 Disentangled Knowledge Transfer for <fixed-case>OOD</fixed-case> Intent Discovery with Unified Contrastive Learning @@ -8818,6 +9426,7 @@ in the Case of Unambiguous Gender 2022.acl-short.6 mou-etal-2022-disentangled myt517/dkt + 10.18653/v1/2022.acl-short.6 Voxel-informed Language Grounding @@ -8831,6 +9440,7 @@ in the Case of Unambiguous Gender corona-etal-2022-voxel rcorona/voxel_informed_language_grounding SNARE + 10.18653/v1/2022.acl-short.7 <fixed-case>P</fixed-case>-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks @@ -8848,6 +9458,7 @@ in the Case of Unambiguous Gender GLUE SQuAD SuperGLUE + 10.18653/v1/2022.acl-short.8 On Efficiently Acquiring Annotations for Multilingual Models @@ -8858,6 +9469,7 @@ in the Case of Unambiguous Gender When tasked with supporting multiple languages for a given problem, two approaches have arisen: training a model for each language with the annotation budget divided equally among them, and training on a high-resource language followed by zero-shot transfer to the remaining languages. In this work, we show that the strategy of joint learning across multiple languages using a single model performs substantially better than the aforementioned alternatives. We also demonstrate that active learning provides additional, complementary benefits. We show that this simple approach enables the model to be data efficient by allowing it to arbitrate its annotation budget to query languages it is less certain on. We illustrate the effectiveness of our proposed method on a diverse set of tasks: a classification task with 4 languages, a sequence tagging task with 4 languages and a dependency parsing task with 5 languages. Our proposed method, whilst simple, substantially outperforms the other viable alternatives for building a model in a multilingual setting under constrained budgets. 2022.acl-short.9 moniz-etal-2022-efficiently + 10.18653/v1/2022.acl-short.9 Automatic Detection of Entity-Manipulated Text using Factual Knowledge @@ -8869,6 +9481,7 @@ in the Case of Unambiguous Gender 2022.acl-short.10 jawahar-etal-2022-automatic RealNews + 10.18653/v1/2022.acl-short.10 Does <fixed-case>BERT</fixed-case> Know that the <fixed-case>IS</fixed-case>-A Relation Is Transitive? @@ -8880,6 +9493,7 @@ in the Case of Unambiguous Gender 2022.acl-short.11.software.zip lin-ng-2022-bert nusnlp/probe-bert-transitivity + 10.18653/v1/2022.acl-short.11 Buy Tesla, Sell Ford: Assessing Implicit Stock Market Preference in Pre-trained Language Models @@ -8889,6 +9503,7 @@ in the Case of Unambiguous Gender Pretrained language models such as BERT have achieved remarkable success in several NLP tasks. With the wide adoption of BERT in real-world applications, researchers begin to investigate the implicit biases encoded in the BERT. In this paper, we assess the implicit stock market preferences in BERT and its finance domain-specific model FinBERT. We find some interesting patterns. For example, the language models are overall more positive towards the stock market, but there are significant differences in preferences between a pair of industry sectors, or even within a sector. Given the prevalence of NLP models in financial decision making systems, this work raises the awareness of their potential implicit preferences in the stock markets. Awareness of such problems can help practitioners improve robustness and accountability of their financial NLP pipelines . 2022.acl-short.12 chuang-yang-2022-buy + 10.18653/v1/2022.acl-short.12 Pixie: Preference in Implicit and Explicit Comparisons @@ -8901,6 +9516,7 @@ in the Case of Unambiguous Gender 2022.acl-short.13 haque-etal-2022-pixie ahaque2/pixie + 10.18653/v1/2022.acl-short.13 Counterfactual Explanations for Natural Language Interfaces @@ -8914,6 +9530,7 @@ in the Case of Unambiguous Gender 2022.acl-short.14.software.zip tolkachev-etal-2022-counterfactual georgeto20/counterfactual_explanations + 10.18653/v1/2022.acl-short.14 Predicting Difficulty and Discrimination of Natural Language Questions @@ -8925,6 +9542,7 @@ in the Case of Unambiguous Gender 2022.acl-short.15.software.zip byrd-srivastava-2022-predicting HotpotQA + 10.18653/v1/2022.acl-short.15 How does the pre-training objective affect what large language models learn about linguistic properties? @@ -8935,6 +9553,7 @@ in the Case of Unambiguous Gender 2022.acl-short.16 alajrami-aletras-2022-pre GLUE + 10.18653/v1/2022.acl-short.16 The Power of Prompt Tuning for Low-Resource Semantic Parsing @@ -8945,6 +9564,7 @@ in the Case of Unambiguous Gender Prompt tuning has recently emerged as an effective method for adapting pre-trained language models to a number of language understanding and generation tasks. In this paper, we investigate prompt tuning for semantic parsing—the task of mapping natural language utterances onto formal meaning representations. On the low-resource splits of Overnight and TOPv2, we find that a prompt tuned T5-xl significantly outperforms its fine-tuned counterpart, as well as strong GPT-3 and BART baselines. We also conduct ablation studies across different model scales and target representations, finding that, with increasing model scale, prompt tuned T5 models improve at generating target representations that are far from the pre-training distribution. 2022.acl-short.17 schucher-etal-2022-power + 10.18653/v1/2022.acl-short.17 Data Contamination: From Memorization to Exploitation @@ -8956,6 +9576,7 @@ in the Case of Unambiguous Gender magar-schwartz-2022-data schwartz-lab-nlp/data_contamination SST + 10.18653/v1/2022.acl-short.18 Detecting Annotation Errors in Morphological Data with the Transformer @@ -8965,6 +9586,7 @@ in the Case of Unambiguous Gender Annotation errors that stem from various sources are usually unavoidable when performing large-scale annotation of linguistic data. In this paper, we evaluate the feasibility of using the Transformer model to detect various types of annotator errors in morphological data sets that contain inflected word forms. We evaluate our error detection model on four languages by introducing three different types of artificial errors in the data: (1) typographic errors, where single characters in the data are inserted, replaced, or deleted; (2) linguistic confusion errors where two inflected forms are systematically swapped; and (3) self-adversarial errors where the Transformer model itself is used to generate plausible-looking, but erroneous forms by retrieving high-scoring predictions from the search beam. Results show that the Transformer model can with perfect, or near-perfect recall detect errors in all three scenarios, even when significant amounts of the annotated data (5%-30%) are corrupted on all languages tested. Precision varies across the languages and types of errors, but is high enough that the model can be very effectively used to flag suspicious entries in large data sets for further scrutiny by human annotators. 2022.acl-short.19 liu-hulden-2022-detecting + 10.18653/v1/2022.acl-short.19 Estimating the Entropy of Linguistic Distributions @@ -8976,6 +9598,7 @@ in the Case of Unambiguous Gender 2022.acl-short.20 2022.acl-short.20.software.zip arora-etal-2022-estimating + 10.18653/v1/2022.acl-short.20 Morphological Reinflection with Multiple Arguments: An Extended Annotation schema and a <fixed-case>G</fixed-case>eorgian Case Study @@ -8986,6 +9609,7 @@ in the Case of Unambiguous Gender In recent years, a flurry of morphological datasets had emerged, most notably UniMorph, aa multi-lingual repository of inflection tables. However, the flat structure of the current morphological annotation makes the treatment of some languages quirky, if not impossible, specifically in cases of polypersonal agreement. In this paper we propose a general solution for such cases and expand the UniMorph annotation schema to naturally address this phenomenon, in which verbs agree with multiple arguments using true affixes. We apply this extended schema to one such language, Georgian, and provide a human-verified, accurate and balanced morphological dataset for Georgian verbs. The dataset has 4 times more tables and 6 times more verb forms compared to the existing UniMorph dataset, covering all possible variants of argument marking, demonstrating the adequacy of our proposed scheme. Experiments on a reinflection task show that generalization is easy when the data is split at the form level, but extremely hard when splitting along lemma lines. Expanding the other languages in UniMorph according to this schema is expected to improve both the coverage, consistency and interpretability of this benchmark. 2022.acl-short.21 guriel-etal-2022-morphological + 10.18653/v1/2022.acl-short.21 <fixed-case>DQ</fixed-case>-<fixed-case>BART</fixed-case>: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization @@ -9004,6 +9628,7 @@ in the Case of Unambiguous Gender li-etal-2022-dq CNN/Daily Mail ELI5 + 10.18653/v1/2022.acl-short.22 Learning-by-Narrating: Narrative Pre-Training for Zero-Shot Dialogue Comprehension @@ -9021,6 +9646,7 @@ in the Case of Unambiguous Gender CRD3 DREAM MovieNet + 10.18653/v1/2022.acl-short.23 Kronecker Decomposition for <fixed-case>GPT</fixed-case> Compression @@ -9040,6 +9666,7 @@ in the Case of Unambiguous Gender WebText WikiText-103 WikiText-2 + 10.18653/v1/2022.acl-short.24 Simple and Effective Knowledge-Driven Query Expansion for <fixed-case>QA</fixed-case>-Based Product Attribute Extraction @@ -9051,6 +9678,7 @@ in the Case of Unambiguous Gender A key challenge in attribute value extraction (AVE) from e-commerce sites is how to handle a large number of attributes for diverse products. Although this challenge is partially addressed by a question answering (QA) approach which finds a value in product data for a given query (attribute), it does not work effectively for rare and ambiguous queries. We thus propose simple knowledge-driven query expansion based on possible answers (values) of a query (attribute) for QA-based AVE. We retrieve values of a query (attribute) from the training data to expand the query. We train a model with two tricks, knowledge dropout and knowledge token mixing, which mimic the imperfection of the value knowledge in testing. Experimental results on our cleaned version of AliExpress dataset show that our method improves the performance of AVE (+6.08 macro F1), especially for rare and ambiguous attributes (+7.82 and +6.86 macro F1, respectively). 2022.acl-short.25 shinzato-etal-2022-simple + 10.18653/v1/2022.acl-short.25 Event-Event Relation Extraction using Probabilistic Box Embedding @@ -9064,6 +9692,7 @@ in the Case of Unambiguous Gender To understand a story with multiple events, it is important to capture the proper relations across these events. However, existing event relation extraction (ERE) framework regards it as a multi-class classification task and do not guarantee any coherence between different relation types, such as anti-symmetry. If a phone line “died” after “storm”, then it is obvious that the “storm” happened before the “died”. Current framework of event relation extraction do not guarantee this coherence and thus enforces it via constraint loss function (Wang et al., 2020). In this work, we propose to modify the underlying ERE model to guarantee coherence by representing each event as a box representation (BERE) without applying explicit constraints. From our experiments, BERE also shows stronger conjunctive constraint satisfaction while performing on par or better in F1 compared to previous models with constraint injection. 2022.acl-short.26 hwang-etal-2022-event + 10.18653/v1/2022.acl-short.26 Sample, Translate, Recombine: Leveraging Audio Alignments for Data Augmentation in End-to-end Speech Translation @@ -9076,6 +9705,7 @@ in the Case of Unambiguous Gender 2022.acl-short.27.software.tgz lam-etal-2022-sample Europarl-ST + 10.18653/v1/2022.acl-short.27 Predicting Sentence Deletions for Text Simplification Using a Functional Discourse Structure @@ -9087,6 +9717,7 @@ in the Case of Unambiguous Gender 2022.acl-short.28 zhang-etal-2022-predicting Newsela + 10.18653/v1/2022.acl-short.28 Multilingual Pre-training with Language and Task Adaptation for Multilingual Text Style Transfer @@ -9100,6 +9731,7 @@ in the Case of Unambiguous Gender laihuiyuan/multilingual-tst GYAFC XFORMAL + 10.18653/v1/2022.acl-short.29 When to Use Multi-Task Learning vs Intermediate Fine-Tuning for Pre-Trained Encoder Transfer Learning @@ -9110,6 +9742,7 @@ in the Case of Unambiguous Gender Transfer learning (TL) in natural language processing (NLP) has seen a surge of interest in recent years, as pre-trained models have shown an impressive ability to transfer to novel tasks. Three main strategies have emerged for making use of multiple supervised datasets during fine-tuning: training on an intermediate task before training on the target task (STILTs), using multi-task learning (MTL) to train jointly on a supplementary task and the target task (pairwise MTL), or simply using MTL to train jointly on all available datasets (MTL-ALL). In this work, we compare all three TL methods in a comprehensive analysis on the GLUE dataset suite. We find that there is a simple heuristic for when to use one of these techniques over the other: pairwise MTL is better than STILTs when the target task has fewer instances than the supporting task and vice versa. We show that this holds true in more than 92% of applicable cases on the GLUE dataset and validate this hypothesis with experiments varying dataset size. The simplicity and effectiveness of this heuristic is surprising and warrants additional exploration by the TL community. Furthermore, we find that MTL-ALL is worse than the pairwise methods in almost every case. We hope this study will aid others as they choose between TL methods for NLP tasks. 2022.acl-short.30 weller-etal-2022-use + 10.18653/v1/2022.acl-short.30 Leveraging Explicit Lexico-logical Alignments in Text-to-<fixed-case>SQL</fixed-case> Parsing @@ -9125,6 +9758,7 @@ in the Case of Unambiguous Gender 2022.acl-short.31 2022.acl-short.31.software.zip sun-etal-2022-leveraging + 10.18653/v1/2022.acl-short.31 Complex Evolutional Pattern Learning for Temporal Knowledge Graph Reasoning @@ -9144,6 +9778,7 @@ in the Case of Unambiguous Gender li-etal-2022-complex lee-zix/cen ICEWS + 10.18653/v1/2022.acl-short.32 Mismatch between Multi-turn Dialogue and its Evaluation Metric in Dialogue State Tracking @@ -9157,6 +9792,7 @@ in the Case of Unambiguous Gender 2022.acl-short.33 kim-etal-2022-mismatch MultiWOZ + 10.18653/v1/2022.acl-short.33 <fixed-case>LM</fixed-case>-<fixed-case>BFF</fixed-case>-<fixed-case>MS</fixed-case>: Improving Few-Shot Fine-tuning of Language Models based on Multiple Soft Demonstration Memory @@ -9175,6 +9811,7 @@ in the Case of Unambiguous Gender MRPC SNLI SST + 10.18653/v1/2022.acl-short.34 Towards Fair Evaluation of Dialogue State Tracking by Flexible Incorporation of Turn-level Performances @@ -9188,6 +9825,7 @@ in the Case of Unambiguous Gender dey-etal-2022-towards suvodipdey/fga MultiWOZ + 10.18653/v1/2022.acl-short.35 Exploiting Language Model Prompts Using Similarity Measures: A Case Study on the Word-in-Context Task @@ -9202,6 +9840,7 @@ in the Case of Unambiguous Gender SST SuperGLUE WiC + 10.18653/v1/2022.acl-short.36 Hierarchical Curriculum Learning for <fixed-case>AMR</fixed-case> Parsing @@ -9219,6 +9858,7 @@ in the Case of Unambiguous Gender wang-etal-2022-hierarchical wangpeiyi9979/hcl-text2amr Bio + 10.18653/v1/2022.acl-short.37 <fixed-case>PARE</fixed-case>: A Simple and Strong Baseline for Monolingual and Multilingual Distantly Supervised Relation Extraction @@ -9233,6 +9873,7 @@ in the Case of Unambiguous Gender rathore-etal-2022-pare dair-iitd/dsre DiS-ReX + 10.18653/v1/2022.acl-short.38 To Find Waldo You Need Contextual Cues: Debiasing Who’s Waldo @@ -9249,6 +9890,7 @@ in the Case of Unambiguous Gender COCO Visual Genome Who’s Waldo + 10.18653/v1/2022.acl-short.39 Translate-Train Embracing Translationese Artifacts @@ -9261,6 +9903,7 @@ in the Case of Unambiguous Gender 2022.acl-short.40 yu-etal-2022-translate TyDi QA + 10.18653/v1/2022.acl-short.40 <fixed-case>C</fixed-case>-<fixed-case>MORE</fixed-case>: Pretraining to Answer Open-Domain Questions by Consulting Millions of References @@ -9277,6 +9920,7 @@ in the Case of Unambiguous Gender xiangyue9607/c-more Natural Questions TriviaQA + 10.18653/v1/2022.acl-short.41 k-<fixed-case>R</fixed-case>ater <fixed-case>R</fixed-case>eliability: <fixed-case>T</fixed-case>he Correct Unit of Reliability for Aggregated Human Annotations @@ -9286,6 +9930,7 @@ in the Case of Unambiguous Gender Since the inception of crowdsourcing, aggregation has been a common strategy for dealing with unreliable data. Aggregate ratings are more reliable than individual ones. However, many Natural Language Processing (NLP) applications that rely on aggregate ratings only report the reliability of individual ratings, which is the incorrect unit of analysis. In these instances, the data reliability is under-reported, and a proposed k-rater reliability (kRR) should be used as the correct data reliability for aggregated datasets. It is a multi-rater generalization of inter-rater reliability (IRR). We conducted two replications of the WordSim-353 benchmark, and present empirical, analytical, and bootstrap-based methods for computing kRR on WordSim-353. These methods produce very similar results. We hope this discussion will nudge researchers to report kRR in addition to IRR. 2022.acl-short.42 wong-paritosh-2022-k + 10.18653/v1/2022.acl-short.42 An Embarrassingly Simple Method to Mitigate Undesirable Properties of Pretrained Language Model Tokenizers @@ -9298,6 +9943,7 @@ in the Case of Unambiguous Gender 2022.acl-short.43.software.zip hofmann-etal-2022-embarrassingly valentinhofmann/flota + 10.18653/v1/2022.acl-short.43 <fixed-case>SCD</fixed-case>: Self-Contrastive Decorrelation of Sentence Embeddings @@ -9312,6 +9958,7 @@ in the Case of Unambiguous Gender MRPC SST SentEval + 10.18653/v1/2022.acl-short.44 Problems with Cosine as a Measure of Embedding Similarity for High Frequency Words @@ -9325,6 +9972,7 @@ in the Case of Unambiguous Gender zhou-etal-2022-problems katezhou/cosine_and_frequency WiC + 10.18653/v1/2022.acl-short.45 Revisiting the Compositional Generalization Abilities of Neural Sequence Models @@ -9339,6 +9987,7 @@ in the Case of Unambiguous Gender patel-etal-2022-revisiting arkilpatel/compositional-generalization-seq2seq SCAN + 10.18653/v1/2022.acl-short.46 A Copy-Augmented Generative Model for Open-Domain Question Answering @@ -9353,6 +10002,7 @@ in the Case of Unambiguous Gender liu-etal-2022-copy Natural Questions TriviaQA + 10.18653/v1/2022.acl-short.47 Augmenting Document Representations for Dense Retrieval with Interpolation and Perturbation @@ -9368,6 +10018,7 @@ in the Case of Unambiguous Gender starsuzi/dar Natural Questions TriviaQA + 10.18653/v1/2022.acl-short.48 <fixed-case>WLASL</fixed-case>-<fixed-case>LEX</fixed-case>: a Dataset for Recognising Phonological Properties in <fixed-case>A</fixed-case>merican <fixed-case>S</fixed-case>ign <fixed-case>L</fixed-case>anguage @@ -9382,6 +10033,7 @@ in the Case of Unambiguous Gender 2022.acl-short.49.software.zip tavella-etal-2022-wlasl WLASL + 10.18653/v1/2022.acl-short.49 Investigating person-specific errors in chat-oriented dialogue systems @@ -9393,6 +10045,7 @@ in the Case of Unambiguous Gender Creating chatbots to behave like real people is important in terms of believability. Errors in general chatbots and chatbots that follow a rough persona have been studied, but those in chatbots that behave like real people have not been thoroughly investigated. We collected a large amount of user interactions of a generation-based chatbot trained from large-scale dialogue data of a specific character, i.e., target person, and analyzed errors related to that person. We found that person-specific errors can be divided into two types: errors in attributes and those in relations, each of which can be divided into two levels: self and other. The correspondence with an existing taxonomy of errors was also investigated, and person-specific errors that should be addressed in the future were clarified. 2022.acl-short.50 mitsuda-etal-2022-investigating + 10.18653/v1/2022.acl-short.50 Direct parsing to sentiment graphs @@ -9409,6 +10062,7 @@ in the Case of Unambiguous Gender samuel-etal-2022-direct jerbarnes/direct_parsing_to_sent_graph MPQA Opinion Corpus + 10.18653/v1/2022.acl-short.51 <fixed-case>XDBERT</fixed-case>: <fixed-case>D</fixed-case>istilling Visual Information to <fixed-case>BERT</fixed-case> from Cross-Modal Systems to Improve Language Understanding @@ -9422,6 +10076,7 @@ in the Case of Unambiguous Gender hsu-etal-2022-xdbert GLUE SWAG + 10.18653/v1/2022.acl-short.52 As Little as Possible, as Much as Necessary: Detecting Over- and Undertranslations with Contrastive Conditioning @@ -9433,6 +10088,7 @@ in the Case of Unambiguous Gender 2022.acl-short.53.software.zip vamvas-sennrich-2022-little zurichnlp/coverage-contrastive-conditioning + 10.18653/v1/2022.acl-short.53 How Distributed are Distributed Representations? An Observation on the Locality of Syntactic Information in Verb Agreement Tasks @@ -9443,6 +10099,7 @@ in the Case of Unambiguous Gender This work addresses the question of the localization of syntactic information encoded in the transformers representations. We tackle this question from two perspectives, considering the object-past participle agreement in French, by identifying, first, in which part of the sentence and, second, in which part of the representation the syntactic information is encoded. The results of our experiments, using probing, causal analysis and feature selection method, show that syntactic information is encoded locally in a way consistent with the French grammar. 2022.acl-short.54 li-etal-2022-distributed + 10.18653/v1/2022.acl-short.54 Machine Translation for <fixed-case>L</fixed-case>ivonian: Catering to 20 Speakers @@ -9455,6 +10112,7 @@ in the Case of Unambiguous Gender Livonian is one of the most endangered languages in Europe with just a tiny handful of speakers and virtually no publicly available corpora. In this paper we tackle the task of developing neural machine translation (NMT) between Livonian and English, with a two-fold aim: on one hand, preserving the language and on the other – enabling access to Livonian folklore, lifestories and other textual intangible heritage as well as making it easier to create further parallel corpora. We rely on Livonian’s linguistic similarity to Estonian and Latvian and collect parallel and monolingual data for the four languages for translation experiments. We combine different low-resource NMT techniques like zero-shot translation, cross-lingual transfer and synthetic data creation to reach the highest possible translation quality as well as to find which base languages are empirically more helpful for transfer to Livonian. The resulting NMT systems and the collected monolingual and parallel data, including a manually translated and verified translation benchmark, are publicly released via OPUS and Huggingface repositories. 2022.acl-short.55 rikters-etal-2022-machine + 10.18653/v1/2022.acl-short.55 Fire Burns, Sword Cuts: Commonsense Inductive Bias for Exploration in Text-based Games @@ -9470,6 +10128,7 @@ in the Case of Unambiguous Gender ryu-etal-2022-fire ktr0921/comm-expl-kg-a2c Jericho + 10.18653/v1/2022.acl-short.56 A Simple but Effective Pluggable Entity Lookup Table for Pre-trained Language Models @@ -9489,6 +10148,7 @@ in the Case of Unambiguous Gender LAMA S2ORC T-REx + 10.18653/v1/2022.acl-short.57 S<tex-math>^4</tex-math>-Tuning: A Simple Cross-lingual Sub-network Tuning Method @@ -9503,6 +10163,7 @@ in the Case of Unambiguous Gender xu-etal-2022-s4 PAWS-X XNLI + 10.18653/v1/2022.acl-short.58 Region-dependent temperature scaling for certainty calibration and application to class-imbalanced token classification @@ -9513,6 +10174,7 @@ in the Case of Unambiguous Gender 2022.acl-short.59 dawkins-nejadgholi-2022-region Few-NERD + 10.18653/v1/2022.acl-short.59 Developmental Negation Processing in Transformer Language Models @@ -9524,6 +10186,7 @@ in the Case of Unambiguous Gender 2022.acl-short.60.software.zip laverghetta-jr-licato-2022-developmental advancing-machine-human-reasoning-lab/negation-processing-acl-2022 + 10.18653/v1/2022.acl-short.60 Canary Extraction in Natural Language Understanding Models @@ -9535,6 +10198,7 @@ in the Case of Unambiguous Gender 2022.acl-short.61 parikh-etal-2022-canary SNIPS + 10.18653/v1/2022.acl-short.61 On the Intrinsic and Extrinsic Fairness Evaluation Metrics for Contextualized Language Representations @@ -9550,6 +10214,7 @@ in the Case of Unambiguous Gender 2022.acl-short.62 cao-etal-2022-intrinsic StereoSet + 10.18653/v1/2022.acl-short.62 Sequence-to-sequence <fixed-case>AMR</fixed-case> Parsing with Ancestor Information @@ -9559,6 +10224,7 @@ in the Case of Unambiguous Gender AMR parsing is the task that maps a sentence to an AMR semantic graph automatically. The difficulty comes from generating the complex graph structure. The previous state-of-the-art method translates the AMR graph into a sequence, then directly fine-tunes a pretrained sequence-to-sequence Transformer model (BART). However, purely treating the graph as a sequence does not take advantage of structural information about the graph. In this paper, we design several strategies to add the important ancestor information into the Transformer Decoder. Our experiments show that we can improve the performance for both AMR 2.0 and AMR 3.0 dataset and achieve new state-of-the-art results. 2022.acl-short.63 yu-gildea-2022-sequence + 10.18653/v1/2022.acl-short.63 Zero-Shot Dependency Parsing with Worst-Case Aware Automated Curriculum Learning @@ -9570,6 +10236,7 @@ in the Case of Unambiguous Gender 2022.acl-short.64 de-lhoneux-etal-2022-zero mdelhoneux/machamp-worst_case_acl + 10.18653/v1/2022.acl-short.64 <fixed-case>P</fixed-case>ri<fixed-case>M</fixed-case>ock57: A Dataset Of Primary Care Mock Consultations @@ -9581,6 +10248,7 @@ in the Case of Unambiguous Gender Recent advances in Automatic Speech Recognition (ASR) have made it possible to reliably produce automatic transcripts of clinician-patient conversations. However, access to clinical datasets is heavily restricted due to patient privacy, thus slowing down normal research practices. We detail the development of a public access, high quality dataset comprising of 57 mocked primary care consultations, including audio recordings, their manual utterance-level transcriptions, and the associated consultation notes. Our work illustrates how the dataset can be used as a benchmark for conversational medical ASR as well as consultation note generation from transcripts. 2022.acl-short.65 papadopoulos-korfiatis-etal-2022-primock57 + 10.18653/v1/2022.acl-short.65 <fixed-case>U</fixed-case>ni<fixed-case>GDD</fixed-case>: <fixed-case>A</fixed-case> Unified Generative Framework for Goal-Oriented Document-Grounded Dialogue @@ -9593,6 +10261,7 @@ in the Case of Unambiguous Gender gao-etal-2022-unigdd gao-xiao-bai/UniGDD Doc2Dial + 10.18653/v1/2022.acl-short.66 <fixed-case>DM</fixed-case>ix: Adaptive Distance-aware Interpolative Mixup @@ -9611,6 +10280,7 @@ in the Case of Unambiguous Gender CoLA GLUE SST + 10.18653/v1/2022.acl-short.67 Sub-Word Alignment is Still Useful: A Vest-Pocket Method for Enhancing Low-Resource Machine Translation @@ -9622,6 +10292,7 @@ in the Case of Unambiguous Gender 2022.acl-short.68.software.zip xu-hong-2022-sub Cosmos-Break/transfer-mt-submit + 10.18653/v1/2022.acl-short.68 <fixed-case>HYPHEN</fixed-case>: Hyperbolic <fixed-case>H</fixed-case>awkes Attention For Text Streams @@ -9636,6 +10307,7 @@ in the Case of Unambiguous Gender 2022.acl-short.69.software.zip agarwal-etal-2022-hyphen gtfintechlab/hyphen-acl + 10.18653/v1/2022.acl-short.69 A Risk-Averse Mechanism for Suicidality Assessment on Social Media @@ -9646,6 +10318,7 @@ in the Case of Unambiguous Gender Recent studies have shown that social media has increasingly become a platform for users to express suicidal thoughts outside traditional clinical settings. With advances in Natural Language Processing strategies, it is now possible to design automated systems to assess suicide risk. However, such systems may generate uncertain predictions, leading to severe consequences. We hence reformulate suicide risk assessment as a selective prioritized prediction problem over the Columbia Suicide Severity Risk Scale (C-SSRS). We propose SASI, a risk-averse and self-aware transformer-based hierarchical attention classifier, augmented to refrain from making uncertain predictions. We show that SASI is able to refrain from 83% of incorrect predictions on real-world Reddit data. Furthermore, we discuss the qualitative, practical, and ethical aspects of SASI for suicide risk assessment as a human-in-the-loop framework. 2022.acl-short.70 sawhney-etal-2022-risk + 10.18653/v1/2022.acl-short.70 When classifying grammatical role, <fixed-case>BERT</fixed-case> doesn’t care about word order... except when it matters @@ -9657,6 +10330,7 @@ in the Case of Unambiguous Gender 2022.acl-short.71 2022.acl-short.71.software.tgz papadimitriou-etal-2022-classifying-grammatical + 10.18653/v1/2022.acl-short.71 Triangular Transfer: Freezing the Pivot for Triangular Machine Translation @@ -9667,6 +10341,7 @@ in the Case of Unambiguous Gender Triangular machine translation is a special case of low-resource machine translation where the language pair of interest has limited parallel data, but both languages have abundant parallel data with a pivot language. Naturally, the key to triangular machine translation is the successful exploitation of such auxiliary data. In this work, we propose a transfer-learning-based approach that utilizes all types of auxiliary data. As we train auxiliary source-pivot and pivot-target translation models, we initialize some parameters of the pivot side with a pre-trained language model and freeze them to encourage both translation models to work in the same pivot language space, so that they can be smoothly transferred to the source-target translation model. Experiments show that our approach can outperform previous ones. 2022.acl-short.72 zhang-etal-2022-triangular + 10.18653/v1/2022.acl-short.72 Can Visual Dialogue Models Do Scorekeeping? Exploring How Dialogue Representations Incrementally Encode Shared Knowledge @@ -9678,6 +10353,7 @@ in the Case of Unambiguous Gender madureira-schlangen-2022-visual COCO VisDial + 10.18653/v1/2022.acl-short.73 Focus on the Target’s Vocabulary: Masked Label Smoothing for Machine Translation @@ -9690,6 +10366,7 @@ in the Case of Unambiguous Gender 2022.acl-short.74.software.zip chen-etal-2022-focus chenllliang/MLS + 10.18653/v1/2022.acl-short.74 Contrastive Learning-Enhanced Nearest Neighbor Mechanism for Multi-Label Text Classification @@ -9701,6 +10378,7 @@ in the Case of Unambiguous Gender 2022.acl-short.75 su-etal-2022-contrastive RCV1 + 10.18653/v1/2022.acl-short.75 <fixed-case>N</fixed-case>oisy<fixed-case>T</fixed-case>une: A Little Noise Can Help You Finetune Pretrained Language Models Better @@ -9714,6 +10392,7 @@ in the Case of Unambiguous Gender wu-etal-2022-noisytune GLUE XTREME + 10.18653/v1/2022.acl-short.76 Adjusting the Precision-Recall Trade-Off with Align-and-Predict Decoding for Grammatical Error Correction @@ -9724,6 +10403,7 @@ in the Case of Unambiguous Gender 2022.acl-short.77 sun-wang-2022-adjusting autotemp/align-and-predict + 10.18653/v1/2022.acl-short.77 On the Effect of Isotropy on <fixed-case>VAE</fixed-case> Representations of Text @@ -9736,6 +10416,7 @@ in the Case of Unambiguous Gender 2022.acl-short.78.software.zip zhang-etal-2022-effect lanzhang128/IGPVAE + 10.18653/v1/2022.acl-short.78 Efficient Classification of Long Documents Using Transformers @@ -9747,6 +10428,7 @@ in the Case of Unambiguous Gender 2022.acl-short.79 park-etal-2022-efficient EURLEX57K + 10.18653/v1/2022.acl-short.79 Rewarding Semantic Similarity under Optimized Alignments for <fixed-case>AMR</fixed-case>-to-Text Generation @@ -9756,6 +10438,7 @@ in the Case of Unambiguous Gender A common way to combat exposure bias is by applying scores from evaluation metrics as rewards in reinforcement learning (RL). Metrics leveraging contextualized embeddings appear more flexible than their n-gram matching counterparts and thus ideal as training rewards. However, metrics such as BERTScore greedily align candidate and reference tokens, which can allow system outputs to receive excess credit relative to a reference. Furthermore, past approaches featuring semantic similarity rewards suffer from repetitive outputs and overfitting. We address these issues by proposing metrics that replace the greedy alignments in BERTScore with optimized ones. We compute them on a model’s trained token embeddings to prevent domain mismatch. Our model optimizing discrete alignment metrics consistently outperforms cross-entropy and BLEU reward baselines on AMR-to-text generation. In addition, we find that this approach enjoys stable training compared to a non-RL setting. 2022.acl-short.80 jin-gildea-2022-rewarding + 10.18653/v1/2022.acl-short.80 An Analysis of Negation in Natural Language Understanding Corpora @@ -9777,6 +10460,7 @@ in the Case of Unambiguous Gender SuperGLUE WSC WiC + 10.18653/v1/2022.acl-short.81 <fixed-case>P</fixed-case>rimum <fixed-case>N</fixed-case>on <fixed-case>N</fixed-case>ocere: <fixed-case>B</fixed-case>efore working with <fixed-case>I</fixed-case>ndigenous data, the <fixed-case>ACL</fixed-case> must confront ongoing colonialism @@ -9785,6 +10469,7 @@ in the Case of Unambiguous Gender In this paper, we challenge the ACL community to reckon with historical and ongoing colonialism by adopting a set of ethical obligations and best practices drawn from the Indigenous studies literature. While the vast majority of NLP research focuses on a very small number of very high resource languages (English, Chinese, etc), some work has begun to engage with Indigenous languages. No research involving Indigenous language data can be considered ethical without first acknowledging that Indigenous languages are not merely very low resource languages. The toxic legacy of colonialism permeates every aspect of interaction between Indigenous communities and outside researchers. To this end, we propose that the ACL draft and adopt an ethical framework for NLP researchers and computational linguists wishing to engage in research involving Indigenous languages. 2022.acl-short.82 schwartz-2022-primum + 10.18653/v1/2022.acl-short.82 Unsupervised multiple-choice question generation for out-of-domain <fixed-case>Q</fixed-case>&<fixed-case>A</fixed-case> fine-tuning @@ -9801,6 +10486,7 @@ in the Case of Unambiguous Gender QASC SQuAD SciQ + 10.18653/v1/2022.acl-short.83 Can a Transformer Pass the Wug Test? Tuning Copying Bias in Neural Morphological Inflection Models @@ -9811,6 +10497,7 @@ in the Case of Unambiguous Gender 2022.acl-short.84 2022.acl-short.84.software.zip liu-hulden-2022-transformer + 10.18653/v1/2022.acl-short.84 Probing the Robustness of Trained Metrics for Conversational Dialogue Systems @@ -9826,6 +10513,7 @@ in the Case of Unambiguous Gender jderiu/metric-robustness DailyDialog PERSONA-CHAT + 10.18653/v1/2022.acl-short.85 Rethinking and Refining the Distinct Metric @@ -9840,6 +10528,7 @@ in the Case of Unambiguous Gender 2022.acl-short.86 liu-etal-2022-rethinking DailyDialog + 10.18653/v1/2022.acl-short.86 How reparametrization trick broke differentially-private text representation learning @@ -9850,6 +10539,7 @@ in the Case of Unambiguous Gender 2022.acl-short.87.software.zip habernal-2022-reparametrization trusthlt/acl2022-reparametrization-trick-broke-differential-privacy + 10.18653/v1/2022.acl-short.87 Towards Consistent Document-level Entity Linking: Joint Models for Entity Linking and Coreference Resolution @@ -9864,6 +10554,7 @@ in the Case of Unambiguous Gender zaporojets-etal-2022-towards klimzaporojets/consistent-el DWIE + 10.18653/v1/2022.acl-short.88 A Flexible Multi-Task Model for <fixed-case>BERT</fixed-case> Serving @@ -9879,6 +10570,7 @@ in the Case of Unambiguous Gender MRPC QNLI SST + 10.18653/v1/2022.acl-short.89 Understanding Game-Playing Agents with Natural Language Annotations @@ -9891,6 +10583,7 @@ in the Case of Unambiguous Gender 2022.acl-short.90.software.zip tomlin-etal-2022-understanding andrehe02/go-probe + 10.18653/v1/2022.acl-short.90 Code Synonyms Do Matter: Multiple Synonyms Matching Network for Automatic <fixed-case>ICD</fixed-case> Coding @@ -9904,6 +10597,7 @@ in the Case of Unambiguous Gender yuan-etal-2022-code ganjinzero/icd-msmn MIMIC-III + 10.18653/v1/2022.acl-short.91 <fixed-case>C</fixed-case>o<fixed-case>DA</fixed-case>21: Evaluating Language Understanding Capabilities of <fixed-case>NLP</fixed-case> Models With Context-Definition Alignment @@ -9915,6 +10609,7 @@ in the Case of Unambiguous Gender 2022.acl-short.92 senel-etal-2022-coda21 lksenel/coda21 + 10.18653/v1/2022.acl-short.92 On the Importance of Effectively Adapting Pretrained Language Models for Active Learning @@ -9929,6 +10624,7 @@ in the Case of Unambiguous Gender AG News IMDb Movie Reviews SST + 10.18653/v1/2022.acl-short.93 A Recipe for Arbitrary Text Style Transfer with Large Language Models @@ -9943,6 +10639,7 @@ in the Case of Unambiguous Gender 2022.acl-short.94 2022.acl-short.94.software.zip reif-etal-2022-recipe + 10.18653/v1/2022.acl-short.94 <fixed-case>D</fixed-case>i<fixed-case>S</fixed-case>-<fixed-case>R</fixed-case>e<fixed-case>X</fixed-case>: A Multilingual Dataset for Distantly Supervised Relation Extraction @@ -9957,6 +10654,7 @@ in the Case of Unambiguous Gender dair-iitd/DiS-ReX DiS-ReX RELX + 10.18653/v1/2022.acl-short.95 (Un)solving Morphological Inflection: Lemma Overlap Artificially Inflates Models’ Performance @@ -9967,6 +10665,7 @@ in the Case of Unambiguous Gender In the domain of Morphology, Inflection is a fundamental and important task that gained a lot of traction in recent years, mostly via SIGMORPHON’s shared-tasks.With average accuracy above 0.9 over the scores of all languages, the task is considered mostly solved using relatively generic neural seq2seq models, even with little data provided.In this work, we propose to re-evaluate morphological inflection models by employing harder train-test splits that will challenge the generalization capacity of the models. In particular, as opposed to the naïve split-by-form, we propose a split-by-lemma method to challenge the performance on existing benchmarks.Our experiments with the three top-ranked systems on the SIGMORPHON’s 2020 shared-task show that the lemma-split presents an average drop of 30 percentage points in macro-average for the 90 languages included. The effect is most significant for low-resourced languages with a drop as high as 95 points, but even high-resourced languages lose about 10 points on average. Our results clearly show that generalizing inflection to unseen lemmas is far from being solved, presenting a simple yet effective means to promote more sophisticated models. 2022.acl-short.96 goldman-etal-2022-un + 10.18653/v1/2022.acl-short.96 Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks @@ -9981,6 +10680,7 @@ in the Case of Unambiguous Gender wu-etal-2022-text SNIPS SST + 10.18653/v1/2022.acl-short.97 @@ -10006,6 +10706,7 @@ in the Case of Unambiguous Gender This work presents two experiments with the goal of replicating the transferability of dependency parsers and POS taggers trained on closely related languages within the low-resource language family Tupían. The experiments include both zero-shot settings as well as multilingual models. Previous studies have found that even a comparably small treebank from a closely related language will improve sequence labelling considerably in such cases. Results from both POS tagging and dependency parsing confirm previous evidence that the closer the phylogenetic relation between two languages, the better the predictions for sequence labelling tasks get. In many cases, the results are improved if multiple languages from the same family are combined. This suggests that in addition to leveraging similarity between two related languages, the incorporation of multiple languages of the same family might lead to better results in transfer learning for NLP applications. 2022.acl-srw.1 blum-2022-evaluating + 10.18653/v1/2022.acl-srw.1 <fixed-case>RFBFN</fixed-case>: A Relation-First Blank Filling Network for Joint Relational Triple Extraction @@ -10019,6 +10720,7 @@ in the Case of Unambiguous Gender 2022.acl-srw.2 li-etal-2022-rfbfn lizhe2016/rfbfn + 10.18653/v1/2022.acl-srw.2 Building a Dialogue Corpus Annotated with Expressed and Experienced Emotions @@ -10033,6 +10735,7 @@ in the Case of Unambiguous Gender EmoBank EmotionLines Story Commonsense + 10.18653/v1/2022.acl-srw.3 Darkness can not drive out darkness: Investigating Bias in Hate <fixed-case>S</fixed-case>peech<fixed-case>D</fixed-case>etection Models @@ -10041,6 +10744,7 @@ in the Case of Unambiguous Gender It has become crucial to develop tools for automated hate speech and abuse detection. These tools would help to stop the bullies and the haters and provide a safer environment for individuals especially from marginalized groups to freely express themselves. However, recent research shows that machine learning models are biased and they might make the right decisions for the wrong reasons. In this thesis, I set out to understand the performance of hate speech and abuse detection models and the different biases that could influence them. I show that hate speech and abuse detection models are not only subject to social bias but also to other types of bias that have not been explored before. Finally, I investigate the causal effect of the social and intersectional bias on the performance and unfairness of hate speech detection models. 2022.acl-srw.4 elsafoury-2022-darkness + 10.18653/v1/2022.acl-srw.4 Ethical Considerations for Low-resourced Machine Translation @@ -10049,6 +10753,7 @@ in the Case of Unambiguous Gender This paper considers some ethical implications of machine translation for low-resourced languages. I use Armenian as a case study and investigate specific needs for and concerns arising from the creation and deployment of improved machine translation between English and Armenian. To do this, I conduct stakeholder interviews and construct Value Scenarios (Nathan et al., 2007) from the themes that emerge. These scenarios illustrate some of the potential harms that low-resourced language communities may face due to the deployment of improved machine translation systems. Based on these scenarios, I recommend 1) collaborating with stakeholders in order to create more useful and reliable machine translation tools, and 2) determining which other forms of language technology should be developed alongside efforts to improve machine translation in order to mitigate harms rendered to vulnerable language communities. Both of these goals require treating low-resourced machine translation as a language-specific, rather than language-agnostic, task. 2022.acl-srw.5 haroutunian-2022-ethical + 10.18653/v1/2022.acl-srw.5 Integrating Question Rewrites in Conversational Question Answering: A Reinforcement Learning Approach @@ -10065,6 +10770,7 @@ in the Case of Unambiguous Gender CoQA QReCC QuAC + 10.18653/v1/2022.acl-srw.6 What Do You Mean by Relation Extraction? A Survey on Datasets and Study on Scientific Relation Classification @@ -10079,6 +10785,7 @@ in the Case of Unambiguous Gender DocRED FewRel FewRel 2.0 + 10.18653/v1/2022.acl-srw.7 Logical Inference for Counting on Semi-structured Tables @@ -10090,6 +10797,7 @@ in the Case of Unambiguous Gender kurosawa-yanaka-2022-logical ynklab/sst_count InfoTabS + 10.18653/v1/2022.acl-srw.8 <fixed-case>GNN</fixed-case>er: Reducing Overlapping in Span-based <fixed-case>NER</fixed-case> Using Graph Neural Networks @@ -10104,6 +10812,7 @@ in the Case of Unambiguous Gender urchade/gnner CoNLL-2003 SciERC + 10.18653/v1/2022.acl-srw.9 Compositional Semantics and Inference System for Temporal Order based on <fixed-case>J</fixed-case>apanese <fixed-case>CCG</fixed-case> @@ -10114,6 +10823,7 @@ in the Case of Unambiguous Gender 2022.acl-srw.10 sugimoto-yanaka-2022-compositional ynklab/ccgtemp + 10.18653/v1/2022.acl-srw.10 Combine to Describe: Evaluating Compositional Generalization in Image Captioning @@ -10125,6 +10835,7 @@ in the Case of Unambiguous Gender 2022.acl-srw.11 pantazopoulos-etal-2022-combine COCO + 10.18653/v1/2022.acl-srw.11 Towards Unification of Discourse Annotation Frameworks @@ -10133,6 +10844,7 @@ in the Case of Unambiguous Gender Discourse information is difficult to represent and annotate. Among the major frameworks for annotating discourse information, RST, PDTB and SDRT are widely discussed and used, each having its own theoretical foundation and focus. Corpora annotated under different frameworks vary considerably. To make better use of the existing discourse corpora and achieve the possible synergy of different frameworks, it is worthwhile to investigate the systematic relations between different frameworks and devise methods of unifying the frameworks. Although the issue of framework unification has been a topic of discussion for a long time, there is currently no comprehensive approach which considers unifying both discourse structure and discourse relations and evaluates the unified framework intrinsically and extrinsically. We plan to use automatic means for the unification task and evaluate the result with structural complexity and downstream tasks. We will also explore the application of the unified framework in multi-task learning and graphical models. 2022.acl-srw.12 fu-2022-towards + 10.18653/v1/2022.acl-srw.12 <fixed-case>AMR</fixed-case> Alignment for Morphologically-rich and Pro-drop Languages @@ -10142,6 +10854,7 @@ in the Case of Unambiguous Gender Alignment between concepts in an abstract meaning representation (AMR) graph and the words within a sentence is one of the important stages of AMR parsing. Although there exist high performing AMR aligners for English, unfortunately, these are not well suited for many languages where many concepts appear from morpho-semantic elements.For the first time in the literature, this paper presents an AMR aligner tailored for morphologically-rich and pro-drop languages by experimenting on the Turkish language being a prominent example of this language group.Our aligner focuses on the meaning considering the rich Turkish morphology and aligns AMR concepts that emerge from morphemes using a tree traversal approach without additional resources or rules. We evaluate our aligner over a manually annotated gold data set in terms of precision, recall and F1 score. Our aligner outperforms the Turkish adaptations of the previously proposed aligners for English and Portuguese by an F1 score of 0.87 and provides a relative error reduction of up to 76%. 2022.acl-srw.13 oral-eryigit-2022-amr + 10.18653/v1/2022.acl-srw.13 Sketching a Linguistically-Driven Reasoning Dialog Model for Social Talk @@ -10150,6 +10863,7 @@ in the Case of Unambiguous Gender The capability of holding social talk (or casual conversation) and making sense of conversational content requires context-sensitive natural language understanding and reasoning, which cannot be handled efficiently by the current popular open-domain dialog systems and chatbots. Heavily relying on corpus-based machine learning techniques to encode and decode context-sensitive meanings, these systems focus on fitting a particular training dataset, but not tracking what is actually happening in a conversation, and therefore easily derail in a new context. This work sketches out a more linguistically-informed architecture to handle social talk in English, in which corpus-based methods form the backbone of the relatively context-insensitive components (e.g. part-of-speech tagging, approximation of lexical meaning and constituent chunking), while symbolic modeling is used for reasoning out the context-sensitive components, which do not have any consistent mapping to linguistic forms. All components are fitted into a Bayesian game-theoretic model to address the interactive and rational aspects of conversation. 2022.acl-srw.14 luu-2022-sketching + 10.18653/v1/2022.acl-srw.14 Scoping natural language processing in <fixed-case>I</fixed-case>ndonesian and <fixed-case>M</fixed-case>alay for education applications @@ -10161,6 +10875,7 @@ in the Case of Unambiguous Gender 2022.acl-srw.15 maxwelll-smith-etal-2022-scoping IndoNLU Benchmark + 10.18653/v1/2022.acl-srw.15 <fixed-case>E</fixed-case>nglish-<fixed-case>M</fixed-case>alay Cross-Lingual Embedding Alignment using Bilingual Lexicon Augmentation @@ -10170,6 +10885,7 @@ in the Case of Unambiguous Gender As high-quality Malay language resources are still a scarcity, cross lingual word embeddings make it possible for richer English resources to be leveraged for downstream Malay text classification tasks. This paper focuses on creating an English-Malay cross-lingual word embeddings using embedding alignment by exploiting existing language resources. We augmented the training bilingual lexicons using machine translation with the goal to improve the alignment precision of our cross-lingual word embeddings. We investigated the quality of the current state-of-the-art English-Malay bilingual lexicon and worked on improving its quality using Google Translate. We also examined the effect of Malay word coverage on the quality of cross-lingual word embeddings. Experimental results with a precision up till 28.17% show that the alignment precision of the cross-lingual word embeddings would inevitably degrade after 1-NN but a better seed lexicon and cleaner nearest neighbours can reduce the number of word pairs required to achieve satisfactory performance. As the English and Malay monolingual embeddings are pre-trained on informal language corpora, our proposed English-Malay embeddings alignment approach is also able to map non-standard Malay translations in the English nearest neighbours. 2022.acl-srw.16 lim-liew-2022-english + 10.18653/v1/2022.acl-srw.16 Towards Detecting Political Bias in <fixed-case>H</fixed-case>indi News Articles @@ -10181,6 +10897,7 @@ in the Case of Unambiguous Gender Political propaganda in recent times has been amplified by media news portals through biased reporting, creating untruthful narratives on serious issues causing misinformed public opinions with interests of siding and helping a particular political party. This issue proposes a challenging NLP task of detecting political bias in news articles.We propose a transformer-based transfer learning method to fine-tune the pre-trained network on our data for this bias detection. As the required dataset for this particular task was not available, we created our dataset comprising 1388 Hindi news articles and their headlines from various Hindi news media outlets. We marked them on whether they are biased towards, against, or neutral to BJP, a political party, and the current ruling party at the centre in India. 2022.acl-srw.17 agrawal-etal-2022-towards + 10.18653/v1/2022.acl-srw.17 Restricted or Not: A General Training Framework for Neural Machine Translation @@ -10193,6 +10910,7 @@ in the Case of Unambiguous Gender 2022.acl-srw.18 li-etal-2022-restricted ASPEC + 10.18653/v1/2022.acl-srw.18 What do Models Learn From Training on More Than Text? Measuring Visual Commonsense Knowledge @@ -10203,6 +10921,7 @@ in the Case of Unambiguous Gender 2022.acl-srw.19 hagstrom-johansson-2022-models lovhag/measure-visual-commonsense-knowledge + 10.18653/v1/2022.acl-srw.19 <fixed-case>T</fixed-case>elugu<fixed-case>NER</fixed-case>: Leveraging Multi-Domain Named Entity Recognition with Deep Transformers @@ -10215,6 +10934,7 @@ in the Case of Unambiguous Gender 2022.acl-srw.20 duggenpudi-etal-2022-teluguner WikiAnn + 10.18653/v1/2022.acl-srw.20 Using Neural Machine Translation Methods for Sign Language Translation @@ -10226,6 +10946,7 @@ in the Case of Unambiguous Gender 2022.acl-srw.21 angelova-etal-2022-using PHOENIX14T + 10.18653/v1/2022.acl-srw.21 Flexible Visual Grounding @@ -10242,6 +10963,7 @@ in the Case of Unambiguous Gender RefCOCO Visual Genome Visual7W + 10.18653/v1/2022.acl-srw.22 A large-scale computational study of content preservation measures for text style transfer and paraphrase generation @@ -10254,6 +10976,7 @@ in the Case of Unambiguous Gender 2022.acl-srw.23 babakov-etal-2022-large skoltech-nlp/mutual_implication_score + 10.18653/v1/2022.acl-srw.23 Explicit Object Relation Alignment for Vision and Language Navigation @@ -10264,6 +10987,7 @@ in the Case of Unambiguous Gender 2022.acl-srw.24 zhang-kordjamshidi-2022-explicit hlr/object-grounding-for-vln + 10.18653/v1/2022.acl-srw.24 Mining Logical Event Schemas From Pre-Trained Language Models @@ -10274,6 +10998,7 @@ in the Case of Unambiguous Gender 2022.acl-srw.25 lawley-schubert-2022-mining FrameNet + 10.18653/v1/2022.acl-srw.25 Exploring Cross-lingual Text Detoxification with Large Multilingual Language Models. @@ -10284,6 +11009,7 @@ in the Case of Unambiguous Gender Detoxification is a task of generating text in polite style while preserving meaning and fluency of the original toxic text. Existing detoxification methods are monolingual i.e. designed to work in one exact language. This work investigates multilingual and cross-lingual detoxification and the behavior of large multilingual models in this setting. Unlike previous works we aim to make large language models able to perform detoxification without direct fine-tuning in a given language. Experiments show that multilingual models are capable of performing multilingual style transfer. However, tested state-of-the-art models are not able to perform cross-lingual detoxification and direct fine-tuning on exact language is currently inevitable and motivating the need of further research in this direction. 2022.acl-srw.26 moskovskiy-etal-2022-exploring + 10.18653/v1/2022.acl-srw.26 <fixed-case>MEKER</fixed-case>: Memory Efficient Knowledge Embedding Representation for Link Prediction and Question Answering @@ -10298,6 +11024,7 @@ in the Case of Unambiguous Gender chekalina-etal-2022-meker FB15k-237 SimpleQuestions + 10.18653/v1/2022.acl-srw.27 Discourse on <fixed-case>ASR</fixed-case> Measurement: Introducing the <fixed-case>ARPOCA</fixed-case> Assessment Tool @@ -10307,6 +11034,7 @@ in the Case of Unambiguous Gender Automatic speech recognition (ASR) has evolved from a pipeline architecture with pronunciation dictionaries, phonetic features and language models to the end-to-end systems performing a direct translation from a raw waveform into a word sequence. With the increase in accuracy and the availability of pre-trained models, the ASR systems are now omnipresent in our daily applications. On the other hand, the models’ interpretability and their computational cost have become more challenging, particularly when dealing with less-common languages or identifying regional variations of speakers. This research proposal will follow a four-stage process: 1) Proving an overview of acoustic features and feature extraction algorithms; 2) Exploring current ASR models, tools, and performance assessment techniques; 3) Aligning features with interpretable phonetic transcripts; and 4) Designing a prototype ARPOCA to increase awareness of regional language variation and improve models feedback by developing a semi-automatic acoustic features extraction using PRAAT in conjunction with phonetic transcription. 2022.acl-srw.28 merz-scrivner-2022-discourse + 10.18653/v1/2022.acl-srw.28 Pretrained Knowledge Base Embeddings for improved Sentential Relation Extraction @@ -10319,6 +11047,7 @@ in the Case of Unambiguous Gender 2022.acl-srw.29 papaluca-etal-2022-pretrained brunoliegibastonliegi/pretrained-kb-embeddings-for-re + 10.18653/v1/2022.acl-srw.29 Improving Cross-domain, Cross-lingual and Multi-modal Deception Detection @@ -10329,6 +11058,7 @@ in the Case of Unambiguous Gender 2022.acl-srw.30 panda-levitan-2022-improving LIAR + 10.18653/v1/2022.acl-srw.30 Automatic Generation of Distractors for Fill-in-the-Blank Exercises with Round-Trip Neural Machine Translation @@ -10340,6 +11070,7 @@ in the Case of Unambiguous Gender In a fill-in-the-blank exercise, a student is presented with a carrier sentence with one word hidden, and a multiple-choice list that includes the correct answer and several inappropriate options, called distractors. We propose to automatically generate distractors using round-trip neural machine translation: the carrier sentence is translated from English into another (pivot) language and back, and distractors are produced by aligning the original sentence and its round-trip translation. We show that using hundreds of translations for a given sentence allows us to generate a rich set of challenging distractors. Further, using multiple pivot languages produces a diverse set of candidates. The distractors are evaluated against a real corpus of cloze exercises and checked manually for validity. We demonstrate that the proposed method significantly outperforms two strong baselines. 2022.acl-srw.31 panda-etal-2022-automatic + 10.18653/v1/2022.acl-srw.31 On the Locality of Attention in Direct Speech Translation @@ -10351,6 +11082,7 @@ in the Case of Unambiguous Gender Transformers have achieved state-of-the-art results across multiple NLP tasks. However, the self-attention mechanism complexity scales quadratically with the sequence length, creating an obstacle for tasks involving long sequences, like in the speech domain. In this paper, we discuss the usefulness of self-attention for Direct Speech Translation. First, we analyze the layer-wise token contributions in the self-attention of the encoder, unveiling local diagonal patterns. To prove that some attention weights are avoidable, we propose to substitute the standard self-attention with a local efficient one, setting the amount of context used based on the results of the analysis. With this approach, our model matches the baseline performance, and improves the efficiency by skipping the computation of those weights that standard attention discards. 2022.acl-srw.32 alastruey-etal-2022-locality + 10.18653/v1/2022.acl-srw.32 Extraction of Diagnostic Reasoning Relations for Clinical Knowledge Graphs @@ -10359,6 +11091,7 @@ in the Case of Unambiguous Gender Clinical knowledge graphs lack meaningful diagnostic relations (e.g. comorbidities, sign/symptoms), limiting their ability to represent real-world diagnostic processes. Previous methods in biomedical relation extraction have focused on concept relations, such as gene-disease and disease-drug, and largely ignored clinical processes. In this thesis, we leverage a clinical reasoning ontology and propose methods to extract such relations from a physician-facing point-of-care reference wiki and consumer health resource texts. Given the lack of data labeled with diagnostic relations, we also propose new methods of evaluating the correctness of extracted triples in the zero-shot setting. We describe a process for the intrinsic evaluation of new facts by triple confidence filtering and clinician manual review, as well extrinsic evaluation in the form of a differential diagnosis prediction task. 2022.acl-srw.33 socrates-2022-extraction + 10.18653/v1/2022.acl-srw.33 Scene-Text Aware Image and Text Retrieval with Dual-Encoder @@ -10372,6 +11105,7 @@ in the Case of Unambiguous Gender 2022.acl-srw.34 miyawaki-etal-2022-scene TextCaps + 10.18653/v1/2022.acl-srw.34 Towards Fine-grained Classification of Climate Change related Social Media Text @@ -10382,6 +11116,7 @@ in the Case of Unambiguous Gender With climate change becoming a cause of concern worldwide, it becomes essential to gauge people’s reactions. This can help educate and spread awareness about it and help leaders improve decision-making. This work explores the fine-grained classification and Stance detection of climate change-related social media text. Firstly, we create two datasets, ClimateStance and ClimateEng, consisting of 3777 tweets each, posted during the 2019 United Nations Framework Convention on Climate Change and comprehensively outline the dataset collection, annotation methodology, and dataset composition. Secondly, we propose the task of Climate Change stance detection based on our proposed ClimateStance dataset. Thirdly, we propose a fine-grained classification based on the ClimateEng dataset, classifying social media text into five categories: Disaster, Ocean/Water, Agriculture/Forestry, Politics, and General. We benchmark both the datasets for climate change stance detection and fine-grained classification using state-of-the-art methods in text classification. We also create a Reddit-based dataset for both the tasks, ClimateReddit, consisting of 6262 pseudo-labeled comments along with 329 manually annotated comments for the label. We then perform semi-supervised experiments for both the tasks and benchmark their results using the best-performing model for the supervised experiments. Lastly, we provide insights into the ClimateStance and ClimateReddit using part-of-speech tagging and named-entity recognition. 2022.acl-srw.35 vaid-etal-2022-towards + 10.18653/v1/2022.acl-srw.35 Deep Neural Representations for Multiword Expressions Detection @@ -10391,6 +11126,7 @@ in the Case of Unambiguous Gender Effective methods for multiword expressions detection are important for many technologies related to Natural Language Processing. Most contemporary methods are based on the sequence labeling scheme applied to an annotated corpus, while traditional methods use statistical measures. In our approach, we want to integrate the concepts of those two approaches. We present a novel weakly supervised multiword expressions extraction method which focuses on their behaviour in various contexts. Our method uses a lexicon of English multiword lexical units acquired from The Oxford Dictionary of English as a reference knowledge base and leverages neural language modelling with deep learning architectures. In our approach, we do not need a corpus annotated specifically for the task. The only required components are: a lexicon of multiword units, a large corpus, and a general contextual embeddings model. We propose a method for building a silver dataset by spotting multiword expression occurrences and acquiring statistical collocations as negative samples. Sample representation has been inspired by representations used in Natural Language Inference and relation recognition. Very good results (F1=0.8) were obtained with CNN network applied to individual occurrences followed by weighted voting used to combine results from the whole corpus.The proposed method can be quite easily applied to other languages. 2022.acl-srw.36 kanclerz-piasecki-2022-deep + 10.18653/v1/2022.acl-srw.36 A Checkpoint on Multilingual Misogyny Identification @@ -10400,6 +11136,7 @@ in the Case of Unambiguous Gender We address the problem of identifying misogyny in tweets in mono and multilingual settings in three languages: English, Italian, and Spanish. We explore model variations considering single and multiple languages both in the pre-training of the transformer and in the training of the downstream taskto explore the feasibility of detecting misogyny through a transfer learning approach across multiple languages. That is, we train monolingual transformers with monolingual data, and multilingual transformers with both monolingual and multilingual data.Our models reach state-of-the-art performance on all three languages. The single-language BERT models perform the best, closely followed by different configurations of multilingual BERT models. The performance drops in zero-shot classification across languages. Our error analysis shows that multilingual and monolingual models tend to make the same mistakes. 2022.acl-srw.37 muti-barron-cedeno-2022-checkpoint + 10.18653/v1/2022.acl-srw.37 Using dependency parsing for few-shot learning in distributional semantics @@ -10409,6 +11146,7 @@ in the Case of Unambiguous Gender In this work, we explore the novel idea of employing dependency parsing information in the context of few-shot learning, the task of learning the meaning of a rare word based on a limited amount of context sentences. Firstly, we use dependency-based word embedding models as background spaces for few-shot learning. Secondly, we introduce two few-shot learning methods which enhance the additive baseline model by using dependencies. 2022.acl-srw.38 preda-emerson-2022-using + 10.18653/v1/2022.acl-srw.38 A Dataset and <fixed-case>BERT</fixed-case>-based Models for Targeted Sentiment Analysis on <fixed-case>T</fixed-case>urkish Texts @@ -10418,6 +11156,7 @@ in the Case of Unambiguous Gender Targeted Sentiment Analysis aims to extract sentiment towards a particular target from a given text. It is a field that is attracting attention due to the increasing accessibility of the Internet, which leads people to generate an enormous amount of data. Sentiment analysis, which in general requires annotated data for training, is a well-researched area for widely studied languages such as English. For low-resource languages such as Turkish, there is a lack of such annotated data. We present an annotated Turkish dataset suitable for targeted sentiment analysis. We also propose BERT-based models with different architectures to accomplish the task of targeted sentiment analysis. The results demonstrate that the proposed models outperform the traditional sentiment analysis models for the targeted sentiment analysis task. 2022.acl-srw.39 mutlu-ozgur-2022-dataset + 10.18653/v1/2022.acl-srw.39 @@ -10449,6 +11188,7 @@ in the Case of Unambiguous Gender 2022.acl-demo.1 lin-etal-2022-dotat fxlp/marktool + 10.18653/v1/2022.acl-demo.1 <fixed-case>UKP</fixed-case>-<fixed-case>SQUARE</fixed-case>: An Online Platform for Question Answering Research @@ -10475,6 +11215,7 @@ in the Case of Unambiguous Gender MS MARCO Natural Questions SQuAD + 10.18653/v1/2022.acl-demo.2 <fixed-case>V</fixed-case>i<fixed-case>LM</fixed-case>edic: a framework for research at the intersection of vision and language in medical <fixed-case>AI</fixed-case> @@ -10494,6 +11235,7 @@ in the Case of Unambiguous Gender jbdel/vilmedic PadChest Visual Question Answering + 10.18653/v1/2022.acl-demo.3 <fixed-case>T</fixed-case>ext<fixed-case>P</fixed-case>runer: A Model Pruning Toolkit for Pre-Trained Language Models @@ -10504,6 +11246,7 @@ in the Case of Unambiguous Gender Pre-trained language models have been prevailed in natural language processing and become the backbones of many NLP tasks, but the demands for computational resources have limited their applications. In this paper, we introduce TextPruner, an open-source model pruning toolkit designed for pre-trained language models, targeting fast and easy model compression. TextPruner offers structured post-training pruning methods, including vocabulary pruning and transformer pruning, and can be applied to various models and tasks. We also propose a self-supervised pruning method that can be applied without the labeled data. Our experiments with several NLP tasks demonstrate the ability of TextPruner to reduce the model size without re-training the model. 2022.acl-demo.4 yang-etal-2022-textpruner + 10.18653/v1/2022.acl-demo.4 <fixed-case>A</fixed-case>nn<fixed-case>IE</fixed-case>: An Annotation Platform for Constructing Complete Open Information Extraction Benchmark @@ -10519,6 +11262,7 @@ in the Case of Unambiguous Gender 2022.acl-demo.5 friedrich-etal-2022-annie nfriedri/annie-annotation-platform + 10.18653/v1/2022.acl-demo.5 <fixed-case>A</fixed-case>dapter<fixed-case>H</fixed-case>ub Playground: Simple and Flexible Few-Shot Learning with Adapters @@ -10539,6 +11283,7 @@ in the Case of Unambiguous Gender IMDb Movie Reviews MRPC SST + 10.18653/v1/2022.acl-demo.6 <fixed-case>Q</fixed-case>iu<fixed-case>N</fixed-case>iu: A <fixed-case>C</fixed-case>hinese Lyrics Generation System with Passage-Level Input @@ -10550,6 +11295,7 @@ in the Case of Unambiguous Gender Lyrics generation has been a very popular application of natural language generation. Previous works mainly focused on generating lyrics based on a couple of attributes or keywords, rendering very limited control over the content of the lyrics. In this paper, we demonstrate the QiuNiu, a Chinese lyrics generation system which is conditioned on passage-level text rather than a few attributes or keywords. By using the passage-level text as input, the content of generated lyrics is expected to reflect the nuances of users’ needs. The QiuNiu system supports various forms of passage-level input, such as short stories, essays, poetry. The training of it is conducted under the framework of unsupervised machine translation, due to the lack of aligned passage-level text-to-lyrics corpus. We initialize the parameters of QiuNiu with a custom pretrained Chinese GPT-2 model and adopt a two-step process to finetune the model for better alignment between passage-level text and lyrics. Additionally, a postprocess module is used to filter and rerank the generated lyrics to select the ones of highest quality. The demo video of the system is available at https://youtu.be/OCQNzahqWgM. 2022.acl-demo.7 zhang-etal-2022-qiuniu + 10.18653/v1/2022.acl-demo.7 Automatic Gloss Dictionary for Sign Language Learners @@ -10563,6 +11309,7 @@ in the Case of Unambiguous Gender 2022.acl-demo.8 xu-etal-2022-automatic WLASL + 10.18653/v1/2022.acl-demo.8 <fixed-case>P</fixed-case>rompt<fixed-case>S</fixed-case>ource: An Integrated Development Environment and Repository for Natural Language Prompts @@ -10599,6 +11346,7 @@ in the Case of Unambiguous Gender bach-etal-2022-promptsource bigscience-workshop/promptsource SNLI + 10.18653/v1/2022.acl-demo.9 <fixed-case>O</fixed-case>pen<fixed-case>P</fixed-case>rompt: An Open-source Framework for Prompt-learning @@ -10615,6 +11363,7 @@ in the Case of Unambiguous Gender ding-etal-2022-openprompt thunlp/OpenPrompt GLUE + 10.18653/v1/2022.acl-demo.10 Guided K-best Selection for Semantic Parsing Annotation @@ -10632,6 +11381,7 @@ in the Case of Unambiguous Gender Collecting data for conversational semantic parsing is a time-consuming and demanding process. In this paper we consider, given an incomplete dataset with only a small amount of data, how to build an AI-powered human-in-the-loop process to enable efficient data collection. A guided K-best selection process is proposed, which (i) generates a set of possible valid candidates; (ii) allows users to quickly traverse the set and filter incorrect parses; and (iii) asks users to select the correct parse, with minimal modification when necessary. We investigate how to best support users in efficiently traversing the candidate set and locating the correct parse, in terms of speed and accuracy. In our user study, consisting of five annotators labeling 300 instances each, we find that combining keyword searching, where keywords can be used to query relevant candidates, and keyword suggestion, where representative keywords are automatically generated, enables fast and accurate annotation. 2022.acl-demo.11 belyy-etal-2022-guided + 10.18653/v1/2022.acl-demo.11 Hard and Soft Evaluation of <fixed-case>NLP</fixed-case> models with <fixed-case>BOO</fixed-case>t<fixed-case>ST</fixed-case>rap <fixed-case>SA</fixed-case>mpling - <fixed-case>B</fixed-case>oo<fixed-case>S</fixed-case>t<fixed-case>S</fixed-case>a @@ -10643,6 +11393,7 @@ in the Case of Unambiguous Gender Natural Language Processing (NLP) ‘s applied nature makes it necessary to select the most effective and robust models. Producing slightly higher performance is insufficient; we want to know whether this advantage will carry over to other data sets. Bootstrapped significance tests can indicate that ability.So while necessary, computing the significance of models’ performance differences has many levels of complexity. It can be tedious, especially when the experimental design has many conditions to compare and several runs of experiments.We present BooStSa, a tool that makes it easy to compute significance levels with the BOOtSTrap SAmpling procedure to evaluate models that predict not only standard hard labels but soft-labels (i.e., probability distributions over different classes) as well. 2022.acl-demo.12 fornaciari-etal-2022-hard + 10.18653/v1/2022.acl-demo.12 <fixed-case>COVID</fixed-case>-19 Claim Radar: A Structured Claim Extraction and Tracking System @@ -10659,6 +11410,7 @@ in the Case of Unambiguous Gender 2022.acl-demo.13 li-etal-2022-covid uiucnlp/covid-claim-radar + 10.18653/v1/2022.acl-demo.13 <fixed-case>TS</fixed-case>-<fixed-case>ANNO</fixed-case>: An Annotation Tool to Build, Annotate and Evaluate Text Simplification Corpora @@ -10670,6 +11422,7 @@ in the Case of Unambiguous Gender stodden-kallmeyer-2022-ts ASSET ASSET Corpus + 10.18653/v1/2022.acl-demo.14 Language Diversity: Visible to Humans, Exploitable by Machines @@ -10683,6 +11436,7 @@ in the Case of Unambiguous Gender The Universal Knowledge Core (UKC) is a large multilingual lexical database with a focus on language diversity and covering over two thousand languages. The aim of the database, as well as its tools and data catalogue, is to make the abstract notion of linguistic diversity visually understandable for humans and formally exploitable by machines. The UKC website lets users explore millions of individual words and their meanings, but also phenomena of cross-lingual convergence and divergence, such as shared interlingual meanings, lexicon similarities, cognate clusters, or lexical gaps. The UKC LiveLanguage Catalogue, in turn, provides access to the underlying lexical data in a computer-processable form, ready to be reused in cross-lingual applications. 2022.acl-demo.15 bella-etal-2022-language + 10.18653/v1/2022.acl-demo.15 <fixed-case>C</fixed-case>og<fixed-case>KGE</fixed-case>: A Knowledge Graph Embedding Toolkit and Benchmark for Representing Multi-source and Heterogeneous Knowledge @@ -10703,6 +11457,7 @@ in the Case of Unambiguous Gender jinzhuoran/cogkge ConceptNet FrameNet + 10.18653/v1/2022.acl-demo.16 Dynatask: A Framework for Creating Dynamic <fixed-case>AI</fixed-case> Benchmark Tasks @@ -10724,6 +11479,7 @@ in the Case of Unambiguous Gender ANLI AdversarialQA GLUE + 10.18653/v1/2022.acl-demo.17 <fixed-case>D</fixed-case>ata<fixed-case>L</fixed-case>ab: A Platform for Data Analysis and Intervention @@ -10741,6 +11497,7 @@ in the Case of Unambiguous Gender xiao-etal-2022-datalab BeerAdvocate SNLI + 10.18653/v1/2022.acl-demo.18 Cue-bot: A Conversational Agent for Assistive Technology @@ -10755,6 +11512,7 @@ in the Case of Unambiguous Gender Intelligent conversational assistants have become an integral part of our lives for performing simple tasks. However, such agents, for example, Google bots, Alexa and others are yet to have any social impact on minority population, for example, for people with neurological disorders and people with speech, language and social communication disorders, sometimes with locked-in states where speaking or typing is a challenge. Language model technologies can be very powerful tools in enabling these users to carry out daily communication and social interactions. In this work, we present a system that users with varied levels of disabilties can use to interact with the world, supported by eye-tracking, mouse controls and an intelligent agent Cue-bot, that can represent the user in a conversation. The agent provides relevant controllable ‘cues’ to generate desirable responses quickly for an ongoing dialog context. In the context of usage of such systems for people with degenerative disorders, we present automatic and human evaluation of our cue/keyword predictor and the controllable dialog system and show that our models perform significantly better than models without control and can also reduce user effort (fewer keystrokes) and speed up communication (typing time) significantly. 2022.acl-demo.19 h-kumar-etal-2022-cue + 10.18653/v1/2022.acl-demo.19 <fixed-case>M</fixed-case>-<fixed-case>SENA</fixed-case>: An Integrated Platform for Multimodal Sentiment Analysis @@ -10772,6 +11530,7 @@ in the Case of Unambiguous Gender CH-SIMS CMU-MOSEI Multimodal Opinionlevel Sentiment Intensity + 10.18653/v1/2022.acl-demo.20 <fixed-case>HOSMEL</fixed-case>: A Hot-Swappable Modularized Entity Linking Toolkit for <fixed-case>C</fixed-case>hinese @@ -10788,6 +11547,7 @@ in the Case of Unambiguous Gender zhang-li-etal-2022-hosmel thudm/hosmel CLUE + 10.18653/v1/2022.acl-demo.21 <fixed-case>BMI</fixed-case>nf: An Efficient Toolkit for Big Model Inference and Tuning @@ -10805,6 +11565,7 @@ in the Case of Unambiguous Gender 2022.acl-demo.22 han-etal-2022-bminf openbmb/bminf + 10.18653/v1/2022.acl-demo.22 <fixed-case>MMEKG</fixed-case>: Multi-modal Event Knowledge Graph towards Universal Representation across Modalities @@ -10824,6 +11585,7 @@ in the Case of Unambiguous Gender 2022.acl-demo.23 ma-etal-2022-mmekg FrameNet + 10.18653/v1/2022.acl-demo.23 <fixed-case>S</fixed-case>ocio<fixed-case>F</fixed-case>illmore: A Tool for Discovering Perspectives @@ -10836,6 +11598,7 @@ in the Case of Unambiguous Gender SOCIOFILLMORE is a multilingual tool which helps to bring to the fore the focus or the perspective that a text expresses in depicting an event. Our tool, whose rationale we also support through a large collection of human judgements, is theoretically grounded on frame semantics and cognitive linguistics, and implemented using the LOME frame semantic parser. We describe SOCIOFILLMORE’s development and functionalities, show how non-NLP researchers can easily interact with the tool, and present some example case studies which are already incorporated in the system, together with the kind of analysis that can be visualised. 2022.acl-demo.24 minnema-etal-2022-sociofillmore + 10.18653/v1/2022.acl-demo.24 <fixed-case>T</fixed-case>ime<fixed-case>LM</fixed-case>s: Diachronic Language Models from <fixed-case>T</fixed-case>witter @@ -10850,6 +11613,7 @@ in the Case of Unambiguous Gender loureiro-etal-2022-timelms cardiffnlp/timelms TweetEval + 10.18653/v1/2022.acl-demo.25 Adaptor: Objective-Centric Adaptation Framework for Language Models @@ -10862,6 +11626,7 @@ in the Case of Unambiguous Gender 2022.acl-demo.26 stefanik-etal-2022-adaptor gaussalgo/adaptor + 10.18653/v1/2022.acl-demo.26 <fixed-case>Q</fixed-case>uick<fixed-case>G</fixed-case>raph: A Rapid Annotation Tool for Knowledge Graph Extraction from Technical Text @@ -10873,6 +11638,7 @@ in the Case of Unambiguous Gender 2022.acl-demo.27 bikaun-etal-2022-quickgraph nlp-tlp/quickgraph + 10.18653/v1/2022.acl-demo.27 @@ -10905,6 +11671,7 @@ in the Case of Unambiguous Gender 2022.acl-tutorials.1 church-etal-2022-gentle GLUE + 10.18653/v1/2022.acl-tutorials.1 Towards Reproducible Machine Learning Research in Natural Language Processing @@ -10920,6 +11687,7 @@ in the Case of Unambiguous Gender While recent progress in the field of ML has been significant, the reproducibility of these cutting-edge results is often lacking, with many submissions lacking the necessary information in order to ensure subsequent reproducibility. Despite proposals such as the Reproducibility Checklist and reproducibility criteria at several major conferences, the reflex for carrying out research with reproducibility in mind is lacking in the broader ML community. We propose this tutorial as a gentle introduction to ensuring reproducible research in ML, with a specific emphasis on computational linguistics and NLP. We also provide a framework for using reproducibility as a teaching tool in university-level computer science programs. 2022.acl-tutorials.2 lucic-etal-2022-towards + 10.18653/v1/2022.acl-tutorials.2 Knowledge-Augmented Methods for Natural Language Processing @@ -10936,6 +11704,7 @@ in the Case of Unambiguous Gender CommonGen CommonsenseQA ConceptNet + 10.18653/v1/2022.acl-tutorials.3 Non-Autoregressive Sequence Generation @@ -10945,6 +11714,7 @@ in the Case of Unambiguous Gender Non-autoregressive sequence generation (NAR) attempts to generate the entire or partial output sequences in parallel to speed up the generation process and avoid potential issues (e.g., label bias, exposure bias) in autoregressive generation. While it has received much research attention and has been applied in many sequence generation tasks in natural language and speech, naive NAR models still face many challenges to close the performance gap between state-of-the-art autoregressive models because of a lack of modeling power. In this tutorial, we will provide a thorough introduction and review of non-autoregressive sequence generation, in four sections: 1) Background, which covers the motivation of NAR generation, the problem definition, the evaluation protocol, and the comparison with standard autoregressive generation approaches. 2) Method, which includes different aspects: model architecture, objective function, training data, learning paradigm, and additional inference tricks. 3) Application, which covers different tasks in text and speech generation, and some advanced topics in applications. 4) Conclusion, in which we describe several research challenges and discuss the potential future research directions. We hope this tutorial can serve both academic researchers and industry practitioners working on non-autoregressive sequence generation. 2022.acl-tutorials.4 gu-tan-2022-non + 10.18653/v1/2022.acl-tutorials.4 Learning with Limited Text Data @@ -10955,6 +11725,7 @@ in the Case of Unambiguous Gender Natural Language Processing (NLP) has achieved great progress in the past decade on the basis of neural models, which often make use of large amounts of labeled data to achieve state-of-the-art performance. The dependence on labeled data prevents NLP models from being applied to low-resource settings and languages because of the time, money, and expertise that is often required to label massive amounts of textual data. Consequently, the ability to learn with limited labeled data is crucial for deploying neural systems to real-world NLP applications. Recently, numerous approaches have been explored to alleviate the need for labeled data in NLP such as data augmentation and semi-supervised learning. This tutorial aims to provide a systematic and up-to-date overview of these methods in order to help researchers and practitioners understand the landscape of approaches and the challenges associated with learning from limited labeled data, an emerging topic in the computational linguistics community. We will consider applications to a wide variety of NLP tasks (including text classification, generation, and structured prediction) and will highlight current challenges and future directions. 2022.acl-tutorials.5 yang-etal-2022-learning + 10.18653/v1/2022.acl-tutorials.5 Zero- and Few-Shot <fixed-case>NLP</fixed-case> with Pretrained Language Models @@ -10967,6 +11738,7 @@ in the Case of Unambiguous Gender The ability to efficiently learn from little-to-no data is critical to applying NLP to tasks where data collection is costly or otherwise difficult. This is a challenging setting both academically and practically—particularly because training neutral models typically require large amount of labeled data. More recently, advances in pretraining on unlabelled data have brought up the potential of better zero-shot or few-shot learning (Devlin et al., 2019; Brown et al., 2020). In particular, over the past year, a great deal of research has been conducted to better learn from limited data using large-scale language models. In this tutorial, we aim at bringing interested NLP researchers up to speed about the recent and ongoing techniques for zero- and few-shot learning with pretrained language models. Additionally, our goal is to reveal new research opportunities to the audience, which will hopefully bring us closer to address existing challenges in this domain. 2022.acl-tutorials.6 beltagy-etal-2022-zero + 10.18653/v1/2022.acl-tutorials.6 Vision-Language Pretraining: Current Trends and the Future @@ -10978,6 +11750,7 @@ in the Case of Unambiguous Gender 2022.acl-tutorials.7 agrawal-etal-2022-vision Visual Question Answering + 10.18653/v1/2022.acl-tutorials.7 Natural Language Processing for Multilingual Task-Oriented Dialogue @@ -10990,6 +11763,7 @@ in the Case of Unambiguous Gender Recent advances in deep learning have also enabled fast progress in the research of task-oriented dialogue (ToD) systems. However, the majority of ToD systems are developed for English and merely a handful of other widely spoken languages, e.g., Chinese and German. This hugely limits the global reach and, consequently, transformative socioeconomic potential of such systems. In this tutorial, we will thus discuss and demonstrate the importance of (building) multilingual ToD systems, and then provide a systematic overview of current research gaps, challenges and initiatives related to multilingual ToD systems, with a particular focus on their connections to current research and challenges in multilingual and low-resource NLP. The tutorial will aim to provide answers or shed new light to the following questions: a) Why are multilingual dialogue systems so hard to build: what makes multilinguality for dialogue more challenging than for other NLP applications and tasks? b) What are the best existing methods and datasets for multilingual and cross-lingual (task-oriented) dialog systems? How are (multilingual) ToD systems usually evaluated? c) What are the promising future directions for multilingual ToD research: where can one draw inspiration from related NLP areas and tasks? 2022.acl-tutorials.8 razumovskaia-etal-2022-natural + 10.18653/v1/2022.acl-tutorials.8 diff --git a/data/xml/2022.bigscience.xml b/data/xml/2022.bigscience.xml index 4b10218a6d..ae20e9fc45 100644 --- a/data/xml/2022.bigscience.xml +++ b/data/xml/2022.bigscience.xml @@ -33,6 +33,7 @@ jin-etal-2022-lifelong S2ORC SciERC + 10.18653/v1/2022.bigscience-1.1 Using <fixed-case>ASR</fixed-case>-Generated Text for Spoken Language Modeling @@ -47,6 +48,7 @@ This papers aims at improving spoken language modeling (LM) using very large amount of automatically transcribed speech. We leverage the INA (French National Audiovisual Institute) collection and obtain 19GB of text after applying ASR on 350,000 hours of diverse TV shows. From this, spoken language models are trained either by fine-tuning an existing LM (FlauBERT) or through training a LM from scratch.The new models (FlauBERT-Oral) will be shared with the community and are evaluated not only in terms of word prediction accuracy but also for two downstream tasks : classification of TV shows and syntactic parsing of speech. Experimental results show that FlauBERT-Oral is better than its initial FlauBERT version demonstrating that, despite its inherent noisy nature, ASR-Generated text can be useful to improve spoken language modeling. 2022.bigscience-1.2 herve-etal-2022-using + 10.18653/v1/2022.bigscience-1.2 You reap what you sow: On the Challenges of Bias Evaluation Under Multilingual Settings @@ -71,6 +73,7 @@ 2022.bigscience-1.3 talat-etal-2022-reap CrowS-Pairs + 10.18653/v1/2022.bigscience-1.3 Diverse Lottery Tickets Boost Ensemble from a Single Pretrained Model @@ -87,6 +90,7 @@ SQuAD SST SuperGLUE + 10.18653/v1/2022.bigscience-1.4 <fixed-case>UNIREX</fixed-case>: A Unified Learning Framework for Language Model Rationale Extraction @@ -107,6 +111,7 @@ MultiRC SST e-SNLI + 10.18653/v1/2022.bigscience-1.5 Pipelines for Social Bias Testing of Large Language Models @@ -117,6 +122,7 @@ The maturity level of language models is now at a stage in which many companies rely on them to solve various tasks. However, while research has shown how biased and harmful these models are, systematic ways of integrating social bias tests into development pipelines are still lacking. This short paper suggests how to use these verification techniques in development pipelines. We take inspiration from software testing and suggest addressing social bias evaluation as software testing. We hope to open a discussion on the best methodologies to handle social bias testing in language models. 2022.bigscience-1.6 nozza-etal-2022-pipelines + 10.18653/v1/2022.bigscience-1.6 Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0 @@ -131,6 +137,7 @@ In this work, we explore whether the recently demonstrated zero-shot abilities of the T0 model extend to Named Entity Recognition for out-of-distribution languages and time periods. Using a historical newspaper corpus in 3 languages as test-bed, we use prompts to extract possible named entities. Our results show that a naive approach for prompt-based zero-shot multilingual Named Entity Recognition is error-prone, but highlights the potential of such an approach for historical languages lacking labeled datasets. Moreover, we also find that T0-like models can be probed to predict the publication date and language of a document, which could be very relevant for the study of historical texts. 2022.bigscience-1.7 de-toni-etal-2022-entities + 10.18653/v1/2022.bigscience-1.7 A Holistic Assessment of the Carbon Footprint of Noor, a Very Large <fixed-case>A</fixed-case>rabic Language Model @@ -144,6 +151,7 @@ 2022.bigscience-1.8 lakim-etal-2022-holistic CCNet + 10.18653/v1/2022.bigscience-1.8 <fixed-case>GPT</fixed-case>-<fixed-case>N</fixed-case>eo<fixed-case>X</fixed-case>-20<fixed-case>B</fixed-case>: An Open-Source Autoregressive Language Model @@ -179,6 +187,7 @@ PROST SuperGLUE The Pile + 10.18653/v1/2022.bigscience-1.9 Dataset Debt in Biomedical Language Modeling @@ -200,6 +209,7 @@ fries-etal-2022-dataset BLUE BLURB + 10.18653/v1/2022.bigscience-1.10 Emergent Structures and Training Dynamics in Large Language Models @@ -214,6 +224,7 @@ Large language models have achieved success on a number of downstream tasks, particularly in a few and zero-shot manner. As a consequence, researchers have been investigating both the kind of information these networks learn and how such information can be encoded in the parameters of the model. We survey the literature on changes in the network during training, drawing from work outside of NLP when necessary, and on learned representations of linguistic features in large language models. We note in particular the lack of sufficient research on the emergence of functional units, subsections of the network where related functions are grouped or organised, within large language models and motivate future work that grounds the study of language models in an analysis of their changing internal structure during training time. 2022.bigscience-1.11 teehan-etal-2022-emergent + 10.18653/v1/2022.bigscience-1.11 Foundation Models of Scientific Knowledge for Chemistry: Opportunities, Challenges and Lessons Learned @@ -245,6 +256,7 @@ WSC WebText WiC + 10.18653/v1/2022.bigscience-1.12 diff --git a/data/xml/2022.bionlp.xml b/data/xml/2022.bionlp.xml index dd540d0613..482668f3aa 100644 --- a/data/xml/2022.bionlp.xml +++ b/data/xml/2022.bionlp.xml @@ -27,6 +27,7 @@ The healthcare domain suffers from the spread of poor quality articles on the Internet. While manual efforts exist to check the quality of online healthcare articles, they are not sufficient to assess all those in circulation. Such quality assessment can be automated as a text classification task, however, explanations for the labels are necessary for the users to trust the model predictions. While current explainable systems tackle explanation generation as summarization, we propose a new approach based on question answering (QA) that allows us to generate explanations for multiple criteria using a single model. We show that this QA-based approach is competitive with the current state-of-the-art, and complements summarization-based models for explainable quality assessment. We also introduce a human evaluation protocol more appropriate than automatic metrics for the evaluation of explanation generation models. 2022.bionlp-1.1 boissonnet-etal-2022-explainable + 10.18653/v1/2022.bionlp-1.1 A sequence-to-sequence approach for document-level relation extraction @@ -41,6 +42,7 @@ BC5CDR CDR DocRED + 10.18653/v1/2022.bionlp-1.2 Position-based Prompting for Health Outcome Generation @@ -52,6 +54,7 @@ Probing factual knowledge in Pre-trained Language Models (PLMs) using prompts has indirectly implied that language models (LMs) can be treated as knowledge bases. To this end, this phenomenon has been effective, especially when these LMs are fine-tuned towards not just data, but also to the style or linguistic pattern of the prompts themselves. We observe that satisfying a particular linguistic pattern in prompts is an unsustainable, time-consuming constraint in the probing task, especially because they are often manually designed and the range of possible prompt template patterns can vary depending on the prompting task. To alleviate this constraint, we propose using a position-attention mechanism to capture positional information of each word in a prompt relative to the mask to be filled, hence avoiding the need to re-construct prompts when the prompts’ linguistic pattern changes. Using our approach, we demonstrate the ability of eliciting answers (in a case study on health outcome generation) to not only common prompt templates like Cloze and Prefix but also rare ones too, such as Postfix and Mixed patterns whose masks are respectively at the start and in multiple random places of the prompt. More so, using various biomedical PLMs, our approach consistently outperforms a baseline in which the default PLMs representation is used to predict masked tokens. 2022.bionlp-1.3 abaho-etal-2022-position + 10.18653/v1/2022.bionlp-1.3 How You Say It Matters: Measuring the Impact of Verbal Disfluency Tags on Automated Dementia Detection @@ -63,6 +66,7 @@ 2022.bionlp-1.4 farzana-etal-2022-say ashwindeshpande96/measuring_the_impact_of_verbal_disfluency_tags_on_automated_dementia_detection + 10.18653/v1/2022.bionlp-1.4 Zero-Shot Aspect-Based Scientific Document Summarization using Self-Supervised Pre-training @@ -76,6 +80,7 @@ soleimani-etal-2022-zero asoleimanib/zeroshotaspectbased FacetSum + 10.18653/v1/2022.bionlp-1.5 Data Augmentation for Biomedical Factoid Question Answering @@ -90,6 +95,7 @@ BIOMRC BioASQ SQuAD + 10.18653/v1/2022.bionlp-1.6 Slot Filling for Biomedical Information Extraction @@ -104,6 +110,7 @@ ypapanik/biomedical-slot-filling KILT Natural Questions + 10.18653/v1/2022.bionlp-1.7 Automatic Biomedical Term Clustering by Learning Fine-grained Term Representations @@ -116,6 +123,7 @@ zeng-etal-2022-automatic GanjinZero/CODER BC5CDR + 10.18653/v1/2022.bionlp-1.8 <fixed-case>B</fixed-case>io<fixed-case>BART</fixed-case>: Pretraining and Evaluation of A Biomedical Generative Language Model @@ -138,6 +146,7 @@ MeQSum MedMentions Semantic Scholar + 10.18653/v1/2022.bionlp-1.9 Incorporating Medical Knowledge to Transformer-based Language Models for Medical Dialogue Generation @@ -150,6 +159,7 @@ Medical dialogue systems have the potential to assist doctors in expanding access to medical care, improving the quality of patient experiences, and lowering medical expenses. The computational methods are still in their early stages and are not ready for widespread application despite their great potential. Existing transformer-based language models have shown promising results but lack domain-specific knowledge. However, to diagnose like doctors, an automatic medical diagnosis necessitates more stringent requirements for the rationality of the dialogue in the context of relevant knowledge. In this study, we propose a new method that addresses the challenges of medical dialogue generation by incorporating medical knowledge into transformer-based language models. We present a method that leverages an external medical knowledge graph and injects triples as domain knowledge into the utterances. Automatic and human evaluation on a publicly available dataset demonstrates that incorporating medical knowledge outperforms several state-of-the-art baseline methods. 2022.bionlp-1.10 naseem-etal-2022-incorporating + 10.18653/v1/2022.bionlp-1.10 Memory-aligned Knowledge Graph for Clinically Accurate Radiology Image Report Generation @@ -158,6 +168,7 @@ Automatic generating the clinically accurate radiology report from X-ray images is important but challenging. The identification of multi-grained abnormal regions in image and corresponding abnormalities is difficult for data-driven neural models. In this work, we introduce a Memory-aligned Knowledge Graph (MaKG) of clinical abnormalities to better learn the visual patterns of abnormalities and their relationships by integrating it into a deep model architecture for the report generation. We carry out extensive experiments and show that the proposed MaKG deep model can improve the clinical accuracy of the generated reports. 2022.bionlp-1.11 yan-2022-memory + 10.18653/v1/2022.bionlp-1.11 Simple Semantic-based Data Augmentation for Named Entity Recognition in Biomedical Texts @@ -167,6 +178,7 @@ Data augmentation is important in addressing data sparsity and low resources in NLP. Unlike data augmentation for other tasks such as sentence-level and sentence-pair ones, data augmentation for named entity recognition (NER) requires preserving the semantic of entities. To that end, in this paper we propose a simple semantic-based data augmentation method for biomedical NER. Our method leverages semantic information from pre-trained language models for both entity-level and sentence-level. Experimental results on two datasets: i2b2-2010 (English) and VietBioNER (Vietnamese) showed that the proposed method could improve NER performance. 2022.bionlp-1.12 phan-nguyen-2022-simple + 10.18653/v1/2022.bionlp-1.12 Auxiliary Learning for Named Entity Recognition with Multiple Auxiliary Biomedical Training Data @@ -181,6 +193,7 @@ 2022.bionlp-1.13 watanabe-etal-2022-auxiliary NCBI Disease + 10.18653/v1/2022.bionlp-1.13 <fixed-case>SNP</fixed-case>2<fixed-case>V</fixed-case>ec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study @@ -196,6 +209,7 @@ 2022.bionlp-1.14 cahyawijaya-etal-2022-snp2vec hltchkust/snp2vec + 10.18653/v1/2022.bionlp-1.14 Biomedical <fixed-case>NER</fixed-case> using Novel Schema and Distant Supervision @@ -207,6 +221,7 @@ Biomedical Named Entity Recognition (BMNER) is one of the most important tasks in the field of biomedical text mining. Most work so far on this task has not focused on identification of discontinuous and overlapping entities, even though they are present in significant fractions in real-life biomedical datasets. In this paper, we introduce a novel annotation schema to capture complex entities, and explore the effects of distant supervision on our deep-learning sequence labelling model. For BMNER task, our annotation schema outperforms other BIO-based annotation schemes on the same model. We also achieve higher F1-scores than state-of-the-art models on multiple corpora without fine-tuning embeddings, highlighting the efficacy of neural feature extraction using our model. 2022.bionlp-1.15 khandelwal-etal-2022-biomedical + 10.18653/v1/2022.bionlp-1.15 Improving Supervised Drug-Protein Relation Extraction with Distantly Supervised Models @@ -217,6 +232,7 @@ This paper proposes novel drug-protein relation extraction models that indirectly utilize distant supervision data. Concretely, instead of adding distant supervision data to the manually annotated training data, our models incorporate distantly supervised models that are relation extraction models trained with distant supervision data. Distantly supervised learning has been proposed to generate a large amount of pseudo-training data at low cost. However, there is still a problem of low prediction performance due to the inclusion of mislabeled data. Therefore, several methods have been proposed to suppress the effects of noisy cases by utilizing some manually annotated training data. However, their performance is lower than that of supervised learning on manually annotated data because mislabeled data that cannot be fully suppressed becomes noise when training the model. To overcome this issue, our methods indirectly utilize distant supervision data with manually annotated training data. The experimental results on the DrugProt corpus in the BioCreative VII Track 1 showed that our proposed model can consistently improve the supervised models in different settings. 2022.bionlp-1.16 iinuma-etal-2022-improving + 10.18653/v1/2022.bionlp-1.16 Named Entity Recognition for Cancer Immunology Research Using Distant Supervision @@ -227,6 +243,7 @@ Cancer immunology research involves several important cell and protein factors. Extracting the information of such cells and proteins and the interactions between them from text are crucial in text mining for cancer immunology research. However, there are few available datasets for these entities, and the amount of annotated documents is not sufficient compared with other major named entity types. In this work, we introduce our automatically annotated dataset of key named entities, i.e., T-cells, cytokines, and transcription factors, which engages the recent cancer immunotherapy. The entities are annotated based on the UniProtKB knowledge base using dictionary matching. We build a neural named entity recognition (NER) model to be trained on this dataset and evaluate it on a manually-annotated data. Experimental results show that we can achieve a promising NER performance even though our data is automatically annotated. Our dataset also enhances the NER performance when combined with existing data, especially gaining improvement in yet investigated named entities such as cytokines and transcription factors. 2022.bionlp-1.17 trieu-etal-2022-named + 10.18653/v1/2022.bionlp-1.17 Intra-Template Entity Compatibility based Slot-Filling for Clinical Trial Information Extraction @@ -236,6 +253,7 @@ We present a deep learning based information extraction system that can extract the design and results of a published abstract describing a Randomized Controlled Trial (RCT). In contrast to other approaches, our system does not regard the PICO elements as flat objects or labels but as structured objects. We thus model the task as the one of filling a set of templates and slots; our two-step approach recognizes relevant slot candidates as a first step and assigns them to a corresponding template as second step, relying on a learned pairwise scoring function that models the compatibility of the different slot values. We evaluate the approach on a dataset of 211 manually annotated abstracts for type 2 Diabetes and Glaucoma, showing the positive impact of modelling intra-template entity compatibility. As main benefit, our approach yields a structured object for every RCT abstract that supports the aggregation and summarization of clinical trial results across published studies and can facilitate the task of creating a systematic review or meta-analysis. 2022.bionlp-1.18 witte-cimiano-2022-intra + 10.18653/v1/2022.bionlp-1.18 Pretrained Biomedical Language Models for Clinical <fixed-case>NLP</fixed-case> in <fixed-case>S</fixed-case>panish @@ -253,6 +271,7 @@ 2022.bionlp-1.19 carrino-etal-2022-pretrained PlanTL-GOB-ES/lm-biomedical-clinical-es + 10.18653/v1/2022.bionlp-1.19 Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts @@ -268,6 +287,7 @@ amin-etal-2022-shot suamin/t2ner CoNLL 2002 + 10.18653/v1/2022.bionlp-1.20 <fixed-case>VPAI</fixed-case>_<fixed-case>L</fixed-case>ab at <fixed-case>M</fixed-case>ed<fixed-case>V</fixed-case>id<fixed-case>QA</fixed-case> 2022: A Two-Stage Cross-modal Fusion Method for Medical Instructional Video Classification @@ -283,6 +303,7 @@ lireanstar/medvidcl Kinetics MedVidQA + 10.18653/v1/2022.bionlp-1.21 <fixed-case>G</fixed-case>en<fixed-case>C</fixed-case>ompare<fixed-case>S</fixed-case>um: a hybrid unsupervised summarization method using salience @@ -298,6 +319,7 @@ Pubmed S2ORC arXiv + 10.18653/v1/2022.bionlp-1.22 <fixed-case>B</fixed-case>io<fixed-case>C</fixed-case>ite: A Deep Learning-based Citation Linkage Framework for Biomedical Research Articles @@ -307,6 +329,7 @@ Research papers reflect scientific advances. Citations are widely used in research publications to support the new findings and show their benefits, while also regulating the information flow to make the contents clearer for the audience. A citation in a research article refers to the information’s source, but not the specific text span from that source article. In biomedical research articles, this task is challenging as the same chemical or biological component can be represented in multiple ways in different papers from various domains. This paper suggests a mechanism for linking citing sentences in a publication with cited sentences in referenced sources. The framework presented here pairs the citing sentence with all of the sentences in the reference text, and then tries to retrieve the semantically equivalent pairs. These semantically related sentences from the reference paper are chosen as the cited statements. This effort involves designing a citation linkage framework utilizing sequential and tree-structured siamese deep learning models. This paper also provides a method to create a synthetic corpus for such a task. 2022.bionlp-1.23 singha-roy-mercer-2022-biocite + 10.18653/v1/2022.bionlp-1.23 Low Resource Causal Event Detection from Biomedical Literature @@ -318,6 +341,7 @@ Recognizing causal precedence relations among the chemical interactions in biomedical literature is crucial to understanding the underlying biological mechanisms. However, detecting such causal relation can be hard because: (1) many times, such causal relations among events are not explicitly expressed by certain phrases but implicitly implied by very diverse expressions in the text, and (2) annotating such causal relation detection datasets requires considerable expert knowledge and effort. In this paper, we propose a strategy to address both challenges by training neural models with in-domain pre-training and knowledge distillation. We show that, by using very limited amount of labeled data, and sufficient amount of unlabeled data, the neural models outperform previous baselines on the causal precedence detection task, and are ten times faster at inference compared to the BERT base model. 2022.bionlp-1.24 liang-etal-2022-low + 10.18653/v1/2022.bionlp-1.24 Overview of the <fixed-case>M</fixed-case>ed<fixed-case>V</fixed-case>id<fixed-case>QA</fixed-case> 2022 Shared Task on Medical Video Question-Answering @@ -329,6 +353,7 @@ gupta-demner-fushman-2022-overview HowTo100M MedVidQA + 10.18653/v1/2022.bionlp-1.25 Inter-annotator agreement is not the ceiling of machine learning performance: Evidence from a comprehensive set of simulations @@ -339,6 +364,7 @@ It is commonly claimed that inter-annotator agreement (IAA) is the ceiling of machine learning (ML) performance, i.e., that the agreement between an ML system’s predictions and an annotator can not be higher than the agreement between two annotators. Although Boguslav & Cohen (2017) showed that this claim is falsified by many real-world ML systems, the claim has persisted. As a complement to this real-world evidence, we conducted a comprehensive set of simulations, and show that an ML model can beat IAA even if (and especially if) annotators are noisy and differ in their underlying classification functions, as long as the ML model is reasonably well-specified. Although the latter condition has long been elusive, leading ML models to underperform IAA, we anticipate that this condition will be increasingly met in the era of big data and deep learning. Our work has implications for (1) maximizing the value of machine learning, (2) adherence to ethical standards in computing, and (3) economical use of annotated resources, which is paramount in settings where annotation is especially expensive, like biomedical natural language processing. 2022.bionlp-1.26 richie-etal-2022-inter + 10.18653/v1/2022.bionlp-1.26 Conversational Bots for Psychotherapy: A Study of Generative Transformer Models Using Domain-specific Dialogues @@ -356,6 +382,7 @@ 2022.bionlp-1.27 das-etal-2022-conversational WebText + 10.18653/v1/2022.bionlp-1.27 <fixed-case>BEEDS</fixed-case>: Large-Scale Biomedical Event Extraction using Distant Supervision and Question Answering @@ -367,6 +394,7 @@ 2022.bionlp-1.28 wang-etal-2022-beeds wangxii/beeds + 10.18653/v1/2022.bionlp-1.28 Data Augmentation for Rare Symptoms in Vaccine Side-Effect Detection @@ -376,6 +404,7 @@ We study the problem of entity detection and normalization applied to patient self-reports of symptoms that arise as side-effects of vaccines. Our application domain presents unique challenges that render traditional classification methods ineffective: the number of entity types is large; and many symptoms are rare, resulting in a long-tail distribution of training examples per entity type. We tackle these challenges with an autoregressive model that generates standardized names of symptoms. We introduce a data augmentation technique to increase the number of training examples for rare symptoms. Experiments on real-life patient vaccine symptom self-reports show that our approach outperforms strong baselines, and that additional examples improve performance on the long-tail entities. 2022.bionlp-1.29 kim-nakashole-2022-data + 10.18653/v1/2022.bionlp-1.29 Improving <fixed-case>R</fixed-case>omanian <fixed-case>B</fixed-case>io<fixed-case>NER</fixed-case> Using a Biologically Inspired System @@ -385,6 +414,7 @@ Recognition of named entities present in text is an important step towards information extraction and natural language understanding. This work presents a named entity recognition system for the Romanian biomedical domain. The system makes use of a new and extended version of SiMoNERo corpus, that is open sourced. Also, the best system is available for direct usage in the RELATE platform. 2022.bionlp-1.30 mitrofan-pais-2022-improving + 10.18653/v1/2022.bionlp-1.30 <fixed-case>B</fixed-case>angla<fixed-case>B</fixed-case>io<fixed-case>M</fixed-case>ed: A Biomedical Named-Entity Annotated Corpus for <fixed-case>B</fixed-case>angla (<fixed-case>B</fixed-case>engali) @@ -394,6 +424,7 @@ 2022.bionlp-1.31 sazzed-2022-banglabiomed CoWeSe + 10.18653/v1/2022.bionlp-1.31 <fixed-case>ICDB</fixed-case>ig<fixed-case>B</fixed-case>ird: A Contextual Embedding Model for <fixed-case>ICD</fixed-case> Code Classification @@ -406,6 +437,7 @@ The International Classification of Diseases (ICD) system is the international standard for classifying diseases and procedures during a healthcare encounter and is widely used for healthcare reporting and management purposes. Assigning correct codes for clinical procedures is important for clinical, operational and financial decision-making in healthcare. Contextual word embedding models have achieved state-of-the-art results in multiple NLP tasks. However, these models have yet to achieve state-of-the-art results in the ICD classification task since one of their main disadvantages is that they can only process documents that contain a small number of tokens which is rarely the case with real patient notes. In this paper, we introduce ICDBigBird a BigBird-based model which can integrate a Graph Convolutional Network (GCN), that takes advantage of the relations between ICD codes in order to create ‘enriched’ representations of their embeddings, with a BigBird contextual model that can process larger documents. Our experiments on a real-world clinical dataset demonstrate the effectiveness of our BigBird-based model on the ICD classification task as it outperforms the previous state-of-the-art models. 2022.bionlp-1.32 michalopoulos-etal-2022-icdbigbird + 10.18653/v1/2022.bionlp-1.32 Doctor <fixed-case>XA</fixed-case>v<fixed-case>I</fixed-case>er: Explainable Diagnosis on Physician-Patient Dialogues and <fixed-case>XAI</fixed-case> Evaluation @@ -416,6 +448,7 @@ 2022.bionlp-1.33 ngai-rudzicz-2022-doctor hillary-ngai/doctor_xavier + 10.18653/v1/2022.bionlp-1.33 <fixed-case>DISTANT</fixed-case>-<fixed-case>CTO</fixed-case>: A Zero Cost, Distantly Supervised Approach to Improve Low-Resource Entity Extraction Using Clinical Trials Literature @@ -425,6 +458,7 @@ PICO recognition is an information extraction task for identifying participant, intervention, comparator, and outcome information from clinical literature. Manually identifying PICO information is the most time-consuming step for conducting systematic reviews (SR), which is already labor-intensive. A lack of diversified and large, annotated corpora restricts innovation and adoption of automated PICO recognition systems. The largest-available PICO entity/span corpus is manually annotated which is too expensive for a majority of the scientific community. To break through the bottleneck, we propose DISTANT-CTO, a novel distantly supervised PICO entity extraction approach using the clinical trials literature, to generate a massive weakly-labeled dataset with more than a million ‘Intervention’ and ‘Comparator’ entity annotations. We train distant NER (named-entity recognition) models using this weakly-labeled dataset and demonstrate that it outperforms even the sophisticated models trained on the manually annotated dataset with a 2% F1 improvement over the Intervention entity of the PICO benchmark and more than 5% improvement when combined with the manually annotated dataset. We investigate the generalizability of our approach and gain an impressive F1 score on another domain-specific PICO benchmark. The approach is not only zero-cost but is also scalable for a constant stream of PICO entity annotations. 2022.bionlp-1.34 dhrangadhariya-muller-2022-distant + 10.18653/v1/2022.bionlp-1.34 <fixed-case>E</fixed-case>cho<fixed-case>G</fixed-case>en: Generating Conclusions from Echocardiogram Notes @@ -440,6 +474,7 @@ 2022.bionlp-1.35 tang-etal-2022-echogen MIMIC-III + 10.18653/v1/2022.bionlp-1.35 Quantifying Clinical Outcome Measures in Patients with Epilepsy Using the Electronic Health Record @@ -451,6 +486,7 @@ A wealth of important clinical information lies untouched in the Electronic Health Record, often in the form of unstructured textual documents. For patients with Epilepsy, such information includes outcome measures like Seizure Frequency and Dates of Last Seizure, key parameters that guide all therapy for these patients. Transformer models have been able to extract such outcome measures from unstructured clinical note text as sentences with human-like accuracy; however, these sentences are not yet usable in a quantitative analysis for large-scale studies. In this study, we developed a pipeline to quantify these outcome measures. We used text summarization models to convert unstructured sentences into specific formats, and then employed rules-based quantifiers to calculate seizure frequencies and dates of last seizure. We demonstrated that our pipeline of models does not excessively propagate errors and we analyzed its mistakes. We anticipate that our methods can be generalized outside of epilepsy to other disorders to drive large-scale clinical research. 2022.bionlp-1.36 xie-etal-2022-quantifying + 10.18653/v1/2022.bionlp-1.36 Comparing Encoder-Only and Encoder-Decoder Transformers for Relation Extraction from Biomedical Texts: An Empirical Study on Ten Benchmark Datasets @@ -462,6 +498,7 @@ 2022.bionlp-1.37 sarrouti-etal-2022-comparing DDI + 10.18653/v1/2022.bionlp-1.37 Utility Preservation of Clinical Text After De-Identification @@ -471,6 +508,7 @@ Electronic health records contain valuable information about symptoms, diagnosis, treatment and outcomes of the treatments of individual patients. However, the records may also contain information that can reveal the identity of the patients. Removing these identifiers - the Protected Health Information (PHI) - can protect the identity of the patient. Automatic de-identification is a process which employs machine learning techniques to detect and remove PHI. However, automatic techniques are imperfect in their precision and introduce noise into the data. This study examines the impact of this noise on the utility of Swedish de-identified clinical data by using human evaluators and by training and testing BERT models. Our results indicate that de-identification does not harm the utility for clinical NLP and that human evaluators are less sensitive to noise from de-identification than expected. 2022.bionlp-1.38 vakili-dalianis-2022-utility + 10.18653/v1/2022.bionlp-1.38 Horses to Zebras: Ontology-Guided Data Augmentation and Synthesis for <fixed-case>ICD</fixed-case>-9 Coding @@ -483,6 +521,7 @@ 2022.bionlp-1.39 falis-etal-2022-horses MIMIC-III + 10.18653/v1/2022.bionlp-1.39 Towards Automatic Curation of Antibiotic Resistance Genes via Statement Extraction from Scientific Papers: A Benchmark Dataset and Models @@ -495,6 +534,7 @@ 2022.bionlp-1.40 chandak-etal-2022-towards vt-nlp/sciarg + 10.18653/v1/2022.bionlp-1.40 Model Distillation for Faithful Explanations of Medical Code Predictions @@ -505,6 +545,7 @@ Machine learning models that offer excellent predictive performance often lack the interpretability necessary to support integrated human machine decision-making. In clinical medicine and other high-risk settings, domain experts may be unwilling to trust model predictions without explanations. Work in explainable AI must balance competing objectives along two different axes: 1) Models should ideally be both accurate and simple. 2) Explanations must balance faithfulness to the model’s decision-making with their plausibility to a domain expert. We propose to use knowledge distillation, or training a student model that mimics the behavior of a trained teacher model, as a technique to generate faithful and plausible explanations. We evaluate our approach on the task of assigning ICD codes to clinical notes to demonstrate that the student model is faithful to the teacher model’s behavior and produces quality natural language explanations. 2022.bionlp-1.41 wood-doughty-etal-2022-model + 10.18653/v1/2022.bionlp-1.41 Towards Generalizable Methods for Automating Risk Score Calculation @@ -521,6 +562,7 @@ liang-etal-2022-towards MIMIC-III emrQA + 10.18653/v1/2022.bionlp-1.42 <fixed-case>D</fixed-case>o<fixed-case>SSIER</fixed-case> at <fixed-case>M</fixed-case>ed<fixed-case>V</fixed-case>id<fixed-case>QA</fixed-case> 2022: Text-based Approaches to Medical Video Answer Localization Problem @@ -534,6 +576,7 @@ 2022.bionlp-1.43 kusa-etal-2022-dossier MedVidQA + 10.18653/v1/2022.bionlp-1.43 diff --git a/data/xml/2022.cmcl.xml b/data/xml/2022.cmcl.xml index ae2be7a3ad..b5d65c0ebc 100644 --- a/data/xml/2022.cmcl.xml +++ b/data/xml/2022.cmcl.xml @@ -31,6 +31,7 @@ DannyMerkx/speech2image COCO ImageNet + 10.18653/v1/2022.cmcl-1.1 A Neural Model for Compositional Word Embeddings and Sentence Processing @@ -40,6 +41,7 @@ We propose a new neural model for word embeddings, which uses Unitary Matrices as the primary device for encoding lexical information. It uses simple matrix multiplication to derive matrices for large units, yielding a sentence processing model that is strictly compositional, does not lose information over time steps, and is transparent, in the sense that word embeddings can be analysed regardless of context. This model does not employ activation functions, and so the network is fully accessible to analysis by the methods of linear algebra at each point in its operation on an input sequence. We test it in two NLP agreement tasks and obtain rule like perfect accuracy, with greater stability than current state-of-the-art systems. Our proposed model goes some way towards offering a class of computationally powerful deep learning systems that can be fully understood and compared to human cognitive processes for natural language learning and representation. 2022.cmcl-1.2 lappin-bernardy-2022-neural + 10.18653/v1/2022.cmcl-1.2 Visually Grounded Interpretation of Noun-Noun Compounds in <fixed-case>E</fixed-case>nglish @@ -52,6 +54,7 @@ 2022.cmcl-1.3 lang-etal-2022-visually ImageNet + 10.18653/v1/2022.cmcl-1.3 Less Descriptive yet Discriminative: Quantifying the Properties of Multimodal Referring Utterances via <fixed-case>CLIP</fixed-case> @@ -63,6 +66,7 @@ 2022.cmcl-1.4 takmaz-etal-2022-less ecekt/clip-desc-disc + 10.18653/v1/2022.cmcl-1.4 Codenames as a Game of Co-occurrence Counting @@ -75,6 +79,7 @@ 2022.cmcl-1.5 cserhati-etal-2022-codenames xerevity/codenamesagent + 10.18653/v1/2022.cmcl-1.5 Estimating word co-occurrence probabilities from pretrained static embeddings using a log-bilinear model @@ -83,6 +88,7 @@ We investigate how to use pretrained static word embeddings to deliver improved estimates of bilexical co-occurrence probabilities: conditional probabilities of one word given a single other word in a specific relationship. Such probabilities play important roles in psycholinguistics, corpus linguistics, and usage-based cognitive modeling of language more generally. We propose a log-bilinear model taking pretrained vector representations of the two words as input, enabling generalization based on the distributional information contained in both vectors. We show that this model outperforms baselines in estimating probabilities of adjectives given nouns that they attributively modify, and probabilities of nominal direct objects given their head verbs, given limited training data in Arabic, English, Korean, and Spanish. 2022.cmcl-1.6 futrell-2022-estimating + 10.18653/v1/2022.cmcl-1.6 Modeling the Relationship between Input Distributions and Learning Trajectories with the Tolerance Principle @@ -91,6 +97,7 @@ Child language learners develop with remarkable uniformity, both in their learning trajectories and ultimate outcomes, despite major differences in their learning environments. In this paper, we explore the role that the frequencies and distributions of irregular lexical items in the input plays in driving learning trajectories. We conclude that while the Tolerance Principle, a type-based model of productivity learning, accounts for inter-learner uniformity, it also interacts with input distributions to drive cross-linguistic variation in learning trajectories. 2022.cmcl-1.7 kodner-2022-modeling + 10.18653/v1/2022.cmcl-1.7 Predicting scalar diversity with context-driven uncertainty over alternatives @@ -101,6 +108,7 @@ Scalar implicature (SI) arises when a speaker uses an expression (e.g., “some”) that is semantically compatible with a logically stronger alternative on the same scale (e.g., “all”), leading the listener to infer that they did not intend to convey the stronger meaning. Prior work has demonstrated that SI rates are highly variable across scales, raising the question of what factors determine the SI strength for a particular scale. Here, we test the hypothesis that SI rates depend on the listener’s confidence in the underlying scale, which we operationalize as uncertainty over the distribution of possible alternatives conditioned on the context. We use a T5 model fine-tuned on a text infilling task to estimate this distribution. We find that scale uncertainty predicts human SI rates, measured as entropy over the sampled alternatives and over latent classes among alternatives in sentence embedding space. Furthermore, we do not find a significant effect of the surprisal of the strong scalemate. Our results suggest that pragmatic inferences depend on listeners’ context-driven uncertainty over alternatives. 2022.cmcl-1.8 hu-etal-2022-predicting + 10.18653/v1/2022.cmcl-1.8 Eye Gaze and Self-attention: How Humans and Transformers Attend Words in Sentences @@ -119,6 +127,7 @@ GLUE MovieQA SuperGLUE + 10.18653/v1/2022.cmcl-1.9 About Time: Do Transformers Learn Temporal Verbal Aspect? @@ -130,6 +139,7 @@ 2022.cmcl-1.10 metheniti-etal-2022-time lenakmeth/telicity_classification + 10.18653/v1/2022.cmcl-1.10 Poirot at <fixed-case>CMCL</fixed-case> 2022 Shared Task: Zero Shot Crosslingual Eye-Tracking Data Prediction using Multilingual Transformer Models @@ -138,6 +148,7 @@ Eye tracking data during reading is a useful source of information to understand the cognitive processes that take place during language comprehension processes. Different languages account for different cognitive triggers, however there seems to be some uniform indicatorsacross languages. In this paper, we describe our submission to the CMCL 2022 shared task on predicting human reading patterns for multi-lingual dataset. Our model uses text representations from transformers and some hand engineered features with a regression layer on top to predict statistical measures of mean and standard deviation for 2 main eye-tracking features. We train an end-to-end model to extract meaningful information from different languages and test our model on two separate datasets. We compare different transformer models andshow ablation studies affecting model performance. Our final submission ranked 4th place for SubTask-1 and 1st place for SubTask-2 forthe shared task. 2022.cmcl-1.11 srivastava-2022-poirot + 10.18653/v1/2022.cmcl-1.11 <fixed-case>NU</fixed-case> <fixed-case>HLT</fixed-case> at <fixed-case>CMCL</fixed-case> 2022 Shared Task: Multilingual and Crosslingual Prediction of Human Reading Behavior in Universal Language Space @@ -147,6 +158,7 @@ 2022.cmcl-1.12 imperial-2022-nu imperialite/cmcl2022-unified-eye-tracking-ipa + 10.18653/v1/2022.cmcl-1.12 <fixed-case>H</fixed-case>k<fixed-case>A</fixed-case>msters at <fixed-case>CMCL</fixed-case> 2022 Shared Task: Predicting Eye-Tracking Data from a Gradient Boosting Framework with Linguistic Features @@ -157,6 +169,7 @@ Eye movement data are used in psycholinguistic studies to infer information regarding cognitive processes during reading. In this paper, we describe our proposed method for the Shared Task of Cognitive Modeling and Computational Linguistics (CMCL) 2022 - Subtask 1, which involves data from multiple datasets on 6 languages. We compared different regression models using features of the target word and its previous word, and target word surprisal as regression features. Our final system, using a gradient boosting regressor, achieved the lowest mean absolute error (MAE), resulting in the best system of the competition. 2022.cmcl-1.13 salicchi-etal-2022-hkamsters + 10.18653/v1/2022.cmcl-1.13 <fixed-case>CMCL</fixed-case> 2022 Shared Task on Multilingual and Crosslingual Prediction of Human Reading Behavior @@ -170,6 +183,7 @@ We present the second shared task on eye-tracking data prediction of the Cognitive Modeling and Computational Linguistics Workshop (CMCL). Differently from the previous edition, participating teams are asked to predict eye-tracking features from multiple languages, including a surprise language for which there were no available training data. Moreover, the task also included the prediction of standard deviations of feature values in order to account for individual differences between readers.A total of six teams registered to the task. For the first subtask on multilingual prediction, the winning team proposed a regression model based on lexical features, while for the second subtask on cross-lingual prediction, the winning team used a hybrid model based on a multilingual transformer embeddings as well as statistical features. 2022.cmcl-1.14 hollenstein-etal-2022-cmcl + 10.18653/v1/2022.cmcl-1.14 Team <fixed-case>ÚFAL</fixed-case> at <fixed-case>CMCL</fixed-case> 2022 Shared Task: Figuring out the correct recipe for predicting Eye-Tracking features using Pretrained Language Models @@ -180,6 +194,7 @@ Eye-Tracking data is a very useful source of information to study cognition and especially language comprehension in humans. In this paper, we describe our systems for the CMCL 2022 shared task on predicting eye-tracking information. We describe our experiments withpretrained models like BERT and XLM and the different ways in which we used those representations to predict four eye-tracking features. Along with analysing the effect of using two different kinds of pretrained multilingual language models and different ways of pooling the token-level representations, we also explore how contextual information affects the performance of the systems. Finally, we also explore if factors like augmenting linguistic information affect the predictions. Our submissions achieved an average MAE of 5.72 and ranked 5th in the shared task. The average MAE showed further reduction to 5.25 in post task evaluation. 2022.cmcl-1.15 bhattacharya-etal-2022-team + 10.18653/v1/2022.cmcl-1.15 Team <fixed-case>DMG</fixed-case> at <fixed-case>CMCL</fixed-case> 2022 Shared Task: Transformer Adapters for the Multi- and Cross-Lingual Prediction of Human Reading Behavior @@ -188,6 +203,7 @@ In this paper, we present the details of our approaches that attained the second place in the shared task of the ACL 2022 Cognitive Modeling and Computational Linguistics Workshop. The shared task is focused on multi- and cross-lingual prediction of eye movement features in human reading behavior, which could provide valuable information regarding language processing. To this end, we train ‘adapters’ inserted into the layers of frozen transformer-based pretrained language models. We find that multilingual models equipped with adapters perform well in predicting eye-tracking features. Our results suggest that utilizing language- and task-specific adapters is beneficial and translating test sets into similar languages that exist in the training set could help with zero-shot transferability in the prediction of human reading behavior. 2022.cmcl-1.16 takmaz-2022-team + 10.18653/v1/2022.cmcl-1.16 diff --git a/data/xml/2022.computel.xml b/data/xml/2022.computel.xml index e198a7aca7..6b262d154e 100644 --- a/data/xml/2022.computel.xml +++ b/data/xml/2022.computel.xml @@ -31,6 +31,7 @@ In this paper we present the speech corpus for the Siberian Ingrian Finnish language. The speech corpus includes audio data, annotations, software tools for data-processing, two databases and a web application. We have published part of the audio data and annotations. The software tool for parsing annotation files and feeding a relational database is developed and published under a free license. A web application is developed and available. At this moment, about 300 words and 200 phrases can be displayed using this web application. 2022.computel-1.1 ubaleht-raudalainen-2022-development + 10.18653/v1/2022.computel-1.1 New syntactic insights for automated <fixed-case>W</fixed-case>olof <fixed-case>U</fixed-case>niversal <fixed-case>D</fixed-case>ependency parsing @@ -39,6 +40,7 @@ Focus on language-specific properties with insights from formal minimalist syntax can improve universal dependency (UD) parsing. Such improvements are especially sensitive for low-resource African languages, like Wolof, which have fewer UD treebanks in number and amount of annotations, and fewer contributing annotators. For two different UD parser pipelines, one parser model was trained on the original Wolof treebank, and one was trained on an edited treebank. For each parser pipeline, the accuracy of the edited treebank was higher than the original for both the dependency relations and dependency labels. Accuracy for universal dependency relations improved as much as 2.90%, while accuracy for universal dependency labels increased as much as 3.38%. An annotation scheme that better fits a language’s distinct syntax results in better parsing accuracy. 2022.computel-1.2 dyer-2022-new + 10.18653/v1/2022.computel-1.2 Corpus Development of Kiswahili Speech Recognition Test and Evaluation sets, Preemptively Mitigating Demographic Bias Through Collaboration with Linguists @@ -53,6 +55,7 @@ Language technologies, particularly speech technologies, are becoming more pervasive for access to digital platforms and resources. This brings to the forefront concerns of their inclusivity, first in terms of language diversity. Additionally, research shows speech recognition to be more accurate for men than for women and more accurate for individuals younger than 30 years of age than those older. In the Global South where languages are low resource, these same issues should be taken into consideration in data collection efforts to not replicate these mistakes. It is also important to note that in varying contexts within the Global South, this work presents additional nuance and potential for bias based on accents, related dialects and variants of a language. This paper documents i) the designing and execution of a Linguists Engagement for purposes of building an inclusive Kiswahili Speech Recognition dataset, representative of the diversity among speakers of the language ii) the unexpected yet key learning in terms of socio-linguistcs which demonstrate the importance of multi-disciplinarity in teams developing datasets and NLP technologies iii) the creation of a test dataset intended to be used for evaluating the performance of Speech Recognition models on demographic groups that are likely to be underrepresented. 2022.computel-1.3 siminyu-etal-2022-corpus + 10.18653/v1/2022.computel-1.3 <fixed-case>CLD</fixed-case>² Language Documentation Meets Natural Language Processing for Revitalising Endangered Languages @@ -63,6 +66,7 @@ Language revitalisation should not be understood as a direct outcome of language documentation, which is mainly focused on the creation of language repositories. Natural language processing (NLP) offers the potential to complement and exploit these repositories through the development of language technologies that may contribute to improving the vitality status of endangered languages. In this paper, we discuss the current state of the interaction between language documentation and computational linguistics, present a diagnosis of how the outputs of recent documentation projects for endangered languages are underutilised for the NLP community, and discuss how the situation could change from both the documentary linguistics and NLP perspectives. All this is introduced as a bridging paradigm dubbed as Computational Language Documentation and Development (CLD²). CLD² calls for (1) the inclusion of NLP-friendly annotated data as a deliverable of future language documentation projects; and (2) the exploitation of language documentation databases by the NLP community to promote the computerization of endangered languages, as one way to contribute to their revitalization. 2022.computel-1.4 zariquiey-etal-2022-cld2 + 10.18653/v1/2022.computel-1.4 One Wug, Two Wug+s Transformer Inflection Models Hallucinate Affixes @@ -72,6 +76,7 @@ Data augmentation strategies are increasingly important in NLP pipelines for low-resourced and endangered languages, and in neural morphological inflection, augmentation by so called data hallucination is a popular technique. This paper presents a detailed analysis of inflection models trained with and without data hallucination for the low-resourced Canadian Indigenous language Gitksan. Our analysis reveals evidence for a concatenative inductive bias in augmented models—in contrast to models trained without hallucination, they strongly prefer affixing inflection patterns over suppletive ones. We find that preference for affixation in general improves inflection performance in “wug test” like settings, where the model is asked to inflect lexemes missing from the training set. However, data hallucination dramatically reduces prediction accuracy for reduplicative forms due to a misanalysis of reduplication as affixation. While the overall impact of data hallucination for unseen lexemes remains positive, our findings call for greater qualitative analysis and more varied evaluation conditions in testing automatic inflection systems. Our results indicate that further innovations in data augmentation for computational morphology are desirable. 2022.computel-1.5 samir-silfverberg-2022-one + 10.18653/v1/2022.computel-1.5 Automated speech tools for helping communities process restricted-access corpora for language revival efforts @@ -88,6 +93,7 @@ Many archival recordings of speech from endangered languages remain unannotated and inaccessible to community members and language learning programs. One bottleneck is the time-intensive nature of annotation. An even narrower bottleneck occurs for recordings with access constraints, such as language that must be vetted or filtered by authorised community members before annotation can begin. We propose a privacy-preserving workflow to widen both bottlenecks for recordings where speech in the endangered language is intermixed with a more widely-used language such as English for meta-linguistic commentary and questions (e.g.What is the word for ‘tree’?). We integrate voice activity detection (VAD), spoken language identification (SLI), and automatic speech recognition (ASR) to transcribe the metalinguistic content, which an authorised person can quickly scan to triage recordings that can be annotated by people with lower levels of access. We report work-in-progress processing 136 hours archival audio containing a mix of English and Muruwari. Our collaborative work with the Muruwari custodian of the archival materials show that this workflow reduces metalanguage transcription time by 20% even given only minimal amounts of annotated training data, 10 utterances per language for SLI and for ASR at most 39 minutes, and possibly as little as 39 seconds. 2022.computel-1.6 san-etal-2022-automated + 10.18653/v1/2022.computel-1.6 <fixed-case>G</fixed-case><tex-math>_i</tex-math>2<fixed-case>P</fixed-case><tex-math>_i</tex-math> Rule-based, index-preserving grapheme-to-phoneme transformations @@ -105,6 +111,7 @@ This paper describes the motivation and implementation details for a rule-based, index-preserving grapheme-to-phoneme engine ‘G_i2P_i' implemented in pure Python and released under the open source MIT license. The engine and interface have been designed to prioritize the developer experience of potential contributors without requiring a high level of programming knowledge. ‘G_i2P_i' already provides mappings for 30 (mostly Indigenous) languages, and the package is accompanied by a web-based interactive development environment, a RESTful API, and extensive documentation to encourage the addition of more mappings in the future. We also present three downstream applications of ‘G_i2P_i' and show results of a preliminary evaluation. 2022.computel-1.7 pine-etal-2022-gi22pi + 10.18653/v1/2022.computel-1.7 Shallow Parsing for <fixed-case>N</fixed-case>epal <fixed-case>B</fixed-case>hasa Complement Clauses @@ -115,6 +122,7 @@ Accelerating the process of data collection, annotation, and analysis is an urgent need for linguistic fieldwork and documentation of endangered languages (Bird, 2009). Our experiments describe how we maximize the quality for the Nepal Bhasa syntactic complement structure chunking model. Native speaker language consultants were trained to annotate a minimally selected raw data set (Suárez et al.,2019). The embedded clauses, matrix verbs, and embedded verbs are annotated. We apply both statistical training algorithms and transfer learning in our training, including Naive Bayes, MaxEnt, and fine-tuning the pre-trained mBERT model (Devlin et al., 2018). We show that with limited annotated data, the model is already sufficient for the task. The modeling resources we used are largely available for many other endangered languages. The practice is easy to duplicate for training a shallow parser for other endangered languages in general. 2022.computel-1.8 zhang-etal-2022-shallow + 10.18653/v1/2022.computel-1.8 Using <fixed-case>LARA</fixed-case> to create image-based and phonetically annotated multimodal texts for endangered languages @@ -131,6 +139,7 @@ We describe recent extensions to the open source Learning And Reading Assistant (LARA) supporting image-based and phonetically annotated texts. We motivate the utility of these extensions both in general and specifically in relation to endangered and archaic languages, and illustrate with examples from the revived Australian language Barngarla, Icelandic Sign Language, Irish Gaelic, Old Norse manuscripts and Egyptian hieroglyphics. 2022.computel-1.9 bedi-etal-2022-using + 10.18653/v1/2022.computel-1.9 Recovering Text from Endangered Languages Corrupted <fixed-case>PDF</fixed-case> documents @@ -139,6 +148,7 @@ In this paper we present an approach to efficiently recover texts from corrupted documents of endangered languages. Textual resources for such languages are scarce, and sometimes the few available resources are corrupted PDF documents. Endangered languages are not supported by standard tools and present even the additional difficulties of not possessing any corpus over which to train language models to assist with the recovery. The approach presented is able to fully recover born digital PDF documents with minimal effort, thereby helping the preservation effort of endangered languages, by extending the range of documents usable for corpus building. 2022.computel-1.10 stefanovitch-2022-recovering + 10.18653/v1/2022.computel-1.10 Learning Through Transcription @@ -148,6 +158,7 @@ Transcribing speech for primarily oral, local languages is often a joint effort involving speakers and outsiders. It is commonly motivated by externally-defined scientific goals, alongside local motivations such as language acquisition and access to heritage materials. We explore the task of ‘learning through transcription’ through the design of a system for collaborative speech annotation. We have developed a prototype to support local and remote learner-speaker interactions in remote Aboriginal communities in northern Australia. We show that situated systems design for inclusive non-expert practice is a promising new direction for working with speakers of local languages. 2022.computel-1.11 bettinson-bird-2022-learning + 10.18653/v1/2022.computel-1.11 Developing a Part-Of-Speech tagger for te reo <fixed-case>M</fixed-case>āori @@ -160,6 +171,7 @@ This paper discusses the development of a Part-of-Speech tagger for te reo Māori which is the Indigenous language of Aotearoa, also known as New Zealand, see Morrison. Henceforth, Part-of-Speech will be referred to as POS throughout this paper and te reo Māori will be referred to as Māori, while Universal Dependencies will be referred to as UD. Prior to the development of this tagger, there was no POS tagger for Māori from Aotearoa. POS taggers tag words according to their syntactic or grammatical category. However, many traditional syntactic categories, and by consequence POS labels, do not “work for” Māori. By this we mean that, for some of the traditional categories, The definition of, or guidelines for, an existing category is not suitable for Māori. They do not have an existing category for certain word classes of Māori. They do not reflect a Māori worldview of the Māori language. We wanted a tagset that is usable with industry-wide tools, but we also needed a tagset that would meet the needs of Māori. Therefore, we based our tagset and guidelines on the UD tagset and tagging conventions, however the categorization of words has been significantly altered to be appropriate for Māori. This is because at the time of development of our POS tagger, the UD conventions had still not been used to tag a Polyneisan language such as Māori, nor did it provide any guidelines about how to tag them. To that end, we worked with highly-proficient, specially-selected Māori speakers and linguists who are specialists in Māori. This has ensured that our POS labels and guidelines conventions faithfully reflect a Māori speaker’s conceptualization of their language. 2022.computel-1.12 finn-etal-2022-developing + 10.18653/v1/2022.computel-1.12 Challenges and Perspectives for Innu-Aimun within Indigenous Language Technologies @@ -171,6 +183,7 @@ Innu-Aimun is an Algonquian language spoken in Eastern Canada. It is the language of the Innu, an Indigenous people that now lives for the most part in a dozen communities across Quebec and Labrador. Although it is alive, Innu-Aimun sees important preservation and revitalization challenges and issues. The state of its technology is still nascent, with very few existing applications. This paper proposes a first survey of the available linguistic resources and existing technology for Innu-Aimun. Considering the existing linguistic and textual resources, we argue that developing language technology is feasible and propose first steps towards NLP applications like machine translation. The goal of developing such technologies is first and foremost to help efforts in improving language transmission and cultural safety and preservation for Innu-Aimun speakers, as those are considered urgent and vital issues. Finally, we discuss the importance of close collaboration and consultation with the Innu community in order to ensure that language technologies are developed respectfully and in accordance with that goal. 2022.computel-1.13 cadotte-etal-2022-challenges + 10.18653/v1/2022.computel-1.13 Using Speech and <fixed-case>NLP</fixed-case> Resources to build an i<fixed-case>CALL</fixed-case> platform for a minority language, the story of An Scéalaí, the <fixed-case>I</fixed-case>rish experience to date @@ -184,6 +197,7 @@ This paper describes how emerging linguistic resources and technologies can be used to build a language learning platform for Irish, an endangered language. This platform, An Scéalaí, harvests learner corpora - a vital resource both to study the stages of learners’ language acquisition and to guide future platform development. A technical description of the platform is provided, including details of how different speech technologies and linguistic resources are fused to provide a holistic learner experience. The active continuous participation of the community, and platform evaluations by learners and teachers, are discussed. 2022.computel-1.14 ni-chiarain-etal-2022-using + 10.18653/v1/2022.computel-1.14 Closing the <fixed-case>NLP</fixed-case> Gap: Documentary Linguistics and <fixed-case>NLP</fixed-case> Need a Shared Software Infrastructure @@ -192,6 +206,7 @@ For decades, researchers in natural language processing and computational linguistics have been developing models and algorithms that aim to serve the needs of language documentation projects. However, these models have seen little use in language documentation despite their great potential for making documentary linguistic artefacts better and easier to produce. In this work, we argue that a major reason for this NLP gap is the lack of a strong foundation of application software which can on the one hand serve the complex needs of language documentation and on the other hand provide effortless integration with NLP models. We further present and describe a work-in-progress system we have developed to serve this need, Glam. 2022.computel-1.15 gessler-2022-closing + 10.18653/v1/2022.computel-1.15 Can We Use Word Embeddings for Enhancing <fixed-case>G</fixed-case>uarani-<fixed-case>S</fixed-case>panish Machine Translation? @@ -203,6 +218,7 @@ 2022.computel-1.16 gongora-etal-2022-use sgongora27/Guarani-embeddings-for-MT + 10.18653/v1/2022.computel-1.16 Faoi Gheasa an adaptive game for <fixed-case>I</fixed-case>rish language learning @@ -213,6 +229,7 @@ In this paper, we present a game with a purpose (GWAP) (Von Ahn 2006). The aim of the game is to promote language learning and ‘noticing’ (Skehan, 2013). The game has been designed for Irish, but the framework could be used for other languages. Irish is a minority language which means that L2 learners have limited opportunities for exposure to the language, and additionally, there are also limited (digital) learning resources available. This research incorporates game development, language pedagogy and ICALL language materials development. This paper will focus on the language materials development as this is a bottleneck in the teaching and learning of minority and endangered languages. 2022.computel-1.17 xu-etal-2022-faoi + 10.18653/v1/2022.computel-1.17 Using Graph-Based Methods to Augment Online Dictionaries of Endangered Languages @@ -224,6 +241,7 @@ Many endangered Uralic languages have multilingual machine readable dictionaries saved in an XML format. However, the dictionaries cover translations very inconsistently between language pairs, for instance, the Livonian dictionary has some translations to Finnish, Latvian and Estonian, and the Komi-Zyrian dictionary has some translations to Finnish, English and Russian. We utilize graph-based approaches to augment such dictionaries by predicting new translations to existing and new languages based on different dictionaries for endangered languages and Wiktionaries. Our study focuses on the lexical resources for Komi-Zyrian (kpv), Erzya (myv) and Livonian (liv). We evaluate our approach by human judges fluent in the three endangered languages in question. Based on the evaluation, the method predicted good or acceptable translations 77% of the time. Furthermore, we train a neural prediction model to predict the quality of the automatically predicted translations with an 81% accuracy. The resulting extensions to the dictionaries are made available on the online dictionary platform used by the speakers of these languages. 2022.computel-1.18 alnajjar-etal-2022-using + 10.18653/v1/2022.computel-1.18 Reusing a Multi-lingual Setup to Bootstrap a Grammar Checker for a Very Low Resource Language without Data @@ -234,6 +252,7 @@ Grammar checkers (GEC) are needed for digital language survival. Very low resource languages like Lule Sámi with less than 3,000 speakers need to hurry to build these tools, but do not have the big corpus data that are required for the construction of machine learning tools. We present a rule-based tool and a workflow where the work done for a related language can speed up the process. We use an existing grammar to infer rules for the new language, and we do not need a large gold corpus of annotated grammar errors, but a smaller corpus of regression tests is built while developing the tool. We present a test case for Lule Sámi reusing resources from North Sámi, show how we achieve a categorisation of the most frequent errors, and present a preliminary evaluation of the system. We hope this serves as an inspiration for small languages that need advanced tools in a limited amount of time, but do not have big data. 2022.computel-1.19 lill-sigga-mikkelsen-etal-2022-reusing + 10.18653/v1/2022.computel-1.19 A Word-and-Paradigm Workflow for Fieldwork Annotation @@ -246,6 +265,7 @@ There are many challenges in morphological fieldwork annotation, it heavily relies on segmentation and feature labeling (which have both practical and theoretical drawbacks), it’s time-intensive, and the annotator needs to be linguistically trained and may still annotate things inconsistently. We propose a workflow that relies on unsupervised and active learning grounded in Word-and-Paradigm morphology (WP). Machine learning has the potential to greatly accelerate the annotation process and allow a human annotator to focus on problematic cases, while the WP approach makes for an annotation system that is word-based and relational, removing the need to make decisions about feature labeling and segmentation early in the process and allowing speakers of the language of interest to participate more actively, since linguistic training is not necessary. We present a proof-of-concept for the first step of the workflow, in a realistic fieldwork setting, annotators can process hundreds of forms per hour. 2022.computel-1.20 copot-etal-2022-word + 10.18653/v1/2022.computel-1.20 Fine-tuning pre-trained models for Automatic Speech Recognition, experiments on a fieldwork corpus of Japhug (Trans-Himalayan family) @@ -263,6 +283,7 @@ This is a report on results obtained in the development of speech recognition tools intended to support linguistic documentation efforts. The test case is an extensive fieldwork corpus of Japhug, an endangered language of the Trans-Himalayan (Sino-Tibetan) family. The goal is to reduce the transcription workload of field linguists. The method used is a deep learning approach based on the language-specific tuning of a generic pre-trained representation model, XLS-R, using a Transformer architecture. We note difficulties in implementation, in terms of learning stability. But this approach brings significant improvements nonetheless. The quality of phonemic transcription is improved over earlier experiments; and most significantly, the new approach allows for reaching the stage of automatic word recognition. Subjective evaluation of the tool by the author of the training data confirms the usefulness of this approach. 2022.computel-1.21 guillaume-etal-2022-fine + 10.18653/v1/2022.computel-1.21 Morphologically annotated corpora of Pomak @@ -280,6 +301,7 @@ The project XXXX is developing a platform to enable researchers of living languages to easily create and make available state-of-the-art spoken and textual annotated resources. As a case study we use Greek and Pomak, the latter being an endangered oral Slavic language of the Balkans (including Thrace/Greece). The linguistic documentation of Pomak is an ongoing work by an interdisciplinary team in close cooperation with the Pomak community of Greece. We describe our experience in the development of a Latin-based orthography and morphologically annotated text corpora of Pomak with state-of-the-art NLP technology. These resources will be made openly available on the XXXX site and the gold annotated corpora of Pomak will be made available on the Universal Dependencies treebank repository. 2022.computel-1.22 jusuf-karahoga-etal-2022-morphologically + 10.18653/v1/2022.computel-1.22 Enhancing Documentation of <fixed-case>H</fixed-case>upa with Automatic Speech Recognition @@ -290,6 +312,7 @@ This study investigates applications of automatic speech recognition (ASR) techniques to Hupa, a critically endangered Native American language from the Dene (Athabaskan) language family. Using around 9h12m of spoken data produced by one elder who is a first-language Hupa speaker, we experimented with different evaluation schemes and training settings. On average a fully connected deep neural network reached a word error rate of 35.26%. Our overall results illustrate the utility of ASR for making Hupa language documentation more accessible and usable. In addition, we found that when training acoustic models, using recordings with transcripts that were not carefully verified did not necessarily have a negative effect on model performance. This shows promise for speech corpora of indigenous languages that commonly include transcriptions produced by second-language speakers or linguists who have advanced knowledge in the language of interest. 2022.computel-1.23 liu-etal-2022-enhancing + 10.18653/v1/2022.computel-1.23 diff --git a/data/xml/2022.constraint.xml b/data/xml/2022.constraint.xml index 0cbdbf1a75..ca2e33c2ed 100644 --- a/data/xml/2022.constraint.xml +++ b/data/xml/2022.constraint.xml @@ -33,6 +33,7 @@ We present the findings of the shared task at the CONSTRAINT 2022 Workshop: Hero, Villain, and Victim: Dissecting harmful memes for Semantic role labeling of entities. The task aims to delve deeper into the domain of meme comprehension by deciphering the connotations behind the entities present in a meme. In more nuanced terms, the shared task focuses on determining the victimizing, glorifying, and vilifying intentions embedded in meme entities to explicate their connotations. To this end, we curate HVVMemes, a novel meme dataset of about 7000 memes spanning the domains of COVID-19 and US Politics, each containing entities and their associated roles: hero, villain, victim, or none. The shared task attracted 105 participants, but eventually only 6 submissions were made. Most of the successful submissions relied on fine-tuning pre-trained language and multimodal models along with ensembles. The best submission achieved an F1-score of 58.67. 2022.constraint-1.1 sharma-etal-2022-findings + 10.18653/v1/2022.constraint-1.1 <fixed-case>DD</fixed-case>-<fixed-case>TIG</fixed-case> at Constraint@<fixed-case>ACL</fixed-case>2022: Multimodal Understanding and Reasoning for Role Labeling of Entities in Hateful Memes @@ -47,6 +48,7 @@ zhou-etal-2022-dd Hateful Memes VCR + 10.18653/v1/2022.constraint-1.2 Are you a hero or a villain? A semantic role labelling approach for detecting harmful memes. @@ -60,6 +62,7 @@ Identifying good and evil through representations of victimhood, heroism, and villainy (i.e., role labeling of entities) has recently caught the research community’s interest. Because of the growing popularity of memes, the amount of offensive information published on the internet is expanding at an alarming rate. It generated a larger need to address this issue and analyze the memes for content moderation. Framing is used to show the entities engaged as heroes, villains, victims, or others so that readers may better anticipate and understand their attitudes and behaviors as characters. Positive phrases are used to characterize heroes, whereas negative terms depict victims and villains, and terms that tend to be neutral are mapped to others. In this paper, we propose two approaches to role label the entities of the meme as hero, villain, victim, or other through Named-Entity Recognition(NER), Sentiment Analysis, etc. With an F1-score of 23.855, our team secured eighth position in the Shared Task @ Constraint 2022. 2022.constraint-1.3 fharook-etal-2022-hero + 10.18653/v1/2022.constraint-1.3 Logically at the Constraint 2022: Multimodal role labelling @@ -70,6 +73,7 @@ This paper describes our system for the Constraint 2022 challenge at ACL 2022, whose goal is to detect which entities are glorified, vilified or victimised, within a meme . The task should be done considering the perspective of the meme’s author. In our work, the challenge is treated as a multi-class classification task. For a given pair of a meme and an entity, we need to classify whether the entity is being referenced as Hero, a Villain, a Victim or Other. Our solution combines (ensembling) different models based on Unimodal (Text only) model and Multimodal model (Text + Images). We conduct several experiments and benchmarks different competitive pre-trained transformers and vision models in this work. Our solution, based on an ensembling method, is ranked first on the leaderboard and obtains a macro F1-score of 0.58 on test set. The code for the experiments and results are available at https://bitbucket.org/logicallydevs/constraint_2022/src/master/ 2022.constraint-1.4 kun-etal-2022-logically + 10.18653/v1/2022.constraint-1.4 Combining Language Models and Linguistic Information to Label Entities in Memes @@ -80,6 +84,7 @@ This paper describes the system we developed for the shared task ‘Hero, Villain and Victim: Dissecting harmful memes for Semantic role labelling of entities’ organised in the framework of the Second Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situation (Constraint 2022). We present an ensemble approach combining transformer-based models and linguistic information, such as the presence of irony and implicit sentiment associated to the target named entities. The ensemble system obtains promising classification scores, resulting in a third place finish in the competition. 2022.constraint-1.5 singh-etal-2022-combining + 10.18653/v1/2022.constraint-1.5 Detecting the Role of an Entity in Harmful Memes: Techniques and their Limitations @@ -93,6 +98,7 @@ robi56/harmful_memes_block_fusion Hateful Memes Hateful Memes Challenge + 10.18653/v1/2022.constraint-1.6 Fine-tuning and Sampling Strategies for Multimodal Role Labeling of Entities under Class Imbalance @@ -104,6 +110,7 @@ We propose our solution to the multimodal semantic role labeling task from the CONSTRAINT’22 workshop. The task aims at classifying entities in memes into classes such as “hero” and “villain”. We use several pre-trained multi-modal models to jointly encode the text and image of the memes, and implement three systems to classify the role of the entities. We propose dynamic sampling strategies to tackle the issue of class imbalance. Finally, we perform qualitative analysis on the representations of the entities. 2022.constraint-1.7 montariol-etal-2022-fine + 10.18653/v1/2022.constraint-1.7 Document Retrieval and Claim Verification to Mitigate <fixed-case>COVID</fixed-case>-19 Misinformation @@ -119,6 +126,7 @@ sundriyal-etal-2022-document CORD-19 FEVER + 10.18653/v1/2022.constraint-1.8 <fixed-case>M</fixed-case>-<fixed-case>BAD</fixed-case>: A Multilabel Dataset for Detecting Aggressive Texts and Their Targets @@ -129,6 +137,7 @@ Recently, detection and categorization of undesired (e. g., aggressive, abusive, offensive, hate) content from online platforms has grabbed the attention of researchers because of its detrimental impact on society. Several attempts have been made to mitigate the usage and propagation of such content. However, most past studies were conducted primarily for English, where low-resource languages like Bengali remained out of the focus. Therefore, to facilitate research in this arena, this paper introduces a novel multilabel Bengali dataset (named M-BAD) containing 15650 texts to detect aggressive texts and their targets. Each text of M-BAD went through rigorous two-level annotations. At the primary level, each text is labelled as either aggressive or non-aggressive. In the secondary level, the aggressive texts have been further annotated into five fine-grained target classes: religion, politics, verbal, gender and race. Baseline experiments are carried out with different machine learning (ML), deep learning (DL) and transformer models, where Bangla-BERT acquired the highest weighted f_1-score in both detection (0.92) and target identification (0.83) tasks. Error analysis of the models exhibits the difficulty to identify context-dependent aggression, and this work argues that further research is required to address these issues. 2022.constraint-1.9 sharif-etal-2022-bad + 10.18653/v1/2022.constraint-1.9 How does fake news use a thumbnail? <fixed-case>CLIP</fixed-case>-based Multimodal Detection on the Unrepresentative News Image @@ -141,6 +150,7 @@ 2022.constraint-1.10 choi-etal-2022-fake ssu-humane/fake-news-thumbnail + 10.18653/v1/2022.constraint-1.10 Detecting False Claims in Low-Resource Regions: A Case Study of Caribbean Islands @@ -153,6 +163,7 @@ 2022.constraint-1.11 lucas-etal-2022-detecting CoAID + 10.18653/v1/2022.constraint-1.11 diff --git a/data/xml/2022.csrr.xml b/data/xml/2022.csrr.xml index fe8dac8214..e0b4e93554 100644 --- a/data/xml/2022.csrr.xml +++ b/data/xml/2022.csrr.xml @@ -34,6 +34,7 @@ CommonsenseQA ConceptNet OpenBookQA + 10.18653/v1/2022.csrr-1.1 Cloze Evaluation for Deeper Understanding of Commonsense Stories in <fixed-case>I</fixed-case>ndonesian @@ -45,6 +46,7 @@ 2022.csrr-1.2 koto-etal-2022-cloze ROCStories + 10.18653/v1/2022.csrr-1.2 Psycholinguistic Diagnosis of Language Models’ Commonsense Reasoning @@ -55,6 +57,7 @@ cong-2022-psycholinguistic yancong222/pragamtics-commonsense-lms SuperGLUE + 10.18653/v1/2022.csrr-1.3 Bridging the Gap between Recognition-level Pre-training and Commonsensical Vision-language Tasks @@ -71,6 +74,7 @@ Conceptual Captions VCR Visual Question Answering + 10.18653/v1/2022.csrr-1.4 Materialized Knowledge Bases from Commonsense Transformers @@ -82,6 +86,7 @@ nguyen-razniewski-2022-materialized ConceptNet WebText + 10.18653/v1/2022.csrr-1.5 Knowledge-Augmented Language Models for Cause-Effect Relation Classification @@ -96,6 +101,7 @@ BCOPA-CE COPA TCR + 10.18653/v1/2022.csrr-1.6 <fixed-case>CURIE</fixed-case>: An Iterative Querying Approach for Reasoning About Situations @@ -114,6 +120,7 @@ QuaRTz QuaRel WIQA + 10.18653/v1/2022.csrr-1.7 diff --git a/data/xml/2022.deelio.xml b/data/xml/2022.deelio.xml index 029bb68a5e..516c647df5 100644 --- a/data/xml/2022.deelio.xml +++ b/data/xml/2022.deelio.xml @@ -24,6 +24,7 @@ Cross-lingual Transfer Learning typically involves training a model on a high-resource sourcelanguage and applying it to a low-resource tar-get language. In this work we introduce a lexi-cal database calledValency Patterns Leipzig(ValPal)which provides the argument patterninformation about various verb-forms in mul-tiple languages including low-resource langua-ges. We also provide a framework to integratethe ValPal database knowledge into the state-of-the-art LSTM based model for cross-lingualsemantic role labelling. Experimental resultsshow that integrating such knowledge resultedin am improvement in performance of the mo-del on all the target languages on which it isevaluated. 2022.deelio-1.1 choudhary-oriordan-2022-cross + 10.18653/v1/2022.deelio-1.1 How Do Transformer-Architecture Models Address Polysemy of <fixed-case>K</fixed-case>orean Adverbial Postpositions? @@ -34,6 +35,7 @@ 2022.deelio-1.2 2022.deelio-1.2.software.zip mun-desagulier-2022-transformer + 10.18653/v1/2022.deelio-1.2 Query Generation with External Knowledge for Dense Retrieval @@ -52,6 +54,7 @@ SciDocs SciFact SimpleQuestions + 10.18653/v1/2022.deelio-1.3 Uncovering Values: Detecting Latent Moral Content from Natural Language with Explainable and Non-Trained Methods @@ -67,6 +70,7 @@ asprino-etal-2022-uncovering stendoipanni/moraldilemmas DBpedia + 10.18653/v1/2022.deelio-1.4 Jointly Identifying and Fixing Inconsistent Readings from Information Extraction Systems @@ -79,6 +83,7 @@ padia-etal-2022-jointly FEVER TACRED + 10.18653/v1/2022.deelio-1.5 <fixed-case>KIQA</fixed-case>: Knowledge-Infused Question Answering Model for Financial Table-Text Data @@ -89,6 +94,7 @@ While entity retrieval models continue to advance their capabilities, our understanding of their wide-ranging applications is limited, especially in domain-specific settings. We highlighted this issue by using recent general-domain entity-linking models, LUKE and GENRE, to inject external knowledge into a question-answering (QA) model for a financial QA task with a hybrid tabular-textual dataset. We found that both models improved the baseline model by 1.57% overall and 8.86% on textual data. Nonetheless, the challenge remains as they still struggle to handle tabular inputs. We subsequently conducted a comprehensive attention-weight analysis, revealing how LUKE utilizes external knowledge supplied by GENRE. The analysis also elaborates how the injection of symbolic knowledge can be helpful and what needs further improvement, paving the way for future research on this challenging QA task and advancing our understanding of how a language model incorporates external knowledge. 2022.deelio-1.6 nararatwong-etal-2022-kiqa + 10.18653/v1/2022.deelio-1.6 Trans-<fixed-case>KBLSTM</fixed-case>: An External Knowledge Enhanced Transformer <fixed-case>B</fixed-case>i<fixed-case>LSTM</fixed-case> Model for Tabular Reasoning @@ -101,6 +107,7 @@ varun-etal-2022-trans ConceptNet GLUE + 10.18653/v1/2022.deelio-1.7 Fast Few-shot Debugging for <fixed-case>NLU</fixed-case> Test Suites @@ -113,6 +120,7 @@ malon-etal-2022-fast necla-ml/debug-test-suites SST + 10.18653/v1/2022.deelio-1.8 On Masked Language Models for Contextual Link Prediction @@ -123,6 +131,7 @@ In the real world, many relational facts require context; for instance, a politician holds a given elected position only for a particular timespan. This context (the timespan) is typically ignored in knowledge graph link prediction tasks, or is leveraged by models designed specifically to make use of it (i.e. n-ary link prediction models). Here, we show that the task of n-ary link prediction is easily performed using language models, applied with a basic method for constructing cloze-style query sentences. We introduce a pre-training methodology based around an auxiliary entity-linked corpus that outperforms other popular pre-trained models like BERT, even with a smaller model. This methodology also enables n-ary link prediction without access to any n-ary training set, which can be invaluable in circumstances where expensive and time-consuming curation of n-ary knowledge graphs is not feasible. We achieve state-of-the-art performance on the primary n-ary link prediction dataset WD50K and on WikiPeople facts that include literals - typically ignored by knowledge graph embedding methods. 2022.deelio-1.9 brayne-etal-2022-masked + 10.18653/v1/2022.deelio-1.9 What Makes Good In-Context Examples for <fixed-case>GPT</fixed-case>-3? @@ -144,6 +153,7 @@ SNLI SST TriviaQA + 10.18653/v1/2022.deelio-1.10 diff --git a/data/xml/2022.dialdoc.xml b/data/xml/2022.dialdoc.xml index 24ca4287e8..8bf38b5cde 100644 --- a/data/xml/2022.dialdoc.xml +++ b/data/xml/2022.dialdoc.xml @@ -27,6 +27,7 @@ 2022.dialdoc-1.1 feng-etal-2022-msamsum xcfcode/msamsum + 10.18653/v1/2022.dialdoc-1.1 <fixed-case>U</fixed-case>ni<fixed-case>DS</fixed-case>: A Unified Dialogue System for Chit-Chat and Task-oriented Dialogues @@ -43,6 +44,7 @@ With the advances in deep learning, tremendous progress has been made with chit-chat dialogue systems and task-oriented dialogue systems. However, these two systems are often tackled separately in current methods. To achieve more natural interaction with humans, dialogue systems need to be capable of both chatting and accomplishing tasks. To this end, we propose a unified dialogue system (UniDS) with the two aforementioned skills. In particular, we design a unified dialogue data schema, compatible for both chit-chat and task-oriented dialogues. Besides, we propose a two-stage training method to train UniDS based on the unified dialogue data schema. UniDS does not need to adding extra parameters to existing chit-chat dialogue systems. Experimental results demonstrate that the proposed UniDS works comparably well as the state-of-the-art chit-chat dialogue systems and task-oriented dialogue systems. More importantly, UniDS achieves better robustness than pure dialogue systems and satisfactory switch ability between two types of dialogues. 2022.dialdoc-1.2 zhao-etal-2022-unids + 10.18653/v1/2022.dialdoc-1.2 Low-Resource Adaptation of Open-Domain Generative Chatbots @@ -57,6 +59,7 @@ Blended Skill Talk ConvAI2 QReCC + 10.18653/v1/2022.dialdoc-1.3 Pseudo Ambiguous and Clarifying Questions Based on Sentence Structures Toward Clarifying Question Answering System @@ -70,6 +73,7 @@ 2022.dialdoc-1.4 nakano-etal-2022-pseudo HotpotQA + 10.18653/v1/2022.dialdoc-1.4 Parameter-Efficient Abstractive Question Answering over Tables or Text @@ -82,6 +86,7 @@ pal-etal-2022-parameter kolk/pea-qa NarrativeQA + 10.18653/v1/2022.dialdoc-1.5 Conversation- and Tree-Structure Losses for Dialogue Disentanglement @@ -93,6 +98,7 @@ When multiple conversations occur simultaneously, a listener must decide which conversation each utterance is part of in order to interpret and respond to it appropriately. This task is referred as dialogue disentanglement. A significant drawback of previous studies on disentanglement lies in that they only focus on pair-wise relationships between utterances while neglecting the conversation structure which is important for conversation structure modeling. In this paper, we propose a hierarchical model, named Dialogue BERT (DIALBERT), which integrates the local and global semantics in the context range by using BERT to encode each message-pair and using BiLSTM to aggregate the chronological context information into the output of BERT. In order to integrate the conversation structure information into the model, two types of loss of conversation-structure loss and tree-structure loss are designed. In this way, our model can implicitly learn and leverage the conversation structures without being restricted to the lack of explicit access to such structures during the inference stage. Experimental results on two large datasets show that our method outperforms previous methods by substantial margins, achieving great performance on dialogue disentanglement. 2022.dialdoc-1.6 li-etal-2022-conversation + 10.18653/v1/2022.dialdoc-1.6 Conversational Search with Mixed-Initiative - Asking Good Clarification Questions backed-up by Passage Retrieval @@ -104,6 +110,7 @@ We deal with the scenario of conversational search, where user queries are under-specified or ambiguous. This calls for a mixed-initiative setup. User-asks (queries) and system-answers, as well as system-asks (clarification questions) and user response, in order to clarify her information needs. We focus on the task of selecting the next clarification question, given conversation context. Our method leverages passage retrieval from background content to fine-tune two deep-learning models for ranking candidate clarification questions. We evaluated our method on two different use-cases. The first is an open domain conversational search in a large web collection. The second is a task-oriented customer-support setup.We show that our method performs well on both use-cases. 2022.dialdoc-1.7 mass-etal-2022-conversational + 10.18653/v1/2022.dialdoc-1.7 Graph-combined Coreference Resolution Methods on Conversational Machine Reading Comprehension with Pre-trained Language Model @@ -115,6 +122,7 @@ wang-komatani-2022-graph CANARD CoQA + 10.18653/v1/2022.dialdoc-1.8 Construction of Hierarchical Structured Knowledge-based Recommendation Dialogue Dataset and Dialogue System @@ -127,6 +135,7 @@ kodama-etal-2022-construction KdConv Wizard of Wikipedia + 10.18653/v1/2022.dialdoc-1.9 Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters @@ -144,6 +153,7 @@ xu-etal-2022-retrieval hltchkust/knowexpert Wizard of Wikipedia + 10.18653/v1/2022.dialdoc-1.10 G4: Grounding-guided Goal-oriented Dialogues Generation with Multiple Documents @@ -157,6 +167,7 @@ 2022.dialdoc-1.11 zhang-etal-2022-g4 MultiDoc2Dial + 10.18653/v1/2022.dialdoc-1.11 <fixed-case>U</fixed-case><fixed-case>G</fixed-case>ent-<fixed-case>T2K</fixed-case> at the 2nd <fixed-case>D</fixed-case>ial<fixed-case>D</fixed-case>oc Shared Task: A Retrieval-Focused Dialog System Grounded in Multiple Documents @@ -172,6 +183,7 @@ Doc2Dial MultiDoc2Dial doc2dial + 10.18653/v1/2022.dialdoc-1.12 Grounded Dialogue Generation with Cross-encoding Re-ranker, Grounding Span Prediction, and Passage Dropout @@ -186,6 +198,7 @@ MultiDoc2Dial presents an important challenge on modeling dialogues grounded with multiple documents. This paper proposes a pipeline system of “retrieve, re-rank, and generate”, where each component is individually optimized. This enables the passage re-ranker and response generator to fully exploit training with ground-truth data. Furthermore, we use a deep cross-encoder trained with localized hard negative passages from the retriever. For the response generator, we use grounding span prediction as an auxiliary task to be jointly trained with the main task of response generation. We also adopt a passage dropout and regularization technique to improve response generation performance. Experimental results indicate that the system clearly surpasses the competitive baseline and our team CPII-NLP ranked 1st among the public submissions on ALL four leaderboards based on the sum of F1, SacreBLEU, METEOR and RougeL scores. 2022.dialdoc-1.13 li-etal-2022-grounded + 10.18653/v1/2022.dialdoc-1.13 A Knowledge storage and semantic space alignment Method for Multi-documents dialogue generation @@ -200,6 +213,7 @@ CoQA MultiDoc2Dial QuAC + 10.18653/v1/2022.dialdoc-1.14 Improving Multiple Documents Grounded Goal-Oriented Dialog Systems via Diverse Knowledge Enhanced Pretrained Language Model @@ -216,6 +230,7 @@ jang-etal-2022-improving CoQA MultiDoc2Dial + 10.18653/v1/2022.dialdoc-1.15 Docalog: Multi-document Dialogue System using Transformer-based Span Retrieval @@ -234,6 +249,7 @@ MultiDoc2Dial QuAC doc2dial + 10.18653/v1/2022.dialdoc-1.16 R3 : Refined Retriever-Reader pipeline for Multidoc2dial @@ -256,6 +272,7 @@ Natural Questions QuAC doc2dial + 10.18653/v1/2022.dialdoc-1.17 <fixed-case>D</fixed-case>ial<fixed-case>D</fixed-case>oc 2022 Shared Task: Open-Book Document-grounded Dialogue Modeling @@ -269,6 +286,7 @@ Doc2Dial MultiDoc2Dial doc2dial + 10.18653/v1/2022.dialdoc-1.18 <fixed-case>TRUE</fixed-case>: Re-evaluating Factual Consistency Evaluation @@ -292,6 +310,7 @@ GLUE PAWS VitaminC + 10.18653/v1/2022.dialdoc-1.19 Handling Comments in Documents through Interactions @@ -301,6 +320,7 @@ Comments are widely used by users in collaborative documents every day. The documents’ comments enable collaborative editing and review dynamics, transforming each document into a context-sensitive communication channel. Understanding the role of comments in communication dynamics within documents is the first step towards automating their management. In this paper we propose the first ever taxonomy for different types of in-document comments based on analysis of a large scale dataset of public documents from the web. We envision that the next generation of intelligent collaborative document experiences allow interactive creation and consumption of content, there We also introduce the components necessary for developing novel tools that automate the handling of comments through natural language interaction with the documents. We identify the commands that users would use to respond to various types of comments. We train machine learning algorithms to recognize the different types of comments and assess their feasibility. We conclude by discussing some of the implications for the design of automatic document management tools. 2022.dialdoc-1.20 nouri-toxtli-2022-handling + 10.18653/v1/2022.dialdoc-1.20 <fixed-case>T</fixed-case>ask2<fixed-case>D</fixed-case>ial: A Novel Task and Dataset for Commonsense-enhanced Task-based Dialogue Grounded in Documents @@ -313,6 +333,7 @@ CoQA Doc2Dial doc2dial + 10.18653/v1/2022.dialdoc-1.21 diff --git a/data/xml/2022.dravidianlangtech.xml b/data/xml/2022.dravidianlangtech.xml index 4e385bc9a2..3956b1d9f4 100644 --- a/data/xml/2022.dravidianlangtech.xml +++ b/data/xml/2022.dravidianlangtech.xml @@ -31,6 +31,7 @@ 2022.dravidianlangtech-1.1 kumar-etal-2022-bert Universal Dependencies + 10.18653/v1/2022.dravidianlangtech-1.1 A Dataset for Detecting Humor in <fixed-case>T</fixed-case>elugu Social Media Text @@ -42,6 +43,7 @@ 2022.dravidianlangtech-1.2 bellamkonda-etal-2022-dataset shaswa123/telugu_humour_dataset + 10.18653/v1/2022.dravidianlangtech-1.2 <fixed-case>M</fixed-case>u<fixed-case>C</fixed-case>o<fixed-case>T</fixed-case>: Multilingual Contrastive Training for Question-Answering in Low-resource Languages @@ -56,6 +58,7 @@ gokulkarthik/mucot ChAII - Hindi and Tamil Question Answering SQuAD + 10.18653/v1/2022.dravidianlangtech-1.3 <fixed-case>T</fixed-case>amil<fixed-case>ATIS</fixed-case>: Dataset for Task-Oriented Dialog in <fixed-case>T</fixed-case>amil @@ -67,6 +70,7 @@ 2022.dravidianlangtech-1.4 s-etal-2022-tamilatis ATIS + 10.18653/v1/2022.dravidianlangtech-1.4 <fixed-case>DE</fixed-case>-<fixed-case>ABUSE</fixed-case>@<fixed-case>T</fixed-case>amil<fixed-case>NLP</fixed-case>-<fixed-case>ACL</fixed-case> 2022: Transliteration as Data Augmentation for Abuse Detection in <fixed-case>T</fixed-case>amil @@ -78,6 +82,7 @@ With the rise of social media and internet, thereis a necessity to provide an inclusive space andprevent the abusive topics against any gender,race or community. This paper describes thesystem submitted to the ACL-2022 shared taskon fine-grained abuse detection in Tamil. In ourapproach we transliterated code-mixed datasetas an augmentation technique to increase thesize of the data. Using this method we wereable to rank 3rd on the task with a 0.290 macroaverage F1 score and a 0.590 weighted F1score 2022.dravidianlangtech-1.5 palanikumar-etal-2022-de + 10.18653/v1/2022.dravidianlangtech-1.5 <fixed-case>UMUT</fixed-case>eam@<fixed-case>T</fixed-case>amil<fixed-case>NLP</fixed-case>-<fixed-case>ACL</fixed-case>2022: Emotional Analysis in <fixed-case>T</fixed-case>amil @@ -88,6 +93,7 @@ This working notes summarises the participation of the UMUTeam on the TamilNLP (ACL 2022) shared task concerning emotion analysis in Tamil. We participated in the two multi-classification challenges proposed with a neural network that combines linguistic features with different feature sets based on contextual and non-contextual sentence embeddings. Our proposal achieved the 1st result for the second subtask, with an f1-score of 15.1% discerning among 30 different emotions. However, our results for the first subtask were not recorded in the official leader board. Accordingly, we report our results for this subtask with the validation split, reaching a macro f1-score of 32.360%. 2022.dravidianlangtech-1.6 garcia-diaz-etal-2022-umuteam + 10.18653/v1/2022.dravidianlangtech-1.6 <fixed-case>UMUT</fixed-case>eam@<fixed-case>T</fixed-case>amil<fixed-case>NLP</fixed-case>-<fixed-case>ACL</fixed-case>2022: Abusive Detection in <fixed-case>T</fixed-case>amil using Linguistic Features and Transformers @@ -98,6 +104,7 @@ Social media has become a dangerous place as bullies take advantage of the anonymity the Internet provides to target and intimidate vulnerable individuals and groups. In the past few years, the research community has focused on developing automatic classification tools for detecting hate-speech, its variants, and other types of abusive behaviour. However, these methods are still at an early stage in low-resource languages. With the aim of reducing this barrier, the TamilNLP shared task has proposed a multi-classification challenge for Tamil written in Tamil script and code-mixed to detect abusive comments and hope-speech. Our participation consists of a knowledge integration strategy that combines sentence embeddings from BERT, RoBERTa, FastText and a subset of language-independent linguistic features. We achieved our best result in code-mixed, reaching 3rd position with a macro-average f1-score of 35%. 2022.dravidianlangtech-1.7 garcia-diaz-etal-2022-umuteam-tamilnlp + 10.18653/v1/2022.dravidianlangtech-1.7 hate-alert@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech-<fixed-case>ACL</fixed-case>2022: Ensembling Multi-Modalities for <fixed-case>T</fixed-case>amil <fixed-case>T</fixed-case>roll<fixed-case>M</fixed-case>eme Classification @@ -108,6 +115,7 @@ Social media platforms often act as breeding grounds for various forms of trolling or malicious content targeting users or communities. One way of trolling users is by creating memes, which in most cases unites an image with a short piece of text embedded on top of it. The situation is more complex for multilingual(e.g., Tamil) memes due to the lack of benchmark datasets and models. We explore several models to detect Troll memes in Tamil based on the shared task, “Troll Meme Classification in DravidianLangTech2022” at ACL-2022. We observe while the text-based model MURIL performs better for Non-troll meme classification, the image-based model VGG16 performs better for Troll-meme classification. Further fusing these two modalities help us achieve stable outcomes in both classes. Our fusion model achieved a 0.561 weighted average F1 score and ranked second in this task. 2022.dravidianlangtech-1.8 das-etal-2022-hate + 10.18653/v1/2022.dravidianlangtech-1.8 <fixed-case>J</fixed-case>udith<fixed-case>J</fixed-case>eyafreeda<fixed-case>A</fixed-case>ndrew@<fixed-case>T</fixed-case>amil<fixed-case>NLP</fixed-case>-<fixed-case>ACL</fixed-case>2022:<fixed-case>CNN</fixed-case> for Emotion Analysis in <fixed-case>T</fixed-case>amil @@ -116,6 +124,7 @@ Using technology for analysis of human emotion is a relatively nascent research area. There are several types of data where emotion recognition can be employed, such as - text, images, audio and video. In this paper, the focus is on emotion recognition in text data. Emotion recognition in text can be performed from both written comments and from conversations. In this paper, the dataset used for emotion recognition is a list of comments. While extensive research is being performed in this area, the language of the text plays a very important role. In this work, the focus is on the Dravidian language of Tamil. The language and its script demands an extensive pre-processing. The paper contributes to this by adapting various pre-processing methods to the Dravidian Language of Tamil. A CNN method has been adopted for the task at hand. The proposed method has achieved a comparable result. 2022.dravidianlangtech-1.9 andrew-2022-judithjeyafreedaandrew + 10.18653/v1/2022.dravidianlangtech-1.9 <fixed-case>MUCIC</fixed-case>@<fixed-case>T</fixed-case>amil<fixed-case>NLP</fixed-case>-<fixed-case>ACL</fixed-case>2022: Abusive Comment Detection in <fixed-case>T</fixed-case>amil Language using 1<fixed-case>D</fixed-case> Conv-<fixed-case>LSTM</fixed-case> @@ -128,6 +137,7 @@ 2022.dravidianlangtech-1.10 balouchzahi-etal-2022-mucic anushamdgowda/abusive-detection + 10.18653/v1/2022.dravidianlangtech-1.10 <fixed-case>CEN</fixed-case>-<fixed-case>T</fixed-case>amil@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech-<fixed-case>ACL</fixed-case>2022: Abusive Comment detection in <fixed-case>T</fixed-case>amil using <fixed-case>TF</fixed-case>-<fixed-case>IDF</fixed-case> and Random Kitchen Sink Algorithm @@ -140,6 +150,7 @@ This paper describes the approach of team CEN-Tamil used for abusive comment detection in Tamil. This task aims to identify whether a given comment contains abusive comments. We used TF-IDF with char-wb analyzers with Random Kitchen Sink (RKS) algorithm to create feature vectors and the Support Vector Machine (SVM) classifier with polynomial kernel for classification. We used this method for both Tamil and Tamil-English datasets and secured first place with an f1-score of 0.32 and seventh place with an f1-score of 0.25, respectively. The code for our approach is shared in the GitHub repository. 2022.dravidianlangtech-1.11 s-n-etal-2022-cen + 10.18653/v1/2022.dravidianlangtech-1.11 <fixed-case>NITK</fixed-case>-<fixed-case>IT</fixed-case>_<fixed-case>NLP</fixed-case>@<fixed-case>T</fixed-case>amil<fixed-case>NLP</fixed-case>-<fixed-case>ACL</fixed-case>2022: Transformer based model for Toxic Span Identification in <fixed-case>T</fixed-case>amil @@ -150,6 +161,7 @@ Toxic span identification in Tamil is a shared task that focuses on identifying harmful content, contributing to offensiveness. In this work, we have built a model that can efficiently identify the span of text contributing to offensive content. We have used various transformer-based models to develop the system, out of which the fine-tuned MuRIL model was able to achieve the best overall character F1-score of 0.4489. 2022.dravidianlangtech-1.12 lekshmiammal-etal-2022-nitk + 10.18653/v1/2022.dravidianlangtech-1.12 <fixed-case>T</fixed-case>eam<fixed-case>X</fixed-case>@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech-<fixed-case>ACL</fixed-case>2022: A Comparative Analysis for Troll-Based Meme Classification @@ -162,6 +174,7 @@ nandi-etal-2022-teamx Hateful Memes Hateful Memes Challenge + 10.18653/v1/2022.dravidianlangtech-1.13 <fixed-case>GJG</fixed-case>@<fixed-case>T</fixed-case>amil<fixed-case>NLP</fixed-case>-<fixed-case>ACL</fixed-case>2022: Emotion Analysis and Classification in <fixed-case>T</fixed-case>amil using Transformers @@ -172,6 +185,7 @@ This paper describes the systems built by our team for the “Emotion Analysis in Tamil” shared task at the Second Workshop on Speech and Language Technologies for Dravidian Languages at ACL 2022. There were two multi-class classification sub-tasks as a part of this shared task. The dataset for sub-task A contained 11 types of emotions while sub-task B was more fine-grained with 31 emotions. We fine-tuned an XLM-RoBERTa and DeBERTA base model for each sub-task. For sub-task A, the XLM-RoBERTa model achieved an accuracy of 0.46 and the DeBERTa model achieved an accuracy of 0.45. We had the best classification performance out of 11 teams for sub-task A. For sub-task B, the XLM-RoBERTa model’s accuracy was 0.33 and the DeBERTa model had an accuracy of 0.26. We ranked 2nd out of 7 teams for sub-task B. 2022.dravidianlangtech-1.14 prasad-etal-2022-gjg + 10.18653/v1/2022.dravidianlangtech-1.14 <fixed-case>GJG</fixed-case>@<fixed-case>T</fixed-case>amil<fixed-case>NLP</fixed-case>-<fixed-case>ACL</fixed-case>2022: Using Transformers for Abusive Comment Classification in <fixed-case>T</fixed-case>amil @@ -182,6 +196,7 @@ This paper presents transformer-based models for the “Abusive Comment Detection” shared task at the Second Workshop on Speech and Language Technologies for Dravidian Languages at ACL 2022. Our team participated in both the multi-class classification sub-tasks as a part of this shared task. The dataset for sub-task A was in Tamil text; while B was code-mixed Tamil-English text. Both the datasets contained 8 classes of abusive comments. We trained an XLM-RoBERTa and DeBERTA base model on the training splits for each sub-task. For sub-task A, the XLM-RoBERTa model achieved an accuracy of 0.66 and the DeBERTa model achieved an accuracy of 0.62. For sub-task B, both the models achieved a classification accuracy of 0.72; however, the DeBERTa model performed better in other classification metrics. Our team ranked 2nd in the code-mixed classification sub-task and 8th in Tamil-text sub-task. 2022.dravidianlangtech-1.15 prasad-etal-2022-gjg-tamilnlp + 10.18653/v1/2022.dravidianlangtech-1.15 <fixed-case>IIITDWD</fixed-case>@<fixed-case>T</fixed-case>amil<fixed-case>NLP</fixed-case>-<fixed-case>ACL</fixed-case>2022: Transformer-based approach to classify abusive content in <fixed-case>D</fixed-case>ravidian Code-mixed text @@ -191,6 +206,7 @@ Identifying abusive content or hate speech in social media text has raised the research community’s interest in recent times. The major driving force behind this is the widespread use of social media websites. Further, it also leads to identifying abusive content in low-resource regional languages, which is an important research problem in computational linguistics. As part of ACL-2022, organizers of DravidianLangTech@ACL 2022 have released a shared task on abusive category identification in Tamil and Tamil-English code-mixed text to encourage further research on offensive content identification in low-resource Indic languages. This paper presents the working notes for the model submitted by IIITDWD at DravidianLangTech@ACL 2022. Our team competed in Sub-Task B and finished in 9th place among the participating teams. In our proposed approach, we used a pre-trained transformer model such as Indic-bert for feature extraction, and on top of that, SVM classifier is used for stance detection. Further, our model achieved 62 % accuracy on code-mixed Tamil-English text. 2022.dravidianlangtech-1.16 biradar-saumya-2022-iiitdwd + 10.18653/v1/2022.dravidianlangtech-1.16 <fixed-case>PANDAS</fixed-case>@<fixed-case>T</fixed-case>amil<fixed-case>NLP</fixed-case>-<fixed-case>ACL</fixed-case>2022: Emotion Analysis in <fixed-case>T</fixed-case>amil Text using Language Agnostic Embeddings @@ -204,6 +220,7 @@ As the world around us continues to become increasingly digital, it has been acknowledged that there is a growing need for emotion analysis of social media content. The task of identifying the emotion in a given text has many practical applications ranging from screening public health to business and management. In this paper, we propose a language-agnostic model that focuses on emotion analysis in Tamil text. Our experiments yielded an F1-score of 0.010. 2022.dravidianlangtech-1.17 k-etal-2022-pandas + 10.18653/v1/2022.dravidianlangtech-1.17 <fixed-case>PANDAS</fixed-case>@Abusive Comment Detection in <fixed-case>T</fixed-case>amil Code-Mixed Data Using Custom Embeddings with <fixed-case>L</fixed-case>a<fixed-case>BSE</fixed-case> @@ -216,6 +233,7 @@ Abusive language has lately been prevalent in comments on various social media platforms. The increasing hostility observed on the internet calls for the creation of a system that can identify and flag such acerbic content, to prevent conflict and mental distress. This task becomes more challenging when low-resource languages like Tamil, as well as the often-observed Tamil-English code-mixed text, are involved. The approach used in this paper for the classification model includes different methods of feature extraction and the use of traditional classifiers. We propose a novel method of combining language-agnostic sentence embeddings with the TF-IDF vector representation that uses a curated corpus of words as vocabulary, to create a custom embedding, which is then passed to an SVM classifier. Our experimentation yielded an accuracy of 52% and an F1-score of 0.54. 2022.dravidianlangtech-1.18 swaminathan-etal-2022-pandas + 10.18653/v1/2022.dravidianlangtech-1.18 Translation Techies @<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech-<fixed-case>ACL</fixed-case>2022-Machine Translation in <fixed-case>D</fixed-case>ravidian Languages @@ -227,6 +245,7 @@ This paper discusses the details of submission made by team Translation Techies to the Shared Task on Machine Translation in Dravidian languages- ACL 2022. In connection to the task, five language pairs were provided to test the accuracy of submitted model. A baseline transformer model with Neural Machine Translation(NMT) technique is used which has been taken directly from the OpenNMT framework. On this baseline model, tokenization is applied using the IndicNLP library. Finally, the evaluation is performed using the BLEU scoring mechanism. 2022.dravidianlangtech-1.19 goyal-etal-2022-translation + 10.18653/v1/2022.dravidianlangtech-1.19 <fixed-case>SSNCSE</fixed-case>_<fixed-case>NLP</fixed-case>@<fixed-case>T</fixed-case>amil<fixed-case>NLP</fixed-case>-<fixed-case>ACL</fixed-case>2022: Transformer based approach for Emotion analysis in <fixed-case>T</fixed-case>amil language @@ -236,6 +255,7 @@ Emotion analysis is the process of identifying and analyzing the underlying emotions expressed in textual data. Identifying emotions from a textual conversation is a challenging task due to the absence of gestures, vocal intonation, and facial expressions. Once the chatbots and messengers detect and report the emotions of the user, a comfortable conversation can be carried out with no misunderstandings. Our task is to categorize text into a predefined notion of emotion. In this thesis, it is required to classify text into several emotional labels depending on the task. We have adopted the transformer model approach to identify the emotions present in the text sequence. Our task is to identify whether a given comment contains emotion, and the emotion it stands for. The datasets were provided to us by the LT-EDI organizers (CITATION) for two tasks, in the Tamil language. We have evaluated the datasets using the pretrained transformer models and we have obtained the micro-averaged F1 scores as 0.19 and 0.12 for Task1 and Task 2 respectively. 2022.dravidianlangtech-1.20 b-varsha-2022-ssncse + 10.18653/v1/2022.dravidianlangtech-1.20 <fixed-case>SSN</fixed-case>_<fixed-case>MLRG</fixed-case>1@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech-<fixed-case>ACL</fixed-case>2022: Troll Meme Classification in <fixed-case>T</fixed-case>amil using Transformer Models @@ -248,6 +268,7 @@ The ACL shared task of DravidianLangTech-2022 for Troll Meme classification is a binary classification task that involves identifying Tamil memes as troll or not-troll. Classification of memes is a challenging task since memes express humour and sarcasm in an implicit way. Team SSN_MLRG1 tested and compared results obtained by using three models namely BERT, ALBERT and XLNET. The XLNet model outperformed the other two models in terms of various performance metrics. The proposed XLNet model obtained the 3rd rank in the shared task with a weighted F1-score of 0.558. 2022.dravidianlangtech-1.21 hariprasad-etal-2022-ssn + 10.18653/v1/2022.dravidianlangtech-1.21 <fixed-case>B</fixed-case>p<fixed-case>H</fixed-case>igh@<fixed-case>T</fixed-case>amil<fixed-case>NLP</fixed-case>-<fixed-case>ACL</fixed-case>2022: Effects of Data Augmentation on Indic-Transformer based classifier for Abusive Comments Detection in <fixed-case>T</fixed-case>amil @@ -256,6 +277,7 @@ Social Media platforms have grown their reach worldwide. As an effect of this growth, many vernacular social media platforms have also emerged, focusing more on the diverse languages in the specific regions. Tamil has also emerged as a popular language for use on social media platforms due to the increasing penetration of vernacular media like Sharechat and Moj, which focus more on local Indian languages than English and encourage their users to converse in Indic languages. Abusive language remains a significant challenge in the social media framework and more so when we consider languages like Tamil, which are low-resource languages and have poor performance on multilingual models and lack language-specific models. Based on this shared task, “Abusive Comment detection in Tamil@DravidianLangTech-ACL 2022”, we present an exploration of different techniques used to tackle and increase the accuracy of our models using data augmentation in NLP. We also show the results of these techniques. 2022.dravidianlangtech-1.22 pahwa-2022-bphigh + 10.18653/v1/2022.dravidianlangtech-1.22 <fixed-case>MUCS</fixed-case>@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech@<fixed-case>ACL</fixed-case>2022: Ensemble of Logistic Regression Penalties to Identify Emotions in <fixed-case>T</fixed-case>amil Text @@ -266,6 +288,7 @@ Emotion Analysis (EA) is the process of automatically analyzing and categorizing the input text into one of the predefined sets of emotions. In recent years, people have turned to social media to express their emotions, opinions or feelings about news, movies, products, services, and so on. These users’ emotions may help the public, governments, business organizations, film producers, and others in devising strategies, making decisions, and so on. The increasing number of social media users and the increasing amount of user generated text containing emotions on social media demands automated tools for the analysis of such data as handling this data manually is labor intensive and error prone. Further, the characteristics of social media data makes the EA challenging. Most of the EA research works have focused on English language leaving several Indian languages including Tamil unexplored for this task. To address the challenges of EA in Tamil texts, in this paper, we - team MUCS, describe the model submitted to the shared task on Emotion Analysis in Tamil at DravidianLangTech@ACL 2022. Out of the two subtasks in this shared task, our team submitted the model only for Task a. The proposed model comprises of an Ensemble of Logistic Regression (LR) classifiers with three penalties, namely: L1, L2, and Elasticnet. This Ensemble model trained with Term Frequency - Inverse Document Frequency (TF-IDF) of character bigrams and trigrams secured 4th rank in Task a with a macro averaged F1-score of 0.04. The code to reproduce the proposed models is available in github1. 2022.dravidianlangtech-1.23 hegde-etal-2022-mucs + 10.18653/v1/2022.dravidianlangtech-1.23 <fixed-case>BPHC</fixed-case>@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech-<fixed-case>ACL</fixed-case>2022-A comparative analysis of classical and pre-trained models for troll meme classification in <fixed-case>T</fixed-case>amil @@ -277,6 +300,7 @@ Trolling refers to any user behaviour on the internet to intentionally provoke or instigate conflict predominantly in social media. This paper aims to classify troll meme captions in Tamil-English code-mixed form. Embeddings are obtained for raw code-mixed text and the translated and transliterated version of the text and their relative performances are compared. Furthermore, this paper compares the performances of 11 different classification algorithms using Accuracy and F1- Score. We conclude that we were able to achieve a weighted F1 score of 0.74 through MuRIL pretrained model. 2022.dravidianlangtech-1.24 v-etal-2022-bphc + 10.18653/v1/2022.dravidianlangtech-1.24 <fixed-case>SSNCSE</fixed-case> <fixed-case>NLP</fixed-case>@<fixed-case>T</fixed-case>amil<fixed-case>NLP</fixed-case>-<fixed-case>ACL</fixed-case>2022: Transformer based approach for detection of abusive comment for <fixed-case>T</fixed-case>amil language @@ -286,6 +310,7 @@ Social media platforms along with many other public forums on the Internet have shown a significant rise in the cases of abusive behavior such as Misogynism, Misandry, Homophobia, and Cyberbullying. To tackle these concerns, technologies are being developed and applied, as it is a tedious and time-consuming task to identify, report and block these offenders. Our task was to automate the process of identifying abusive comments and classify them into appropriate categories. The datasets provided by the DravidianLangTech@ACL2022 organizers were a code-mixed form of Tamil text. We trained the datasets using pre-trained transformer models such as BERT,m-BERT, and XLNET and achieved a weighted average of F1 scores of 0.96 for Tamil-English code mixed text and 0.59 for Tamil text. 2022.dravidianlangtech-1.25 b-varsha-2022-ssncse-nlp + 10.18653/v1/2022.dravidianlangtech-1.25 <fixed-case>V</fixed-case>arsini_and_<fixed-case>K</fixed-case>irthanna@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech-<fixed-case>ACL</fixed-case>2022-Emotional Analysis in <fixed-case>T</fixed-case>amil @@ -299,6 +324,7 @@ In this paper, we present our system for the task of Emotion analysis in Tamil. Over 3.96 million people use these platforms to send messages formed using texts, images, videos, audio or combinations of these to express their thoughts and feelings. Text communication on social media platforms is quite overwhelming due to its enormous quantity and simplicity. The data must be processed to understand the general feeling felt by the author. We present a lexicon-based approach for the extraction emotion in Tamil texts. We use dictionaries of words labelled with their respective emotions. The process of assigning an emotional label to each text, and then capture the main emotion expressed in it. Finally, the F1-score in the official test set is 0.0300 and our method ranks 5th. 2022.dravidianlangtech-1.26 s-etal-2022-varsini + 10.18653/v1/2022.dravidianlangtech-1.26 <fixed-case>CUET</fixed-case>-<fixed-case>NLP</fixed-case>@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech-<fixed-case>ACL</fixed-case>2022: Investigating Deep Learning Techniques to Detect Multimodal Troll Memes @@ -311,6 +337,7 @@ With the substantial rise of internet usage, social media has become a powerful communication medium to convey information, opinions, and feelings on various issues. Recently, memes have become a popular way of sharing information on social media. Usually, memes are visuals with text incorporated into them and quickly disseminate hatred and offensive content. Detecting or classifying memes is challenging due to their region-specific interpretation and multimodal nature. This work presents a meme classification technique in Tamil developed by the CUET NLP team under the shared task (DravidianLangTech-ACL2022). Several computational models have been investigated to perform the classification task. This work also explored visual and textual features using VGG16, ResNet50, VGG19, CNN and CNN+LSTM models. Multimodal features are extracted by combining image (VGG16) and text (CNN, LSTM+CNN) characteristics. Results demonstrate that the textual strategy with CNN+LSTM achieved the highest weighted f_1-score (0.52) and recall (0.57). Moreover, the CNN-Text+VGG16 outperformed the other models concerning the multimodal memes detection by achieving the highest f_1-score of 0.49, but the LSTM+CNN model allowed the team to achieve 4^{th} place in the shared task. 2022.dravidianlangtech-1.27 hasan-etal-2022-cuet + 10.18653/v1/2022.dravidianlangtech-1.27 <fixed-case>PICT</fixed-case>@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech-<fixed-case>ACL</fixed-case>2022: Neural Machine Translation On <fixed-case>D</fixed-case>ravidian Languages @@ -325,6 +352,7 @@ vyawahare-etal-2022-pict IndicCorp Samanantar + 10.18653/v1/2022.dravidianlangtech-1.28 Sentiment Analysis on Code-Switched <fixed-case>D</fixed-case>ravidian Languages with Kernel Based Extreme Learning Machines @@ -335,6 +363,7 @@ Code-switching refers to the textual or spoken data containing multiple languages. Application of natural language processing (NLP) tasks like sentiment analysis is a harder problem on code-switched languages due to the irregularities in the sentence structuring and ordering. This paper shows the experiment results of building a Kernel based Extreme Learning Machines(ELM) for sentiment analysis for code-switched Dravidian languages with English. Our results show that ELM performs better than traditional machine learning classifiers on various metrics as well as trains faster than deep learning models. We also show that Polynomial kernels perform better than others in the ELM architecture. We were able to achieve a median AUC of 0.79 with a polynomial kernel. 2022.dravidianlangtech-1.29 s-r-etal-2022-sentiment + 10.18653/v1/2022.dravidianlangtech-1.29 <fixed-case>CUET</fixed-case>-<fixed-case>NLP</fixed-case>@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech-<fixed-case>ACL</fixed-case>2022: Exploiting Textual Features to Classify Sentiment of Multimodal Movie Reviews @@ -348,6 +377,7 @@ With the proliferation of internet usage, a massive growth of consumer-generated content on social media has been witnessed in recent years that provide people’s opinions on diverse issues. Through social media, users can convey their emotions and thoughts in distinctive forms such as text, image, audio, video, and emoji, which leads to the advancement of the multimodality of the content users on social networking sites. This paper presents a technique for classifying multimodal sentiment using the text modality into five categories: highly positive, positive, neutral, negative, and highly negative categories. A shared task was organized to develop models that can identify the sentiments expressed by the videos of movie reviewers in both Malayalam and Tamil languages. This work applied several machine learning techniques (LR, DT, MNB, SVM) and deep learning (BiLSTM, CNN+BiLSTM) to accomplish the task. Results demonstrate that the proposed model with the decision tree (DT) outperformed the other methods and won the competition by acquiring the highest macro f_1-score of 0.24. 2022.dravidianlangtech-1.30 mustakim-etal-2022-cuet + 10.18653/v1/2022.dravidianlangtech-1.30 <fixed-case>CUET</fixed-case>-<fixed-case>NLP</fixed-case>@<fixed-case>T</fixed-case>amil<fixed-case>NLP</fixed-case>-<fixed-case>ACL</fixed-case>2022: Multi-Class Textual Emotion Detection from Social Media using Transformer @@ -361,6 +391,7 @@ Recently, emotion analysis has gained increased attention by NLP researchers due to its various applications in opinion mining, e-commerce, comprehensive search, healthcare, personalized recommendations and online education. Developing an intelligent emotion analysis model is challenging in resource-constrained languages like Tamil. Therefore a shared task is organized to identify the underlying emotion of a given comment expressed in the Tamil language. The paper presents our approach to classifying the textual emotion in Tamil into 11 classes: ambiguous, anger, anticipation, disgust, fear, joy, love, neutral, sadness, surprise and trust. We investigated various machine learning (LR, DT, MNB, SVM), deep learning (CNN, LSTM, BiLSTM) and transformer-based models (Multilingual-BERT, XLM-R). Results reveal that the XLM-R model outdoes all other models by acquiring the highest macro f_1-score (0.33). 2022.dravidianlangtech-1.31 mustakim-etal-2022-cuet-nlp + 10.18653/v1/2022.dravidianlangtech-1.31 <fixed-case>DLRG</fixed-case>@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech-<fixed-case>ACL</fixed-case>2022: Abusive Comment Detection in <fixed-case>T</fixed-case>amil using Multilingual Transformer Models @@ -371,6 +402,7 @@ Online Social Network has let people to connect and interact with each other. It does, however, also provide a platform for online abusers to propagate abusive content. The vast majority of abusive remarks are written in a multilingual style, which allows them to easily slip past internet inspection. This paper presents a system developed for the Shared Task on Abusive Comment Detection (Misogyny, Misandry, Homophobia, Transphobic, Xenophobia, CounterSpeech, Hope Speech) in Tamil DravidianLangTech@ACL 2022 to detect the abusive category of each comment. We approach the task with three methodologies - Machine Learning, Deep Learning and Transformer-based modeling, for two sets of data - Tamil and Tamil+English language dataset. The dataset used in our system can be accessed from the competition on CodaLab. For Machine Learning, eight algorithms were implemented, among which Random Forest gave the best result with Tamil+English dataset, with a weighted average F1-score of 0.78. For Deep Learning, Bi-Directional LSTM gave best result with pre-trained word embeddings. In Transformer-based modeling, we used IndicBERT and mBERT with fine-tuning, among which mBERT gave the best result for Tamil dataset with a weighted average F1-score of 0.7. 2022.dravidianlangtech-1.32 rajalakshmi-etal-2022-dlrg + 10.18653/v1/2022.dravidianlangtech-1.32 Aanisha@<fixed-case>T</fixed-case>amil<fixed-case>NLP</fixed-case>-<fixed-case>ACL</fixed-case>2022:Abusive Detection in <fixed-case>T</fixed-case>amil @@ -379,6 +411,7 @@ In social media, there are instances where people present their opinions in strong language, resorting to abusive/toxic comments.There are instances of communal hatred, hate-speech, toxicity and bullying. And, in this age of social media, it’s very important to find means to keep check on these toxic comments, as to preserve the mental peace of people in social media.While there are tools, models to detect andpotentially filter these kind of content, developing these kinds of models for the low resource language space is an issue of research.In this paper, the task of abusive comment identification in Tamil language, is seen upon as a multi-class classification problem.There are different pre-processing as well as modelling approaches discussed in this paper.The different approaches are compared on the basis of weighted average accuracy. 2022.dravidianlangtech-1.33 bhattacharyya-2022-aanisha + 10.18653/v1/2022.dravidianlangtech-1.33 <fixed-case>COMBATANT</fixed-case>@<fixed-case>T</fixed-case>amil<fixed-case>NLP</fixed-case>-<fixed-case>ACL</fixed-case>2022: Fine-grained Categorization of Abusive Comments using Logistic Regression @@ -391,6 +424,7 @@ With the widespread usage of social media and effortless internet access, millions of posts and comments are generated every minute. Unfortunately, with this substantial rise, the usage of abusive language has increased significantly in these mediums. This proliferation leads to many hazards such as cyber-bullying, vulgarity, online harassment and abuse. Therefore, it becomes a crucial issue to detect and mitigate the usage of abusive language. This work presents our system developed as part of the shared task to detect the abusive language in Tamil. We employed three machine learning (LR, DT, SVM), two deep learning (CNN+BiLSTM, CNN+BiLSTM with FastText) and a transformer-based model (Indic-BERT). The experimental results show that Logistic regression (LR) and CNN+BiLSTM models outperformed the others. Both Logistic Regression (LR) and CNN+BiLSTM with FastText achieved the weighted F_1-score of 0.39. However, LR obtained a higher recall value (0.44) than CNN+BiLSTM (0.36). This leads us to stand the 2^{nd} rank in the shared task competition. 2022.dravidianlangtech-1.34 hossain-etal-2022-combatant + 10.18653/v1/2022.dravidianlangtech-1.34 <fixed-case>O</fixed-case>ptimize_<fixed-case>P</fixed-case>rime@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech-<fixed-case>ACL</fixed-case>2022: Emotion Analysis in <fixed-case>T</fixed-case>amil @@ -403,6 +437,7 @@ This paper aims to perform an emotion analysis of social media comments in Tamil. Emotion analysis is the process of identifying the emotional context of the text. In this paper, we present the findings obtained by Team Optimize_Prime in the ACL 2022 shared task “Emotion Analysis in Tamil.” The task aimed to classify social media comments into categories of emotion like Joy, Anger, Trust, Disgust, etc. The task was further divided into two subtasks, one with 11 broad categories of emotions and the other with 31 specific categories of emotion. We implemented three different approaches to tackle this problem: transformer-based models, Recurrent Neural Networks (RNNs), and Ensemble models. XLM-RoBERTa performed the best on the first task with a macro-averaged f1 score of 0.27, while MuRIL provided the best results on the second task with a macro-averaged f1 score of 0.13. 2022.dravidianlangtech-1.35 gokhale-etal-2022-optimize + 10.18653/v1/2022.dravidianlangtech-1.35 <fixed-case>O</fixed-case>ptimize_<fixed-case>P</fixed-case>rime@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech-<fixed-case>ACL</fixed-case>2022: Abusive Comment Detection in <fixed-case>T</fixed-case>amil @@ -415,6 +450,7 @@ This paper tries to address the problem of abusive comment detection in low-resource indic languages. Abusive comments are statements that are offensive to a person or a group of people. These comments are targeted toward individuals belonging to specific ethnicities, genders, caste, race, sexuality, etc. Abusive Comment Detection is a significant problem, especially with the recent rise in social media users. This paper presents the approach used by our team — Optimize_Prime, in the ACL 2022 shared task “Abusive Comment Detection in Tamil.” This task detects and classifies YouTube comments in Tamil and Tamil-English Codemixed format into multiple categories. We have used three methods to optimize our results: Ensemble models, Recurrent Neural Networks, and Transformers. In the Tamil data, MuRIL and XLM-RoBERTA were our best performing models with a macro-averaged f1 score of 0.43. Furthermore, for the Code-mixed data, MuRIL and M-BERT provided sublime results, with a macro-averaged f1 score of 0.45. 2022.dravidianlangtech-1.36 patankar-etal-2022-optimize + 10.18653/v1/2022.dravidianlangtech-1.36 Zero-shot Code-Mixed Offensive Span Identification through Rationale Extraction @@ -426,6 +462,7 @@ 2022.dravidianlangtech-1.37 ravikiran-chakravarthi-2022-zero manikandan-ravikiran/zero-shot-offensive-span + 10.18653/v1/2022.dravidianlangtech-1.37 <fixed-case>DLRG</fixed-case>@<fixed-case>T</fixed-case>amil<fixed-case>NLP</fixed-case>-<fixed-case>ACL</fixed-case>2022: Offensive Span Identification in <fixed-case>T</fixed-case>amil using<fixed-case>B</fixed-case>i<fixed-case>LSTM</fixed-case>-<fixed-case>CRF</fixed-case> approach @@ -439,6 +476,7 @@ Identifying offensive speech is an exciting andessential area of research, with ample tractionin recent times. This paper presents our sys-tem submission to the subtask 1, focusing onusing supervised approaches for extracting Of-fensive spans from code-mixed Tamil-Englishcomments. To identify offensive spans, wedeveloped the Bidirectional Long Short-TermMemory (BiLSTM) model with Glove Em-bedding. To this end, the developed systemachieved an overall F1 of 0.1728. Addition-ally, for comments with less than 30 characters,the developed system shows an F1 of 0.3890,competitive with other submissions. 2022.dravidianlangtech-1.38 rajalakshmi-etal-2022-dlrg-tamilnlp + 10.18653/v1/2022.dravidianlangtech-1.38 Findings of the Shared Task on Multimodal Sentiment Analysis and Troll Meme Classification in <fixed-case>D</fixed-case>ravidian Languages @@ -455,6 +493,7 @@ This paper presents the findings of the shared task on Multimodal Sentiment Analysis and Troll meme classification in Dravidian languages held at ACL 2022. Multimodal sentiment analysis deals with the identification of sentiment from video. In addition to video data, the task requires the analysis of corresponding text and audio features for the classification of movie reviews into five classes. We created a dataset for this task in Malayalam and Tamil. The Troll meme classification task aims to classify multimodal Troll memes into two categories. This task assumes the analysis of both text and image features for making better predictions. The performance of the participating teams was analysed using the F1-score. Only one team submitted their results in the Multimodal Sentiment Analysis task, whereas we received six submissions in the Troll meme classification task. The only team that participated in the Multimodal Sentiment Analysis shared task obtained an F1-score of 0.24. In the Troll meme classification task, the winning team achieved an F1-score of 0.596. 2022.dravidianlangtech-1.39 b-etal-2022-findings + 10.18653/v1/2022.dravidianlangtech-1.39 Findings of the Shared Task on Offensive Span Identification from<fixed-case>C</fixed-case>ode-Mixed <fixed-case>T</fixed-case>amil-<fixed-case>E</fixed-case>nglish Comments @@ -470,6 +509,7 @@ Offensive content moderation is vital in social media platforms to support healthy online discussions. However, their prevalence in code-mixed Dravidian languages is limited to classifying whole comments without identifying part of it contributing to offensiveness. Such limitation is primarily due to the lack of annotated data for offensive spans. Accordingly, in this shared task, we provide Tamil-English code-mixed social comments with offensive spans. This paper outlines the dataset so released, methods, and results of the submitted systems. 2022.dravidianlangtech-1.40 ravikiran-etal-2022-findings + 10.18653/v1/2022.dravidianlangtech-1.40 Overview of the Shared Task on Machine Translation in <fixed-case>D</fixed-case>ravidian Languages @@ -485,6 +525,7 @@ 2022.dravidianlangtech-1.41 madasamy-etal-2022-overview Samanantar + 10.18653/v1/2022.dravidianlangtech-1.41 Findings of the Shared Task on Emotion Analysis in <fixed-case>T</fixed-case>amil @@ -505,6 +546,7 @@ This paper presents the overview of the shared task on emotional analysis in Tamil. The result of the shared task is presented at the workshop. This paper presents the dataset used in the shared task, task description, and the methodology used by the participants and the evaluation results of the submission. This task is organized as two Tasks. Task A is carried with 11 emotions annotated data for social media comments in Tamil and Task B is organized with 31 fine-grained emotion annotated data for social media comments in Tamil. For conducting experiments, training and development datasets were provided to the participants and results are evaluated for the unseen data. Totally we have received around 24 submissions from 13 teams. For evaluating the models, Precision, Recall, micro average metrics are used. 2022.dravidianlangtech-1.42 sampath-etal-2022-findings + 10.18653/v1/2022.dravidianlangtech-1.42 Findings of the Shared Task on Multi-task Learning in <fixed-case>D</fixed-case>ravidian Languages @@ -523,6 +565,7 @@ We present our findings from the first shared task on Multi-task Learning in Dravidian Languages at the second Workshop on Speech and Language Technologies for Dravidian Languages. In this task, a sentence in any of three Dravidian Languages is required to be classified into two closely related tasks namely Sentiment Analyis (SA) and Offensive Language Identification (OLI). The task spans over three Dravidian Languages, namely, Kannada, Malayalam, and Tamil. It is one of the first shared tasks that focuses on Multi-task Learning for closely related tasks, especially for a very low-resourced language family such as the Dravidian language family. In total, 55 people signed up to participate in the task, and due to the intricate nature of the task, especially in its first iteration, 3 submissions have been received. 2022.dravidianlangtech-1.43 chakravarthi-etal-2022-findings + 10.18653/v1/2022.dravidianlangtech-1.43 Overview of Abusive Comment Detection in <fixed-case>T</fixed-case>amil-<fixed-case>ACL</fixed-case> 2022 @@ -538,6 +581,7 @@ The social media is one of the significantdigital platforms that create a huge im-pact in peoples of all levels. The commentsposted on social media is powerful enoughto even change the political and businessscenarios in very few hours. They alsotend to attack a particular individual ora group of individuals. This shared taskaims at detecting the abusive comments in-volving, Homophobia, Misandry, Counter-speech, Misogyny, Xenophobia, Transpho-bic. The hope speech is also identified. Adataset collected from social media taggedwith the above said categories in Tamiland Tamil-English code-mixed languagesare given to the participants. The par-ticipants used different machine learningand deep learning algorithms. This paperpresents the overview of this task compris-ing the dataset details and results of theparticipants. 2022.dravidianlangtech-1.44 priyadharshini-etal-2022-overview + 10.18653/v1/2022.dravidianlangtech-1.44 diff --git a/data/xml/2022.ecnlp.xml b/data/xml/2022.ecnlp.xml index c6eaea6589..4e4ec7c166 100644 --- a/data/xml/2022.ecnlp.xml +++ b/data/xml/2022.ecnlp.xml @@ -26,6 +26,7 @@ Defect Triage is a time-sensitive and critical process in a large-scale agile software development lifecycle for e-commerce. Inefficiencies arising from human and process dependencies in this domain have motivated research in automated approaches using machine learning to accurately assign defects to qualified teams. This work proposes a novel framework for automated defect triage (DEFTri) using fine-tuned state-of-the-art pre-trained BERT on labels fused text embeddings to improve contextual representations from human-generated product defects. For our multi-label text classification defect triage task, we also introduce a Walmart proprietary dataset of product defects using weak supervision and adversarial learning, in a few-shot setting. 2022.ecnlp-1.1 mohanty-2022-deftri + 10.18653/v1/2022.ecnlp-1.1 Interactive Latent Knowledge Selection for <fixed-case>E</fixed-case>-Commerce Product Copywriting Generation @@ -40,6 +41,7 @@ As the multi-modal e-commerce is thriving, high-quality advertising product copywriting has gain more attentions, which plays a crucial role in the e-commerce recommender, advertising and even search platforms.The advertising product copywriting is able to enhance the user experience by highlighting the product’s characteristics with textual descriptions and thus to improve the likelihood of user click and purchase. Automatically generating product copywriting has attracted noticeable interests from both academic and industrial communities, where existing solutions merely make use of a product’s title and attribute information to generate its corresponding description.However, in addition to the product title and attributes, we observe that there are various auxiliary descriptions created by the shoppers or marketers in the e-commerce platforms (namely human knowledge), which contains valuable information for product copywriting generation, yet always accompanying lots of noises.In this work, we propose a novel solution to automatically generating product copywriting that involves all the title, attributes and denoised auxiliary knowledge.To be specific, we design an end-to-end generation framework equipped with two variational autoencoders that works interactively to select informative human knowledge and generate diverse copywriting. 2022.ecnlp-1.2 wang-etal-2022-interactive + 10.18653/v1/2022.ecnlp-1.2 Leveraging Seq2seq Language Generation for Multi-level Product Issue Identification @@ -54,6 +56,7 @@ In a leading e-commerce business, we receive hundreds of millions of customer feedback from different text communication channels such as product reviews. The feedback can contain rich information regarding customers’ dissatisfaction in the quality of goods and services. To harness such information to better serve customers, in this paper, we created a machine learning approach to automatically identify product issues and uncover root causes from the customer feedback text. We identify issues at two levels: coarse grained (L-Coarse) and fine grained (L-Granular). We formulate this multi-level product issue identification problem as a seq2seq language generation problem. Specifically, we utilize transformer-based seq2seq models due to their versatility and strong transfer-learning capability. We demonstrate that our approach is label efficient and outperforms the traditional approach such as multi-class multi-label classification formulation. Based on human evaluation, our fine-tuned model achieves 82.1% and 95.4% human-level performance for L-Coarse and L-Granular issue identification, respectively. Furthermore, our experiments illustrate that the model can generalize to identify unseen L-Granular issues. 2022.ecnlp-1.3 liu-etal-2022-leveraging + 10.18653/v1/2022.ecnlp-1.3 Data Quality Estimation Framework for Faster Tax Code Classification @@ -64,6 +67,7 @@ This paper describes a novel framework to estimate the data quality of a collection of product descriptions to identify required relevant information for accurate product listing classification for tax-code assignment. Our Data Quality Estimation (DQE) framework consists of a Question Answering (QA) based attribute value extraction model to identify missing attributes and a classification model to identify bad quality records. We show that our framework can accurately predict the quality of product descriptions. In addition to identifying low-quality product listings, our framework can also generate a detailed report at a category level showing missing product information resulting in a better customer experience. 2022.ecnlp-1.4 kondadadi-etal-2022-data + 10.18653/v1/2022.ecnlp-1.4 <fixed-case>CML</fixed-case>: A Contrastive Meta Learning Method to Estimate Human Label Confidence Scores and Reduce Data Collection Cost @@ -77,6 +81,7 @@ Deep neural network models are especially susceptible to noise in annotated labels. In the real world, annotated data typically contains noise caused by a variety of factors such as task difficulty, annotator experience, and annotator bias. Label quality is critical for label validation tasks; however, correcting for noise by collecting more data is often costly. In this paper, we propose a contrastive meta-learning framework (CML) to address the challenges introduced by noisy annotated data, specifically in the context of natural language processing. CML combines contrastive and meta learning to improve the quality of text feature representations. Meta-learning is also used to generate confidence scores to assess label quality. We demonstrate that a model built on CML-filtered data outperforms a model built on clean data. Furthermore, we perform experiments on deidentified commercial voice assistant datasets and demonstrate that our model outperforms several SOTA approaches. 2022.ecnlp-1.5 dong-etal-2022-cml + 10.18653/v1/2022.ecnlp-1.5 Improving Relevance Quality in Product Search using High-Precision Query-Product Semantic Similarity @@ -92,6 +97,7 @@ Ensuring relevance quality in product search is a critical task as it impacts the customer’s ability to find intended products in the short-term as well as the general perception and trust of the e-commerce system in the long term. In this work we leverage a high-precision cross-encoder BERT model for semantic similarity between customer query and products and survey its effectiveness for three ranking applications where offline-generated scores could be used: (1) as an offline metric for estimating relevance quality impact, (2) as a re-ranking feature covering head/torso queries, and (3) as a training objective for optimization. We present results on effectiveness of this strategy for the large e-commerce setting, which has general applicability for choice of other high-precision models and tasks in ranking. 2022.ecnlp-1.6 bagheri-garakani-etal-2022-improving + 10.18653/v1/2022.ecnlp-1.6 Comparative Snippet Generation @@ -103,6 +109,7 @@ 2022.ecnlp-1.7 jain-etal-2022-comparative wing-nus/comparative-snippet-generation-dataset + 10.18653/v1/2022.ecnlp-1.7 Textual Content Moderation in <fixed-case>C</fixed-case>2<fixed-case>C</fixed-case> Marketplace @@ -113,6 +120,7 @@ Automatic monitoring systems for inappropriate user-generated messages have been found to be effective in reducing human operation costs in Consumer to Consumer (C2C) marketplace services, in which customers send messages directly to other customers.We propose a lightweight neural network that takes a conversation as input, which we deployed to a production service.Our results show that the system reduced the human operation costs to less than one-sixth compared to the conventional rule-based monitoring at Mercari. 2022.ecnlp-1.8 shido-etal-2022-textual + 10.18653/v1/2022.ecnlp-1.8 Spelling Correction using Phonetics in <fixed-case>E</fixed-case>-commerce Search @@ -127,6 +135,7 @@ In E-commerce search, spelling correction plays an important role to find desired products for customers in processing user-typed search queries. However, resolving phonetic errors is a critical but much overlooked area. The query with phonetic spelling errors tends to appear correct based on pronunciation but is nonetheless inaccurate in spelling (e.g., “bluetooth sound system” vs. “blutut sant sistam”) with numerous noisy forms and sparse occurrences. In this work, we propose a generalized spelling correction system integrating phonetics to address phonetic errors in E-commerce search without additional latency cost. Using India (IN) E-commerce market for illustration, the experiment shows that our proposed phonetic solution significantly improves the F1 score by 9%+ and recall of phonetic errors by 8%+. This phonetic spelling correction system has been deployed to production, currently serving hundreds of millions of customers. 2022.ecnlp-1.9 yang-etal-2022-spelling + 10.18653/v1/2022.ecnlp-1.9 Logical Reasoning for Task Oriented Dialogue Systems @@ -139,6 +148,7 @@ In recent years, large pretrained models have been used in dialogue systems to improve successful task completion rates. However, lack of reasoning capabilities of dialogue platforms make it difficult to provide relevant and fluent responses, unless the designers of a conversational experience spend a considerable amount of time implementing these capabilities in external rule based modules. In this work, we propose a novel method to fine-tune pretrained transformer models such as Roberta and T5, to reason over a set of facts in a given dialogue context.Our method includes a synthetic data generation mechanism which helps the model learn logical relations, such as comparison between list of numerical values, inverse relations (and negation), inclusion and exclusion for categorical attributes, and application of a combination of attributes over both numerical and categorical values, and spoken form for numerical values, without need for additional training data. We show that the transformer based model can perform logical reasoning to answer questions when the dialogue context contains all the required information, otherwise it is able to extract appropriate constraints to pass to downstream components (e.g. a knowledge base) when partial information is available. We observe that transformer based models such as UnifiedQA-T5 can be fine-tuned to perform logical reasoning (such as numerical and categorical attributes’ comparison) over attributes seen at training time (e.g., accuracy of 90%+ for comparison of smaller than kmax=5 values over heldout test dataset). 2022.ecnlp-1.10 beygi-etal-2022-logical + 10.18653/v1/2022.ecnlp-1.10 <fixed-case>C</fixed-case>o<fixed-case>VA</fixed-case>: Context-aware Visual Attention for Webpage Information Extraction @@ -153,6 +163,7 @@ kumar-etal-2022-cova kevalmorabia97/cova-web-object-detection CoVA + 10.18653/v1/2022.ecnlp-1.11 Product Titles-to-Attributes As a Text-to-Text Task @@ -162,6 +173,7 @@ Online marketplaces use attribute-value pairs, such as brand, size, size type, color, etc. to help define important and relevant facts about a listing. These help buyers to curate their search results using attribute filtering and overall create a richer experience. Although their critical importance for listings’ discoverability, getting sellers to input tens of different attribute-value pairs per listing is costly and often results in missing information. This can later translate to the unnecessary removal of relevant listings from the search results when buyers are filtering by attribute values. In this paper we demonstrate using a Text-to-Text hierarchical multi-label ranking model framework to predict the most relevant attributes per listing, along with their expected values, using historic user behavioral data. This solution helps sellers by allowing them to focus on verifying information on attributes that are likely to be used by buyers, and thus, increase the expected recall for their listings. Specifically for eBay’s case we show that using this model can improve the relevancy of the attribute extraction process by 33.2% compared to the current highly-optimized production system. Apart from the empirical contribution, the highly generalized nature of the framework presented in this paper makes it relevant for many high-volume search-driven websites. 2022.ecnlp-1.12 fuchs-acriche-2022-product + 10.18653/v1/2022.ecnlp-1.12 Product Answer Generation from Heterogeneous Sources: A New Benchmark and Best Practices @@ -176,6 +188,7 @@ 2022.ecnlp-1.13 shen-etal-2022-product AmazonQA + 10.18653/v1/2022.ecnlp-1.13 semi<fixed-case>PQA</fixed-case>: A Study on Product Question Answering over Semi-structured Data @@ -192,6 +205,7 @@ Natural Questions NewsQA SQuAD + 10.18653/v1/2022.ecnlp-1.14 Improving Specificity in Review Response Generation with Data-Driven Data Filtering @@ -201,6 +215,7 @@ Responding to online customer reviews has become an essential part of successfully managing and growing a business both in e-commerce and the hospitality and tourism sectors. Recently, neural text generation methods intended to assist authors in composing responses have been shown to deliver highly fluent and natural looking texts. However, they also tend to learn a strong, undesirable bias towards generating overly generic, one-size-fits-all outputs to a wide range of inputs. While this often results in ‘safe’, high-probability responses, there are many practical settings in which greater specificity is preferable. In this work we examine the task of generating more specific responses for online reviews in the hospitality domain by identifying generic responses in the training data, filtering them and fine-tuning the generation model. We experiment with a range of data-driven filtering methods and show through automatic and human evaluation that, despite a 60% reduction in the amount of training data, filtering helps to derive models that are capable of generating more specific, useful responses. 2022.ecnlp-1.15 kew-volk-2022-improving + 10.18653/v1/2022.ecnlp-1.15 Extreme Multi-Label Classification with Label Masking for Product Attribute Value Extraction @@ -211,6 +226,7 @@ Although most studies have treated attribute value extraction (AVE) as named entity recognition, these approaches are not practical in real-world e-commerce platforms because they perform poorly, and require canonicalization of extracted values. Furthermore, since values needed for actual services is static in many attributes, extraction of new values is not always necessary. Given the above, we formalize AVE as extreme multi-label classification (XMC). A major problem in solving AVE as XMC is that the distribution between positive and negative labels for products is heavily imbalanced. To mitigate the negative impact derived from such biased distribution, we propose label masking, a simple and effective method to reduce the number of negative labels in training. We exploit attribute taxonomy designed for e-commerce platforms to determine which labels are negative for products. Experimental results using a dataset collected from a Japanese e-commerce platform demonstrate that the label masking improves micro and macro F_1 scores by 3.38 and 23.20 points, respectively. 2022.ecnlp-1.16 chen-etal-2022-extreme + 10.18653/v1/2022.ecnlp-1.16 Enhanced Representation with Contrastive Loss for Long-Tail Query Classification in e-commerce @@ -222,6 +238,7 @@ Query classification is a fundamental task in an e-commerce search engine, which assigns one or multiple predefined product categories in response to each search query. Taking click-through logs as training data in deep learning methods is a common and effective approach for query classification. However, the frequency distribution of queries typically has long-tail property, which means that there are few logs for most of the queries. The lack of reliable user feedback information results in worse performance of long-tail queries compared with frequent queries. To solve the above problem, we propose a novel method that leverages an auxiliary module to enhance the representations of long-tail queries by taking advantage of reliable supervised information of variant frequent queries. The long-tail queries are guided by the contrastive loss to obtain category-aligned representations in the auxiliary module, where the variant frequent queries serve as anchors in the representation space. We train our model with real-world click data from AliExpress and conduct evaluation on both offline labeled data and online AB test. The results and further analysis demonstrate the effectiveness of our proposed method. 2022.ecnlp-1.17 zhu-etal-2022-enhanced + 10.18653/v1/2022.ecnlp-1.17 Domain-specific knowledge distillation yields smaller and better models for conversational commerce @@ -239,6 +256,7 @@ We demonstrate that knowledge distillation can be used not only to reduce model size, but to simultaneously adapt a contextual language model to a specific domain. We use Multilingual BERT (mBERT; Devlin et al., 2019) as a starting point and follow the knowledge distillation approach of (Sahn et al., 2019) to train a smaller multilingual BERT model that is adapted to the domain at hand. We show that for in-domain tasks, the domain-specific model shows on average 2.3% improvement in F1 score, relative to a model distilled on domain-general data. Whereas much previous work with BERT has fine-tuned the encoder weights during task training, we show that the model improvements from distillation on in-domain data persist even when the encoder weights are frozen during task training, allowing a single encoder to support classifiers for multiple tasks and languages. 2022.ecnlp-1.18 howell-etal-2022-domain + 10.18653/v1/2022.ecnlp-1.18 <fixed-case>O</fixed-case>pen<fixed-case>B</fixed-case>rand: Open Brand Value Extraction from Product Descriptions @@ -250,6 +268,7 @@ 2022.ecnlp-1.19 sabeh-etal-2022-openbrand kassemsabeh/open-brand + 10.18653/v1/2022.ecnlp-1.19 Robust Product Classification with Instance-Dependent Noise @@ -259,6 +278,7 @@ Noisy labels in large E-commerce product data (i.e., product items are placed into incorrect categories) is a critical issue for product categorization task because they are unavoidable, non-trivial to remove and degrade prediction performance significantly. Training a product title classification model which is robust to noisy labels in the data is very important to make product classification applications more practical. In this paper, we study the impact of instance-dependent noise to performance of product title classification by comparing our data denoising algorithm and different noise-resistance training algorithms which were designed to prevent a classifier model from over-fitting to noise. We develop a simple yet effective Deep Neural Network for product title classification to use as a base classifier. Along with recent methods of stimulating instance-dependent noise, we propose a novel noise stimulation algorithm based on product title similarity. Our experiments cover multiple datasets, various noise methods and different training solutions. Results uncover the limit of classification task when noise rate is not negligible and data distribution is highly skewed. 2022.ecnlp-1.20 nguyen-khatwani-2022-robust + 10.18653/v1/2022.ecnlp-1.20 Structured Extraction of Terms and Conditions from <fixed-case>G</fixed-case>erman and <fixed-case>E</fixed-case>nglish Online Shops @@ -270,6 +290,7 @@ 2022.ecnlp-1.21 schamel-etal-2022-structured sebischair/lowestcommonancestorextractor + 10.18653/v1/2022.ecnlp-1.21 “Does it come in black?” <fixed-case>CLIP</fixed-case>-like models are zero-shot recommenders @@ -282,6 +303,7 @@ Product discovery is a crucial component for online shopping. However, item-to-item recommendations today do not allow users to explore changes along selected dimensions: given a query item, can a model suggest something similar but in a different color? We consider item recommendations of the comparative nature (e.g. “something darker”) and show how CLIP-based models can support this use case in a zero-shot manner. Leveraging a large model built for fashion, we introduce GradREC and its industry potential, and offer a first rounded assessment of its strength and weaknesses. 2022.ecnlp-1.22 chia-etal-2022-come + 10.18653/v1/2022.ecnlp-1.22 Clause Topic Classification in <fixed-case>G</fixed-case>erman and <fixed-case>E</fixed-case>nglish Standard Form Contracts @@ -291,6 +313,7 @@ So-called standard form contracts, i.e. contracts that are drafted unilaterally by one party, like terms and conditions of online shops or terms of services of social networks, are cornerstones of our modern economy. Their processing is, therefore, of significant practical value. Often, the sheer size of these contracts allows the drafting party to hide unfavourable terms from the other party. In this paper, we compare different approaches for automatically classifying the topics of clauses in standard form contracts, based on a data-set of more than 6,000 clauses from more than 170 contracts, which we collected from German and English online shops and annotated based on a taxonomy of clause topics, that we developed together with legal experts. We will show that, in our comparison of seven approaches, from simple keyword matching to transformer language models, BERT performed best with an F1-score of up to 0.91, however much simpler and computationally cheaper models like logistic regression also achieved similarly good results of up to 0.87. 2022.ecnlp-1.23 braun-matthes-2022-clause + 10.18653/v1/2022.ecnlp-1.23 Investigating the Generative Approach for Question Answering in <fixed-case>E</fixed-case>-Commerce @@ -302,6 +325,7 @@ Many e-commerce websites provide Product-related Question Answering (PQA) platform where potential customers can ask questions related to a product, and other consumers can post an answer to that question based on their experience. Recently, there has been a growing interest in providing automated responses to product questions. In this paper, we investigate the suitability of the generative approach for PQA. We use state-of-the-art generative models proposed by Deng et al.(2020) and Lu et al.(2020) for this purpose. On closer examination, we find several drawbacks in this approach: (1) input reviews are not always utilized significantly for answer generation, (2) the performance of the models is abysmal while answering the numerical questions, (3) many of the generated answers contain phrases like “I do not know” which are taken from the reference answer in training data, and these answers do not convey any information to the customer. Although these approaches achieve a high ROUGE score, it does not reflect upon these shortcomings of the generated answers. We hope that our analysis will lead to more rigorous PQA approaches, and future research will focus on addressing these shortcomings in PQA. 2022.ecnlp-1.24 roy-etal-2022-investigating + 10.18653/v1/2022.ecnlp-1.24 Utilizing Cross-Modal Contrastive Learning to Improve Item Categorization <fixed-case>BERT</fixed-case> Model @@ -311,6 +335,7 @@ Item categorization (IC) is a core natural language processing (NLP) task in e-commerce. As a special text classification task, fine-tuning pre-trained models, e.g., BERT, has become a mainstream solution. To improve IC performance further, other product metadata, e.g., product images, have been used. Although multimodal IC (MIC) systems show higher performance, expanding from processing text to more resource-demanding images brings large engineering impacts and hinders the deployment of such dual-input MIC systems. In this paper, we proposed a new way of using product images to improve text-only IC model: leveraging cross-modal signals between products’ titles and associated images to adapt BERT models in a self-supervised learning (SSL) way. Our experiments on the three genres in the public Amazon product dataset show that the proposed method generates improved prediction accuracy and macro-F1 values than simply using the original BERT. Moreover, the proposed method is able to keep using existing text-only IC inference implementation and shows a resource advantage than the deployment of a dual-input MIC system. 2022.ecnlp-1.25 chen-chou-2022-utilizing + 10.18653/v1/2022.ecnlp-1.25 Towards Generalizeable Semantic Product Search by Text Similarity Pre-training on Search Click Logs @@ -324,6 +349,7 @@ Recently, semantic search has been successfully applied to E-commerce product search and the learned semantic space for query and product encoding are expected to generalize well to unseen queries or products. Yet, whether generalization can conveniently emerge has not been thoroughly studied in the domain thus far. In this paper, we examine several general-domain and domain-specific pre-trained Roberta variants and discover that general-domain fine-tuning does not really help generalization which aligns with the discovery of prior art, yet proper domain-specific fine-tuning with clickstream data can lead to better model generalization, based on a bucketed analysis of a manually annotated query-product relevance data. 2022.ecnlp-1.26 liu-etal-2022-towards + 10.18653/v1/2022.ecnlp-1.26 Can Pretrained Language Models Generate Persuasive, Faithful, and Informative Ad Text for Product Descriptions? @@ -334,6 +360,7 @@ For any e-commerce service, persuasive, faithful, and informative product descriptions can attract shoppers and improve sales. While not all sellers are capable of providing such interesting descriptions, a language generation system can be a source of such descriptions at scale, and potentially assist sellers to improve their product descriptions. Most previous work has addressed this task based on statistical approaches (Wang et al., 2017), limited attributes such as titles (Chen et al., 2019; Chan et al., 2020), and focused on only one product type (Wang et al., 2017; Munigala et al., 2018; Hong et al., 2021). In this paper, we jointly train image features and 10 text attributes across 23 diverse product types, with two different target text types with different writing styles: bullet points and paragraph descriptions. Our findings suggest that multimodal training with modern pretrained language models can generate fluent and persuasive advertisements, but are less faithful and informative, especially out of domain. 2022.ecnlp-1.27 koto-etal-2022-pretrained + 10.18653/v1/2022.ecnlp-1.27 A Simple Baseline for Domain Adaptation in End to End <fixed-case>ASR</fixed-case> Systems Using Synthetic Data @@ -343,6 +370,7 @@ Automatic Speech Recognition(ASR) has been dominated by deep learning-based end-to-end speech recognition models. These approaches require large amounts of labeled data in the form of audio-text pairs. Moreover, these models are more susceptible to domain shift as compared to traditional models. It is common practice to train generic ASR models and then adapt them to target domains using comparatively smaller data sets. We consider a more extreme case of domain adaptation where text-only corpus is available. In this work, we propose a simple baseline technique for domain adaptation in end-to-end speech recognition models. We convert the text-only corpus to audio data using single speaker Text to Speech (TTS) engine. The parallel data in the target domain is then used to fine-tune the final dense layer of generic ASR models. We show that single speaker synthetic TTS data coupled with final dense layer only fine-tuning provides reasonable improvements in word error rates. We use text data from address and e-commerce search domains to show the effectiveness of our low-cost baseline approach on CTC and attention-based models. 2022.ecnlp-1.28 joshi-singh-2022-simple + 10.18653/v1/2022.ecnlp-1.28 Lot or Not: Identifying Multi-Quantity Offerings in <fixed-case>E</fixed-case>-Commerce @@ -352,6 +380,7 @@ The term lot in is defined to mean an offering that contains a collection of multiple identical items for sale. In a large online marketplace, lot offerings play an important role, allowing buyers and sellers to set price levels to optimally balance supply and demand needs. In spite of their central role, platforms often struggle to identify lot offerings, since explicit lot status identification is frequently not provided by sellers. The ability to identify lot offerings plays a key role in many fundamental tasks, from matching offerings to catalog products, through ranking search results, to providing effective pricing guidance. In this work, we seek to determine the lot status (and lot size) of each offering, in order to facilitate an improved buyer experience, while reducing the friction for sellers posting new offerings. We demonstrate experimentally the ability to accurately classify offerings as lots and predict their lot size using only the offer title, by adapting state-of-the-art natural language techniques to the lot identification problem. 2022.ecnlp-1.29 lavee-guy-2022-lot + 10.18653/v1/2022.ecnlp-1.29 diff --git a/data/xml/2022.fever.xml b/data/xml/2022.fever.xml index 43438c9852..be78c35d52 100644 --- a/data/xml/2022.fever.xml +++ b/data/xml/2022.fever.xml @@ -34,6 +34,7 @@ IIRC QASC eQASC + 10.18653/v1/2022.fever-1.1 Heterogeneous-Graph Reasoning and Fine-Grained Aggregation for Fact Checking @@ -44,6 +45,7 @@ 2022.fever-1.2 lin-fu-2022-heterogeneous FEVER + 10.18653/v1/2022.fever-1.2 Distilling Salient Reviews with Zero Labels @@ -57,6 +59,7 @@ Many people read online reviews to learn about real-world entities of their interest. However, majority of reviews only describes general experiences and opinions of the customers, and may not reveal facts that are specific to the entity being reviewed. In this work, we focus on a novel task of mining from a review corpus sentences that are unique for each entity. We refer to this task as Salient Fact Extraction. Salient facts are extremely scarce due to their very nature. Consequently, collecting labeled examples for training supervised models is tedious and cost-prohibitive. To alleviate this scarcity problem, we develop an unsupervised method, ZL-Distiller, which leverages contextual language representations of the reviews and their distributional patterns to identify salient sentences about entities. Our experiments on multiple domains (hotels, products, and restaurants) show that ZL-Distiller achieves state-of-the-art performance and further boosts the performance of other supervised/unsupervised algorithms for the task. Furthermore, we show that salient sentences mined by ZL-Distiller provide unique and detailed information about entities, which benefit downstream NLP applications including question answering and summarization. 2022.fever-1.3 huang-etal-2022-distilling + 10.18653/v1/2022.fever-1.3 Automatic Fake News Detection: Are current models “fact-checking” or“gut-checking”? @@ -71,6 +74,7 @@ kelk-etal-2022-automatic PolitiFact Snopes + 10.18653/v1/2022.fever-1.4 A Semantics-Aware Approach to Automated Claim Verification @@ -82,6 +86,7 @@ 2022.fever-1.5 calvo-figueras-etal-2022-semantics FEVER + 10.18653/v1/2022.fever-1.5 <fixed-case>PHEMEP</fixed-case>lus: Enriching Social Media Rumour Verification with External Evidence @@ -95,6 +100,7 @@ 2022.fever-1.6 dougrez-lewis-etal-2022-phemeplus FEVER + 10.18653/v1/2022.fever-1.6 <fixed-case>XI</fixed-case>nfo<fixed-case>T</fixed-case>ab<fixed-case>S</fixed-case>: Evaluating Multilingual Tabular Natural Language Inference @@ -108,6 +114,7 @@ 2022.fever-1.7 minhas-etal-2022-xinfotabs TabFact + 10.18653/v1/2022.fever-1.7 Neural Machine Translation for Fact-checking Temporal Claims @@ -119,6 +126,7 @@ Computational fact-checking aims at supporting the verification process of textual claims by exploiting trustworthy sources. However, there are large classes of complex claims that cannot be automatically verified, for instance those related to temporal reasoning. To this aim, in this work, we focus on the verification of economic claims against time series sources.Starting from given textual claims in natural language, we propose a neural machine translation approach to produce respective queries expressed in a recently proposed temporal fragment of the Datalog language. The adopted deep neural approach shows promising preliminary results for the translation of 10 categories of claims extracted from real use cases. 2022.fever-1.8 mori-etal-2022-neural + 10.18653/v1/2022.fever-1.8 diff --git a/data/xml/2022.findings.xml b/data/xml/2022.findings.xml index 11f3729ec8..d88f428e60 100644 --- a/data/xml/2022.findings.xml +++ b/data/xml/2022.findings.xml @@ -30,6 +30,7 @@ Whole word masking (WWM), which masks all subwords corresponding to a word at once, makes a better English BERT model. For the Chinese language, however, there is no subword because each token is an atomic character. The meaning of a word in Chinese is different in that a word is a compositional unit consisting of multiple characters. Such difference motivates us to investigate whether WWM leads to better context understanding ability for Chinese BERT. To achieve this, we introduce two probing tasks related to grammatical error correction and ask pretrained models to revise or insert tokens in a masked language modeling manner. We construct a dataset including labels for 19,075 tokens in 10,448 sentences. We train three Chinese BERT models with standard character-level masking (CLM), WWM, and a combination of CLM and WWM, respectively. Our major findings are as follows: First, when one character needs to be inserted or replaced, the model trained with CLM performs the best. Second, when more than one character needs to be handled, WWM is the key to better performance. Finally, when being fine-tuned on sentence-level downstream tasks, models trained with different masking strategies perform comparably. 2022.findings-acl.1 dai-etal-2022-whole + 10.18653/v1/2022.findings-acl.1 Compilable Neural Code Generation with Compiler Feedback @@ -48,6 +49,7 @@ 2022.findings-acl.2 wang-etal-2022-compilable CodeSearchNet + 10.18653/v1/2022.findings-acl.2 Towards Unifying the Label Space for Aspect- and Sentence-based Sentiment Analysis @@ -59,6 +61,7 @@ The aspect-based sentiment analysis (ABSA) is a fine-grained task that aims to determine the sentiment polarity towards targeted aspect terms occurring in the sentence. The development of the ABSA task is very much hindered by the lack of annotated data. To tackle this, the prior works have studied the possibility of utilizing the sentiment analysis (SA) datasets to assist in training the ABSA model, primarily via pretraining or multi-task learning. In this article, we follow this line, and for the first time, we manage to apply the Pseudo-Label (PL) method to merge the two homogeneous tasks. While it seems straightforward to use generated pseudo labels to handle this case of label granularity unification for two highly related tasks, we identify its major challenge in this paper and propose a novel framework, dubbed as Dual-granularity Pseudo Labeling (DPL). Further, similar to PL, we regard the DPL as a general framework capable of combining other prior methods in the literature. Through extensive experiments, DPL has achieved state-of-the-art performance on standard benchmarks surpassing the prior work significantly. 2022.findings-acl.3 zhang-etal-2022-towards + 10.18653/v1/2022.findings-acl.3 Input-specific Attention Subnetworks for Adversarial Detection @@ -78,6 +81,7 @@ QNLI SNLI SST + 10.18653/v1/2022.findings-acl.4 <fixed-case>R</fixed-case>elation<fixed-case>P</fixed-case>rompt: Leveraging Prompts to Generate Synthetic Data for Zero-Shot Relation Triplet Extraction @@ -92,6 +96,7 @@ declare-lab/relationprompt FewRel Wiki-ZSL + 10.18653/v1/2022.findings-acl.5 Pre-Trained Multilingual Sequence-to-Sequence Models: A Hope for Low-Resource Language Translation? @@ -108,6 +113,7 @@ 2022.findings-acl.6.software.zip lee-etal-2022-pre PMIndia + 10.18653/v1/2022.findings-acl.6 Multi-Scale Distribution Deep Variational Autoencoder for Explanation Generation @@ -120,6 +126,7 @@ Generating explanations for recommender systems is essential for improving their transparency, as users often wish to understand the reason for receiving a specified recommendation. Previous methods mainly focus on improving the generation quality, but often produce generic explanations that fail to incorporate user and item specific details. To resolve this problem, we present Multi-Scale Distribution Deep Variational Autoencoders (MVAE).These are deep hierarchical VAEs with a prior network that eliminates noise while retaining meaningful signals in the input, coupled with a recognition network serving as the source of information to guide the learning of the prior network. Further, the Multi-scale distribution Learning Framework (MLF) along with a Target Tracking Kullback-Leibler divergence (TKL) mechanism are proposed to employ multi KL divergences at different scales for more effective learning. Extensive empirical experiments demonstrate that our methods can generate explanations with concrete input-specific contents. 2022.findings-acl.7 cai-etal-2022-multi + 10.18653/v1/2022.findings-acl.7 Dual Context-Guided Continuous Prompt Tuning for Few-Shot Learning @@ -133,6 +140,7 @@ Prompt-based paradigm has shown its competitive performance in many NLP tasks. However, its success heavily depends on prompt design, and the effectiveness varies upon the model and training data. In this paper, we propose a novel dual context-guided continuous prompt (DCCP) tuning method. To explore the rich contextual information in language structure and close the gap between discrete prompt tuning and continuous prompt tuning, DCCP introduces two auxiliary training objectives and constructs input in a pair-wise fashion.Experimental results demonstrate that our method is applicable to many NLP tasks, and can often outperform existing prompt tuning methods by a large margin in the few-shot setting. 2022.findings-acl.8 zhou-etal-2022-dual + 10.18653/v1/2022.findings-acl.8 Extract-Select: A Span Selection Framework for Nested Named Entity Recognition with Generative Adversarial Training @@ -146,6 +154,7 @@ Nested named entity recognition (NER) is a task in which named entities may overlap with each other. Span-based approaches regard nested NER as a two-stage span enumeration and classification task, thus having the innate ability to handle this task. However, they face the problems of error propagation, ignorance of span boundary, difficulty in long entity recognition and requirement on large-scale annotated data. In this paper, we propose Extract-Select, a span selection framework for nested NER, to tackle these problems. Firstly, we introduce a span selection framework in which nested entities with different input categories would be separately extracted by the extractor, thus naturally avoiding error propagation in two-stage span-based approaches. In the inference phase, the trained extractor selects final results specific to the given entity category. Secondly, we propose a hybrid selection strategy in the extractor, which not only makes full use of span boundary but also improves the ability of long entity recognition. Thirdly, we design a discriminator to evaluate the extraction result, and train both extractor and discriminator with generative adversarial training (GAT). The use of GAT greatly alleviates the stress on the dataset size. Experimental results on four benchmark datasets demonstrate that Extract-Select outperforms competitive nested NER models, obtaining state-of-the-art results. The proposed model also performs well when less labeled data are given, proving the effectiveness of GAT. 2022.findings-acl.9 huang-etal-2022-extract + 10.18653/v1/2022.findings-acl.9 Controlled Text Generation Using Dictionary Prior in Variational Autoencoders @@ -161,6 +170,7 @@ fang-etal-2022-controlled Penn Treebank SNLI + 10.18653/v1/2022.findings-acl.10 Challenges to Open-Domain Constituency Parsing @@ -175,6 +185,7 @@ yang-etal-2022-challenges ringos/multi-domain-parsing-analysis Penn Treebank + 10.18653/v1/2022.findings-acl.11 Going “Deeper”: Structured Sememe Prediction via Transformer with Tree Attention @@ -188,6 +199,7 @@ 2022.findings-acl.12.software.zip ye-etal-2022-going thunlp/stg + 10.18653/v1/2022.findings-acl.12 Table-based Fact Verification with Self-adaptive Mixture of Experts @@ -201,6 +213,7 @@ zhou-etal-2022-table thumlp/samoe TabFact + 10.18653/v1/2022.findings-acl.13 Investigating Data Variance in Evaluations of Automatic Machine Translation Metrics @@ -215,6 +228,7 @@ Current practices in metric evaluation focus on one single dataset, e.g., Newstest dataset in each year’s WMT Metrics Shared Task. However, in this paper, we qualitatively and quantitatively show that the performances of metrics are sensitive to data. The ranking of metrics varies when the evaluation is conducted on different datasets. Then this paper further investigates two potential hypotheses, i.e., insignificant data points and the deviation of i.i.d assumption, which may take responsibility for the issue of data variance. In conclusion, our findings suggest that when evaluating automatic translation metrics, researchers should take data variance into account and be cautious to report the results on unreliable datasets, because it may leads to inconsistent results with most of the other datasets. 2022.findings-acl.14 xiang-etal-2022-investigating + 10.18653/v1/2022.findings-acl.14 Sememe Prediction for <fixed-case>B</fixed-case>abel<fixed-case>N</fixed-case>et Synsets using Multilingual and Multimodal Information @@ -231,6 +245,7 @@ qi-etal-2022-sememe thunlp/msgi ImageNet + 10.18653/v1/2022.findings-acl.15 Query and Extract: Refining Event Extraction as Type-oriented Binary Decoding @@ -244,6 +259,7 @@ 2022.findings-acl.16 wang-etal-2022-query MAVEN + 10.18653/v1/2022.findings-acl.16 <fixed-case>LEVEN</fixed-case>: A Large-Scale <fixed-case>C</fixed-case>hinese Legal Event Detection Dataset @@ -264,6 +280,7 @@ yao-etal-2022-leven thunlp/leven MAVEN + 10.18653/v1/2022.findings-acl.17 Analyzing Dynamic Adversarial Training Data in the Limit @@ -278,6 +295,7 @@ facebookresearch/dadc-limit FEVER SNLI + 10.18653/v1/2022.findings-acl.18 <fixed-case>A</fixed-case>bduction<fixed-case>R</fixed-case>ules: Training Transformers to Explain Unexpected Inputs @@ -291,6 +309,7 @@ young-etal-2022-abductionrules strong-ai-lab/abductionrules ProofWriter + 10.18653/v1/2022.findings-acl.19 On the Importance of Data Size in Probing Fine-tuned Models @@ -306,6 +325,7 @@ GLUE MRPC SST + 10.18653/v1/2022.findings-acl.20 <fixed-case>R</fixed-case>u<fixed-case>CC</fixed-case>o<fixed-case>N</fixed-case>: Clinical Concept Normalization in <fixed-case>R</fixed-case>ussian @@ -324,6 +344,7 @@ 2022.findings-acl.21 nesterov-etal-2022-ruccon XL-BEL + 10.18653/v1/2022.findings-acl.21 A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive Learning Framework for Sentence Embeddings @@ -337,6 +358,7 @@ 2022.findings-acl.22 tan-etal-2022-sentence namco0816/pt-bert + 10.18653/v1/2022.findings-acl.22 Eider: Empowering Document-level Relation Extraction with Efficient Evidence Extraction and Inference-stage Fusion @@ -351,6 +373,7 @@ xie-etal-2022-eider veronicium/eider DocRED + 10.18653/v1/2022.findings-acl.23 Meta-X<tex-math>_{NLG}</tex-math>: A Meta-Learning Approach Based on Language Clustering for Zero-Shot Cross-Lingual Transfer and Generation @@ -366,6 +389,7 @@ TyDi QA WikiLingua XQuAD + 10.18653/v1/2022.findings-acl.24 <fixed-case>MR</fixed-case>-<fixed-case>P</fixed-case>: A Parallel Decoding Algorithm for Iterative Refinement Non-Autoregressive Translation @@ -375,6 +399,7 @@ Non-autoregressive translation (NAT) predicts all the target tokens in parallel and significantly speeds up the inference process. The Conditional Masked Language Model (CMLM) is a strong baseline of NAT. It decodes with the Mask-Predict algorithm which iteratively refines the output. Most works about CMLM focus on the model structure and the training objective. However, the decoding algorithm is equally important. We propose a simple, effective, and easy-to-implement decoding algorithm that we call MaskRepeat-Predict (MR-P). The MR-P algorithm gives higher priority to consecutive repeated tokens when selecting tokens to mask for the next iteration and stops the iteration after target tokens converge. We conduct extensive experiments on six translation directions with varying data sizes. The results show that MR-P significantly improves the performance with the same model parameters. Specifically, we achieve a BLEU increase of 1.39 points in the WMT’14 En-De translation task. 2022.findings-acl.25 cheng-zhang-2022-mr + 10.18653/v1/2022.findings-acl.25 Open Relation Modeling: Learning to Define Relations between Entities @@ -388,6 +413,7 @@ 2022.findings-acl.26.software.zip huang-etal-2022-open jeffhj/open-relation-modeling + 10.18653/v1/2022.findings-acl.26 A Slot Is Not Built in One Utterance: Spoken Language Dialogs with Sub-Slots @@ -409,6 +435,7 @@ SSD_NAME SSD_PHONE SSD_PLATE + 10.18653/v1/2022.findings-acl.27 Towards Transparent Interactive Semantic Parsing via Step-by-Step Correction @@ -424,6 +451,7 @@ BREAK GEM SPLASH + 10.18653/v1/2022.findings-acl.28 <fixed-case>MINER</fixed-case>: Multi-Interest Matching Network for News Recommendation @@ -440,6 +468,7 @@ 2022.findings-acl.29 li-etal-2022-miner MIND + 10.18653/v1/2022.findings-acl.29 <fixed-case>KSAM</fixed-case>: Infusing Multi-Source Knowledge into Dialogue Generation via Knowledge Source Aware Multi-Head Decoding @@ -451,6 +480,7 @@ Knowledge-enhanced methods have bridged the gap between human beings and machines in generating dialogue responses. However, most previous works solely seek knowledge from a single source, and thus they often fail to obtain available knowledge because of the insufficient coverage of a single knowledge source. To this end, infusing knowledge from multiple sources becomes a trend. This paper proposes a novel approach Knowledge Source Aware Multi-Head Decoding, KSAM, to infuse multi-source knowledge into dialogue generation more efficiently. Rather than following the traditional single decoder paradigm, KSAM uses multiple independent source-aware decoder heads to alleviate three challenging problems in infusing multi-source knowledge, namely, the diversity among different knowledge sources, the indefinite knowledge alignment issue, and the insufficient flexibility/scalability in knowledge usage. Experiments on a Chinese multi-source knowledge-aligned dataset demonstrate the superior performance of KSAM against various competitive approaches. 2022.findings-acl.30 wu-etal-2022-ksam + 10.18653/v1/2022.findings-acl.30 Towards Responsible Natural Language Annotation for the Varieties of <fixed-case>A</fixed-case>rabic @@ -460,6 +490,7 @@ When building NLP models, there is a tendency to aim for broader coverage, often overlooking cultural and (socio)linguistic nuance. In this position paper, we make the case for care and attention to such nuances, particularly in dataset annotation, as well as the inclusion of cultural and linguistic expertise in the process. We present a playbook for responsible dataset creation for polyglossic, multidialectal languages. This work is informed by a study on Arabic annotation of social media content. 2022.findings-acl.31 bergman-diab-2022-towards + 10.18653/v1/2022.findings-acl.31 Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection @@ -472,6 +503,7 @@ 2022.findings-acl.32 bose-etal-2022-dynamically tbose20/d-ref + 10.18653/v1/2022.findings-acl.32 Towards Large-Scale Interpretable Knowledge Graph Reasoning for Dialogue Systems @@ -486,6 +518,7 @@ 2022.findings-acl.33 tuan-etal-2022-towards OpenDialKG + 10.18653/v1/2022.findings-acl.33 <fixed-case>MDER</fixed-case>ank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction @@ -502,6 +535,7 @@ 2022.findings-acl.34 zhang-etal-2022-mderank linhanz/mderank + 10.18653/v1/2022.findings-acl.34 Visualizing the Relationship Between Encoded Linguistic Information and Task Performance @@ -515,6 +549,7 @@ Probing is popular to analyze whether linguistic information can be captured by a well-trained deep neural model, but it is hard to answer how the change of the encoded linguistic information will affect task performance. To this end, we study the dynamic relationship between the encoded linguistic information and task performance from the viewpoint of Pareto Optimality. Its key idea is to obtain a set of models which are Pareto-optimal in terms of both objectives. From this viewpoint, we propose a method to optimize the Pareto-optimal models by formalizing it as a multi-objective optimization problem. We conduct experiments on two popular NLP tasks, i.e., machine translation and language modeling, and investigate the relationship between several kinds of linguistic information and task performances. Experimental results demonstrate that the proposed method is better than a baseline method. Our empirical findings suggest that some syntactic information is helpful for NLP tasks whereas encoding more syntactic information does not necessarily lead to better performance, because the model architecture is also an important factor. 2022.findings-acl.35 xiang-etal-2022-visualizing + 10.18653/v1/2022.findings-acl.35 Efficient Argument Structure Extraction with Transfer Learning and Active Learning @@ -525,6 +560,7 @@ 2022.findings-acl.36 hua-wang-2022-efficient CDCP + 10.18653/v1/2022.findings-acl.36 Plug-and-Play Adaptation for Continuously-updated <fixed-case>QA</fixed-case> @@ -541,6 +577,7 @@ lee-etal-2022-plug Natural Questions SituatedQA + 10.18653/v1/2022.findings-acl.37 Reinforced Cross-modal Alignment for Radiology Report Generation @@ -552,6 +589,7 @@ 2022.findings-acl.38.software.zip qin-song-2022-reinforced CheXpert + 10.18653/v1/2022.findings-acl.38 What Works and Doesn’t Work, A Deep Decoder for Neural Machine Translation @@ -565,6 +603,7 @@ Deep learning has demonstrated performance advantages in a wide range of natural language processing tasks, including neural machine translation (NMT). Transformer NMT models are typically strengthened by deeper encoder layers, but deepening their decoder layers usually results in failure. In this paper, we first identify the cause of the failure of the deep decoder in the Transformer model. Inspired by this discovery, we then propose approaches to improving it, with respect to model structure and model training, to make the deep decoder practical in NMT. Specifically, with respect to model structure, we propose a cross-attention drop mechanism to allow the decoder layers to perform their own different roles, to reduce the difficulty of deep-decoder learning. For model training, we propose a collapse reducing training approach to improve the stability and effectiveness of deep-decoder training. We experimentally evaluated our proposed Transformer NMT model structure modification and novel training methods on several popular machine translation benchmarks. The results showed that deepening the NMT model by increasing the number of decoder layers successfully prevented the deepened decoder from degrading to an unconditional language model. In contrast to prior work on deepening an NMT model on the encoder, our method can deepen the model on both the encoder and decoder at the same time, resulting in a deeper model and improved performance. 2022.findings-acl.39 li-etal-2022-works + 10.18653/v1/2022.findings-acl.39 <fixed-case>S</fixed-case>y<fixed-case>MC</fixed-case>o<fixed-case>M</fixed-case> - Syntactic Measure of Code Mixing A Study Of <fixed-case>E</fixed-case>nglish-<fixed-case>H</fixed-case>indi Code-Mixing @@ -578,6 +617,7 @@ 2022.findings-acl.40 kodali-etal-2022-symcom LinCE + 10.18653/v1/2022.findings-acl.40 <fixed-case>H</fixed-case>ybri<fixed-case>D</fixed-case>ialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data @@ -598,6 +638,7 @@ RecipeQA SQA ShARC + 10.18653/v1/2022.findings-acl.41 <fixed-case>NEWTS</fixed-case>: A Corpus for News Topic-Focused Summarization @@ -608,6 +649,7 @@ Text summarization models are approaching human levels of fidelity. Existing benchmarking corpora provide concordant pairs of full and abridged versions of Web, news or professional content. To date, all summarization datasets operate under a one-size-fits-all paradigm that may not reflect the full range of organic summarization needs. Several recently proposed models (e.g., plug and play language models) have the capacity to condition the generated summaries on a desired range of themes. These capacities remain largely unused and unevaluated as there is no dedicated dataset that would support the task of topic-focused summarization.This paper introduces the first topical summarization corpus NEWTS, based on the well-known CNN/Dailymail dataset, and annotated via online crowd-sourcing. Each source article is paired with two reference summaries, each focusing on a different theme of the source document. We evaluate a representative range of existing techniques and analyze the effectiveness of different prompting methods. 2022.findings-acl.42 bahrainian-etal-2022-newts + 10.18653/v1/2022.findings-acl.42 Classification without (Proper) Representation: Political Heterogeneity in Social Media and Its Implications for Classification and Behavioral Analysis @@ -618,6 +660,7 @@ Reddit is home to a broad spectrum of political activity, and users signal their political affiliations in multiple ways—from self-declarations to community participation. Frequently, computational studies have treated political users as a single bloc, both in developing models to infer political leaning and in studying political behavior. Here, we test this assumption of political users and show that commonly-used political-inference models do not generalize, indicating heterogeneous types of political users. The models remain imprecise at best for most users, regardless of which sources of data or methods are used. Across a 14-year longitudinal analysis, we demonstrate that the choice in definition of a political user has significant implications for behavioral analysis. Controlling for multiple factors, political users are more toxic on the platform and inter-party interactions are even more toxic—but not all political users behave this way. Last, we identify a subset of political users who repeatedly flip affiliations, showing that these users are the most controversial of all, acting as provocateurs by more frequently bringing up politics, and are more likely to be banned, suspended, or deleted. 2022.findings-acl.43 alkiek-etal-2022-classification + 10.18653/v1/2022.findings-acl.43 Toward More Meaningful Resources for Lower-resourced Languages @@ -631,6 +674,7 @@ lignos-etal-2022-toward MasakhaNER WikiAnn + 10.18653/v1/2022.findings-acl.44 Better Quality Estimation for Low Resource Corpus Mining @@ -643,6 +687,7 @@ 2022.findings-acl.45.software.zip kocyigit-etal-2022-better MLQE + 10.18653/v1/2022.findings-acl.45 End-to-End Segmentation-based News Summarization @@ -654,6 +699,7 @@ 2022.findings-acl.46 liu-etal-2022-end CNN/Daily Mail + 10.18653/v1/2022.findings-acl.46 Fast Nearest Neighbor Machine Translation @@ -669,6 +715,7 @@ 2022.findings-acl.47 meng-etal-2022-fast ShannonAI/fast-knn-nmt + 10.18653/v1/2022.findings-acl.47 Extracting Latent Steering Vectors from Pretrained Language Models @@ -681,6 +728,7 @@ subramani-etal-2022-extracting nishantsubramani/steering_vectors StylePTB + 10.18653/v1/2022.findings-acl.48 Domain Generalisation of <fixed-case>NMT</fixed-case>: Fusing Adapters with Leave-One-Domain-Out Training @@ -692,6 +740,7 @@ Generalising to unseen domains is under-explored and remains a challenge in neural machine translation. Inspired by recent research in parameter-efficient transfer learning from pretrained models, this paper proposes a fusion-based generalisation method that learns to combine domain-specific parameters. We propose a leave-one-domain-out training strategy to avoid information leaking to address the challenge of not knowing the test domain during training time. Empirical results on three language pairs show that our proposed fusion method outperforms other baselines up to +0.8 BLEU score on average. 2022.findings-acl.49 vu-etal-2022-domain + 10.18653/v1/2022.findings-acl.49 Reframing Instructional Prompts to <fixed-case>GPT</fixed-case>k’s Language @@ -709,6 +758,7 @@ MC-TACO QASC WinoGrande + 10.18653/v1/2022.findings-acl.50 Read Top News First: A Document Reordering Approach for Multi-Document News Summarization @@ -725,6 +775,7 @@ zhaochaocs/mds-dr CNN/Daily Mail Multi-News + 10.18653/v1/2022.findings-acl.51 Human Language Modeling @@ -737,6 +788,7 @@ 2022.findings-acl.52 soni-etal-2022-human humanlab/hart + 10.18653/v1/2022.findings-acl.52 Inverse is Better! Fast and Accurate Prompt for Few-shot Slot Tagging @@ -750,6 +802,7 @@ 2022.findings-acl.53 hou-etal-2022-inverse atmahou/promptslottagging + 10.18653/v1/2022.findings-acl.53 Cross-Modal Cloze Task: A New Task to Brain-to-Word Decoding @@ -762,6 +815,7 @@ 2022.findings-acl.54 zou-etal-2022-cross littletreezou/cross-modal-cloze-task + 10.18653/v1/2022.findings-acl.54 Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal @@ -780,6 +834,7 @@ 2022.findings-acl.55 gupta-etal-2022-mitigating WebText + 10.18653/v1/2022.findings-acl.55 Domain Representative Keywords Selection: A Probabilistic Approach @@ -795,6 +850,7 @@ akash-etal-2022-domain pritomsaha/keyword-selection AMiner + 10.18653/v1/2022.findings-acl.56 Hierarchical Inductive Transfer for Continual Dialogue Learning @@ -806,6 +862,7 @@ Pre-trained models have achieved excellent performance on the dialogue task. However, for the continual increase of online chit-chat scenarios, directly fine-tuning these models for each of the new tasks not only explodes the capacity of the dialogue system on the embedded devices but also causes knowledge forgetting on pre-trained models and knowledge interference among diverse dialogue tasks. In this work, we propose a hierarchical inductive transfer framework to learn and deploy the dialogue skills continually and efficiently. First, we introduce the adapter module into pre-trained models for learning new dialogue tasks. As the only trainable module, it is beneficial for the dialogue system on the embedded devices to acquire new dialogue skills with negligible additional parameters. Then, for alleviating knowledge interference between tasks yet benefiting the regularization between them, we further design hierarchical inductive transfer that enables new tasks to use general knowledge in the base adapter without being misled by diverse knowledge in task-specific adapters. Empirical evaluation and analysis indicate that our framework obtains comparable performance under deployment-friendly model capacity. 2022.findings-acl.57 feng-etal-2022-hierarchical + 10.18653/v1/2022.findings-acl.57 Why Exposure Bias Matters: An Imitation Learning Perspective of Error Accumulation in Language Generation @@ -820,6 +877,7 @@ kushalarora/quantifying_exposure_bias WikiText-103 WikiText-2 + 10.18653/v1/2022.findings-acl.58 Question Answering Infused Pre-training of General-Purpose Contextualized Representations @@ -840,6 +898,7 @@ SQuAD SST SearchQA + 10.18653/v1/2022.findings-acl.59 Automatic Song Translation for Tonal Languages @@ -854,6 +913,7 @@ This paper develops automatic song translation (AST) for tonal languages and addresses the unique challenge of aligning words’ tones with melody of a song in addition to conveying the original meaning. We propose three criteria for effective AST—preserving meaning, singability and intelligibility—and design metrics for these criteria. We develop a new benchmark for English–Mandarin song translation and develop an unsupervised AST system, Guided AliGnment for Automatic Song Translation (GagaST), which combines pre-training with three decoding constraints. Both automatic and human evaluations show GagaST successfully balances semantics and singability. 2022.findings-acl.60 guo-etal-2022-automatic + 10.18653/v1/2022.findings-acl.60 Read before Generate! Faithful Long Form Question Answering with Machine Reading @@ -873,6 +933,7 @@ KILT MS MARCO Natural Questions + 10.18653/v1/2022.findings-acl.61 A Simple yet Effective Relation Information Guided Approach for Few-Shot Relation Extraction @@ -887,6 +948,7 @@ liu-etal-2022-simple lylylylylyly/simplefsre FewRel + 10.18653/v1/2022.findings-acl.62 <fixed-case>MIMIC</fixed-case>ause: <fixed-case>R</fixed-case>epresentation and automatic extraction of causal relation types from clinical notes @@ -902,6 +964,7 @@ khetan-etal-2022-mimicause MIMIC-III ROCStories + 10.18653/v1/2022.findings-acl.63 Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective Distillation @@ -915,6 +978,7 @@ zhao-etal-2022-compressing xuandongzhao/hpd SNLI + 10.18653/v1/2022.findings-acl.64 Debiasing Event Understanding for Visual Commonsense Tasks @@ -929,6 +993,7 @@ 2022.findings-acl.65.software.zip seo-etal-2022-debiasing VCR + 10.18653/v1/2022.findings-acl.65 Fact-Tree Reasoning for N-ary Question Answering over Knowledge Graphs @@ -941,6 +1006,7 @@ Current Question Answering over Knowledge Graphs (KGQA) task mainly focuses on performing answer reasoning upon KGs with binary facts. However, it neglects the n-ary facts, which contain more than two entities. In this work, we highlight a more challenging but under-explored task: n-ary KGQA, i.e., answering n-ary facts questions upon n-ary KGs. Nevertheless, the multi-hop reasoning framework popular in binary KGQA task is not directly applicable on n-ary KGQA. We propose two feasible improvements: 1) upgrade the basic reasoning unit from entity or relation to fact, and 2) upgrade the reasoning structure from chain to tree. Therefore, we propose a novel fact-tree reasoning framework, FacTree, which integrates the above two upgrades. FacTree transforms the question into a fact tree and performs iterative fact reasoning on the fact tree to infer the correct answer. Experimental results on the n-ary KGQA dataset we constructed and two binary KGQA benchmarks demonstrate the effectiveness of FacTree compared with state-of-the-art methods. 2022.findings-acl.66 zhang-etal-2022-fact + 10.18653/v1/2022.findings-acl.66 <fixed-case>D</fixed-case>eep<fixed-case>S</fixed-case>truct: Pretraining of Language Models for Structure Prediction @@ -965,6 +1031,7 @@ OPIEC T-REx TekGen + 10.18653/v1/2022.findings-acl.67 The Change that Matters in Discourse Parsing: Estimating the Impact of Domain Shift on Parser Error @@ -977,6 +1044,7 @@ 2022.findings-acl.68 atwell-etal-2022-change anthonysicilia/change-that-matters-acl2022 + 10.18653/v1/2022.findings-acl.68 Mukayese: <fixed-case>T</fixed-case>urkish <fixed-case>NLP</fixed-case> Strikes Back @@ -990,6 +1058,7 @@ safaya-etal-2022-mukayese alisafaya/mukayese GLUE + 10.18653/v1/2022.findings-acl.69 Virtual Augmentation Supported Contrastive Learning of Sentence Representations @@ -1003,6 +1072,7 @@ 2022.findings-acl.70 zhang-etal-2022-virtual amazon-research/sentence-representations + 10.18653/v1/2022.findings-acl.70 <fixed-case>M</fixed-case>o<fixed-case>E</fixed-case>fication: Transformer Feed-forward Layers are Mixtures of Experts @@ -1020,6 +1090,7 @@ GLUE RACE SST + 10.18653/v1/2022.findings-acl.71 <fixed-case>DS</fixed-case>-<fixed-case>TOD</fixed-case>: Efficient Domain Specialization for Task-Oriented Dialog @@ -1034,6 +1105,7 @@ hung-etal-2022-ds umanlp/ds-tod CCNet + 10.18653/v1/2022.findings-acl.72 Distinguishing Non-natural from Natural Adversarial Samples for More Robust Pre-trained Language Model @@ -1048,6 +1120,7 @@ lilynlp/distinguishing-non-natural IMDb Movie Reviews SST + 10.18653/v1/2022.findings-acl.73 Learning Adaptive Axis Attentions in Fine-tuning: Beyond Fixed Sparse Attention Patterns @@ -1067,6 +1140,7 @@ GLUE LRA QNLI + 10.18653/v1/2022.findings-acl.74 Using Interactive Feedback to Improve the Accuracy and Explainability of Question Answering Systems Post-Deployment @@ -1079,6 +1153,7 @@ Most research on question answering focuses on the pre-deployment stage; i.e., building an accurate model for deployment.In this paper, we ask the question: Can we improve QA systems further post-deployment based on user interactions? We focus on two kinds of improvements: 1) improving the QA system’s performance itself, and 2) providing the model with the ability to explain the correctness or incorrectness of an answer.We collect a retrieval-based QA dataset, FeedbackQA, which contains interactive feedback from users. We collect this dataset by deploying a base QA system to crowdworkers who then engage with the system and provide feedback on the quality of its answers.The feedback contains both structured ratings and unstructured natural language explanations.We train a neural model with this feedback data that can generate explanations and re-score answer candidates. We show that feedback data not only improves the accuracy of the deployed QA system but also other stronger non-deployed systems. The generated explanations also help users make informed decisions about the correctness of answers. 2022.findings-acl.75 li-etal-2022-using + 10.18653/v1/2022.findings-acl.75 To be or not to be an Integer? Encoding Variables for Mathematical Text @@ -1092,6 +1167,7 @@ 2022.findings-acl.76 2022.findings-acl.76.software.zip ferreira-etal-2022-integer + 10.18653/v1/2022.findings-acl.76 <fixed-case>GRS</fixed-case>: Combining Generation and Revision in Unsupervised Sentence Simplification @@ -1106,6 +1182,7 @@ ASSET CoLA Newsela + 10.18653/v1/2022.findings-acl.77 <fixed-case>BPE</fixed-case> vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic Languages @@ -1118,6 +1195,7 @@ Morphologically-rich polysynthetic languages present a challenge for NLP systems due to data sparsity, and a common strategy to handle this issue is to apply subword segmentation. We investigate a wide variety of supervised and unsupervised morphological segmentation methods for four polysynthetic languages: Nahuatl, Raramuri, Shipibo-Konibo, and Wixarika. Then, we compare the morphologically inspired segmentation methods against Byte-Pair Encodings (BPEs) as inputs for machine translation (MT) when translating to and from Spanish. We show that for all language pairs except for Nahuatl, an unsupervised morphological segmentation algorithm outperforms BPEs consistently and that, although supervised methods achieve better segmentation scores, they under-perform in MT challenges. Finally, we contribute two new morphological segmentation datasets for Raramuri and Shipibo-Konibo, and a parallel corpus for Raramuri–Spanish. 2022.findings-acl.78 mager-etal-2022-bpe + 10.18653/v1/2022.findings-acl.78 Distributed <fixed-case>NLI</fixed-case>: Learning to Predict Human Opinion Distributions for Language Reasoning @@ -1132,6 +1210,7 @@ easonnie/ChaosNLI ChaosNLI SNLI + 10.18653/v1/2022.findings-acl.79 Morphological Processing of Low-Resource Languages: Where We Are and What’s Next @@ -1146,6 +1225,7 @@ Automatic morphological processing can aid downstream natural language processing applications, especially for low-resource languages, and assist language documentation efforts for endangered languages. Having long been multilingual, the field of computational morphology is increasingly moving towards approaches suitable for languages with minimal or no annotated resources. First, we survey recent developments in computational morphology with a focus on low-resource languages. Second, we argue that the field is ready to tackle the logical next challenge: understanding a language’s morphology from raw text alone. We perform an empirical study on a truly unsupervised version of the paradigm completion task and show that, while existing state-of-the-art models bridged by two newly proposed models we devise perform reasonably, there is still much room for improvement. The stakes are high: solving this task will increase the language coverage of morphological resources by a number of magnitudes. 2022.findings-acl.80 wiemerslage-etal-2022-morphological + 10.18653/v1/2022.findings-acl.80 Learning and Evaluating Character Representations in Novels @@ -1158,6 +1238,7 @@ 2022.findings-acl.81 inoue-etal-2022-learning naoya-i/charembench + 10.18653/v1/2022.findings-acl.81 Answer Uncertainty and Unanswerability in Multiple-Choice Machine Reading Comprehension @@ -1169,6 +1250,7 @@ raina-gales-2022-answer RACE ReClor + 10.18653/v1/2022.findings-acl.82 Measuring the Language of Self-Disclosure across Corpora @@ -1181,6 +1263,7 @@ Being able to reliably estimate self-disclosure – a key component of friendship and intimacy – from language is important for many psychology studies. We build single-task models on five self-disclosure corpora, but find that these models generalize poorly; the within-domain accuracy of predicted message-level self-disclosure of the best-performing model (mean Pearson’s r=0.69) is much higher than the respective across data set accuracy (mean Pearson’s r=0.32), due to both variations in the corpora (e.g., medical vs. general topics) and labeling instructions (target variables: self-disclosure, emotional disclosure, intimacy). However, some lexical features, such as expression of negative emotions and use of first person personal pronouns such as ‘I’ reliably predict self-disclosure across corpora. We develop a multi-task model that yields better results, with an average Pearson’s r of 0.37 for out-of-corpora prediction. 2022.findings-acl.83 reuel-etal-2022-measuring + 10.18653/v1/2022.findings-acl.83 When Chosen Wisely, More Data Is What You Need: A Universal Sample-Efficient Strategy For Data Augmentation @@ -1199,6 +1282,7 @@ QNLI SICK SQuAD + 10.18653/v1/2022.findings-acl.84 Explaining Classes through Stable Word Attributions @@ -1213,6 +1297,7 @@ 2022.findings-acl.85.software.tgz ronnqvist-etal-2022-explaining turkunlp/class-explainer + 10.18653/v1/2022.findings-acl.85 What to Learn, and How: <fixed-case>T</fixed-case>oward Effective Learning from Rationales @@ -1227,6 +1312,7 @@ FEVER MultiRC e-SNLI + 10.18653/v1/2022.findings-acl.86 Listening to Affected Communities to Define Extreme Speech: Dataset and Experiments @@ -1241,6 +1327,7 @@ 2022.findings-acl.87 maronikolakis-etal-2022-listening antmarakis/xtremespeech + 10.18653/v1/2022.findings-acl.87 Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists @@ -1255,6 +1342,7 @@ attanasio-etal-2022-entropy g8a9/ear MLMA Hate Speech + 10.18653/v1/2022.findings-acl.88 From <fixed-case>BERT</fixed-case>‘s <fixed-case>P</fixed-case>oint of <fixed-case>V</fixed-case>iew: <fixed-case>R</fixed-case>evealing the <fixed-case>P</fixed-case>revailing <fixed-case>C</fixed-case>ontextual <fixed-case>D</fixed-case>ifferences @@ -1265,6 +1353,7 @@ 2022.findings-acl.89 2022.findings-acl.89.software.zip schuster-hegelich-2022-berts + 10.18653/v1/2022.findings-acl.89 Learning Bias-reduced Word Embeddings Using Dictionary Definitions @@ -1276,6 +1365,7 @@ 2022.findings-acl.90 an-etal-2022-learning haozhe-an/dd-glove + 10.18653/v1/2022.findings-acl.90 Knowledge Graph Embedding by Adaptive Limit Scoring Loss Using Dynamic Weighting Strategy @@ -1291,6 +1381,7 @@ 2022.findings-acl.91 yang-etal-2022-knowledge FB15k-237 + 10.18653/v1/2022.findings-acl.91 <fixed-case>OCR</fixed-case> Improves Machine Translation for Low-Resource Languages @@ -1302,6 +1393,7 @@ We aim to investigate the performance of current OCR systems on low resource languages and low resource scripts.We introduce and make publicly available a novel benchmark, OCR4MT, consisting of real and synthetic data, enriched with noise, for 60 low-resource languages in low resource scripts. We evaluate state-of-the-art OCR systems on our benchmark and analyse most common errors. We show that OCR monolingual data is a valuable resource that can increase performance of Machine Translation models, when used in backtranslation. We then perform an ablation study to investigate how OCR errors impact Machine Translation performance and determine what is the minimum level of OCR quality needed for the monolingual data to be useful for Machine Translation. 2022.findings-acl.92 ignat-etal-2022-ocr + 10.18653/v1/2022.findings-acl.92 <fixed-case>C</fixed-case>o<fixed-case>C</fixed-case>o<fixed-case>LM</fixed-case>: Complex Commonsense Enhanced Language Model with Discourse Relations @@ -1319,6 +1411,7 @@ LAMA ROCStories SuperGLUE + 10.18653/v1/2022.findings-acl.93 Learning to Robustly Aggregate Labeling Functions for Semi-supervised Data Programming @@ -1334,6 +1427,7 @@ 2022.findings-acl.94.software.zip maheshwari-etal-2022-learning SST + 10.18653/v1/2022.findings-acl.94 Multi-Granularity Semantic Aware Graph Model for Reducing Position Bias in Emotion Cause Pair Extraction @@ -1346,6 +1440,7 @@ The emotion cause pair extraction (ECPE) task aims to extract emotions and causes as pairs from documents. We observe that the relative distance distribution of emotions and causes is extremely imbalanced in the typical ECPE dataset. Existing methods have set a fixed size window to capture relations between neighboring clauses. However, they neglect the effective semantic connections between distant clauses, leading to poor generalization ability towards position-insensitive data. To alleviate the problem, we propose a novel \textbf{M}ulti-\textbf{G}ranularity \textbf{S}emantic \textbf{A}ware \textbf{G}raph model (MGSAG) to incorporate fine-grained and coarse-grained semantic features jointly, without regard to distance limitation. In particular, we first explore semantic dependencies between clauses and keywords extracted from the document that convey fine-grained semantic features, obtaining keywords enhanced clause representations. Besides, a clause graph is also established to model coarse-grained semantic relations between clauses. Experimental results indicate that MGSAG surpasses the existing state-of-the-art ECPE models. Especially, MGSAG outperforms other models significantly in the condition of position-insensitive data. 2022.findings-acl.95 bao-etal-2022-multi + 10.18653/v1/2022.findings-acl.95 Cross-lingual Inference with A <fixed-case>C</fixed-case>hinese Entailment Graph @@ -1362,6 +1457,7 @@ teddy-li/chineseentgraph CLUE FIGER + 10.18653/v1/2022.findings-acl.96 Multi-task Learning for Paraphrase Generation With Keyword and Part-of-Speech Reconstruction @@ -1374,6 +1470,7 @@ 2022.findings-acl.97.software.zip xie-etal-2022-multi COCO + 10.18653/v1/2022.findings-acl.97 <fixed-case>MDCS</fixed-case>pell: A Multi-task Detector-Corrector Framework for <fixed-case>C</fixed-case>hinese Spelling Correction @@ -1385,6 +1482,7 @@ Chinese Spelling Correction (CSC) is a task to detect and correct misspelled characters in Chinese texts. CSC is challenging since many Chinese characters are visually or phonologically similar but with quite different semantic meanings. Many recent works use BERT-based language models to directly correct each character of the input sentence. However, these methods can be sub-optimal since they correct every character of the sentence only by the context which is easily negatively affected by the misspelled characters. Some other works propose to use an error detector to guide the correction by masking the detected errors. Nevertheless, these methods dampen the visual or phonological features from the misspelled characters which could be critical for correction. In this work, we propose a novel general detector-corrector multi-task framework where the corrector uses BERT to capture the visual and phonological features from each character in the raw sentence and uses a late fusion strategy to fuse the hidden states of the corrector with that of the detector to minimize the negative impact from the misspelled characters. Comprehensive experiments on benchmarks demonstrate that our proposed method can significantly outperform the state-of-the-art methods in the CSC task. 2022.findings-acl.98 zhu-etal-2022-mdcspell + 10.18653/v1/2022.findings-acl.98 <fixed-case>S</fixed-case><tex-math>^2</tex-math><fixed-case>SQL</fixed-case>: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-<fixed-case>SQL</fixed-case> Parsers @@ -1402,6 +1500,7 @@ 2022.findings-acl.99.software.zip hui-etal-2022-s2sql SPIDER + 10.18653/v1/2022.findings-acl.99 Constructing Open Cloze Tests Using Generation and Discrimination Capabilities of Transformers @@ -1412,6 +1511,7 @@ This paper presents the first multi-objective transformer model for generating open cloze tests that exploits generation and discrimination capabilities to improve performance. Our model is further enhanced by tweaking its loss function and applying a post-processing re-ranking algorithm that improves overall test structure. Experiments using automatic and human evaluation show that our approach can achieve up to 82% accuracy according to experts, outperforming previous work and baselines. We also release a collection of high-quality open cloze tests along with sample system output and human annotations that can serve as a future benchmark. 2022.findings-acl.100 felice-etal-2022-constructing + 10.18653/v1/2022.findings-acl.100 <fixed-case>C</fixed-case>o-training an <fixed-case>U</fixed-case>nsupervised <fixed-case>C</fixed-case>onstituency <fixed-case>P</fixed-case>arser with <fixed-case>W</fixed-case>eak <fixed-case>S</fixed-case>upervision @@ -1424,6 +1524,7 @@ Nickil21/weakly-supervised-parsing Chinese Treebank Penn Treebank + 10.18653/v1/2022.findings-acl.101 <fixed-case>H</fixed-case>i<fixed-case>S</fixed-case>truct+: Improving Extractive Text Summarization with Hierarchical Structure Information @@ -1438,6 +1539,7 @@ QianRuan/histruct Pubmed arXiv + 10.18653/v1/2022.findings-acl.102 An Isotropy Analysis in the Multilingual <fixed-case>BERT</fixed-case> Embedding Space @@ -1448,6 +1550,7 @@ 2022.findings-acl.103 rajaee-pilehvar-2022-isotropy sara-rajaee/multilingual-isotropy + 10.18653/v1/2022.findings-acl.103 Multi-Stage Prompting for Knowledgeable Dialogue Generation @@ -1464,6 +1567,7 @@ liu-etal-2022-multi NVIDIA/Megatron-LM Wizard of Wikipedia + 10.18653/v1/2022.findings-acl.104 <tex-math>\textrm{DuReader}_{\textrm{vis}}</tex-math>: A <fixed-case>C</fixed-case>hinese Dataset for Open-domain Document Visual Question Answering @@ -1486,6 +1590,7 @@ InfographicVQA Natural Questions VisualMRC + 10.18653/v1/2022.findings-acl.105 Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive Bias to Sequence-to-sequence Models @@ -1500,6 +1605,7 @@ mueller-etal-2022-coloring sebschu/multilingual-transformations mC4 + 10.18653/v1/2022.findings-acl.106 <fixed-case>C</fixed-case><tex-math>^3</tex-math><fixed-case>KG</fixed-case>: A <fixed-case>C</fixed-case>hinese Commonsense Conversation Knowledge Graph @@ -1517,6 +1623,7 @@ ATOMIC ConceptNet MOD + 10.18653/v1/2022.findings-acl.107 Graph Neural Networks for Multiparallel Word Alignment @@ -1529,6 +1636,7 @@ After a period of decrease, interest in word alignments is increasing again for their usefulness in domains such as typological research, cross-lingual annotation projection and machine translation. Generally, alignment algorithms only use bitext and do not make use of the fact that many parallel corpora are multiparallel. Here, we compute high-quality word alignments between multiple language pairs by considering all language pairs together. First, we create a multiparallel word alignment graph, joining all bilingual word alignment pairs in one graph. Next, we use graph neural networks (GNNs) to exploit the graph structure. Our GNN approach (i) utilizes information about the meaning, position and language of the input words, (ii) incorporates information from multiple parallel sentences, (iii) adds and removes edges from the initial alignments, and (iv) yields a prediction model that can generalize beyond the training sentences. We show that community detection algorithms can provide valuable information for multiparallel word alignment. Our method outperforms previous work on three word alignment datasets and on a downstream task. 2022.findings-acl.108 imani-etal-2022-graph + 10.18653/v1/2022.findings-acl.108 Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with <fixed-case>ASR</fixed-case> Errors @@ -1545,6 +1653,7 @@ wu-etal-2022-sentiment albertwy/SWRM Multimodal Opinionlevel Sentiment Intensity + 10.18653/v1/2022.findings-acl.109 A Novel Framework Based on Medical Concept Driven Attention for Explainable Medical Code Prediction via External Knowledge @@ -1557,6 +1666,7 @@ Medical code prediction from clinical notes aims at automatically associating medical codes with the clinical notes. Rare code problem, the medical codes with low occurrences, is prominent in medical code prediction. Recent studies employ deep neural networks and the external knowledge to tackle it. However, such approaches lack interpretability which is a vital issue in medical application. Moreover, due to the lengthy and noisy clinical notes, such approaches fail to achieve satisfactory results. Therefore, in this paper, we propose a novel framework based on medical concept driven attention to incorporate external knowledge for explainable medical code prediction. In specific, both the clinical notes and Wikipedia documents are aligned into topic space to extract medical concepts using topic modeling. Then, the medical concept-driven attention mechanism is applied to uncover the medical code related concepts which provide explanations for medical code prediction. Experimental results on the benchmark dataset show the superiority of the proposed framework over several state-of-the-art baselines. 2022.findings-acl.110 wang-etal-2022-novel + 10.18653/v1/2022.findings-acl.110 Effective Unsupervised Constrained Text Generation based on Perturbed Masking @@ -1568,6 +1678,7 @@ Unsupervised constrained text generation aims to generate text under a given set of constraints without any supervised data. Current state-of-the-art methods stochastically sample edit positions and actions, which may cause unnecessary search steps. In this paper, we propose PMCTG to improve effectiveness by searching for the best edit position and action in each step. Specifically, PMCTG extends perturbed masking technique to effectively search for the most incongruent token to edit. Then it introduces four multi-aspect scoring functions to select edit action to further reduce search difficulty. Since PMCTG does not require supervised data, it could be applied to different generation tasks. We show that under the unsupervised setting, PMCTG achieves new state-of-the-art results in two representative tasks, namely keywords- to-sentence generation and paraphrasing. 2022.findings-acl.111 fu-etal-2022-effective + 10.18653/v1/2022.findings-acl.111 Combining (Second-Order) Graph-Based and Headed-Span-Based Projective Dependency Parsing @@ -1579,6 +1690,7 @@ yang-tu-2022-combining sustcsonglin/span-based-dependency-parsing Penn Treebank + 10.18653/v1/2022.findings-acl.112 End-to-End Speech Translation for Code Switched Speech @@ -1595,6 +1707,7 @@ weller-etal-2022-end apple/ml-code-switched-speech-translation CoVoST + 10.18653/v1/2022.findings-acl.113 A Transformational Biencoder with In-Domain Negative Sampling for Zero-Shot Entity Linking @@ -1609,6 +1722,7 @@ 2022.findings-acl.114.software.zip sun-etal-2022-transformational ZESHEL + 10.18653/v1/2022.findings-acl.114 Finding the Dominant Winning Ticket in Pre-Trained Language Models @@ -1626,6 +1740,7 @@ gong-etal-2022-finding GLUE QNLI + 10.18653/v1/2022.findings-acl.115 <fixed-case>T</fixed-case>hai Nested Named Entity Recognition Corpus @@ -1642,6 +1757,7 @@ CoNLL-2003 DaN+ NNE + 10.18653/v1/2022.findings-acl.116 Two-Step Question Retrieval for Open-Domain <fixed-case>QA</fixed-case> @@ -1660,6 +1776,7 @@ Natural Questions PAQ TriviaQA + 10.18653/v1/2022.findings-acl.117 Semantically Distributed Robust Optimization for Vision-and-Language Inference @@ -1674,6 +1791,7 @@ gokhale-etal-2022-semantically asu-apg/vli_sdro Violin + 10.18653/v1/2022.findings-acl.118 Learning from Missing Relations: Contrastive Learning with Commonsense Knowledge Graphs for Commonsense Inference @@ -1691,6 +1809,7 @@ yongho94/solar-framework_commonsense-inference ConceptNet Event2Mind + 10.18653/v1/2022.findings-acl.119 Capture Human Disagreement Distributions by Calibrated Networks for Natural Language Inference @@ -1708,6 +1827,7 @@ wang-etal-2022-capture ChaosNLI MultiNLI + 10.18653/v1/2022.findings-acl.120 Efficient, Uncertainty-based Moderation of Neural Networks Text Classifiers @@ -1720,6 +1840,7 @@ andersen-maalej-2022-efficient jsandersen/cmt IMDb Movie Reviews + 10.18653/v1/2022.findings-acl.121 Revisiting Automatic Evaluation of Extractive Summarization Task: Can We Do Better than <fixed-case>ROUGE</fixed-case>? @@ -1730,6 +1851,7 @@ It has been the norm for a long time to evaluate automated summarization tasks using the popular ROUGE metric. Although several studies in the past have highlighted the limitations of ROUGE, researchers have struggled to reach a consensus on a better alternative until today. One major limitation of the traditional ROUGE metric is the lack of semantic understanding (relies on direct overlap of n-grams). In this paper, we exclusively focus on the extractive summarization task and propose a semantic-aware nCG (normalized cumulative gain)-based evaluation metric (called Sem-nCG) for evaluating this task. One fundamental contribution of the paper is that it demonstrates how we can generate more reliable semantic-aware ground truths for evaluating extractive summarization tasks without any additional human intervention. To the best of our knowledge, this work is the first of its kind. We have conducted extensive experiments with this new metric using the widely used CNN/DailyMail dataset. Experimental results show that the new Sem-nCG metric is indeed semantic-aware, shows higher correlation with human judgement (more reliable) and yields a large number of disagreements with the original ROUGE metric (suggesting that ROUGE often leads to inaccurate conclusions also verified by humans). 2022.findings-acl.122 akter-etal-2022-revisiting + 10.18653/v1/2022.findings-acl.122 Open Vocabulary Extreme Classification Using Generative Models @@ -1744,6 +1866,7 @@ The extreme multi-label classification (XMC) task aims at tagging content with a subset of labels from an extremely large label set. The label vocabulary is typically defined in advance by domain experts and assumed to capture all necessary tags. However in real world scenarios this label set, although large, is often incomplete and experts frequently need to refine it. To develop systems that simplify this process, we introduce the task of open vocabulary XMC (OXMC): given a piece of content, predict a set of labels, some of which may be outside of the known tag set. Hence, in addition to not having training data for some labels–as is the case in zero-shot classification–models need to invent some labels on-thefly. We propose GROOV, a fine-tuned seq2seq model for OXMC that generates the set of labels as a flat sequence and is trained using a novel loss independent of predicted label order. We show the efficacy of the approach, experimenting with popular XMC datasets for which GROOV is able to predict meaningful labels outside the given vocabulary while performing on par with state-of-the-art solutions for known labels. 2022.findings-acl.123 simig-etal-2022-open + 10.18653/v1/2022.findings-acl.123 Decomposed Meta-Learning for Few-Shot Named Entity Recognition @@ -1761,6 +1884,7 @@ CoNLL 2002 Few-NERD WNUT 2017 + 10.18653/v1/2022.findings-acl.124 <fixed-case>T</fixed-case>eg<fixed-case>T</fixed-case>ok: Augmenting Text Generation via Task-specific and Open-world Knowledge @@ -1777,6 +1901,7 @@ 2022.findings-acl.125 tan-etal-2022-tegtok lxchtan/tegtok + 10.18653/v1/2022.findings-acl.125 <fixed-case>E</fixed-case>mo<fixed-case>C</fixed-case>aps: Emotion Capsule based Model for Conversational Emotion Recognition @@ -1791,6 +1916,7 @@ li-etal-2022-emocaps IEMOCAP MELD + 10.18653/v1/2022.findings-acl.126 Logic-Driven Context Extension and Data Augmentation for Logical Reasoning of Text @@ -1809,6 +1935,7 @@ wang-etal-2022-logic WangsyGit/LReasoner ReClor + 10.18653/v1/2022.findings-acl.127 Transfer Learning and Prediction Consistency for Detecting Offensive Spans of Text @@ -1823,6 +1950,7 @@ 2022.findings-acl.128 2022.findings-acl.128.software.zip pouran-ben-veyseh-etal-2022-transfer + 10.18653/v1/2022.findings-acl.128 Learning Reasoning Patterns for Relational Triple Extraction with Mutual Generation of Text and Graph @@ -1833,6 +1961,7 @@ Relational triple extraction is a critical task for constructing knowledge graphs. Existing methods focused on learning text patterns from explicit relational mentions. However, they usually suffered from ignoring relational reasoning patterns, thus failed to extract the implicitly implied triples. Fortunately, the graph structure of a sentence’s relational triples can help find multi-hop reasoning paths. Moreover, the type inference logic through the paths can be captured with the sentence’s supplementary relational expressions that represent the real-world conceptual meanings of the paths’ composite relations. In this paper, we propose a unified framework to learn the relational reasoning patterns for this task. To identify multi-hop reasoning paths, we construct a relational graph from the sentence (text-to-graph generation) and apply multi-layer graph convolutions to it. To capture the relation type inference logic of the paths, we propose to understand the unlabeled conceptual expressions by reconstructing the sentence from the relational graph (graph-to-text generation) in a self-supervised manner. Experimental results on several benchmark datasets demonstrate the effectiveness of our method. 2022.findings-acl.129 chen-etal-2022-learning + 10.18653/v1/2022.findings-acl.129 Document-Level Event Argument Extraction via Optimal Transport @@ -1846,6 +1975,7 @@ 2022.findings-acl.130 2022.findings-acl.130.software.zip pouran-ben-veyseh-etal-2022-document + 10.18653/v1/2022.findings-acl.130 N-Shot Learning for Augmenting Task-Oriented Dialogue State Tracking @@ -1858,6 +1988,7 @@ 2022.findings-acl.131 aksu-etal-2022-n MultiWOZ + 10.18653/v1/2022.findings-acl.131 Document-Level Relation Extraction with Adaptive Focal Loss and Knowledge Distillation @@ -1872,6 +2003,7 @@ tan-etal-2022-document tonytan48/kd-docre DocRED + 10.18653/v1/2022.findings-acl.132 Calibration of Machine Reading Systems at Scale @@ -1884,6 +2016,7 @@ 2022.findings-acl.133 dhuliawala-etal-2022-calibration Natural Questions + 10.18653/v1/2022.findings-acl.133 Towards Adversarially Robust Text Classifiers by Learning to Reweight Clean Examples @@ -1901,6 +2034,7 @@ xu-etal-2022-towards AG News SST + 10.18653/v1/2022.findings-acl.134 Morphosyntactic Tagging with Pre-trained Language Models for <fixed-case>A</fixed-case>rabic and its Dialects @@ -1912,6 +2046,7 @@ 2022.findings-acl.135 inoue-etal-2022-morphosyntactic camel-lab/camelbert_morphosyntactic_tagger + 10.18653/v1/2022.findings-acl.135 How Pre-trained Language Models Capture Factual Knowledge? A Causal-Inspired Analysis @@ -1929,6 +2064,7 @@ 2022.findings-acl.136 li-etal-2022-pre LAMA + 10.18653/v1/2022.findings-acl.136 Metadata Shaping: A Simple Approach for Knowledge-Enhanced Language Models @@ -1944,6 +2080,7 @@ FewRel Open Entity TACRED + 10.18653/v1/2022.findings-acl.137 Enhancing Natural Language Representation with Large-Scale Out-of-Domain Commonsense @@ -1959,6 +2096,7 @@ QNLI WSC WinoGrande + 10.18653/v1/2022.findings-acl.138 Weighted self Distillation for <fixed-case>C</fixed-case>hinese word segmentation @@ -1972,6 +2110,7 @@ 2022.findings-acl.139.software.zip he-etal-2022-weighted anzi20/weidc + 10.18653/v1/2022.findings-acl.139 Sibylvariant Transformations for Robust Text Classification @@ -1987,6 +2126,7 @@ AG News IMDb Movie Reviews SST + 10.18653/v1/2022.findings-acl.140 <fixed-case>D</fixed-case>a<fixed-case>LC</fixed-case>: Domain Adaptation Learning Curve Prediction for Neural Machine Translation @@ -1999,6 +2139,7 @@ Domain Adaptation (DA) of Neural Machine Translation (NMT) model often relies on a pre-trained general NMT model which is adapted to the new domain on a sample of in-domain parallel data. Without parallel data, there is no way to estimate the potential benefit of DA, nor the amount of parallel samples it would require. It is however a desirable functionality that could help MT practitioners to make an informed decision before investing resources in dataset creation. We propose a Domain adaptation Learning Curve prediction (DaLC) model that predicts prospective DA performance based on in-domain monolingual samples in the source language. Our model relies on the NMT encoder representations combined with various instance and corpus-level features. We demonstrate that instance-level is better able to distinguish between different domains compared to corpus-level frameworks proposed in previous studies Finally, we perform in-depth analyses of the results highlighting the limitations of our approach, and provide directions for future research. 2022.findings-acl.141 park-etal-2022-dalc + 10.18653/v1/2022.findings-acl.141 Hey <fixed-case>AI</fixed-case>, Can You Solve Complex Tasks by Talking to Agents? @@ -2013,6 +2154,7 @@ allenai/commaqa DROP MathQA + 10.18653/v1/2022.findings-acl.142 Modality-specific Learning Rates for Effective Multimodal Additive Late-fusion @@ -2023,6 +2165,7 @@ 2022.findings-acl.143 yao-mihalcea-2022-modality MELD + 10.18653/v1/2022.findings-acl.143 <fixed-case>B</fixed-case>i<fixed-case>S</fixed-case>yn-<fixed-case>GAT</fixed-case>+: Bi-Syntax Aware Graph Attention Network for Aspect-based Sentiment Analysis @@ -2037,6 +2180,7 @@ liang-etal-2022-bisyn CCIIPLab/BiSyn_GAT_plus MAMS + 10.18653/v1/2022.findings-acl.144 <fixed-case>I</fixed-case>ndic<fixed-case>BART</fixed-case>: A Pre-trained Model for Indic Natural Language Generation @@ -2054,6 +2198,7 @@ FLoRes IndicCorp Samanantar + 10.18653/v1/2022.findings-acl.145 Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models @@ -2073,6 +2218,7 @@ ReQA SentEval SuperGLUE + 10.18653/v1/2022.findings-acl.146 Improving Relation Extraction through Syntax-induced Pre-training with Dependency Masking @@ -2088,6 +2234,7 @@ Updated code link in footnote. Penn Treebank SemEval-2010 Task 8 + 10.18653/v1/2022.findings-acl.147 Striking a Balance: Alleviating Inconsistency in Pre-trained Models for Symmetric Classification Tasks @@ -2103,6 +2250,7 @@ PAWS QNLI SST + 10.18653/v1/2022.findings-acl.148 Diversifying Content Generation for Commonsense Reasoning with Mixture of Knowledge Graph Experts @@ -2117,6 +2265,7 @@ 2022.findings-acl.149 yu-etal-2022-diversifying DM2-ND/MoKGE + 10.18653/v1/2022.findings-acl.149 Dict-<fixed-case>BERT</fixed-case>: Enhancing Language Model Pre-training with Dictionary @@ -2135,6 +2284,7 @@ GLUE QNLI WNLaMPro + 10.18653/v1/2022.findings-acl.150 A Feasibility Study of Answer-Unaware Question Generation for Education @@ -2151,6 +2301,7 @@ 2022.findings-acl.151 dugan-etal-2022-feasibility liamdugan/summary-qg + 10.18653/v1/2022.findings-acl.151 Relevant <fixed-case>C</fixed-case>ommon<fixed-case>S</fixed-case>ense Subgraphs for “What if...” Procedural Reasoning @@ -2162,6 +2313,7 @@ zheng-kordjamshidi-2022-relevant ConceptNet WIQA + 10.18653/v1/2022.findings-acl.152 Combining Feature and Instance Attribution to Detect Artifacts @@ -2176,6 +2328,7 @@ BoolQ IMDb Movie Reviews SuperGLUE + 10.18653/v1/2022.findings-acl.153 Leveraging Expert Guided Adversarial Augmentation For Improving Generalization in Named Entity Recognition @@ -2191,6 +2344,7 @@ reich-etal-2022-leveraging gt-salt/guided-adversarial-augmentation CoNLL-2003 + 10.18653/v1/2022.findings-acl.154 Label Semantics for Few Shot Named Entity Recognition @@ -2208,6 +2362,7 @@ CoNLL-2003 NCBI Disease WNUT 2017 + 10.18653/v1/2022.findings-acl.155 Detection, Disambiguation, Re-ranking: Autoregressive Entity Linking as a Multi-Task Problem @@ -2223,6 +2378,7 @@ mrini-etal-2022-detection AIDA CoNLL-YAGO COMETA + 10.18653/v1/2022.findings-acl.156 <fixed-case>VISITRON</fixed-case>: Visual Semantics-Aligned Interactively Trained Object-Navigator @@ -2240,6 +2396,7 @@ alexa/visitron Matterport3D RxR + 10.18653/v1/2022.findings-acl.157 Investigating Selective Prediction Approaches Across Several Tasks in <fixed-case>IID</fixed-case>, <fixed-case>OOD</fixed-case>, and Adversarial Settings @@ -2252,6 +2409,7 @@ 2022.findings-acl.158.software.zip varshney-etal-2022-investigating SNLI + 10.18653/v1/2022.findings-acl.158 Unsupervised Natural Language Inference Using <fixed-case>PHL</fixed-case> Triplet Generation @@ -2269,6 +2427,7 @@ ConceptNet MultiNLI SNLI + 10.18653/v1/2022.findings-acl.159 Data Augmentation and Learned Layer Aggregation for Improved Multilingual Language Understanding in Dialogue @@ -2281,6 +2440,7 @@ razumovskaia-etal-2022-data CC100 xSID + 10.18653/v1/2022.findings-acl.160 Ranking-Constrained Learning with Rationales for Text Classification @@ -2291,6 +2451,7 @@ We propose a novel approach that jointly utilizes the labels and elicited rationales for text classification to speed up the training of deep learning models with limited training data. We define and optimize a ranking-constrained loss function that combines cross-entropy loss with ranking losses as rationale constraints. We evaluate our proposed rationale-augmented learning approach on three human-annotated datasets, and show that our approach provides significant improvements over classification approaches that do not utilize rationales as well as other state-of-the-art rationale-augmented baselines. 2022.findings-acl.161 wang-etal-2022-ranking + 10.18653/v1/2022.findings-acl.161 <fixed-case>C</fixed-case>a<fixed-case>M</fixed-case>-<fixed-case>G</fixed-case>en: <fixed-case>C</fixed-case>ausally Aware Metric-Guided Text Generation @@ -2304,6 +2465,7 @@ Content is created for a well-defined purpose, often described by a metric or signal represented in the form of structured information. The relationship between the goal (metrics) of target content and the content itself is non-trivial. While large-scale language models show promising text generation capabilities, guiding the generated text with external metrics is challenging.These metrics and content tend to have inherent relationships and not all of them may be of consequence. We introduce CaM-Gen: Causally aware Generative Networks guided by user-defined target metrics incorporating the causal relationships between the metric and content features. We leverage causal inference techniques to identify causally significant aspects of a text that lead to the target metric and then explicitly guide generative models towards these by a feedback mechanism. We propose this mechanism for variational autoencoder and Transformer-based generative models. The proposed models beat baselines in terms of the target metric control while maintaining fluency and language quality of the generated text. To the best of our knowledge, this is one of the early attempts at controlled generation incorporating a metric guide using causal inference. 2022.findings-acl.162 goyal-etal-2022-cam + 10.18653/v1/2022.findings-acl.162 Training Dynamics for Text Summarization Models @@ -2315,6 +2477,7 @@ Pre-trained language models (e.g. BART) have shown impressive results when fine-tuned on large summarization datasets. However, little is understood about this fine-tuning process, including what knowledge is retained from pre-training time or how content selection and generation strategies are learnt across iterations. In this work, we analyze the training dynamics for generation models, focusing on summarization. Across different datasets (CNN/DM, XSum, MediaSum) and summary properties, such as abstractiveness and hallucination, we study what the model learns at different stages of its fine-tuning process. We find that a propensity to copy the input is learned early in the training process consistently across all datasets studied. On the other hand, factual errors, such as hallucination of unsupported facts, are learnt in the later stages, though this behavior is more varied across domains. Based on these observations, we explore complementary approaches for modifying training: first, disregarding high-loss tokens that are challenging to learn and second, disregarding low-loss tokens that are learnt very quickly in the latter stages of the training process. We show that these simple training modifications allow us to configure our model to achieve different goals, such as improving factuality or improving abstractiveness. 2022.findings-acl.163 goyal-etal-2022-training + 10.18653/v1/2022.findings-acl.163 Richer Countries and Richer Representations @@ -2326,6 +2489,7 @@ 2022.findings-acl.164 zhou-etal-2022-richer katezhou/country_distortions + 10.18653/v1/2022.findings-acl.164 <fixed-case>BBQ</fixed-case>: A hand-built bias benchmark for question answering @@ -2344,6 +2508,7 @@ nyu-mll/bbq BBQ RACE + 10.18653/v1/2022.findings-acl.165 Zero-shot Learning for Grapheme to Phoneme Conversion with Language Ensemble @@ -2357,6 +2522,7 @@ 2022.findings-acl.166 li-etal-2022-zero xinjli/transphone + 10.18653/v1/2022.findings-acl.166 Dim Wihl Gat Tun: <fixed-case>T</fixed-case>he Case for Linguistic Expertise in <fixed-case>NLP</fixed-case> for Under-Documented Languages @@ -2371,6 +2537,7 @@ Recent progress in NLP is driven by pretrained models leveraging massive datasets and has predominantly benefited the world’s political and economic superpowers. Technologically underserved languages are left behind because they lack such resources. Hundreds of underserved languages, nevertheless, have available data sources in the form of interlinear glossed text (IGT) from language documentation efforts. IGT remains underutilized in NLP work, perhaps because its annotations are only semi-structured and often language-specific. With this paper, we make the case that IGT data can be leveraged successfully provided that target language expertise is available. We specifically advocate for collaboration with documentary linguists. Our paper provides a roadmap for successful projects utilizing IGT data: (1) It is essential to define which NLP tasks can be accomplished with the given IGT data and how these will benefit the speech community. (2) Great care and target language expertise is required when converting the data into structured formats commonly employed in NLP. (3) Task-specific and user-specific evaluation can help to ascertain that the tools which are created benefit the target language speech community. We illustrate each step through a case study on developing a morphological reinflection system for the Tsimchianic language Gitksan. 2022.findings-acl.167 forbes-etal-2022-dim + 10.18653/v1/2022.findings-acl.167 Question Generation for Reading Comprehension Assessment by Modeling How and What to Ask @@ -2385,6 +2552,7 @@ ghanem-etal-2022-question CosmosQA SQuAD + 10.18653/v1/2022.findings-acl.168 <fixed-case>TAB</fixed-case>i: <fixed-case>T</fixed-case>ype-Aware Bi-Encoders for Open-Domain Entity Retrieval @@ -2400,6 +2568,7 @@ FIGER KILT Natural Questions + 10.18653/v1/2022.findings-acl.169 Hierarchical Recurrent Aggregative Generation for Few-Shot <fixed-case>NLG</fixed-case> @@ -2411,6 +2580,7 @@ 2022.findings-acl.170 zhou-etal-2022-hierarchical SGD + 10.18653/v1/2022.findings-acl.170 Training Text-to-Text Transformers with Privacy Guarantees @@ -2424,6 +2594,7 @@ C4 GLUE QNLI + 10.18653/v1/2022.findings-acl.171 Revisiting Uncertainty-based Query Strategies for Active Learning with Transformers @@ -2439,6 +2610,7 @@ MR SUBJ TREC-10 + 10.18653/v1/2022.findings-acl.172 The impact of lexical and grammatical processing on generating code from natural language @@ -2451,6 +2623,7 @@ codegenfact/BertranX CoNaLa Django + 10.18653/v1/2022.findings-acl.173 <fixed-case>S</fixed-case>eq2<fixed-case>P</fixed-case>ath: Generating Sentiment Tuples as Paths of a Tree @@ -2464,6 +2637,7 @@ 2022.findings-acl.174 2022.findings-acl.174.software.zip mao-etal-2022-seq2path + 10.18653/v1/2022.findings-acl.174 Mitigating the Inconsistency Between Word Saliency and Model Confidence with Pathological Contrastive Training @@ -2479,6 +2653,7 @@ zhan-etal-2022-mitigating AG News IMDb Movie Reviews + 10.18653/v1/2022.findings-acl.175 Your fairness may vary: Pretrained language model fairness in toxic text classification @@ -2492,6 +2667,7 @@ 2022.findings-acl.176 baldini-etal-2022-fairness HateXplain + 10.18653/v1/2022.findings-acl.176 <fixed-case>C</fixed-case>hart<fixed-case>QA</fixed-case>: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning @@ -2509,6 +2685,7 @@ FigureQA LEAF-QA PlotQA + 10.18653/v1/2022.findings-acl.177 A Novel Perspective to Look At Attention: Bi-level Attention-based Explainable Topic Modeling for News Classification @@ -2520,6 +2697,7 @@ 2022.findings-acl.178 liu-etal-2022-novel MIND + 10.18653/v1/2022.findings-acl.178 Learn and Review: Enhancing Continual Named Entity Recognition via Reviewing Synthetic Samples @@ -2535,6 +2713,7 @@ 2022.findings-acl.179 xia-etal-2022-learn CoNLL-2003 + 10.18653/v1/2022.findings-acl.179 Phoneme transcription of endangered languages: an evaluation of recent <fixed-case>ASR</fixed-case> architectures in the single speaker scenario @@ -2543,6 +2722,7 @@ Transcription is often reported as the bottleneck in endangered language documentation, requiring large efforts from scarce speakers and transcribers. In general, automatic speech recognition (ASR) can be accurate enough to accelerate transcription only if trained on large amounts of transcribed data. However, when a single speaker is involved, several studies have reported encouraging results for phonetic transcription even with small amounts of training. Here we expand this body of work on speaker-dependent transcription by comparing four ASR approaches, notably recent transformer and pretrained multilingual models, on a common dataset of 11 languages. To automate data preparation, training and evaluation steps, we also developed a phoneme recognition setup which handles morphologically complex languages and writing systems for which no pronunciation dictionary exists.We find that fine-tuning a multilingual pretrained model yields an average phoneme error rate (PER) of 15% for 6 languages with 99 minutes or less of transcribed data for training. For the 5 languages with between 100 and 192 minutes of training, we achieved a PER of 8.4% or less. These results on a number of varied languages suggest that ASR can now significantly reduce transcription efforts in the speaker-dependent situation common in endangered language work. 2022.findings-acl.180 boulianne-2022-phoneme + 10.18653/v1/2022.findings-acl.180 Does <fixed-case>BERT</fixed-case> really agree ? Fine-grained Analysis of Lexical Dependence on a Syntactic Task @@ -2553,6 +2733,7 @@ Although transformer-based Neural Language Models demonstrate impressive performance on a variety of tasks, their generalization abilities are not well understood. They have been shown to perform strongly on subject-verb number agreement in a wide array of settings, suggesting that they learned to track syntactic dependencies during their training even without explicit supervision. In this paper, we examine the extent to which BERT is able to perform lexically-independent subject-verb number agreement (NA) on targeted syntactic templates. To do so, we disrupt the lexical patterns found in naturally occurring stimuli for each targeted structure in a novel fine-grained analysis of BERT’s behavior. Our results on nonce sentences suggest that the model generalizes well for simple templates, but fails to perform lexically-independent syntactic generalization when as little as one attractor is present. 2022.findings-acl.181 lasri-etal-2022-bert + 10.18653/v1/2022.findings-acl.181 Combining Static and Contextualised Multilingual Embeddings @@ -2567,6 +2748,7 @@ kathyhaem/combining-static-contextual TyDi QA XQuAD + 10.18653/v1/2022.findings-acl.182 An Accurate Unsupervised Method for Joint Entity Alignment and Dangling Entity Detection @@ -2578,6 +2760,7 @@ 2022.findings-acl.183.software.zip luo-yu-2022-accurate luosx18/ued + 10.18653/v1/2022.findings-acl.183 Square One Bias in <fixed-case>NLP</fixed-case>: Towards a Multi-Dimensional Exploration of the Research Manifold @@ -2588,6 +2771,7 @@ The prototypical NLP experiment trains a standard architecture on labeled English data and optimizes for accuracy, without accounting for other dimensions such as fairness, interpretability, or computational efficiency. We show through a manual classification of recent NLP research papers that this is indeed the case and refer to it as the square one experimental setup. We observe that NLP research often goes beyond the square one setup, e.g, focusing not only on accuracy, but also on fairness or interpretability, but typically only along a single dimension. Most work targeting multilinguality, for example, considers only accuracy; most work on fairness or interpretability considers only English; and so on. Such one-dimensionality of most research means we are only exploring a fraction of the NLP research search space. We provide historical and recent examples of how the square one bias has led researchers to draw false conclusions or make unwise choices, point to promising yet unexplored directions on the research manifold, and make practical recommendations to enable more multi-dimensional research. We open-source the results of our annotations to enable further analysis. 2022.findings-acl.184 ruder-etal-2022-square + 10.18653/v1/2022.findings-acl.184 Systematicity, Compositionality and Transitivity of Deep <fixed-case>NLP</fixed-case> Models: a Metamorphic Testing Perspective @@ -2601,6 +2785,7 @@ 2022.findings-acl.185 2022.findings-acl.185.software.zip manino-etal-2022-systematicity + 10.18653/v1/2022.findings-acl.185 Improving Neural Political Statement Classification with Class Hierarchical Information @@ -2616,6 +2801,7 @@ 2022.findings-acl.186 2022.findings-acl.186.software.zip dayanik-etal-2022-improving + 10.18653/v1/2022.findings-acl.186 Enabling Multimodal Generation on <fixed-case>CLIP</fixed-case> via Vision-Language Knowledge Distillation @@ -2633,6 +2819,7 @@ GLUE OK-VQA nocaps + 10.18653/v1/2022.findings-acl.187 Co-<fixed-case>VQA</fixed-case> : Answering by Interactive Sub Question Sequence @@ -2649,6 +2836,7 @@ Visual Genome Visual Question Answering Visual Question Answering v2.0 + 10.18653/v1/2022.findings-acl.188 A Simple Hash-Based Early Exiting Approach For Language Understanding and Generation @@ -2671,6 +2859,7 @@ MRPC SNLI SST + 10.18653/v1/2022.findings-acl.189 Auxiliary tasks to boost Biaffine Semantic Dependency Parsing @@ -2681,6 +2870,7 @@ 2022.findings-acl.190.software.tgz candito-2022-auxiliary mcandito/aux-tasks-biaffine-graph-parser-findingsacl22 + 10.18653/v1/2022.findings-acl.190 Syntax-guided Contrastive Learning for Pre-trained Language Model @@ -2699,6 +2889,7 @@ GLUE Open Entity QNLI + 10.18653/v1/2022.findings-acl.191 Improved Multi-label Classification under Temporal Concept Drift: Rethinking Group-Robust Algorithms in a Label-Wise Setting @@ -2711,6 +2902,7 @@ chalkidis-sogaard-2022-improved coastalcph/lw-robust BioASQ + 10.18653/v1/2022.findings-acl.192 <fixed-case>ASCM</fixed-case>: An Answer Space Clustered Prompting Method without Answer Engineering @@ -2726,6 +2918,7 @@ 2022.findings-acl.193 wang-etal-2022-ascm miaomiao1215/ascm + 10.18653/v1/2022.findings-acl.193 Why don’t people use character-level machine translation? @@ -2737,6 +2930,7 @@ 2022.findings-acl.194 2022.findings-acl.194.software.tgz libovicky-etal-2022-dont + 10.18653/v1/2022.findings-acl.194 Seeking Patterns, Not just Memorizing Procedures: Contrastive Learning for Solving Math Word Problems @@ -2754,6 +2948,7 @@ zwx980624/mwp-cl Math23K MathQA + 10.18653/v1/2022.findings-acl.195 x<fixed-case>GQA</fixed-case>: Cross-Lingual Visual Question Answering @@ -2772,6 +2967,7 @@ GQA IGLUE MultiSubs + 10.18653/v1/2022.findings-acl.196 Automatic Speech Recognition and Query By Example for Creole Languages Documentation @@ -2784,6 +2980,7 @@ 2022.findings-acl.197 macaire-etal-2022-automatic macairececile/asr-qbe-creole + 10.18653/v1/2022.findings-acl.197 <fixed-case>MR</fixed-case>e<fixed-case>D</fixed-case>: A Meta-Review Dataset for Structure-Controllable Text Generation @@ -2799,6 +2996,7 @@ shen-etal-2022-mred shen-chenhui/mred CNN/Daily Mail + 10.18653/v1/2022.findings-acl.198 Single Model Ensemble for Subword Regularized Models in Low-Resource Machine Translation @@ -2809,6 +3007,7 @@ Subword regularizations use multiple subword segmentations during training to improve the robustness of neural machine translation models.In previous subword regularizations, we use multiple segmentations in the training process but use only one segmentation in the inference.In this study, we propose an inference strategy to address this discrepancy.The proposed strategy approximates the marginalized likelihood by using multiple segmentations including the most plausible segmentation and several sampled segmentations.Because the proposed strategy aggregates predictions from several segmentations, we can regard it as a single model ensemble that does not require any additional cost for training.Experimental results show that the proposed strategy improves the performance of models trained with subword regularization in low-resource machine translation tasks. 2022.findings-acl.199 takase-etal-2022-single + 10.18653/v1/2022.findings-acl.199 Detecting Various Types of Noise for Neural Machine Translation @@ -2820,6 +3019,7 @@ The filtering and/or selection of training data is one of the core aspects to be considered when building a strong machine translation system.In their influential work, Khayrallah and Koehn (2018) investigated the impact of different types of noise on the performance of machine translation systems.In the same year the WMT introduced a shared task on parallel corpus filtering, which went on to be repeated in the following years, and resulted in many different filtering approaches being proposed.In this work we aim to combine the recent achievements in data filtering with the original analysis of Khayrallah and Koehn (2018) and investigate whether state-of-the-art filtering systems are capable of removing all the suggested noise types.We observe that most of these types of noise can be detected with an accuracy of over 90% by modern filtering systems when operating in a well studied high resource setting.However, we also find that when confronted with more refined noise categories or when working with a less common language pair, the performance of the filtering systems is far from optimal, showing that there is still room for improvement in this area of research. 2022.findings-acl.200 herold-etal-2022-detecting + 10.18653/v1/2022.findings-acl.200 <fixed-case>DU</fixed-case>-<fixed-case>VLG</fixed-case>: Unifying Vision-and-Language Generation via Dual Sequence-to-Sequence Pre-training @@ -2833,6 +3033,7 @@ 2022.findings-acl.201 huang-etal-2022-du COCO + 10.18653/v1/2022.findings-acl.201 <fixed-case>H</fixed-case>i<fixed-case>CLRE</fixed-case>: A Hierarchical Contrastive Learning Framework for Distantly Supervised Relation Extraction @@ -2846,6 +3047,7 @@ 2022.findings-acl.202 li-etal-2022-hiclre matnlp/hiclre + 10.18653/v1/2022.findings-acl.202 Prompt-Driven Neural Machine Translation @@ -2858,6 +3060,7 @@ 2022.findings-acl.203 li-etal-2022-prompt yafuly/promptnmt + 10.18653/v1/2022.findings-acl.203 On Controlling Fallback Responses for Grounded Dialogue Generation @@ -2870,6 +3073,7 @@ 2022.findings-acl.204 2022.findings-acl.204.software.zip lu-etal-2022-controlling + 10.18653/v1/2022.findings-acl.204 <fixed-case>CRAFT</fixed-case>: A Benchmark for Causal Reasoning About Forces and in<fixed-case>T</fixed-case>eractions @@ -2891,6 +3095,7 @@ PHYRE TVQA TVQA+ + 10.18653/v1/2022.findings-acl.205 A Graph Enhanced <fixed-case>BERT</fixed-case> Model for Event Prediction @@ -2905,6 +3110,7 @@ 2022.findings-acl.206.software.zip du-etal-2022-graph ROCStories + 10.18653/v1/2022.findings-acl.206 Long Time No See! Open-Domain Conversation with Long-Term Persona Memory @@ -2921,6 +3127,7 @@ xu-etal-2022-long PaddlePaddle/Research DuLeMon + 10.18653/v1/2022.findings-acl.207 Lacking the Embedding of a Word? Look it up into a Traditional Dictionary @@ -2935,6 +3142,7 @@ 2022.findings-acl.208 2022.findings-acl.208.software.zip ruzzetti-etal-2022-lacking + 10.18653/v1/2022.findings-acl.208 <fixed-case>MTR</fixed-case>ec: Multi-Task Learning over <fixed-case>BERT</fixed-case> for News Recommendation @@ -2949,6 +3157,7 @@ 2022.findings-acl.209 bi-etal-2022-mtrec MIND + 10.18653/v1/2022.findings-acl.209 Cross-domain Named Entity Recognition via Graph Matching @@ -2961,6 +3170,7 @@ 2022.findings-acl.210.software.zip zheng-etal-2022-cross CrossNER + 10.18653/v1/2022.findings-acl.210 Assessing Multilingual Fairness in Pre-trained Multimodal Representations @@ -2973,6 +3183,7 @@ 2022.findings-acl.211.software.tgz wang-etal-2022-assessing FairFace + 10.18653/v1/2022.findings-acl.211 More Than Words: Collocation Retokenization for <fixed-case>L</fixed-case>atent <fixed-case>D</fixed-case>irichlet <fixed-case>A</fixed-case>llocation Models @@ -2983,6 +3194,7 @@ Traditionally, Latent Dirichlet Allocation (LDA) ingests words in a collection of documents to discover their latent topics using word-document co-occurrences. Previous studies show that representing bigrams collocations in the input can improve topic coherence in English. However, it is unclear how to achieve the best results for languages without marked word boundaries such as Chinese and Thai. Here, we explore the use of retokenization based on chi-squared measures, t-statistics, and raw frequency to merge frequent token ngrams into collocations when preparing input to the LDA model. Based on the goodness of fit and the coherence metric, we show that topics trained with merged tokens result in topic keys that are clearer, more coherent, and more effective at distinguishing topics than those of unmerged models. 2022.findings-acl.212 cheevaprawatdomrong-etal-2022-words + 10.18653/v1/2022.findings-acl.212 <i>Generalized but not Robust?</i> Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness @@ -3001,6 +3213,7 @@ Natural Questions SNLI SVHN + 10.18653/v1/2022.findings-acl.213 <fixed-case>ASSIST</fixed-case>: Towards Label Noise-Robust Dialogue State Tracking @@ -3015,6 +3228,7 @@ smartyfh/dst-assist MultiWOZ SGD + 10.18653/v1/2022.findings-acl.214 Graph Refinement for Coreference Resolution @@ -3024,6 +3238,7 @@ The state-of-the-art models for coreference resolution are based on independent mention pair-wise decisions. We propose a modelling approach that learns coreference at the document-level and takes global decisions. For this purpose, we model coreference links in a graph structure where the nodes are tokens in the text, and the edges represent the relationship between them. Our model predicts the graph in a non-autoregressive manner, then iteratively refines it based on previous predictions, allowing global dependencies between decisions. The experimental results show improvements over various baselines, reinforcing the hypothesis that document-level information improves conference resolution. 2022.findings-acl.215 miculicich-henderson-2022-graph + 10.18653/v1/2022.findings-acl.215 <fixed-case>ECO</fixed-case> v1: Towards Event-Centric Opinion Mining @@ -3040,6 +3255,7 @@ 2022.findings-acl.216 2022.findings-acl.216.software.zip xu-etal-2022-eco + 10.18653/v1/2022.findings-acl.216 Deep Reinforcement Learning for Entity Alignment @@ -3053,6 +3269,7 @@ 2022.findings-acl.217.software.zip guo-etal-2022-deep guolingbing/rlea + 10.18653/v1/2022.findings-acl.217 Breaking Down Multilingual Machine Translation @@ -3064,6 +3281,7 @@ While multilingual training is now an essential ingredient in machine translation (MT) systems, recent work has demonstrated that it has different effects in different multilingual settings, such as many-to-one, one-to-many, and many-to-many learning. These training settings expose the encoder and the decoder in a machine translation model with different data distributions. In this paper, we examine how different varieties of multilingual training contribute to learning these two components of the MT model. Specifically, we compare bilingual models with encoders and/or decoders initialized by multilingual training. We show that multilingual training is beneficial to encoders in general, while it only benefits decoders for low-resource languages (LRLs). We further find the important attention heads for each language pair and compare their correlations during inference. Our analysis sheds light on how multilingual translation models work and also enables us to propose methods to improve performance by training with highly related languages. Our many-to-one models for high-resource languages and one-to-many models for LRL outperform the best results reported by Aharoni et al. (2019). 2022.findings-acl.218 chiang-etal-2022-breaking + 10.18653/v1/2022.findings-acl.218 Mitigating Contradictions in Dialogue Based on Contrastive Learning @@ -3076,6 +3294,7 @@ 2022.findings-acl.219 2022.findings-acl.219.software.zip li-etal-2022-mitigating + 10.18653/v1/2022.findings-acl.219 <fixed-case>ELLE</fixed-case>: Efficient Lifelong Pre-training for Emerging Data @@ -3092,6 +3311,7 @@ 2022.findings-acl.220.software.zip qin-etal-2022-elle thunlp/elle + 10.18653/v1/2022.findings-acl.220 <fixed-case>E</fixed-case>n<fixed-case>CBP</fixed-case>: A New Benchmark Dataset for Finer-Grained Cultural Background Prediction in <fixed-case>E</fixed-case>nglish @@ -3109,6 +3329,7 @@ GoEmotions QNLI SST + 10.18653/v1/2022.findings-acl.221 Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models @@ -3126,6 +3347,7 @@ ucinlp/null-prompts GLUE QNLI + 10.18653/v1/2022.findings-acl.222 u<fixed-case>FACT</fixed-case>: Unfaithful Alien-Corpora Training for Semantically Consistent Data-to-Text Generation @@ -3137,6 +3359,7 @@ 2022.findings-acl.223 anders-etal-2022-ufact ViGGO + 10.18653/v1/2022.findings-acl.223 Good Night at 4 pm?! Time Expressions in Different Cultures @@ -3146,6 +3369,7 @@ 2022.findings-acl.224 shwartz-2022-good vered1986/time_expressions + 10.18653/v1/2022.findings-acl.224 Extracting Person Names from User Generated Text: Named-Entity Recognition for Combating Human Trafficking @@ -3158,6 +3382,7 @@ 2022.findings-acl.225 li-etal-2022-extracting WNUT 2017 + 10.18653/v1/2022.findings-acl.225 <fixed-case>O</fixed-case>ne<fixed-case>A</fixed-case>ligner: Zero-shot Cross-lingual Transfer with One Rich-Resource Language Pair for Low-Resource Sentence Retrieval @@ -3171,6 +3396,7 @@ niu-etal-2022-onealigner CC100 WikiMatrix + 10.18653/v1/2022.findings-acl.226 Suum Cuique: Studying Bias in Taboo Detection with a Community Perspective @@ -3184,6 +3410,7 @@ khalid-etal-2022-suum jonrusert/suumcuique OLID + 10.18653/v1/2022.findings-acl.227 Modeling Intensification for Sign Language Generation: A Computational Approach @@ -3199,6 +3426,7 @@ inan-etal-2022-modeling merterm/modeling-intensification-for-slg PHOENIX14T + 10.18653/v1/2022.findings-acl.228 Controllable Natural Language Generation with Contrastive Prefixes @@ -3212,6 +3440,7 @@ 2022.findings-acl.229 qian-etal-2022-controllable AG News + 10.18653/v1/2022.findings-acl.229 Revisiting the Effects of Leakage on Dependency Parsing @@ -3223,6 +3452,7 @@ 2022.findings-acl.230 krasner-etal-2022-revisiting miriamwanner/reu-nlp-project + 10.18653/v1/2022.findings-acl.230 Learning to Describe Solutions for Bug Reports Based on Developer Discussions @@ -3235,6 +3465,7 @@ 2022.findings-acl.231 panthaplackel-etal-2022-learning panthap2/describing-bug-report-solutions + 10.18653/v1/2022.findings-acl.231 Perturbations in the Wild: Leveraging Human-Written Text Perturbations for Realistic Adversarial Attack and Defense @@ -3248,6 +3479,7 @@ 2022.findings-acl.232 le-etal-2022-perturbations lethaiq/perturbations-in-the-wild + 10.18653/v1/2022.findings-acl.232 Improving <fixed-case>C</fixed-case>hinese Grammatical Error Detection via Data augmentation by Conditional Error Generation @@ -3261,6 +3493,7 @@ Chinese Grammatical Error Detection(CGED) aims at detecting grammatical errors in Chinese texts. One of the main challenges for CGED is the lack of annotated data. To alleviate this problem, previous studies proposed various methods to automatically generate more training samples, which can be roughly categorized into rule-based methods and model-based methods. The rule-based methods construct erroneous sentences by directly introducing noises into original sentences. However, the introduced noises are usually context-independent, which are quite different from those made by humans. The model-based methods utilize generative models to imitate human errors. The generative model may bring too many changes to the original sentences and generate semantically ambiguous sentences, so it is difficult to detect grammatical errors in these generated sentences. In addition, generated sentences may be error-free and thus become noisy data. To handle these problems, we propose CNEG, a novel Conditional Non-Autoregressive Error Generation model for generating Chinese grammatical errors. Specifically, in order to generate a context-dependent error, we first mask a span in a correct text, then predict an erroneous span conditioned on both the masked text and the correct span. Furthermore, we filter out error-free spans by measuring their perplexities in the original sentences. Experimental results show that our proposed method achieves better performance than all compared data augmentation methods on the CGED-2018 and CGED-2020 benchmarks. 2022.findings-acl.233 yue-etal-2022-improving + 10.18653/v1/2022.findings-acl.233 Modular and Parameter-Efficient Multimodal Fusion with Prompting @@ -3272,6 +3505,7 @@ 2022.findings-acl.234 2022.findings-acl.234.software.zip liang-etal-2022-modular + 10.18653/v1/2022.findings-acl.234 Synchronous Refinement for Neural Machine Translation @@ -3284,6 +3518,7 @@ Machine translation typically adopts an encoder-to-decoder framework, in which the decoder generates the target sentence word-by-word in an auto-regressive manner. However, the auto-regressive decoder faces a deep-rooted one-pass issue whereby each generated word is considered as one element of the final output regardless of whether it is correct or not. These generated wrong words further constitute the target historical context to affect the generation of subsequent target words. This paper proposes a novel synchronous refinement method to revise potential errors in the generated words by considering part of the target future context. Particularly, the proposed approach allows the auto-regressive decoder to refine the previously generated target words and generate the next target word synchronously. The experimental results on three widely-used machine translation tasks demonstrated the effectiveness of the proposed approach. 2022.findings-acl.235 chen-etal-2022-synchronous + 10.18653/v1/2022.findings-acl.235 <fixed-case>HIE</fixed-case>-<fixed-case>SQL</fixed-case>: History Information Enhanced Network for Context-Dependent Text-to-<fixed-case>SQL</fixed-case> Semantic Parsing @@ -3297,6 +3532,7 @@ 2022.findings-acl.236 zheng-etal-2022-hie CoSQL + 10.18653/v1/2022.findings-acl.236 <fixed-case>CRAS</fixed-case>pell: A Contextual Typo Robust Approach to Improve <fixed-case>C</fixed-case>hinese Spelling Correction @@ -3313,6 +3549,7 @@ 2022.findings-acl.237.software.zip liu-etal-2022-craspell liushulinle/craspell + 10.18653/v1/2022.findings-acl.237 <fixed-case>G</fixed-case>aussian Multi-head Attention for Simultaneous Machine Translation @@ -3323,6 +3560,7 @@ 2022.findings-acl.238 zhang-feng-2022-gaussian ictnlp/gma + 10.18653/v1/2022.findings-acl.238 Composing Structure-Aware Batches for Pairwise Sentence Classification @@ -3336,6 +3574,7 @@ ukplab/acl2022-structure-batches GLUE QNLI + 10.18653/v1/2022.findings-acl.239 Factual Consistency of Multilingual Pretrained Language Models @@ -3348,6 +3587,7 @@ fierro-sogaard-2022-factual coastalcph/mpararel LAMA + 10.18653/v1/2022.findings-acl.240 Selecting Stickers in Open-Domain Dialogue through Multitask Learning @@ -3362,6 +3602,7 @@ 2022.findings-acl.241.software.zip zhang-etal-2022-selecting nonstopfor/sticker-selection + 10.18653/v1/2022.findings-acl.241 <fixed-case>Z</fixed-case>i<fixed-case>N</fixed-case>et: <fixed-case>L</fixed-case>inking <fixed-case>C</fixed-case>hinese Characters Spanning Three Thousand Years @@ -3377,6 +3618,7 @@ 2022.findings-acl.242.software.zip chi-etal-2022-zinet yangchijlu/ancientchinesecharsim + 10.18653/v1/2022.findings-acl.242 How Can Cross-lingual Knowledge Contribute Better to Fine-Grained Entity Typing? @@ -3392,6 +3634,7 @@ 2022.findings-acl.243 jin-etal-2022-cross FIGER + 10.18653/v1/2022.findings-acl.243 <fixed-case>AMR-DA</fixed-case>: <fixed-case>D</fixed-case>ata Augmentation by <fixed-case>A</fixed-case>bstract <fixed-case>M</fixed-case>eaning <fixed-case>R</fixed-case>epresentation @@ -3403,6 +3646,7 @@ 2022.findings-acl.244 shou-etal-2022-amr zzshou/amr-data-augmentation + 10.18653/v1/2022.findings-acl.244 Using Pre-Trained Language Models for Producing Counter Narratives Against Hate Speech: a Comparative Study @@ -3414,6 +3658,7 @@ In this work, we present an extensive study on the use of pre-trained language models for the task of automatic Counter Narrative (CN) generation to fight online hate speech in English. We first present a comparative study to determine whether there is a particular Language Model (or class of LMs) and a particular decoding mechanism that are the most appropriate to generate CNs. Findings show that autoregressive models combined with stochastic decodings are the most promising. We then investigate how an LM performs in generating a CN with regard to an unseen target of hate. We find out that a key element for successful ‘out of target’ experiments is not an overall similarity with the training data but the presence of a specific subset of training data, i. e. a target that shares some commonalities with the test target that can be defined a-priori. We finally introduce the idea of a pipeline based on the addition of an automatic post-editing step to refine generated CNs. 2022.findings-acl.245 tekiroglu-etal-2022-using + 10.18653/v1/2022.findings-acl.245 Improving Robustness of Language Models from a Geometry-aware Perspective @@ -3429,6 +3674,7 @@ zhu-etal-2022-improving IMDb Movie Reviews SST + 10.18653/v1/2022.findings-acl.246 Task-guided Disentangled Tuning for Pretrained Language Models @@ -3444,6 +3690,7 @@ lemon0830/tdt CLUE GLUE + 10.18653/v1/2022.findings-acl.247 Exploring the Impact of Negative Samples of Contrastive Learning: A Case Study of Sentence Embedding @@ -3459,6 +3706,7 @@ 2022.findings-acl.248 cao-etal-2022-exploring xbdxwyh/mocose + 10.18653/v1/2022.findings-acl.248 The Inefficiency of Language Models in Scholarly Retrieval: An Experimental Walk-through @@ -3469,6 +3717,7 @@ 2022.findings-acl.249 singh-singh-2022-inefficiency shruti-singh/scilm_exp + 10.18653/v1/2022.findings-acl.249 Fusing Heterogeneous Factors with Triaffine Mechanism for Nested Named Entity Recognition @@ -3485,6 +3734,7 @@ ACE 2004 ACE 2005 GENIA + 10.18653/v1/2022.findings-acl.250 <fixed-case>UNIMO</fixed-case>-2: End-to-End Unified Vision-Language Grounded Learning @@ -3505,6 +3755,7 @@ SNLI-VE SST Visual Genome + 10.18653/v1/2022.findings-acl.251 The Past Mistake is the Future Wisdom: Error-driven Contrastive Probability Optimization for <fixed-case>C</fixed-case>hinese Spell Checking @@ -3523,6 +3774,7 @@ 2022.findings-acl.252 2022.findings-acl.252.software.zip li-etal-2022-past + 10.18653/v1/2022.findings-acl.252 <fixed-case>XFUND</fixed-case>: A Benchmark Dataset for Multilingual Visually Rich Form Understanding @@ -3540,6 +3792,7 @@ 2022.findings-acl.253.software.zip xu-etal-2022-xfund FUNSD + 10.18653/v1/2022.findings-acl.253 Type-Driven Multi-Turn Corrections for Grammatical Error Correction @@ -3558,6 +3811,7 @@ deeplearnxmu/tmtc FCE WI-LOCNESS + 10.18653/v1/2022.findings-acl.254 Leveraging Knowledge in Multilingual Commonsense Reasoning @@ -3577,6 +3831,7 @@ ConceptNet X-CSQA XCOPA + 10.18653/v1/2022.findings-acl.255 Encoding and Fusing Semantic Connection and Linguistic Evidence for Implicit Discourse Relation Recognition @@ -3589,6 +3844,7 @@ 2022.findings-acl.256 xiang-etal-2022-encoding hustminslab/manf + 10.18653/v1/2022.findings-acl.256 One Agent To Rule Them All: Towards Multi-agent Conversational <fixed-case>AI</fixed-case> @@ -3607,6 +3863,7 @@ clarke-etal-2022-one ChrisIsKing/black-box-multi-agent-integation BBAI Dataset + 10.18653/v1/2022.findings-acl.257 Word-level Perturbation Considering Word Length and Compositional Subwords @@ -3620,6 +3877,7 @@ 2022.findings-acl.258 hiraoka-etal-2022-word tathi/cwr + 10.18653/v1/2022.findings-acl.258 Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised <fixed-case>POS</fixed-case> Tagging @@ -3635,6 +3893,7 @@ Jacob-Zhou/FeatureCRFAE Penn Treebank Universal Dependencies + 10.18653/v1/2022.findings-acl.259 Controlling the Focus of Pretrained Language Generation Models @@ -3649,6 +3908,7 @@ question406/learningtofocus CNN/Daily Mail PERSONA-CHAT + 10.18653/v1/2022.findings-acl.260 Comparative Opinion Summarization via Collaborative Decoding @@ -3661,6 +3921,7 @@ 2022.findings-acl.261 iso-etal-2022-comparative megagonlabs/cocosum + 10.18653/v1/2022.findings-acl.261 <fixed-case>I</fixed-case>so<fixed-case>S</fixed-case>core: Measuring the Uniformity of Embedding Space Utilization @@ -3674,6 +3935,7 @@ rudman-etal-2022-isoscore bcbi-edu/p_eickhoff_isoscore WikiText-2 + 10.18653/v1/2022.findings-acl.262 A Natural Diet: Towards Improving Naturalness of Machine Translation Output @@ -3686,6 +3948,7 @@ Machine translation (MT) evaluation often focuses on accuracy and fluency, without paying much attention to translation style. This means that, even when considered accurate and fluent, MT output can still sound less natural than high quality human translations or text originally written in the target language. Machine translation output notably exhibits lower lexical diversity, and employs constructs that mirror those in the source sentence. In this work we propose a method for training MT systems to achieve a more natural style, i.e. mirroring the style of text originally written in the target language. Our method tags parallel training data according to the naturalness of the target side by contrasting language models trained on natural and translated data. Tagging data allows us to put greater emphasis on target sentences originally written in the target language. Automatic metrics show that the resulting models achieve lexical richness on par with human translations, mimicking a style much closer to sentences originally written in the target language. Furthermore, we find that their output is preferred by human experts when compared to the baseline translations. 2022.findings-acl.263 freitag-etal-2022-natural + 10.18653/v1/2022.findings-acl.263 From Stance to Concern: Adaptation of Propositional Analysis to New Tasks and Domains @@ -3700,6 +3963,7 @@ 2022.findings-acl.264 mather-etal-2022-stance ihmc/findings-of-acl-2022-concern-detection + 10.18653/v1/2022.findings-acl.264 <fixed-case>CUE</fixed-case> Vectors: Modular Training of Language Models Conditioned on Diverse Contextual Signals @@ -3711,6 +3975,7 @@ We propose a framework to modularize the training of neural language models that use diverse forms of context by eliminating the need to jointly train context and within-sentence encoders. Our approach, contextual universal embeddings (CUE), trains LMs on one type of contextual data and adapts to novel context types. The model consists of a pretrained neural sentence LM, a BERT-based contextual encoder, and a masked transfomer decoder that estimates LM probabilities using sentence-internal and contextual evidence.When contextually annotated data is unavailable, our model learns to combine contextual and sentence-internal information using noisy oracle unigram embeddings as a proxy. Real context data can be introduced later and used to adapt a small number of parameters that map contextual data into the decoder’s embedding space. We validate the CUE framework on a NYTimes text corpus with multiple metadata types, for which the LM perplexity can be lowered from 36.6 to 27.4 by conditioning on context. Bootstrapping a contextual LM with only a subset of the metadata during training retains 85% of the achievable gain. Training the model initially with proxy context retains 67% of the perplexity gain after adapting to real context. Furthermore, we can swap one type of pretrained sentence LM for another without retraining the context encoders, by only adapting the decoder model. Overall, we obtain a modular framework that allows incremental, scalable training of context-enhanced LMs. 2022.findings-acl.265 novotney-etal-2022-cue + 10.18653/v1/2022.findings-acl.265 Cross-Lingual <fixed-case>UMLS</fixed-case> Named Entity Linking using <fixed-case>UMLS</fixed-case> Dictionary Fine-Tuning @@ -3725,6 +3990,7 @@ rinagalperin/biomedical_nel BC5CDR MedMentions + 10.18653/v1/2022.findings-acl.266 Aligned Weight Regularizers for Pruning Pretrained Neural Networks @@ -3735,6 +4001,7 @@ Pruning aims to reduce the number of parameters while maintaining performance close to the original network. This work proposes a novel self-distillation based pruning strategy, whereby the representational similarity between the pruned and unpruned versions of the same network is maximized. Unlike previous approaches that treat distillation and pruning separately, we use distillation to inform the pruning criteria, without requiring a separate student network as in knowledge distillation. We show that the proposed cross-correlation objective for self-distilled pruning implicitly encourages sparse solutions, naturally complementing magnitude-based pruning criteria. Experiments on the GLUE and XGLUE benchmarks show that self-distilled pruning increases mono- and cross-lingual language model performance. Self-distilled pruned models also outperform smaller Transformers with an equal number of parameters and are competitive against (6 times) larger distilled networks. We also observe that self-distillation (1) maximizes class separability, (2) increases the signal-to-noise ratio, and (3) converges faster after pruning steps, providing further insights into why self-distilled pruning improves generalization. 2022.findings-acl.267 o-neill-etal-2022-aligned + 10.18653/v1/2022.findings-acl.267 Consistent Representation Learning for Continual Relation Extraction @@ -3749,6 +4016,7 @@ thuiar/CRL FewRel TACRED + 10.18653/v1/2022.findings-acl.268 Event Transition Planning for Open-ended Text Generation @@ -3763,6 +4031,7 @@ 2022.findings-acl.269 li-etal-2022-event ATOMIC + 10.18653/v1/2022.findings-acl.269 Comprehensive Multi-Modal Interactions for Referring Image Segmentation @@ -3776,6 +4045,7 @@ COCO Google Refexp RefCOCO + 10.18653/v1/2022.findings-acl.270 <fixed-case>M</fixed-case>eta<fixed-case>W</fixed-case>eighting: Learning to Weight Tasks in Multi-Task Learning @@ -3789,6 +4059,7 @@ 2022.findings-acl.271 2022.findings-acl.271.software.zip mao-etal-2022-metaweighting + 10.18653/v1/2022.findings-acl.271 Improving Controllable Text Generation with Position-Aware Weighted Decoding @@ -3804,6 +4075,7 @@ gu-etal-2022-improving IMDb Movie Reviews SST + 10.18653/v1/2022.findings-acl.272 Prompt Tuning for Discriminative Pre-trained Language Models @@ -3825,6 +4097,7 @@ AG News Quoref SST + 10.18653/v1/2022.findings-acl.273 Two Birds with One Stone: Unified Model Learning for Both Recall and Ranking in News Recommendation @@ -3837,6 +4110,7 @@ 2022.findings-acl.274 wu-etal-2022-two MIND + 10.18653/v1/2022.findings-acl.274 What does it take to bake a cake? The <fixed-case>R</fixed-case>ecipe<fixed-case>R</fixed-case>ef corpus and anaphora resolution in procedural text @@ -3848,6 +4122,7 @@ 2022.findings-acl.275 fang-etal-2022-take biaoyanf/reciperef + 10.18653/v1/2022.findings-acl.275 <fixed-case>MERI</fixed-case>t: <fixed-case>M</fixed-case>eta-<fixed-case>P</fixed-case>ath <fixed-case>G</fixed-case>uided <fixed-case>C</fixed-case>ontrastive <fixed-case>L</fixed-case>earning for <fixed-case>L</fixed-case>ogical <fixed-case>R</fixed-case>easoning @@ -3862,6 +4137,7 @@ sparkjiao/merit LogiQA ReClor + 10.18653/v1/2022.findings-acl.276 <fixed-case>THE</fixed-case>-<fixed-case>X</fixed-case>: Privacy-Preserving Transformer Inference with Homomorphic Encryption @@ -3884,6 +4160,7 @@ MRPC QNLI SST + 10.18653/v1/2022.findings-acl.277 <fixed-case>HLDC</fixed-case>: <fixed-case>H</fixed-case>indi Legal Documents Corpus @@ -3903,6 +4180,7 @@ 2022.findings-acl.278.software.zip kapoor-etal-2022-hldc exploration-lab/hldc + 10.18653/v1/2022.findings-acl.278 Rethinking Document-level Neural Machine Translation @@ -3918,6 +4196,7 @@ 2022.findings-acl.279 sun-etal-2022-rethinking sunzewei2715/Doc2Doc_NMT + 10.18653/v1/2022.findings-acl.279 Incremental Intent Detection for Medical Domain with Contrast Replay Networks @@ -3930,6 +4209,7 @@ 2022.findings-acl.280 bai-etal-2022-incremental KUAKE-QIC + 10.18653/v1/2022.findings-acl.280 <fixed-case>L</fixed-case>a<fixed-case>P</fixed-case>ra<fixed-case>D</fixed-case>o<fixed-case>R</fixed-case>: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval @@ -3950,6 +4230,7 @@ MS MARCO Natural Questions SciFact + 10.18653/v1/2022.findings-acl.281 Do Pre-trained Models Benefit Knowledge Graph Completion? A Reliable Evaluation and a Reasonable Approach @@ -3967,6 +4248,7 @@ lv-etal-2022-pre InferWiki LAMA + 10.18653/v1/2022.findings-acl.282 <fixed-case>EICO</fixed-case>: Improving Few-Shot Text Classification via Explicit and Implicit Consistency Regularization @@ -3978,6 +4260,7 @@ zhao-yao-2022-eico MPQA Opinion Corpus SST + 10.18653/v1/2022.findings-acl.283 Improving the Adversarial Robustness of <fixed-case>NLP</fixed-case> Models by Information Bottleneck @@ -3994,6 +4277,7 @@ zhang-etal-2022-improving IMDb Movie Reviews SST + 10.18653/v1/2022.findings-acl.284 Incorporating Dynamic Semantics into Pre-Trained Language Model for Aspect-based Sentiment Analysis @@ -4008,6 +4292,7 @@ Aspect-based sentiment analysis (ABSA) predicts sentiment polarity towards a specific aspect in the given sentence. While pre-trained language models such as BERT have achieved great success, incorporating dynamic semantic changes into ABSA remains challenging. To this end, in this paper, we propose to address this problem by Dynamic Re-weighting BERT (DR-BERT), a novel method designed to learn dynamic aspect-oriented semantics for ABSA. Specifically, we first take the Stack-BERT layers as a primary encoder to grasp the overall semantic of the sentence and then fine-tune it by incorporating a lightweight Dynamic Re-weighting Adapter (DRA). Note that the DRA can pay close attention to a small region of the sentences at each step and re-weigh the vitally important words for better aspect-aware sentiment understanding. Finally, experimental results on three benchmark datasets demonstrate the effectiveness and the rationality of our proposed model and provide good interpretable insights for future semantic modeling. 2022.findings-acl.285 zhang-etal-2022-incorporating + 10.18653/v1/2022.findings-acl.285 <fixed-case>DARER</fixed-case>: Dual-task Temporal Relational Recurrent Reasoning Network for Joint Dialog Sentiment Classification and Act Recognition @@ -4019,6 +4304,7 @@ xing-tsang-2022-darer xingbowen714/darer DailyDialog + 10.18653/v1/2022.findings-acl.286 Divide and Conquer: Text Semantic Matching with Disentangled Keywords and Intents @@ -4037,6 +4323,7 @@ rowitzou/dc-match GLUE MRPC + 10.18653/v1/2022.findings-acl.287 Modular Domain Adaptation @@ -4050,6 +4337,7 @@ jkvc/modular-domain-adaptation IMDb Movie Reviews SST + 10.18653/v1/2022.findings-acl.288 Detection of Adversarial Examples in Text Classification: Benchmark and Baseline via Robust Density Estimation @@ -4066,6 +4354,7 @@ AG News IMDb Movie Reviews SST + 10.18653/v1/2022.findings-acl.289 <fixed-case>P</fixed-case>latt-Bin: Efficient Posterior Calibrated Training for <fixed-case>NLP</fixed-case> Classifiers @@ -4076,6 +4365,7 @@ 2022.findings-acl.290 2022.findings-acl.290.software.zip singh-goshtasbpour-2022-platt + 10.18653/v1/2022.findings-acl.290 Addressing Resource and Privacy Constraints in Semantic Parsing Through Data Augmentation @@ -4091,6 +4381,7 @@ yang-etal-2022-addressing ATIS BREAK + 10.18653/v1/2022.findings-acl.291 Improving Candidate Retrieval with Entity Profile Generation for <fixed-case>W</fixed-case>ikidata Entity Linking @@ -4102,6 +4393,7 @@ 2022.findings-acl.292 lai-etal-2022-improving laituan245/el-dockers + 10.18653/v1/2022.findings-acl.292 Local Structure Matters Most: Perturbation Study in <fixed-case>NLU</fixed-case> @@ -4115,6 +4407,7 @@ 2022.findings-acl.293.software.zip clouatre-etal-2022-local GLUE + 10.18653/v1/2022.findings-acl.293 Probing Factually Grounded Content Transfer with Factual Ablation @@ -4126,6 +4419,7 @@ Despite recent success, large neural models often generate factually incorrect text. Compounding this is the lack of a standard automatic evaluation for factuality–it cannot be meaningfully improved if it cannot be measured. Grounded generation promises a path to solving both of these problems: models draw on a reliable external document (grounding) for factual information, simplifying the challenge of factuality. Measuring factuality is also simplified–to factual consistency, testing whether the generation agrees with the grounding, rather than all facts. Yet, without a standard automatic metric for factual consistency, factually grounded generation remains an open problem. We study this problem for content transfer, in which generations extend a prompt, using information from factual grounding. Particularly, this domain allows us to introduce the notion of factual ablation for automatically measuring factual consistency: this captures the intuition that the model should be less likely to produce an output given a less relevant grounding document. In practice, we measure this by presenting a model with two grounding documents, and the model should prefer to use the more factually relevant one. We contribute two evaluation sets to measure this. Applying our new evaluation, we propose multiple novel methods improving over strong baselines. 2022.findings-acl.294 west-etal-2022-probing + 10.18653/v1/2022.findings-acl.294 <fixed-case>ED</fixed-case>2<fixed-case>LM</fixed-case>: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference @@ -4146,6 +4440,7 @@ hui-etal-2022-ed2lm MS MARCO Natural Questions + 10.18653/v1/2022.findings-acl.295 Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics @@ -4155,6 +4450,7 @@ Question answering-based summarization evaluation metrics must automatically determine whether the QA model’s prediction is correct or not, a task known as answer verification. In this work, we benchmark the lexical answer verification methods which have been used by current QA-based metrics as well as two more sophisticated text comparison methods, BERTScore and LERC. We find that LERC out-performs the other methods in some settings while remaining statistically indistinguishable from lexical overlap in others. However, our experiments reveal that improved verification performance does not necessarily translate to overall QA-based metric quality: In some scenarios, using a worse verification method — or using none at all — has comparable performance to using the best verification method, a result that we attribute to properties of the datasets. 2022.findings-acl.296 deutsch-roth-2022-benchmarking + 10.18653/v1/2022.findings-acl.296 Prior Knowledge and Memory Enriched Transformer for Sign Language Translation @@ -4167,6 +4463,7 @@ 2022.findings-acl.297 jin-etal-2022-prior PHOENIX14T + 10.18653/v1/2022.findings-acl.297 Discontinuous Constituency and <fixed-case>BERT</fixed-case>: A Case Study of <fixed-case>D</fixed-case>utch @@ -4177,6 +4474,7 @@ 2022.findings-acl.298 2022.findings-acl.298.software.zip kogkalidis-wijnholds-2022-discontinuous + 10.18653/v1/2022.findings-acl.298 Probing Multilingual Cognate Prediction Models @@ -4186,6 +4484,7 @@ Character-based neural machine translation models have become the reference models for cognate prediction, a historical linguistics task. So far, all linguistic interpretations about latent information captured by such models have been based on external analysis (accuracy, raw results, errors). In this paper, we investigate what probing can tell us about both models and previous interpretations, and learn that though our models store linguistic and diachronic information, they do not achieve it in previously assumed ways. 2022.findings-acl.299 fourrier-sagot-2022-probing + 10.18653/v1/2022.findings-acl.299 A Neural Pairwise Ranking Model for Readability Assessment @@ -4198,6 +4497,7 @@ lee-vajjala-2022-neural jlee118/nprm Newsela + 10.18653/v1/2022.findings-acl.300 First the Worst: Finding Better Gender Translations During Beam Search @@ -4210,6 +4510,7 @@ 2022.findings-acl.301.software.zip saunders-etal-2022-first dcsaunders/nmt-gender-rerank + 10.18653/v1/2022.findings-acl.301 Dialogue Summaries as Dialogue States (<fixed-case>DS</fixed-case>2), Template-Guided Summarization for Few-shot Dialogue State Tracking @@ -4226,6 +4527,7 @@ jshin49/ds2 MultiWOZ SAMSum Corpus + 10.18653/v1/2022.findings-acl.302 Unsupervised Preference-Aware Language Identification @@ -4242,6 +4544,7 @@ 2022.findings-acl.303.software.zip ren-etal-2022-unsupervised xzhren/preferenceawarelid + 10.18653/v1/2022.findings-acl.303 Using <fixed-case>NLP</fixed-case> to quantify the environmental cost and diversity benefits of in-person <fixed-case>NLP</fixed-case> conferences @@ -4252,6 +4555,7 @@ 2022.findings-acl.304 przybyla-shardlow-2022-using piotrmp/nlp_geography + 10.18653/v1/2022.findings-acl.304 Interpretable Research Replication Prediction via Variational Contextual Consistency Sentence Masking @@ -4265,6 +4569,7 @@ 2022.findings-acl.305.software.zip luo-etal-2022-interpretable ECHR + 10.18653/v1/2022.findings-acl.305 <fixed-case>C</fixed-case>hinese Synesthesia Detection: New Dataset and Models @@ -4276,6 +4581,7 @@ In this paper, we introduce a new task called synesthesia detection, which aims to extract the sensory word of a sentence, and to predict the original and synesthetic sensory modalities of the corresponding sensory word. Synesthesia refers to the description of perceptions in one sensory modality through concepts from other modalities. It involves not only a linguistic phenomenon, but also a cognitive phenomenon structuring human thought and action, which makes it become a bridge between figurative linguistic phenomenon and abstract cognition, and thus be helpful to understand the deep semantics. To address this, we construct a large-scale human-annotated Chinese synesthesia dataset, which contains 7,217 annotated sentences accompanied by 187 sensory words. Based on this dataset, we propose a family of strong and representative baseline models. Upon these baselines, we further propose a radical-based neural network model to identify the boundary of the sensory word, and to jointly detect the original and synesthetic sensory modalities for the word. Through extensive experiments, we observe that the importance of the proposed task and dataset can be verified by the statistics and progressive performances. In addition, our proposed model achieves state-of-the-art results on the synesthesia dataset. 2022.findings-acl.306 jiang-etal-2022-chinese + 10.18653/v1/2022.findings-acl.306 Rethinking Offensive Text Detection as a Multi-Hop Reasoning Problem @@ -4288,6 +4594,7 @@ zhang-etal-2022-rethinking qzx7/slight OLID + 10.18653/v1/2022.findings-acl.307 On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark @@ -4306,6 +4613,7 @@ 2022.findings-acl.308.software.zip sun-etal-2022-safety thu-coai/diasafety + 10.18653/v1/2022.findings-acl.308 Word Segmentation by Separation Inference for <fixed-case>E</fixed-case>ast <fixed-case>A</fixed-case>sian Languages @@ -4319,6 +4627,7 @@ 2022.findings-acl.309 tong-etal-2022-word um-nlper/spin-ws + 10.18653/v1/2022.findings-acl.309 Unsupervised <fixed-case>C</fixed-case>hinese Word Segmentation with <fixed-case>BERT</fixed-case> Oriented Probing and Transformation @@ -4332,6 +4641,7 @@ 2022.findings-acl.310.software.zip li-etal-2022-unsupervised liweitj47/bert_unsupervised_word_segmentation + 10.18653/v1/2022.findings-acl.310 <fixed-case>E</fixed-case>-<fixed-case>KAR</fixed-case>: A Benchmark for Rationalizing Natural Language Analogical Reasoning @@ -4350,6 +4660,7 @@ 2022.findings-acl.311 chen-etal-2022-e E-KAR + 10.18653/v1/2022.findings-acl.311 Implicit Relation Linking for Question Answering over Knowledge Graph @@ -4367,6 +4678,7 @@ zhao-etal-2022-implicit DBpedia SimpleQuestions + 10.18653/v1/2022.findings-acl.312 Attention Mechanism with Energy-Friendly Operations @@ -4383,6 +4695,7 @@ 2022.findings-acl.313 wan-etal-2022-attention nlp2ct/e-att + 10.18653/v1/2022.findings-acl.313 Probing <fixed-case>BERT</fixed-case>’s priors with serial reproduction chains @@ -4393,6 +4706,7 @@ Sampling is a promising bottom-up method for exposing what generative models have learned about language, but it remains unclear how to generate representative samples from popular masked language models (MLMs) like BERT. The MLM objective yields a dependency network with no guarantee of consistent conditional distributions, posing a problem for naive approaches. Drawing from theories of iterated learning in cognitive science, we explore the use of serial reproduction chains to sample from BERT’s priors. In particular, we observe that a unique and consistent estimator of the ground-truth joint distribution is given by a Generative Stochastic Network (GSN) sampler, which randomly selects which token to mask and reconstruct on each step. We show that the lexical and syntactic statistics of sentences from GSN chains closely match the ground-truth corpus distribution and perform better than other methods in a large corpus of naturalness judgments. Our findings establish a firmer theoretical foundation for bottom-up probing and highlight richer deviations from human priors. 2022.findings-acl.314 yamakoshi-etal-2022-probing + 10.18653/v1/2022.findings-acl.314 Interpreting the Robustness of Neural <fixed-case>NLP</fixed-case> Models to Textual Perturbations @@ -4404,6 +4718,7 @@ Modern Natural Language Processing (NLP) models are known to be sensitive to input perturbations and their performance can decrease when applied to real-world, noisy data. However, it is still unclear why models are less robust to some perturbations than others. In this work, we test the hypothesis that the extent to which a model is affected by an unseen textual perturbation (robustness) can be explained by the learnability of the perturbation (defined as how well the model learns to identify the perturbation with a small amount of evidence). We further give a causal justification for the learnability metric. We conduct extensive experiments with four prominent NLP models — TextRNN, BERT, RoBERTa and XLNet — over eight types of textual perturbations on three datasets. We show that a model which is better at identifying a perturbation (higher learnability) becomes worse at ignoring such a perturbation at test time (lower robustness), providing empirical support for our hypothesis. 2022.findings-acl.315 zhang-etal-2022-interpreting + 10.18653/v1/2022.findings-acl.315 Zero-Shot Dense Retrieval with Momentum Adversarial Domain Invariant Representations @@ -4419,6 +4734,7 @@ xin-etal-2022-zero BEIR Natural Questions + 10.18653/v1/2022.findings-acl.316 A Few-Shot Semantic Parser for <fixed-case>W</fixed-case>izard-of-<fixed-case>O</fixed-case>z Dialogues with the Precise <fixed-case>T</fixed-case>hing<fixed-case>T</fixed-case>alk Representation @@ -4432,6 +4748,7 @@ Previous attempts to build effective semantic parsers for Wizard-of-Oz (WOZ) conversations suffer from the difficulty in acquiring a high-quality, manually annotated training set. Approaches based only on dialogue synthesis are insufficient, as dialogues generated from state-machine based models are poor approximations of real-life conversations. Furthermore, previously proposed dialogue state representations are ambiguous and lack the precision necessary for building an effective agent.This paper proposes a new dialogue representation and a sample-efficient methodology that can predict precise dialogue states in WOZ conversations. We extended the ThingTalk representation to capture all information an agent needs to respond properly. Our training strategy is sample-efficient: we combine (1) few-shot data sparsely sampling the full dialogue space and (2) synthesized data covering a subset space of dialogues generated by a succinct state-based dialogue model. The completeness of the extended ThingTalk language is demonstrated with a fully operational agent, which is also used in training data synthesis. We demonstrate the effectiveness of our methodology on MultiWOZ 3.0, a reannotation of the MultiWOZ 2.1 dataset in ThingTalk. ThingTalk can represent 98% of the test turns, while the simulator can emulate 85% of the validation set. We train a contextual semantic parser using our strategy, and obtain 79% turn-by-turn exact match accuracy on the reannotated test set. 2022.findings-acl.317 campagna-etal-2022-shot + 10.18653/v1/2022.findings-acl.317 <fixed-case>GCPG</fixed-case>: A General Framework for Controllable Paraphrase Generation @@ -4448,6 +4765,7 @@ 2022.findings-acl.318 2022.findings-acl.318.software.zip yang-etal-2022-gcpg + 10.18653/v1/2022.findings-acl.318 <fixed-case>C</fixed-case>ross<fixed-case>A</fixed-case>ligner & Co: Zero-Shot Transfer Methods for Task-Oriented Cross-lingual Natural Language Understanding @@ -4460,6 +4778,7 @@ gritta-etal-2022-crossaligner huawei-noah/noah-research MTOP + 10.18653/v1/2022.findings-acl.319 Attention as Grounding: Exploring Textual and Cross-Modal Attention on Entities and Relations in Language-and-Vision Transformer @@ -4471,6 +4790,7 @@ ilinykh-dobnik-2022-attention gu-clasp/attention-as-grounding Image Description Sequences + 10.18653/v1/2022.findings-acl.320 Improving Zero-Shot Cross-lingual Transfer Between Closely Related Languages by Injecting Character-Level Noise @@ -4481,6 +4801,7 @@ 2022.findings-acl.321 aepli-sennrich-2022-improving Universal Dependencies + 10.18653/v1/2022.findings-acl.321 Structural Supervision for Word Alignment and Machine Translation @@ -4492,6 +4813,7 @@ Syntactic structure has long been argued to be potentially useful for enforcing accurate word alignment and improving generalization performance of machine translation. Unfortunately, existing wisdom demonstrates its significance by considering only the syntactic structure of source tokens, neglecting the rich structural information from target tokens and the structural similarity between the source and target sentences. In this work, we propose to incorporate the syntactic structure of both source and target tokens into the encoder-decoder framework, tightly correlating the internal logic of word alignment and machine translation for multi-task learning. Particularly, we won’t leverage any annotated syntactic graph of the target side during training, so we introduce Dynamic Graph Convolution Networks (DGCN) on observed target tokens to sequentially and simultaneously generate the target tokens and the corresponding syntactic graphs, and further guide the word alignment. On this basis, Hierarchical Graph Random Walks (HGRW) are performed on the syntactic graphs of both source and target sides, for incorporating structured constraints on machine translation outputs. Experiments on four publicly available language pairs verify that our method is highly effective in capturing syntactic structure in different languages, consistently outperforming baselines in alignment accuracy and demonstrating promising results in translation quality. 2022.findings-acl.322 li-etal-2022-structural + 10.18653/v1/2022.findings-acl.322 Focus on the Action: Learning to Highlight and Summarize Jointly for Email To-Do Items Summarization @@ -4502,6 +4824,7 @@ Automatic email to-do item generation is the task of generating to-do items from a given email to help people overview emails and schedule daily work. Different from prior research on email summarization, to-do item generation focuses on generating action mentions to provide more structured summaries of email text.Prior work either requires large amount of annotation for key sentences with potential actions or fails to pay attention to nuanced actions from these unstructured emails, and thus often lead to unfaithful summaries. To fill these gaps, we propose a simple and effective learning to highlight and summarize framework (LHS) to learn to identify the most salient text and actions, and incorporate these structured representations to generate more faithful to-do items. Experiments show that our LHS model outperforms the baselines and achieves the state-of-the-art performance in terms of both quantitative evaluation and human judgement. We also discussed specific challenges that current models faced with email to-do summarization. 2022.findings-acl.323 zhang-etal-2022-focus + 10.18653/v1/2022.findings-acl.323 Exploring the Capacity of a Large-scale Masked Language Model to Recognize Grammatical Errors @@ -4512,6 +4835,7 @@ In this paper, we explore the capacity of a language model-based method for grammatical error detection in detail. We first show that 5 to 10% of training data are enough for a BERT-based error detection method to achieve performance equivalent to what a non-language model-based method can achieve with the full training data; recall improves much faster with respect to training data size in the BERT-based method than in the non-language model method. This suggests that (i) the BERT-based method should have a good knowledge of the grammar required to recognize certain types of error and that (ii) it can transform the knowledge into error detection rules by fine-tuning with few training samples, which explains its high generalization ability in grammatical error detection. We further show with pseudo error data that it actually exhibits such nice properties in learning rules for recognizing various types of error. Finally, based on these findings, we discuss a cost-effective method for detecting grammatical errors with feedback comments explaining relevant grammatical rules to learners. 2022.findings-acl.324 nagata-etal-2022-exploring + 10.18653/v1/2022.findings-acl.324 Should We Trust This Summary? <fixed-case>B</fixed-case>ayesian Abstractive Summarization to The Rescue @@ -4522,6 +4846,7 @@ 2022.findings-acl.325 gidiotis-tsoumakas-2022-trust AESLC + 10.18653/v1/2022.findings-acl.325 On the data requirements of probing @@ -4536,6 +4861,7 @@ zhu-etal-2022-data spoclab-ca/probing_dataset SentEval + 10.18653/v1/2022.findings-acl.326 Translation Error Detection as Rationale Extraction @@ -4547,6 +4873,7 @@ 2022.findings-acl.327 fomicheva-etal-2022-translation MLQE-PE + 10.18653/v1/2022.findings-acl.327 Towards Collaborative Neural-Symbolic Graph Semantic Parsing via Uncertainty @@ -4558,6 +4885,7 @@ 2022.findings-acl.328 lin-etal-2022-towards SCAN + 10.18653/v1/2022.findings-acl.328 Towards Few-shot Entity Recognition in Document Images: A Label-aware Sequence-to-Sequence Framework @@ -4568,6 +4896,7 @@ 2022.findings-acl.329 wang-shang-2022-towards FUNSD + 10.18653/v1/2022.findings-acl.329 On Length Divergence Bias in Textual Matching Models @@ -4582,6 +4911,7 @@ 2022.findings-acl.330 jiang-etal-2022-length TrecQA + 10.18653/v1/2022.findings-acl.330 What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation @@ -4596,6 +4926,7 @@ ghazarian-etal-2022-wrong alexa/conture FED + 10.18653/v1/2022.findings-acl.331 diff --git a/data/xml/2022.fl4nlp.xml b/data/xml/2022.fl4nlp.xml index e3ab30f053..f672f77425 100644 --- a/data/xml/2022.fl4nlp.xml +++ b/data/xml/2022.fl4nlp.xml @@ -34,6 +34,7 @@ In the context of personalized federated learning (FL), the critical challenge is to balance local model improvement and global model tuning when the personal and global objectives may not be exactly aligned. Inspired by Bayesian hierarchical models, we develop ActPerFL, a self-aware personalized FL method where each client can automatically balance the training of its local personal model and the global model that implicitly contributes to other clients’ training. Such a balance is derived from the inter-client and intra-client uncertainty quantification. Consequently, ActPerFL can adapt to the underlying clients’ heterogeneity with uncertainty-driven local training and model aggregation. With experimental studies on Sent140 and Amazon Alexa audio data, we show that ActPerFL can achieve superior personalization performance compared with the existing counterparts. 2022.fl4nlp-1.1 chen-etal-2022-actperfl + 10.18653/v1/2022.fl4nlp-1.1 Scaling Language Model Size in Cross-Device Federated Learning @@ -49,6 +50,7 @@ 2022.fl4nlp-1.2 ro-etal-2022-scaling Billion Word Benchmark + 10.18653/v1/2022.fl4nlp-1.2 Adaptive Differential Privacy for Language Model Training @@ -61,6 +63,7 @@ wu-etal-2022-adaptive WikiText-103 WikiText-2 + 10.18653/v1/2022.fl4nlp-1.3 Intrinsic Gradient Compression for Scalable and Efficient Federated Learning @@ -72,6 +75,7 @@ melas-kyriazi-wang-2022-intrinsic PERSONA-CHAT SST + 10.18653/v1/2022.fl4nlp-1.4 diff --git a/data/xml/2022.humeval.xml b/data/xml/2022.humeval.xml index 2e42a427fd..55a10d5b0c 100644 --- a/data/xml/2022.humeval.xml +++ b/data/xml/2022.humeval.xml @@ -25,6 +25,7 @@ SacreBLEU, by incorporating a text normalizing step in the pipeline, has become a rising automatic evaluation metric in recent MT studies. With agglutinative languages such as Korean, however, the lexical-level metric cannot provide a conceivable result without a customized pre-tokenization. This paper endeavors to ex- amine the influence of diversified tokenization schemes –word, morpheme, subword, character, and consonants & vowels (CV)– on the metric after its protective layer is peeled off.By performing meta-evaluation with manually- constructed into-Korean resources, our empirical study demonstrates that the human correlation of the surface-based metric and other homogeneous ones (as an extension) vacillates greatly by the token type. Moreover, the human correlation of the metric often deteriorates due to some tokenization, with CV one of its culprits. Guiding through the proper usage of tokenizers for the given metric, we discover i) the feasibility of the character tokens and ii) the deficit of CV in the Korean MT evaluation. 2022.humeval-1.1 kim-kim-2022-vacillating + 10.18653/v1/2022.humeval-1.1 A Methodology for the Comparison of Human Judgments With Metrics for Coreference Resolution @@ -37,6 +38,7 @@ 2022.humeval-1.2 borovikova-etal-2022-methodology CoNLL-2012 + 10.18653/v1/2022.humeval-1.2 Perceptual Quality Dimensions of Machine-Generated Text with a Focus on Machine Translation @@ -49,6 +51,7 @@ 2022.humeval-1.3 macketanz-etal-2022-perceptual dfki-nlp/textq + 10.18653/v1/2022.humeval-1.3 Human evaluation of web-crawled parallel corpora for machine translation @@ -61,6 +64,7 @@ 2022.humeval-1.4 ramirez-sanchez-etal-2022-human ParaCrawl + 10.18653/v1/2022.humeval-1.4 Beyond calories: evaluating how tailored communication reduces emotional load in diet-coaching @@ -70,6 +74,7 @@ Dieting is a behaviour change task that is difficult for many people to conduct successfully. This is due to many factors, including stress and cost. Mobile applications offer an alternative to traditional coaching. However, previous work on apps evaluation only focused on dietary outcomes, ignoring users’ emotional state despite its influence on eating habits. In this work, we introduce a novel evaluation of the effects that tailored communication can have on the emotional load of dieting. We implement this by augmenting a traditional diet-app with affective NLG, text-tailoring and persuasive communication techniques. We then run a short 2-weeks experiment and check dietary outcomes, user feedback of produced text and, most importantly, its impact on emotional state, through PANAS questionnaire. Results show that tailored communication significantly improved users’ emotional state, compared to an app-only control group. 2022.humeval-1.5 balloccu-reiter-2022-beyond + 10.18653/v1/2022.humeval-1.5 The Human Evaluation Datasheet: A Template for Recording Details of Human Evaluation Experiments in <fixed-case>NLP</fixed-case> @@ -80,6 +85,7 @@ 2022.humeval-1.6 shimorina-belz-2022-human Shimorina/human-evaluation-datasheet + 10.18653/v1/2022.humeval-1.6 Toward More Effective Human Evaluation for Machine Translation @@ -92,6 +98,7 @@ 2022.humeval-1.7 saldias-fuentes-etal-2022-toward WMT 2020 + 10.18653/v1/2022.humeval-1.7 A Study on Manual and Automatic Evaluation for Text Style Transfer: The Case of Detoxification @@ -107,6 +114,7 @@ 2022.humeval-1.8 logacheva-etal-2022-study CoLA + 10.18653/v1/2022.humeval-1.8 Human Judgement as a Compass to Navigate Automatic Metrics for Formality Transfer @@ -120,6 +128,7 @@ lai-etal-2022-human laihuiyuan/eval-formality-transfer GYAFC + 10.18653/v1/2022.humeval-1.9 Towards Human Evaluation of Mutual Understanding in Human-Computer Spontaneous Conversation: An Empirical Study of Word Sense Disambiguation for Naturalistic Social Dialogs in <fixed-case>A</fixed-case>merican <fixed-case>E</fixed-case>nglish @@ -128,6 +137,7 @@ Current evaluation practices for social dialog systems, dedicated to human-computer spontaneous conversation, exclusively focus on the quality of system-generated surface text, but not human-verifiable aspects of mutual understanding between the systems and their interlocutors. This work proposes Word Sense Disambiguation (WSD) as an essential component of a valid and reliable human evaluation framework, whose long-term goal is to radically improve the usability of dialog systems in real-life human-computer collaboration. The practicality of this proposal is proved via experimentally investigating (1) the WordNet 3.0 sense inventory coverage of lexical meanings in spontaneous conversation between humans in American English, assumed as an upper bound of lexical diversity of human-computer communication, and (2) the effectiveness of state-of-the-art WSD models and pretrained transformer-based contextual embeddings on this type of data. 2022.humeval-1.10 luu-2022-towards + 10.18653/v1/2022.humeval-1.10 diff --git a/data/xml/2022.in2writing.xml b/data/xml/2022.in2writing.xml index de00494aeb..58c50f12dd 100644 --- a/data/xml/2022.in2writing.xml +++ b/data/xml/2022.in2writing.xml @@ -31,6 +31,7 @@ Today, data-to-text systems are used as commercial solutions for automated text productionof large quantities of text. Therefore, they already represent a new technology of writing.This new technology requires the author, asan act of writing, both to configure a systemthat then takes over the transformation into areal text, but also to maintain strategies of traditional writing. What should an environmentlook like, where a human guides a machineto write texts? Based on a comparison of theNLG pipeline architecture with the results ofthe research on the human writing process, thispaper attempts to take an overview of whichtasks need to be solved and which strategiesare necessary to produce good texts in this environment. From this synopsis, principles for thedesign of data-to-text systems as a functioningwriting environment are then derived. 2022.in2writing-1.1 schneider-etal-2022-data + 10.18653/v1/2022.in2writing-1.1 A Design Space for Writing Support Tools Using a Cognitive Process Model of Writing @@ -42,6 +43,7 @@ Improvements in language technology have led to an increasing interest in writing support tools. In this paper we propose a design space for such tools based on a cognitive process model of writing. We conduct a systematic review of recent computer science papers that present and/or study such tools, analyzing 30 papers from the last five years using the design space. Tools are plotted according to three distinct cognitive processes–planning, translating, and reviewing–and the level of constraint each process entails. Analyzing recent work with the design space shows that highly constrained planning and reviewing are under-studied areas that recent technology improvements may now be able to serve. Finally, we propose shared evaluation methodologies and tasks that may help the field mature. 2022.in2writing-1.2 gero-etal-2022-design + 10.18653/v1/2022.in2writing-1.2 A Selective Summary of Where to Hide a Stolen Elephant: Leaps in Creative Writing with Multimodal Machine Intelligence @@ -53,6 +55,7 @@ While developing a story, novices and published writers alike have had to look outside themselves for inspiration. Language models have recently been able to generate text fluently, producing new stochastic narratives upon request. However, effectively integrating such capabilities with human cognitive faculties and creative processes remains challenging. We propose to investigate this integration with a multimodal writing support interface that offers writing suggestions textually, visually, and aurally. We conduct an extensive study that combines elicitation of prior expectations before writing, observation and semi-structured interviews during writing, and outcome evaluations after writing. Our results illustrate individual and situational variation in machine-in-the-loop writing approaches, suggestion acceptance, and ways the system is helpful. Centrally, we report how participants perform integrative leaps, by which they do cognitive work to integrate suggestions of varying semantic relevance into their developing stories. We interpret these findings, offering modeling and design recommendations for future creative writing support technologies. 2022.in2writing-1.3 singh-etal-2022-selective + 10.18653/v1/2022.in2writing-1.3 A text-writing system for Easy-to-Read <fixed-case>G</fixed-case>erman evaluated with low-literate users with cognitive impairment @@ -63,6 +66,7 @@ 2022.in2writing-1.4 steinmetz-harbusch-2022-text CELEX + 10.18653/v1/2022.in2writing-1.4 Language Models as Context-sensitive Word Search Engines @@ -78,6 +82,7 @@ CLOTH WikiText-103 WikiText-2 + 10.18653/v1/2022.in2writing-1.5 Plug-and-Play Controller for Story Completion: A Pilot Study toward Emotion-aware Story Writing Assistance @@ -90,6 +95,7 @@ 2022.in2writing-1.6 mori-etal-2022-plug ROCStories + 10.18653/v1/2022.in2writing-1.6 Text Revision by On-the-Fly Representation Optimization @@ -105,6 +111,7 @@ jingjingli01/oreo GYAFC Newsela + 10.18653/v1/2022.in2writing-1.7 The Pure Poet: How Good is the Subjective Credibility and Stylistic Quality of Literary Short Texts Written with an Artificial Intelligence Tool as Compared to Texts Written by Human Authors? @@ -118,6 +125,7 @@ The application of artificial intelligence (AI) for text generation in creative domains raises questions regarding the credibility of AI-generated content. In two studies, we explored if readers can differentiate between AI-based and human-written texts (generated based on the first line of texts and poems of classic authors) and how the stylistic qualities of these texts are rated. Participants read 9 AI-based continuations and either 9 human-written continuations (Study 1, N=120) or 9 original continuations (Study 2, N=302). Participants’ task was to decide whether a continuation was written with an AI-tool or not, to indicate their confidence in each decision, and to assess the stylistic text quality. Results showed that participants generally had low accuracy for differentiating between text types but were overconfident in their decisions. Regarding the assessment of stylistic quality, AI-continuations were perceived as less well-written, inspiring, fascinating, interesting, and aesthetic than both human-written and original continuations. 2022.in2writing-1.8 gunser-etal-2022-pure + 10.18653/v1/2022.in2writing-1.8 Interactive Children’s Story Rewriting Through Parent-Children Interaction @@ -129,6 +137,7 @@ Storytelling in early childhood provides significant benefits in language and literacy development, relationship building, and entertainment. To maximize these benefits, it is important to empower children with more agency. Interactive story rewriting through parent-children interaction can boost children’s agency and help build the relationship between parent and child as they collaboratively create changes to an original story. However, for children with limited proficiency in reading and writing, parents must carry out multiple tasks to guide the rewriting process, which can incur a high cognitive load. In this work, we introduce an interface design that aims to support children and parents to rewrite stories together with the help of AI techniques. We describe three design goals determined by a review of prior literature in interactive storytelling and existing educational activities. We also propose a preliminary prompt-based pipeline that uses GPT-3 to realize the design goals and enable the interface. 2022.in2writing-1.9 lee-etal-2022-interactive + 10.18653/v1/2022.in2writing-1.9 News Article Retrieval in Context for Event-centric Narrative Creation @@ -141,6 +150,7 @@ 2022.in2writing-1.10 voskarides-etal-2022-news nickvosk/ictir2021-news-retrieval-in-context + 10.18653/v1/2022.in2writing-1.10 Unmet Creativity Support Needs in Computationally Supported Creative Writing @@ -150,6 +160,7 @@ Large language models (LLMs) enabled by the datasets and computing power of the last decade have recently gained popularity for their capacity to generate plausible natural language text from human-provided prompts. This ability makes them appealing to fiction writers as prospective co-creative agents, addressing the common challenge of writer’s block, or getting unstuck. However, creative writers face additional challenges, including maintaining narrative consistency, developing plot structure, architecting reader experience, and refining their expressive intent, which are not well-addressed by current LLM-backed tools. In this paper, we define these needs by grounding them in cognitive and theoretical literature, then survey previous computational narrative research that holds promise for supporting each of them in a co-creative setting. 2022.in2writing-1.11 kreminski-martens-2022-unmet + 10.18653/v1/2022.in2writing-1.11 Sparks: Inspiration for Science Writing using Language Models @@ -160,6 +171,7 @@ Large-scale language models are rapidly improving, performing well on a variety of tasks with little to no customization. In this work we investigate how language models can support science writing, a challenging writing task that is both open-ended and highly constrained. We present a system for generating “sparks”, sentences related to a scientific concept intended to inspire writers. We run a user study with 13 STEM graduate students and find three main use cases of sparks—inspiration, translation, and perspective—each of which correlates with a unique interaction pattern. We also find that while participants were more likely to select higher quality sparks, the overall quality of sparks seen by a given participant did not correlate with their satisfaction with the tool. 2022.in2writing-1.12 gero-etal-2022-sparks + 10.18653/v1/2022.in2writing-1.12 <fixed-case>C</fixed-case>hip<fixed-case>S</fixed-case>ong: A Controllable Lyric Generation System for <fixed-case>C</fixed-case>hinese Popular Song @@ -175,6 +187,7 @@ 2022.in2writing-1.13 liu-etal-2022-chipsong korokes/chipsong + 10.18653/v1/2022.in2writing-1.13 Read, Revise, Repeat: A System Demonstration for Human-in-the-loop Iterative Text Revision @@ -188,6 +201,7 @@ 2022.in2writing-1.14 du-etal-2022-read vipulraheja/iterater + 10.18653/v1/2022.in2writing-1.14 diff --git a/data/xml/2022.insights.xml b/data/xml/2022.insights.xml index 1ace04d59f..cc861d753d 100644 --- a/data/xml/2022.insights.xml +++ b/data/xml/2022.insights.xml @@ -31,6 +31,7 @@ 2022.insights-1.1 ding-etal-2022-isotropy GLUE + 10.18653/v1/2022.insights-1.1 Do Dependency Relations Help in the Task of Stance Detection? @@ -41,6 +42,7 @@ In this paper we present a set of multilingual experiments tackling the task of Stance Detection in five different languages: English, Spanish, Catalan, French and Italian. Furthermore, we study the phenomenon of stance with respect to six different targets – one per language, and two different for Italian – employing a variety of machine learning algorithms that primarily exploit morphological and syntactic knowledge as features, represented throughout the format of Universal Dependencies. Results seem to suggest that the methodology employed is not beneficial per se, but might be useful to exploit the same features with a different methodology. 2022.insights-1.2 cignarella-etal-2022-dependency + 10.18653/v1/2022.insights-1.2 Evaluating the Practical Utility of Confidence-score based Techniques for Unsupervised Open-world Classification @@ -50,6 +52,7 @@ Open-world classification in dialog systems require models to detect open intents, while ensuring the quality of in-domain (ID) intent classification. In this work, we revisit methods that leverage distance-based statistics for unsupervised out-of-domain (OOD) detection. We show that despite their superior performance on threshold-independent metrics like AUROC on test-set, threshold values chosen based on the performance on a validation-set do not generalize well to the test-set, thus resulting in substantially lower performance on ID or OOD detection accuracy and F1-scores. Our analysis shows that this lack of generalizability can be successfully mitigated by setting aside a hold-out set from validation data for threshold selection (sometimes achieving relative gains as high as 100%). Extensive experiments on seven benchmark datasets show that this fix puts the performance of these methods at par with, or sometimes even better than, the current state-of-the-art OOD detection techniques. 2022.insights-1.3 khosla-gangadharaiah-2022-evaluating + 10.18653/v1/2022.insights-1.3 Extending the Scope of Out-of-Domain: Examining <fixed-case>QA</fixed-case> models in multiple subdomains @@ -63,6 +66,7 @@ lyuchenyang/analysing-question-answering-data NewsQA SQuAD + 10.18653/v1/2022.insights-1.4 What Do You Get When You Cross Beam Search with Nucleus Sampling? @@ -72,6 +76,7 @@ We combine beam search with the probabilistic pruning technique of nucleus sampling to create two deterministic nucleus search algorithms for natural language generation. The first algorithm, p-exact search, locally prunes the next-token distribution and performs an exact search over the remaining space. The second algorithm, dynamic beam search, shrinks and expands the beam size according to the entropy of the candidate’s probability distribution. Despite the probabilistic intuition behind nucleus search, experiments on machine translation and summarization benchmarks show that both algorithms reach the same performance levels as standard beam search. 2022.insights-1.5 shaham-levy-2022-get + 10.18653/v1/2022.insights-1.5 How Much Do Modifications to Transformer Language Models Affect Their Ability to Learn Linguistic Knowledge? @@ -83,6 +88,7 @@ 2022.insights-1.6 sun-etal-2022-much BLiMP + 10.18653/v1/2022.insights-1.6 Cross-lingual Inflection as a Data Augmentation Method for Parsing @@ -93,6 +99,7 @@ We propose a morphology-based method for low-resource (LR) dependency parsing. We train a morphological inflector for target LR languages, and apply it to related rich-resource (RR) treebanks to create cross-lingual (x-inflected) treebanks that resemble the target LR language. We use such inflected treebanks to train parsers in zero- (training on x-inflected treebanks) and few-shot (training on x-inflected and target language treebanks) setups. The results show that the method sometimes improves the baselines, but not consistently. 2022.insights-1.7 munoz-ortiz-etal-2022-cross + 10.18653/v1/2022.insights-1.7 Is <fixed-case>BERT</fixed-case> Robust to Label Noise? A Study on Learning with Noisy Labels in Text Classification @@ -108,6 +115,7 @@ uds-lsv/bert-lnl AG News IMDb Movie Reviews + 10.18653/v1/2022.insights-1.8 Ancestor-to-Creole Transfer is Not a Walk in the Park @@ -118,6 +126,7 @@ We aim to learn language models for Creole languages for which large volumes of data are not readily available, and therefore explore the potential transfer from ancestor languages (the ‘Ancestry Transfer Hypothesis’). We find that standard transfer methods do not facilitate ancestry transfer. Surprisingly, different from other non-Creole languages, a very distinct two-phase pattern emerges for Creoles: As our training losses plateau, and language models begin to overfit on their source languages, perplexity on the Creoles drop. We explore if this compression phase can lead to practically useful language models (the ‘Ancestry Bottleneck Hypothesis’), but also falsify this. Moreover, we show that Creoles even exhibit this two-phase pattern even when training on random, unrelated languages. Thus Creoles seem to be typological outliers and we speculate whether there is a link between the two observations. 2022.insights-1.9 lent-etal-2022-ancestor + 10.18653/v1/2022.insights-1.9 What <fixed-case>GPT</fixed-case> Knows About Who is Who @@ -133,6 +142,7 @@ yang-etal-2022-gpt awesomecoref/prompt-coref WSC + 10.18653/v1/2022.insights-1.10 Evaluating Biomedical Word Embeddings for Vocabulary Alignment at Scale in the <fixed-case>UMLS</fixed-case> <fixed-case>M</fixed-case>etathesaurus Using <fixed-case>S</fixed-case>iamese Networks @@ -148,6 +158,7 @@ Recent work uses a Siamese Network, initialized with BioWordVec embeddings (distributed word embeddings), for predicting synonymy among biomedical terms to automate a part of the UMLS (Unified Medical Language System) Metathesaurus construction process. We evaluate the use of contextualized word embeddings extracted from nine different biomedical BERT-based models for synonym prediction in the UMLS by replacing BioWordVec embeddings with embeddings extracted from each biomedical BERT model using different feature extraction methods. Finally, we conduct a thorough grid search, which prior work lacks, to find the best set of hyperparameters. Surprisingly, we find that Siamese Networks initialized with BioWordVec embeddings still out perform the Siamese Networks initialized with embedding extracted from biomedical BERT model. 2022.insights-1.11 bajaj-etal-2022-evaluating + 10.18653/v1/2022.insights-1.11 On the Impact of Data Augmentation on Downstream Performance in Natural Language Processing @@ -160,6 +171,7 @@ 2022.insights-1.12 okimura-etal-2022-impact SST + 10.18653/v1/2022.insights-1.12 Can Question Rewriting Help Conversational Question Answering? @@ -176,6 +188,7 @@ CoQA QReCC QuAC + 10.18653/v1/2022.insights-1.13 Clustering Examples in Multi-Dataset Benchmarks with Item Response Theory @@ -191,6 +204,7 @@ MRQA SST SuperGLUE + 10.18653/v1/2022.insights-1.14 On the Limits of Evaluating Embodied Agent Model Generalization Using Validation Sets @@ -205,6 +219,7 @@ kim-etal-2022-limits AI2-THOR ALFRED + 10.18653/v1/2022.insights-1.15 Do Data-based Curricula Work? @@ -215,6 +230,7 @@ Current state-of-the-art NLP systems use large neural networks that require extensive computational resources for training. Inspired by human knowledge acquisition, researchers have proposed curriculum learning - sequencing tasks (task-based curricula) or ordering and sampling the datasets (data-based curricula) that facilitate training. This work investigates the benefits of data-based curriculum learning for large language models such as BERT and T5. We experiment with various curricula based on complexity measures and different sampling strategies. Extensive experiments on several NLP tasks show that curricula based on various complexity measures rarely have any benefits, while random sampling performs either as well or better than curricula. 2022.insights-1.16 surkov-etal-2022-data + 10.18653/v1/2022.insights-1.16 The Document Vectors Using Cosine Similarity Revisited @@ -226,6 +242,7 @@ bingyu-arefyev-2022-document bgzh/dv_cosine_revisited IMDb Movie Reviews + 10.18653/v1/2022.insights-1.17 Challenges in including extra-linguistic context in pre-trained language models @@ -236,6 +253,7 @@ To successfully account for language, computational models need to take into account both the linguistic context (the content of the utterances) and the extra-linguistic context (for instance, the participants in a dialogue). We focus on a referential task that asks models to link entity mentions in a TV show to the corresponding characters, and design an architecture that attempts to account for both kinds of context. In particular, our architecture combines a previously proposed specialized module (an “entity library”) for character representation with transfer learning from a pre-trained language model. We find that, although the model does improve linguistic contextualization, it fails to successfully integrate extra-linguistic information about the participants in the dialogue. Our work shows that it is very challenging to incorporate extra-linguistic information into pre-trained language models. 2022.insights-1.18 sorodoc-etal-2022-challenges + 10.18653/v1/2022.insights-1.18 Label Errors in <fixed-case>BANKING</fixed-case>77 @@ -245,6 +263,7 @@ We investigate potential label errors present in the popular BANKING77 dataset and the associated negative impacts on intent classification methods. Motivated by our own negative results when constructing an intent classifier, we applied two automated approaches to identify potential label errors in the dataset. We found that over 1,400 (14%) of the 10,003 training utterances may have been incorrectly labelled. In a simple experiment, we found that by removing the utterances with potential errors, our intent classifier saw an increase of 4.5% and 8% for the F1-Score and Adjusted Rand Index, respectively, in supervised and unsupervised classification. This paper serves as a warning of the potential of noisy labels in popular NLP datasets. Further study is needed to fully identify the breadth and depth of label errors in BANKING77 and other datasets. 2022.insights-1.19 ying-thomas-2022-label + 10.18653/v1/2022.insights-1.19 Pathologies of Pre-trained Language Models in Few-shot Fine-tuning @@ -259,6 +278,7 @@ chen-etal-2022-pathologies IMDb Movie Reviews SNLI + 10.18653/v1/2022.insights-1.20 An Empirical study to understand the Compositional Prowess of Neural Dialog Models @@ -273,6 +293,7 @@ vinayshekharcmu/ComposionalityOfDialogModels DailyDialog MutualFriends + 10.18653/v1/2022.insights-1.21 Combining Extraction and Generation for Constructing Belief-Consequence Causal Links @@ -283,6 +304,7 @@ In this paper, we introduce and justify a new task—causal link extraction based on beliefs—and do a qualitative analysis of the ability of a large language model—InstructGPT-3—to generate implicit consequences of beliefs. With the language model-generated consequences being promising, but not consistent, we propose directions of future work, including data collection, explicit consequence extraction using rule-based and language modeling-based approaches, and using explicitly stated consequences of beliefs to fine-tune or prompt the language model to produce outputs suitable for the task. 2022.insights-1.22 alexeeva-etal-2022-combining + 10.18653/v1/2022.insights-1.22 Replicability under Near-Perfect Conditions – A Case-Study from Automatic Summarization @@ -291,6 +313,7 @@ Replication of research results has become more and more important in Natural Language Processing. Nevertheless, we still rely on results reported in the literature for comparison. Additionally, elements of an experimental setup are not always completely reported. This includes, but is not limited to reporting specific parameters used or omitting an implementational detail. In our experiment based on two frequently used data sets from the domain of automatic summarization and the seemingly full disclosure of research artefacts, we examine how well results reported are replicable and what elements influence the success or failure of replication. Our results indicate that publishing research artifacts is far from sufficient, that that publishing all relevant parameters in all possible detail is cruicial. 2022.insights-1.23 mieskes-2022-replicability + 10.18653/v1/2022.insights-1.23 <fixed-case>BPE</fixed-case> beyond Word Boundary: How <fixed-case>NOT</fixed-case> to use Multi Word Expressions in Neural Machine Translation @@ -303,6 +326,7 @@ 2022.insights-1.24.OptionalSupplementaryData.zip kumar-thawani-2022-bpe pegasus-lynx/mwe-bpe + 10.18653/v1/2022.insights-1.24 Pre-trained language models evaluating themselves - A comparative study @@ -314,6 +338,7 @@ 2022.insights-1.25 koch-etal-2022-pre lazerlambda/metricscomparison + 10.18653/v1/2022.insights-1.25 diff --git a/data/xml/2022.iwslt.xml b/data/xml/2022.iwslt.xml index 953f53be1a..ac23672461 100644 --- a/data/xml/2022.iwslt.xml +++ b/data/xml/2022.iwslt.xml @@ -25,6 +25,7 @@ This paper addresses the problem of evaluating the quality of automatically generated subtitles, which includes not only the quality of the machine-transcribed or translated speech, but also the quality of line segmentation and subtitle timing. We propose SubER - a single novel metric based on edit distance with shifts that takes all of these subtitle properties into account. We compare it to existing metrics for evaluating transcription, translation, and subtitle quality. A careful human evaluation in a post-editing scenario shows that the new metric has a high correlation with the post-editing effort and direct human assessment scores, outperforming baseline metrics considering only the subtitle text, such as WER and BLEU, and existing methods to integrate segmentation and timing features. 2022.iwslt-1.1 wilken-etal-2022-suber + 10.18653/v1/2022.iwslt-1.1 Improving <fixed-case>A</fixed-case>rabic Diacritization by Learning to Diacritize and Translate @@ -35,6 +36,7 @@ 2022.iwslt-1.2 thompson-alshehri-2022-improving WikiMatrix + 10.18653/v1/2022.iwslt-1.2 Simultaneous Neural Machine Translation with Prefix Alignment @@ -45,6 +47,7 @@ Simultaneous translation is a task that requires starting translation before the speaker has finished speaking, so we face a trade-off between latency and accuracy. In this work, we focus on prefix-to-prefix translation and propose a method to extract alignment between bilingual prefix pairs. We use the alignment to segment a streaming input and fine-tune a translation model. The proposed method demonstrated higher BLEU than those of baselines in low latency ranges in our experiments on the IWSLT simultaneous translation benchmark. 2022.iwslt-1.3 kano-etal-2022-simultaneous + 10.18653/v1/2022.iwslt-1.3 Locality-Sensitive Hashing for Long Context Neural Machine Translation @@ -56,6 +59,7 @@ After its introduction the Transformer architecture quickly became the gold standard for the task of neural machine translation. A major advantage of the Transformer compared to previous architectures is the faster training speed achieved by complete parallelization across timesteps due to the use of attention over recurrent layers. However, this also leads to one of the biggest problems of the Transformer, namely the quadratic time and memory complexity with respect to the input length. In this work we adapt the locality-sensitive hashing approach of Kitaev et al. (2020) to self-attention in the Transformer, we extended it to cross-attention and apply this memory efficient framework to sentence- and document-level machine translation. Our experiments show that the LSH attention scheme for sentence-level comes at the cost of slightly reduced translation quality. For document-level NMT we are able to include much bigger context sizes than what is possible with the baseline Transformer. However, more context does neither improve translation quality nor improve scores on targeted test suites. 2022.iwslt-1.4 petrick-etal-2022-locality + 10.18653/v1/2022.iwslt-1.4 Anticipation-Free Training for Simultaneous Machine Translation @@ -67,6 +71,7 @@ 2022.iwslt-1.5 chang-etal-2022-anticipation george0828zhang/sinkhorn-simultrans + 10.18653/v1/2022.iwslt-1.5 Who Are We Talking About? Handling Person Names in Speech Translation @@ -79,6 +84,7 @@ gaido-etal-2022-talking hlt-mt/fbk-fairseq Europarl-ST + 10.18653/v1/2022.iwslt-1.6 Joint Generation of Captions and Subtitles with Dual Decoding @@ -93,6 +99,7 @@ xu-etal-2022-joint jitao-xu/dual-decoding MuST-Cinema + 10.18653/v1/2022.iwslt-1.7 <fixed-case>M</fixed-case>irror<fixed-case>A</fixed-case>lign: A Super Lightweight Unsupervised Word Alignment Model via Cross-Lingual Contrastive Learning @@ -104,6 +111,7 @@ Word alignment is essential for the downstream cross-lingual language understanding and generation tasks. Recently, the performance of the neural word alignment models has exceeded that of statistical models. However, they heavily rely on sophisticated translation models. In this study, we propose a super lightweight unsupervised word alignment model named MirrorAlign, in which bidirectional symmetric attention trained with a contrastive learning objective is introduced, and an agreement loss is employed to bind the attention maps, such that the alignments follow mirror-like symmetry hypothesis. Experimental results on several public benchmarks demonstrate that our model achieves competitive, if not better, performance compared to the state of the art in word alignment while significantly reducing the training and decoding time on average. Further ablation analysis and case studies show the superiority of our proposed MirrorAlign. Notably, we recognize our model as a pioneer attempt to unify bilingual word embedding and word alignments. Encouragingly, our approach achieves 16.4X speedup against GIZA++, and 50X parameter compression compared with the Transformer-based alignment methods. We release our code to facilitate the community: https://github.com/moore3930/MirrorAlign. 2022.iwslt-1.8 wu-etal-2022-mirroralign + 10.18653/v1/2022.iwslt-1.8 On the Impact of Noises in Crowd-Sourced Data for Speech Translation @@ -116,6 +124,7 @@ ouyang-etal-2022-impact owaski/must-c-clean MuST-C + 10.18653/v1/2022.iwslt-1.9 Findings of the <fixed-case>IWSLT</fixed-case> 2022 Evaluation Campaign @@ -172,6 +181,7 @@ LibriSpeech MuST-C VoxPopuli + 10.18653/v1/2022.iwslt-1.10 The <fixed-case>Y</fixed-case>i<fixed-case>T</fixed-case>rans Speech Translation System for <fixed-case>IWSLT</fixed-case> 2022 Offline Shared Task @@ -185,6 +195,7 @@ MuST-C OpenSubtitles VoxPopuli + 10.18653/v1/2022.iwslt-1.11 <fixed-case>A</fixed-case>mazon <fixed-case>A</fixed-case>lexa <fixed-case>AI</fixed-case>’s System for <fixed-case>IWSLT</fixed-case> 2022 Offline Speech Translation Shared Task @@ -199,6 +210,7 @@ Europarl-ST LibriSpeech MuST-C + 10.18653/v1/2022.iwslt-1.12 Efficient yet Competitive Speech Translation: <fixed-case>FBK</fixed-case>@<fixed-case>IWSLT</fixed-case>2022 @@ -213,6 +225,7 @@ 2022.iwslt-1.13 gaido-etal-2022-efficient hlt-mt/fbk-fairseq + 10.18653/v1/2022.iwslt-1.13 Effective combination of pretrained models - <fixed-case>KIT</fixed-case>@<fixed-case>IWSLT</fixed-case>2022 @@ -230,6 +243,7 @@ How2 LibriSpeech MuST-C + 10.18653/v1/2022.iwslt-1.14 The <fixed-case>USTC</fixed-case>-<fixed-case>NELSLIP</fixed-case> Offline Speech Translation Systems for <fixed-case>IWSLT</fixed-case> 2022 @@ -250,6 +264,7 @@ This paper describes USTC-NELSLIP’s submissions to the IWSLT 2022 Offline Speech Translation task, including speech translation of talks from English to German, English to Chinese and English to Japanese. We describe both cascaded architectures and end-to-end models which can directly translate source speech into target text. In the cascaded condition, we investigate the effectiveness of different model architectures with robust training and achieve 2.72 BLEU improvements over last year’s optimal system on MuST-C English-German test set. In the end-to-end condition, we build models based on Transformer and Conformer architectures, achieving 2.26 BLEU improvements over last year’s optimal end-to-end system. The end-to-end system has obtained promising results, but it is still lagging behind our cascaded models. 2022.iwslt-1.15 zhang-etal-2022-ustc + 10.18653/v1/2022.iwslt-1.15 The <fixed-case>AISP</fixed-case>-<fixed-case>SJTU</fixed-case> Simultaneous Translation System for <fixed-case>IWSLT</fixed-case> 2022 @@ -266,6 +281,7 @@ This paper describes AISP-SJTU’s submissions for the IWSLT 2022 Simultaneous Translation task. We participate in the text-to-text and speech-to-text simultaneous translation from English to Mandarin Chinese. The training of the CAAT is improved by training across multiple values of right context window size, which achieves good online performance without setting a prior right context window size for training. For speech-to-text task, the best model we submitted achieves 25.87, 26.21, 26.45 BLEU in low, medium and high regimes on tst-COMMON, corresponding to 27.94, 28.31, 28.43 BLEU in text-to-text task. 2022.iwslt-1.16 zhu-etal-2022-aisp + 10.18653/v1/2022.iwslt-1.16 The Xiaomi Text-to-Text Simultaneous Speech Translation System for <fixed-case>IWSLT</fixed-case> 2022 @@ -282,6 +298,7 @@ This system paper describes the Xiaomi Translation System for the IWSLT 2022 Simultaneous Speech Translation (noted as SST) shared task. We participate in the English-to-Mandarin Chinese Text-to-Text (noted as T2T) track. Our system is built based on the Transformer model with novel techniques borrowed from our recent research work. For the data filtering, language-model-based and rule-based methods are conducted to filter the data to obtain high-quality bilingual parallel corpora. We also strengthen our system with some dominating techniques related to data augmentation, such as knowledge distillation, tagged back-translation, and iterative back-translation. We also incorporate novel training techniques such as R-drop, deep model, and large batch training which have been shown to be beneficial to the naive Transformer model. In the SST scenario, several variations of exttt{wait-k} strategies are explored. Furthermore, in terms of robustness, both data-based and model-based ways are used to reduce the sensitivity of our system to Automatic Speech Recognition (ASR) outputs. We finally design some inference algorithms and use the adaptive-ensemble method based on multiple model variants to further improve the performance of the system. Compared with strong baselines, fusing all techniques can improve our system by 2 extasciitilde3 BLEU scores under different latency regimes. 2022.iwslt-1.17 guo-etal-2022-xiaomi + 10.18653/v1/2022.iwslt-1.17 <fixed-case>NVIDIA</fixed-case> <fixed-case>N</fixed-case>e<fixed-case>M</fixed-case>o Offline Speech Translation Systems for <fixed-case>IWSLT</fixed-case> 2022 @@ -299,6 +316,7 @@ Europarl-ST LibriSpeech VoxPopuli + 10.18653/v1/2022.iwslt-1.18 The <fixed-case>N</fixed-case>iu<fixed-case>T</fixed-case>rans’s Submission to the <fixed-case>IWSLT</fixed-case>22 <fixed-case>E</fixed-case>nglish-to-<fixed-case>C</fixed-case>hinese Offline Speech Translation Task @@ -314,6 +332,7 @@ This paper describes NiuTrans’s submission to the IWSLT22 English-to-Chinese (En-Zh) offline speech translation task. The end-to-end and bilingual system is built by constrained English and Chinese data and translates the English speech to Chinese text without intermediate transcription. Our speech translation models are composed of different pre-trained acoustic models and machine translation models by two kinds of adapters. We compared the effect of the standard speech feature (e.g. log Mel-filterbank) and the pre-training speech feature and try to make them interact. The final submission is an ensemble of three potential speech translation models. Our single best and ensemble model achieves 18.66 BLEU and 19.35 BLEU separately on MuST-C En-Zh tst-COMMON set. 2022.iwslt-1.19 zhang-etal-2022-niutranss + 10.18653/v1/2022.iwslt-1.19 The <fixed-case>HW</fixed-case>-<fixed-case>TSC</fixed-case>’s Offline Speech Translation System for <fixed-case>IWSLT</fixed-case> 2022 Evaluation @@ -335,6 +354,7 @@ wang-etal-2022-hw LibriSpeech TED-LIUM 3 + 10.18653/v1/2022.iwslt-1.20 The <fixed-case>HW</fixed-case>-<fixed-case>TSC</fixed-case>’s Simultaneous Speech Translation System for <fixed-case>IWSLT</fixed-case> 2022 Evaluation @@ -356,6 +376,7 @@ wang-etal-2022-hw-tscs LibriSpeech TED-LIUM 3 + 10.18653/v1/2022.iwslt-1.21 <fixed-case>MLLP</fixed-case>-<fixed-case>VRAIN</fixed-case> <fixed-case>UPV</fixed-case> systems for the <fixed-case>IWSLT</fixed-case> 2022 Simultaneous Speech Translation and Speech-to-Speech Translation tasks @@ -376,6 +397,7 @@ Europarl-ST MuST-C OpenSubtitles + 10.18653/v1/2022.iwslt-1.22 Pretrained Speech Encoders and Efficient Fine-tuning Methods for Speech Translation: <fixed-case>UPC</fixed-case> at <fixed-case>IWSLT</fixed-case> 2022 @@ -390,6 +412,7 @@ tsiamas-etal-2022-pretrained Europarl-ST MuST-C + 10.18653/v1/2022.iwslt-1.23 <fixed-case>CUNI</fixed-case>-<fixed-case>KIT</fixed-case> System for Simultaneous Speech Translation Task at <fixed-case>IWSLT</fixed-case> 2022 @@ -405,6 +428,7 @@ In this paper, we describe our submission to the Simultaneous Speech Translation at IWSLT 2022. We explore strategies to utilize an offline model in a simultaneous setting without the need to modify the original model. In our experiments, we show that our onlinization algorithm is almost on par with the offline setting while being 3x faster than offline in terms of latency on the test set. We also show that the onlinized offline model outperforms the best IWSLT2021 simultaneous system in medium and high latency regimes and is almost on par in the low latency regime. We make our system publicly available. 2022.iwslt-1.24 polak-etal-2022-cuni + 10.18653/v1/2022.iwslt-1.24 <fixed-case>NAIST</fixed-case> Simultaneous Speech-to-Text Translation System for <fixed-case>IWSLT</fixed-case> 2022 @@ -421,6 +445,7 @@ 2022.iwslt-1.25 fukuda-etal-2022-naist MuST-C + 10.18653/v1/2022.iwslt-1.25 The <fixed-case>HW</fixed-case>-<fixed-case>TSC</fixed-case>’s Speech to Speech Translation System for <fixed-case>IWSLT</fixed-case> 2022 Evaluation @@ -442,6 +467,7 @@ guo-etal-2022-hw LibriSpeech TED-LIUM 3 + 10.18653/v1/2022.iwslt-1.26 <fixed-case>CMU</fixed-case>’s <fixed-case>IWSLT</fixed-case> 2022 Dialect Speech Translation System @@ -458,6 +484,7 @@ This paper describes CMU’s submissions to the IWSLT 2022 dialect speech translation (ST) shared task for translating Tunisian-Arabic speech to English text. We use additional paired Modern Standard Arabic data (MSA) to directly improve the speech recognition (ASR) and machine translation (MT) components of our cascaded systems. We also augment the paired ASR data with pseudo translations via sequence-level knowledge distillation from an MT model and use these artificial triplet ST data to improve our end-to-end (E2E) systems. Our E2E models are based on the Multi-Decoder architecture with searchable hidden intermediates. We extend the Multi-Decoder by orienting the speech encoder towards the target language by applying ST supervision as hierarchical connectionist temporal classification (CTC) multi-task. During inference, we apply joint decoding of the ST CTC and ST autoregressive decoder branches of our modified Multi-Decoder. Finally, we apply ROVER voting, posterior combination, and minimum bayes-risk decoding with combined N-best lists to ensemble our various cascaded and E2E systems. Our best systems reached 20.8 and 19.5 BLEU on test2 (blind) and test1 respectively. Without any additional MSA data, we reached 20.4 and 19.2 on the same test sets. 2022.iwslt-1.27 yan-etal-2022-cmus + 10.18653/v1/2022.iwslt-1.27 <fixed-case>ON</fixed-case>-<fixed-case>TRAC</fixed-case> Consortium Systems for the <fixed-case>IWSLT</fixed-case> 2022 Dialect and Low-resource Speech Translation Tasks @@ -476,6 +503,7 @@ This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2022: low-resource and dialect speech translation. For the Tunisian Arabic-English dataset (low-resource and dialect tracks), we build an end-to-end model as our joint primary submission, and compare it against cascaded models that leverage a large fine-tuned wav2vec 2.0 model for ASR. Our results show that in our settings pipeline approaches are still very competitive, and that with the use of transfer learning, they can outperform end-to-end models for speech translation (ST). For the Tamasheq-French dataset (low-resource track) our primary submission leverages intermediate representations from a wav2vec 2.0 model trained on 234 hours of Tamasheq audio, while our contrastive model uses a French phonetic transcription of the Tamasheq audio as input in a Conformer speech translation architecture jointly trained on automatic speech recognition, ST and machine translation losses. Our results highlight that self-supervised models trained on smaller sets of target data are more effective to low-resource end-to-end ST fine-tuning, compared to large off-the-shelf models. Results also illustrate that even approximate phonetic transcriptions can improve ST scores. 2022.iwslt-1.28 zanon-boito-etal-2022-trac + 10.18653/v1/2022.iwslt-1.28 <fixed-case>JHU</fixed-case> <fixed-case>IWSLT</fixed-case> 2022 Dialect Speech Translation System Description @@ -487,6 +515,7 @@ This paper details the Johns Hopkins speech translation (ST) system used in the IWLST2022 dialect speech translation task. Our system uses a cascade of automatic speech recognition (ASR) and machine translation (MT). We use a Conformer model for ASR systems and a Transformer model for machine translation. Surprisingly, we found that while using additional ASR training data resulted in only a negligible change in performance as measured by BLEU or word error rate (WER), aggressive text normalization improved BLEU more significantly. We also describe an approach, similar to back-translation, for improving performance using synthetic dialectal source text produced from source sentences in mismatched dialects. 2022.iwslt-1.29 yang-etal-2022-jhu + 10.18653/v1/2022.iwslt-1.29 Controlling Translation Formality Using Pre-trained Multilingual Language Models @@ -499,6 +528,7 @@ rippeth-etal-2022-controlling CCMatrix ParaCrawl + 10.18653/v1/2022.iwslt-1.30 Controlling Formality in Low-Resource <fixed-case>NMT</fixed-case> with Domain Adaptation and Re-Ranking: <fixed-case>SLT</fixed-case>-<fixed-case>CDT</fixed-case>-<fixed-case>U</fixed-case>o<fixed-case>S</fixed-case> at <fixed-case>IWSLT</fixed-case>2022 @@ -512,6 +542,7 @@ MuST-C ParaCrawl WikiMatrix + 10.18653/v1/2022.iwslt-1.31 Improving Machine Translation Formality Control with Weakly-Labelled Data Augmentation and Post Editing Strategies @@ -524,6 +555,7 @@ This paper describes Amazon Alexa AI’s implementation for the IWSLT 2022 shared task on formality control. We focus on the unconstrained and supervised task for en→hi (Hindi) and en→ja (Japanese) pairs where very limited formality annotated data is available. We propose three simple yet effective post editing strategies namely, T-V conversion, utilizing a verb conjugator and seq2seq models in order to rewrite the translated phrases into formal or informal language. Considering nuances for formality and informality in different languages, our analysis shows that a language-specific post editing strategy achieves the best performance. To address the unique challenge of limited formality annotations, we further develop a formality classifier to perform weakly labelled data augmentation which automatically generates synthetic formality labels from large parallel corpus. Empirical results on the IWSLT formality testset have shown that proposed system achieved significant improvements in terms of formality accuracy while retaining BLEU score on-par with baseline. 2022.iwslt-1.32 zhang-etal-2022-improving-machine + 10.18653/v1/2022.iwslt-1.32 <fixed-case>HW</fixed-case>-<fixed-case>TSC</fixed-case>’s Participation in the <fixed-case>IWSLT</fixed-case> 2022 Isometric Spoken Language Translation @@ -543,6 +575,7 @@ This paper presents our submissions to the IWSLT 2022 Isometric Spoken Language Translation task. We participate in all three language pairs (English-German, English-French, English-Spanish) under the constrained setting, and submit an English-German result under the unconstrained setting. We use the standard Transformer model as the baseline and obtain the best performance via one of its variants that shares the decoder input and output embedding. We perform detailed pre-processing and filtering on the provided bilingual data. Several strategies are used to train our models, such as Multilingual Translation, Back Translation, Forward Translation, R-Drop, Average Checkpoint, and Ensemble. We investigate three methods for biasing the output length: i) conditioning the output to a given target-source length-ratio class; ii) enriching the transformer positional embedding with length information and iii) length control decoding for non-autoregressive translation etc. Our submissions achieve 30.7, 41.6 and 36.7 BLEU respectively on the tst-COMMON test sets for English-German, English-French, English-Spanish tasks and 100% comply with the length requirements. 2022.iwslt-1.33 li-etal-2022-hw + 10.18653/v1/2022.iwslt-1.33 <fixed-case>A</fixed-case>pp<fixed-case>T</fixed-case>ek’s Submission to the <fixed-case>IWSLT</fixed-case> 2022 Isometric Spoken Language Translation Task @@ -553,6 +586,7 @@ 2022.iwslt-1.34 wilken-matusov-2022-appteks MuST-C + 10.18653/v1/2022.iwslt-1.34 Hierarchical Multi-task learning framework for Isometric-Speech Language Translation @@ -567,6 +601,7 @@ aakash0017/machine-translation-iswlt MuST-C PAWS-X + 10.18653/v1/2022.iwslt-1.35 diff --git a/data/xml/2022.lchange.xml b/data/xml/2022.lchange.xml index a9ccee8d26..1853ca666f 100644 --- a/data/xml/2022.lchange.xml +++ b/data/xml/2022.lchange.xml @@ -44,6 +44,7 @@ We present a benchmark in six European languages containing manually annotated information about olfactory situations and events following a FrameNet-like approach. The documents selection covers ten domains of interest to cultural historians in the olfactory domain and includes texts published between 1620 to 1920, allowing a diachronic analysis of smell descriptions. With this work, we aim to foster the development of olfactory information extraction approaches as well as the analysis of changes in smell descriptions over time. 2022.lchange-1.1 menini-etal-2022-multilingual + 10.18653/v1/2022.lchange-1.1 Language Acquisition, Neutral Change, and Diachronic Trends in Noun Classifiers @@ -54,6 +55,7 @@ 2022.lchange-1.2 kali-kodner-2022-language an-k45/classifier-change + 10.18653/v1/2022.lchange-1.2 Deconstructing destruction: A Cognitive Linguistics perspective on a computational analysis of diachronic change @@ -64,6 +66,7 @@ In this paper, we aim to introduce a Cognitive Linguistics perspective into a computational analysis of near-synonyms. We focus on a single set of Dutch near-synonyms, vernielen and vernietigen, roughly translated as ‘to destroy’, replicating the analysis from Geeraerts (1997) with distributional models. Our analysis, which tracks the meaning of both words in a corpus of 16th-20th century prose data, shows that both lexical items have undergone semantic change, led by differences in their prototypical semantic core. 2022.lchange-1.3 franco-etal-2022-deconstructing + 10.18653/v1/2022.lchange-1.3 What is Done is Done: an Incremental Approach to Semantic Shift Detection @@ -75,6 +78,7 @@ Contextual word embedding techniques for semantic shift detection are receiving more and more attention. In this paper, we present What is Done is Done (WiDiD), an incremental approach to semantic shift detection based on incremental clustering techniques and contextual embedding methods to capture the changes over the meanings of a target word along a diachronic corpus. In WiDiD, the word contexts observed in the past are consolidated as a set of clusters that constitute the “memory” of the word meanings observed so far. Such a memory is exploited as a basis for subsequent word observations, so that the meanings observed in the present are stratified over the past ones. 2022.lchange-1.4 periti-etal-2022-done + 10.18653/v1/2022.lchange-1.4 From qualifiers to quantifiers: semantic shift at the paradigm level @@ -83,6 +87,7 @@ Language change has often been conceived as a competition between linguistic variants. However, language units may be complex organizations in themselves, e.g. in the case of schematic constructions, featuring a free slot. Such a slot is filled by words forming a set or ‘paradigm’ and engaging in inter-related dynamics within this constructional environment. To tackle this complexity, a simple computational method is offered to automatically characterize their interactions, and visualize them through networks of cooperation and competition. Applying this method to the French paradigm of quantifiers, I show that this method efficiently captures phenomena regarding the evolving organization of constructional paradigms, in particular the constitution of competing clusters of fillers that promote different semantic strategies overall. 2022.lchange-1.5 feltgen-2022-qualifiers + 10.18653/v1/2022.lchange-1.5 Do Not Fire the Linguist: Grammatical Profiles Help Language Models Detect Semantic Change @@ -93,6 +98,7 @@ Morphological and syntactic changes in word usage — as captured, e.g., by grammatical profiles — have been shown to be good predictors of a word’s meaning change. In this work, we explore whether large pre-trained contextualised language models, a common tool for lexical semantic change detection, are sensitive to such morphosyntactic changes. To this end, we first compare the performance of grammatical profiles against that of a multilingual neural language model (XLM-R) on 10 datasets, covering 7 languages, and then combine the two approaches in ensembles to assess their complementarity. Our results show that ensembling grammatical profiles with XLM-R improves semantic change detection performance for most datasets and languages. This indicates that language models do not fully cover the fine-grained morphological and syntactic signals that are explicitly represented in grammatical profiles. An interesting exception are the test sets where the time spans under analysis are much longer than the time gap between them (for example, century-long spans with a one-year gap between them). Morphosyntactic change is slow so grammatical profiles do not detect in such cases. In contrast, language models, thanks to their access to lexical information, are able to detect fast topical changes. 2022.lchange-1.6 giulianelli-etal-2022-fire + 10.18653/v1/2022.lchange-1.6 Explainable Publication Year Prediction of Eighteenth Century Texts with the <fixed-case>BERT</fixed-case> Model @@ -109,6 +115,7 @@ In this paper, we describe a BERT model trained on the Eighteenth Century Collections Online (ECCO) dataset of digitized documents. The ECCO dataset poses unique modelling challenges due to the presence of Optical Character Recognition (OCR) artifacts. We establish the performance of the BERT model on a publication year prediction task against linear baseline models and human judgement, finding the BERT model to be superior to both and able to date the works, on average, with less than 7 years absolute error. We also explore how language change over time affects the model by analyzing the features the model uses for publication year predictions as given by the Integrated Gradients model explanation method. 2022.lchange-1.7 rastas-etal-2022-explainable + 10.18653/v1/2022.lchange-1.7 Using Cross-Lingual Part of Speech Tagging for Partially Reconstructing the Classic Language Family Tree Model @@ -119,6 +126,7 @@ The tree model is well known for expressing the historic evolution of languages. This model has been considered as a method of describing genetic relationships between languages. Nevertheless, some researchers question the model’s ability to predict the proximity between two languages, since it represents genetic relatedness rather than linguistic resemblance. Defining other language proximity models has been an active research area for many years. In this paper we explore a part-of-speech model for defining proximity between languages using a multilingual language model that was fine-tuned on the task of cross-lingual part-of-speech tagging. We train the model on one language and evaluate it on another; the measured performance is then used to define the proximity between the two languages. By further developing the model, we show that it can reconstruct some parts of the tree model. 2022.lchange-1.8 samohi-etal-2022-using + 10.18653/v1/2022.lchange-1.8 A New Framework for Fast Automated Phonological Reconstruction Using Trimmed Alignments and Sound Correspondence Patterns @@ -130,6 +138,7 @@ 2022.lchange-1.9 list-etal-2022-new lingpy/supervised-reconstruction-paper + 10.18653/v1/2022.lchange-1.9 Caveats of Measuring Semantic Change of Cognates and Borrowings using Multilingual Word Embeddings @@ -140,6 +149,7 @@ 2022.lchange-1.10 fourrier-montariol-2022-caveats clefourrier/historical-semantic-change + 10.18653/v1/2022.lchange-1.10 Lexicon of Changes: Towards the Evaluation of Diachronic Semantic Shift in <fixed-case>C</fixed-case>hinese @@ -150,6 +160,7 @@ Recent research has brought a wind of using computational approaches to the classic topic of semantic change, aiming to tackle one of the most challenging issues in the evolution of human language. While several methods for detecting semantic change have been proposed, such studies are limited to a few languages, where evaluation datasets are available. This paper presents the first dataset for evaluating Chinese semantic change in contexts preceding and following the Reform and Opening-up, covering a 50-year period in Modern Chinese. Following the DURel framework, we collected 6,000 human judgments for the dataset. We also reported the performance of alignment-based word embedding models on this evaluation dataset, achieving high and significant correlation scores. 2022.lchange-1.11 chen-etal-2022-lexicon + 10.18653/v1/2022.lchange-1.11 Low <fixed-case>S</fixed-case>axon dialect distances at the orthographic and syntactic level @@ -160,6 +171,7 @@ We compare five Low Saxon dialects from the 19th and 21st century from Germany and the Netherlands with each other as well as with modern Standard Dutch and Standard German. Our comparison is based on character n-grams on the one hand and PoS n-grams on the other and we show that these two lead to different distances. Particularly in the PoS-based distances, one can observe all of the 21st century Low Saxon dialects shifting towards the modern majority languages. 2022.lchange-1.12 siewert-etal-2022-low + 10.18653/v1/2022.lchange-1.12 “Vaderland”, “Volk” and “Natie”: Semantic Change Related to Nationalism in <fixed-case>D</fixed-case>utch Literature Between 1700 and 1880 Captured with Dynamic <fixed-case>B</fixed-case>ernoulli Word Embeddings @@ -170,6 +182,7 @@ Languages can respond to external events in various ways - the creation of new words or named entities, additional senses might develop for already existing words or the valence of words can change. In this work, we explore the semantic shift of the Dutch words “natie” (“nation”), “volk” (“people”) and “vaderland” (“fatherland”) over a period that is known for the rise of nationalism in Europe: 1700-1880. The semantic change is measured by means of Dynamic Bernoulli Word Embeddings which allow for comparison between word embeddings over different time slices. The word embeddings were generated based on Dutch fiction literature divided over different decades. From the analysis of the absolute drifts, it appears that the word “natie” underwent a relatively small drift. However, the drifts of “vaderland’” and “volk”’ show multiple peaks, culminating around the turn of the nineteenth century. To verify whether this semantic change can indeed be attributed to nationalistic movements, a detailed analysis of the nearest neighbours of the target words is provided. From the analysis, it appears that “natie”, “volk” and “vaderlan”’ became more nationalistically-loaded over time. 2022.lchange-1.13 timmermans-etal-2022-vaderland + 10.18653/v1/2022.lchange-1.13 Using neural topic models to track context shifts of words: a case study of <fixed-case>COVID</fixed-case>-related terms before and after the lockdown in <fixed-case>A</fixed-case>pril 2020 @@ -179,6 +192,7 @@ This paper explores lexical meaning changes in a new dataset, which includes tweets from before and after the COVID-related lockdown in April 2020. We use this dataset to evaluate traditional and more recent unsupervised approaches to lexical semantic change that make use of contextualized word representations based on the BERT neural language model to obtain representations of word usages. We argue that previous models that encode local representations of words cannot capture global context shifts such as the context shift of face masks since the pandemic outbreak. We experiment with neural topic models to track context shifts of words. We show that this approach can reveal textual associations of words that go beyond their lexical meaning representation. We discuss future work and how to proceed capturing the pragmatic aspect of meaning change as opposed to lexical semantic change. 2022.lchange-1.14 kellert-mahmud-uz-zaman-2022-using + 10.18653/v1/2022.lchange-1.14 Roadblocks in Gender Bias Measurement for Diachronic Corpora @@ -192,6 +206,7 @@ 2022.lchange-1.15 alshahrani-etal-2022-roadblocks clarkson-accountability-transparency/gbiasroadblocks + 10.18653/v1/2022.lchange-1.15 <fixed-case>LSCD</fixed-case>iscovery: A shared task on semantic change discovery and detection in <fixed-case>S</fixed-case>panish @@ -202,6 +217,7 @@ We present the first shared task on semantic change discovery and detection in Spanish. We create the first dataset of Spanish words manually annotated by semantic change using the DURel framewok (Schlechtweg et al., 2018). The task is divided in two phases: 1) graded change discovery, and 2) binary change detection. In addition to introducing a new language for this task, the main novelty with respect to the previous tasks consists in predicting and evaluating changes for all vocabulary words in the corpus. Six teams participated in phase 1 and seven teams in phase 2 of the shared task, and the best system obtained a Spearman rank correlation of 0.735 for phase 1 and an F1 score of 0.735 for phase 2. We describe the systems developed by the competing teams, highlighting the techniques that were particularly useful. 2022.lchange-1.16 d-zamora-reina-etal-2022-black + 10.18653/v1/2022.lchange-1.16 <fixed-case>BOS</fixed-case> at <fixed-case>LSCD</fixed-case>iscovery: Lexical Substitution for Interpretable Lexical Semantic Change Detection @@ -211,6 +227,7 @@ We propose a solution for the LSCDiscovery shared task on Lexical Semantic Change Detection in Spanish. Our approach is based on generating lexical substitutes that describe old and new senses of a given word. This approach achieves the second best result in sense loss and sense gain detection subtasks. By observing those substitutes that are specific for only one time period, one can understand which senses were obtained or lost. This allows providing more detailed information about semantic change to the user and makes our method interpretable. 2022.lchange-1.17 kudisov-arefyev-2022-black + 10.18653/v1/2022.lchange-1.17 <fixed-case>D</fixed-case>eep<fixed-case>M</fixed-case>istake at <fixed-case>LSCD</fixed-case>iscovery: Can a Multilingual Word-in-Context Model Replace Human Annotators? @@ -220,6 +237,7 @@ In this paper we describe our solution of the LSCDiscovery shared task on Lexical Semantic Change Discovery (LSCD) in Spanish. Our solution employs a Word-in-Context (WiC) model, which is trained to determine if a particular word has the same meaning in two given contexts. We basically try to replicate the annotation of the dataset for the shared task, but replacing human annotators with a neural network. In the graded change discovery subtask, our solution has achieved the 2nd best result according to all metrics. In the main binary change detection subtask, our F1-score is 0.655 compared to 0.716 of the best submission, corresponding to the 5th place. However, in the optional sense gain detection subtask we have outperformed all other participants. During the post-evaluation experiments we compared different ways to prepare WiC data in Spanish for fine-tuning. We have found that it helps leaving only examples annotated as 1 (unrelated senses) and 4 (identical senses) rather than using 2x more examples including intermediate annotations. 2022.lchange-1.18 homskiy-arefyev-2022-black + 10.18653/v1/2022.lchange-1.18 <fixed-case>UA</fixed-case>lberta at <fixed-case>LSCD</fixed-case>iscovery: Lexical Semantic Change Detection via Word Sense Disambiguation @@ -230,6 +248,7 @@ We describe our two systems for the shared task on Lexical Semantic Change Discovery in Spanish. For binary change detection, we frame the task as a word sense disambiguation (WSD) problem. We derive sense frequency distributions for target words in both old and modern corpora. We assume that the word semantics have changed if a sense is observed in only one of the two corpora, or the relative change for any sense exceeds a tuned threshold. For graded change discovery, we follow the design of CIRCE (Pömsl and Lyapin, 2020) by combining both static and contextual embeddings. For contextual embeddings, we use XLM-RoBERTa instead of BERT, and train the model to predict a masked token instead of the time period. Our language-independent methods achieve results that are close to the best-performing systems in the shared task. 2022.lchange-1.19 teodorescu-etal-2022-black + 10.18653/v1/2022.lchange-1.19 <fixed-case>C</fixed-case>o<fixed-case>T</fixed-case>o<fixed-case>H</fixed-case>i<fixed-case>L</fixed-case>i at <fixed-case>LSCD</fixed-case>iscovery: the Role of Linguistic Features in Predicting Semantic Change @@ -243,6 +262,7 @@ This paper presents the contributions of the CoToHiLi team for the LSCDiscovery shared task on semantic change in the Spanish language. We participated in both tasks (graded discovery and binary change, including sense gain and sense loss) and proposed models based on word embedding distances combined with hand-crafted linguistic features, including polysemy, number of neological synonyms, and relation to cognates in English. We find that models that include linguistically informed features combined using weights assigned manually by experts lead to promising results. 2022.lchange-1.20 sabina-uban-etal-2022-black + 10.18653/v1/2022.lchange-1.20 <fixed-case>HSE</fixed-case> at <fixed-case>LSCD</fixed-case>iscovery in <fixed-case>S</fixed-case>panish: Clustering and Profiling for Lexical Semantic Change Discovery @@ -256,6 +276,7 @@ kashleva-etal-2022-black Various fixes throughout the paper. + 10.18653/v1/2022.lchange-1.21 <fixed-case>G</fixed-case>loss<fixed-case>R</fixed-case>eader at <fixed-case>LSCD</fixed-case>iscovery: Train to Select a Proper Gloss in <fixed-case>E</fixed-case>nglish – Discover Lexical Semantic Change in <fixed-case>S</fixed-case>panish @@ -265,6 +286,7 @@ The contextualized embeddings obtained from neural networks pre-trained as Language Models (LM) or Masked Language Models (MLM) are not well suitable for solving the Lexical Semantic Change Detection (LSCD) task because they are more sensitive to changes in word forms rather than word meaning, a property previously known as the word form bias or orthographic bias. Unlike many other NLP tasks, it is also not obvious how to fine-tune such models for LSCD. In order to conclude if there are any differences between senses of a particular word in two corpora, a human annotator or a system shall analyze many examples containing this word from both corpora. This makes annotation of LSCD datasets very labour-consuming. The existing LSCD datasets contain up to 100 words that are labeled according to their semantic change, which is hardly enough for fine-tuning. To solve these problems we fine-tune the XLM-R MLM as part of a gloss-based WSD system on a large WSD dataset in English. Then we employ zero-shot cross-lingual transferability of XLM-R to build the contextualized embeddings for examples in Spanish. In order to obtain the graded change score for each word, we calculate the average distance between our improved contextualized embeddings of its old and new occurrences. For the binary change detection subtask, we apply thresholding to the same scores. Our solution has shown the best results among all other participants in all subtasks except for the optional sense gain detection subtask. 2022.lchange-1.22 rachinskiy-arefyev-2022-black + 10.18653/v1/2022.lchange-1.22 diff --git a/data/xml/2022.lnls.xml b/data/xml/2022.lnls.xml index d02d8b1635..da6435ffd7 100644 --- a/data/xml/2022.lnls.xml +++ b/data/xml/2022.lnls.xml @@ -27,6 +27,7 @@ 2022.lnls-1.1 ri-etal-2022-finding ALFRED + 10.18653/v1/2022.lnls-1.1 <fixed-case>G</fixed-case>rammar<fixed-case>SHAP</fixed-case>: An Efficient Model-Agnostic and Structure-Aware <fixed-case>NLP</fixed-case> Explainer @@ -41,6 +42,7 @@ mosca-etal-2022-grammarshap IMDb Movie Reviews SST + 10.18653/v1/2022.lnls-1.2 Single-Turn Debate Does Not Help Humans Answer Hard Reading-Comprehension Questions @@ -55,6 +57,7 @@ Current QA systems can generate reasonable-sounding yet false answers without explanation or evidence for the generated answer, which is especially problematic when humans cannot readily check the model’s answers. This presents a challenge for building trust in machine learning systems. We take inspiration from real-world situations where difficult questions are answered by considering opposing sides (see Irving et al., 2018). For multiple-choice QA examples, we build a dataset of single arguments for both a correct and incorrect answer option in a debate-style set-up as an initial step in training models to produce explanations for two candidate answers. We use long contexts—humans familiar with the context write convincing explanations for pre-selected correct and incorrect answers, and we test if those explanations allow humans who have not read the full context to more accurately determine the correct answer. We do not find that explanations in our set-up improve human accuracy, but a baseline condition shows that providing human-selected text snippets does improve accuracy. We use these findings to suggest ways of improving the debate set up for future data collection efforts. 2022.lnls-1.3 parrish-etal-2022-single + 10.18653/v1/2022.lnls-1.3 When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data @@ -68,6 +71,7 @@ SNLI TACRED e-SNLI + 10.18653/v1/2022.lnls-1.4 A survey on improving <fixed-case>NLP</fixed-case> models with human explanations @@ -78,6 +82,7 @@ 2022.lnls-1.5 hartmann-sonntag-2022-survey e-SNLI + 10.18653/v1/2022.lnls-1.5 diff --git a/data/xml/2022.ltedi.xml b/data/xml/2022.ltedi.xml index 9ccbdc425b..ba44e72a92 100644 --- a/data/xml/2022.ltedi.xml +++ b/data/xml/2022.ltedi.xml @@ -27,6 +27,7 @@ 2022.ltedi-1.1 markl-2022-mind Common Voice + 10.18653/v1/2022.ltedi-1.1 Regex in a Time of Deep Learning: The Role of an Old Technology in Age Discrimination Detection in Job Advertisements @@ -37,6 +38,7 @@ Deep learning holds great promise for detecting discriminatory language in the public sphere. However, for the detection of illegal age discrimination in job advertisements, regex approaches are still strong performers. In this paper, we investigate job advertisements in the Netherlands. We present a qualitative analysis of the benefits of the ‘old’ approach based on regexes and investigate how neural embeddings could address its limitations. 2022.ltedi-1.2 pillar-etal-2022-regex + 10.18653/v1/2022.ltedi-1.2 Doing not Being: Concrete Language as a Bridge from Language Technology to Ethnically Inclusive Job Ads @@ -48,6 +50,7 @@ This paper makes the case for studying concreteness in language as a bridge that will allow language technology to support the understanding and improvement of ethnic inclusivity in job advertisements. We propose an annotation scheme that guides the assignment of sentences in job ads to classes that reflect concrete actions, i.e., what the employer needs people to do, and abstract dispositions, i.e., who the employer expects people to be. Using an annotated dataset of Dutch-language job ads, we demonstrate that machine learning technology is effectively able to distinguish these classes. 2022.ltedi-1.3 adams-etal-2022-concrete + 10.18653/v1/2022.ltedi-1.3 Measuring Harmful Sentence Completion in Language Models for <fixed-case>LGBTQIA</fixed-case>+ Individuals @@ -61,6 +64,7 @@ nozza-etal-2022-measuring milanlproc/honest HONEST + 10.18653/v1/2022.ltedi-1.4 Using <fixed-case>BERT</fixed-case> Embeddings to Model Word Importance in Conversational Transcripts for Deaf and Hard of Hearing Users @@ -72,6 +76,7 @@ Deaf and hard of hearing individuals regularly rely on captioning while watching live TV. Live TV captioning is evaluated by regulatory agencies using various caption evaluation metrics. However, caption evaluation metrics are often not informed by preferences of DHH users or how meaningful the captions are. There is a need to construct caption evaluation metrics that take the relative importance of words in transcript into account. We conducted correlation analysis between two types of word embeddings and human-annotated labelled word-importance scores in existing corpus. We found that normalized contextualized word embeddings generated using BERT correlated better with manually annotated importance scores than word2vec-based word embeddings. We make available a pairing of word embeddings and their human-annotated importance scores. We also provide proof-of-concept utility by training word importance models, achieving an F1-score of 0.57 in the 6-class word importance classification task. 2022.ltedi-1.5 amin-etal-2022-using + 10.18653/v1/2022.ltedi-1.5 Detoxifying Language Models with a Toxic Corpus @@ -82,6 +87,7 @@ 2022.ltedi-1.6 park-rudzicz-2022-detoxifying WebText + 10.18653/v1/2022.ltedi-1.6 Inferring Gender: A Scalable Methodology for Gender Detection with Online Lexical Databases @@ -92,6 +98,7 @@ 2022.ltedi-1.7 bartl-leavy-2022-inferring marionbartl/lexical-gender + 10.18653/v1/2022.ltedi-1.7 Debiasing Pre-Trained Language Models via Efficient Fine-Tuning @@ -106,6 +113,7 @@ CrowS-Pairs StereoSet WinoBias + 10.18653/v1/2022.ltedi-1.8 Disambiguation of morpho-syntactic features of <fixed-case>A</fixed-case>frican <fixed-case>A</fixed-case>merican <fixed-case>E</fixed-case>nglish – the case of habitual be @@ -117,6 +125,7 @@ Recent research has highlighted that natural language processing (NLP) systems exhibit a bias againstAfrican American speakers. These errors are often caused by poor representation of linguistic features unique to African American English (AAE), which is due to the relatively low probability of occurrence for many such features. We present a workflow to overcome this issue in the case of habitual “be”. Habitual “be” is isomorphic, and therefore ambiguous, with other forms of uninflected “be” found in both AAE and General American English (GAE). This creates a clear challenge for bias in NLP technologies. To overcome the scarcity, we employ a combination of rule-based filters and data augmentation that generate a corpus balanced between habitual and non-habitual instances. This balanced corpus trains unbiased machine learning classifiers, as demonstrated on a corpus of AAE transcribed texts, achieving .65 F_1 score at classifying habitual “be”. 2022.ltedi-1.9 santiago-etal-2022-disambiguation + 10.18653/v1/2022.ltedi-1.9 Behind the Mask: Demographic bias in name detection for <fixed-case>PII</fixed-case> masking @@ -128,6 +137,7 @@ 2022.ltedi-1.10 mansfield-etal-2022-behind csmansfield/pii-masking-bias + 10.18653/v1/2022.ltedi-1.10 Mapping the Multilingual Margins: Intersectional Biases of Sentiment Analysis Systems in <fixed-case>E</fixed-case>nglish, <fixed-case>S</fixed-case>panish, and <fixed-case>A</fixed-case>rabic @@ -140,6 +150,7 @@ As natural language processing systems become more widespread, it is necessary to address fairness issues in their implementation and deployment to ensure that their negative impacts on society are understood and minimized. However, there is limited work that studies fairness using a multilingual and intersectional framework or on downstream tasks. In this paper, we introduce four multilingual Equity Evaluation Corpora, supplementary test sets designed to measure social biases, and a novel statistical framework for studying unisectional and intersectional social biases in natural language processing. We use these tools to measure gender, racial, ethnic, and intersectional social biases across five models trained on emotion regression tasks in English, Spanish, and Arabic. We find that many systems demonstrate statistically significant unisectional and intersectional social biases. We make our code and datasets available for download. 2022.ltedi-1.11 camara-etal-2022-mapping + 10.18653/v1/2022.ltedi-1.11 <fixed-case>M</fixed-case>onte <fixed-case>C</fixed-case>arlo Tree Search for Interpreting Stress in Natural Language @@ -151,6 +162,7 @@ 2022.ltedi-1.12 swanson-etal-2022-monte swansonk14/mcts_interpretability + 10.18653/v1/2022.ltedi-1.12 <fixed-case>IIITS</fixed-case>urat@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: Hope Speech Detection using Machine Learning @@ -162,6 +174,7 @@ This paper addresses the issue of Hope Speech detection using machine learning techniques. Designing a robust model that helps in predicting the target class with higher accuracy is a challenging task in machine learning, especially when the distribution of the class labels is highly imbalanced. This study uses and compares the experimental outcomes of the different oversampling techniques. Many models are implemented to classify the comments into Hope and Non-Hope speech, and it found that machine learning algorithms perform better than deep learning models. The English language dataset used in this research was developed by collecting YouTube comments and is part of the task “ACL-2022:Hope Speech Detection for Equality, Diversity, and Inclusion”. The proposed model achieved a weighted F1-score of 0.55 on the test dataset and secured the first rank among the participated teams. 2022.ltedi-1.13 roy-etal-2022-iiitsurat + 10.18653/v1/2022.ltedi-1.13 The Best of both Worlds: Dual Channel Language modeling for Hope Speech Detection in low-resourced <fixed-case>K</fixed-case>annada @@ -175,6 +188,7 @@ 2022.ltedi-1.14 hande-etal-2022-best KanHope + 10.18653/v1/2022.ltedi-1.14 <fixed-case>NYCU</fixed-case>_<fixed-case>TWD</fixed-case>@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: Ensemble Models with <fixed-case>VADER</fixed-case> and Contrastive Learning for Detecting Signs of Depression from Social Media @@ -186,6 +200,7 @@ This paper presents a state-of-the-art solution to the LT-EDI-ACL 2022 Task 4: Detecting Signs of Depression from Social Media Text. The goal of this task is to detect the severity levels of depression of people from social media posts, where people often share their feelings on a daily basis. To detect the signs of depression, we propose a framework with pre-trained language models using rich information instead of training from scratch, gradient boosting and deep learning models for modeling various aspects, and supervised contrastive learning for the generalization ability. Moreover, ensemble techniques are also employed in consideration of the different advantages of each method. Experiments show that our framework achieves a 2nd prize ranking with a macro F1-score of 0.552, showing the effectiveness and robustness of our approach. 2022.ltedi-1.15 wang-etal-2022-nycu + 10.18653/v1/2022.ltedi-1.15 <fixed-case>UMUT</fixed-case>eam@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: Detecting homophobic and transphobic comments in <fixed-case>T</fixed-case>amil @@ -196,6 +211,7 @@ This working-notes are about the participation of the UMUTeam in a LT-EDI shared task concerning the identification of homophobic and transphobic comments in YouTube. These comments are written in English, which has high availability to machine-learning resources; Tamil, which has fewer resources; and a transliteration from Tamil to Roman script combined with English sentences. To carry out this shared task, we train a neural network that combines several feature sets applying a knowledge integration strategy. These features are linguistic features extracted from a tool developed by our research group and contextual and non-contextual sentence embeddings. We ranked 7th for English subtask (macro f1-score of 45%), 3rd for Tamil subtask (macro f1-score of 82%), and 2nd for Tamil-English subtask (macro f1-score of 58%). 2022.ltedi-1.16 garcia-diaz-etal-2022-umuteam-lt + 10.18653/v1/2022.ltedi-1.16 <fixed-case>UMUT</fixed-case>eam@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: Detecting Signs of Depression from text @@ -205,6 +221,7 @@ Depression is a mental condition related to sadness and the lack of interest in common daily tasks. In this working-notes, we describe the proposal of the UMUTeam in the LT-EDI shared task (ACL 2022) concerning the identification of signs of depression in social network posts. This task is somehow related to other relevant Natural Language Processing tasks such as Emotion Analysis. In this shared task, the organisers challenged the participants to distinguish between moderate and severe signs of depression (or no signs of depression at all) in a set of social posts written in English. Our proposal is based on the combination of linguistic features and several sentence embeddings using a knowledge integration strategy. Our proposal achieved the 6th position, with a macro f1-score of 53.82 in the official leader board. 2022.ltedi-1.17 garcia-diaz-valencia-garcia-2022-umuteam + 10.18653/v1/2022.ltedi-1.17 bitsa_nlp@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: Leveraging Pretrained Language Models for Detecting Homophobia and Transphobia in Social Media Comments @@ -215,6 +232,7 @@ 2022.ltedi-1.18 bhandari-goyal-2022-bitsa vitthal-bhandari/homophobia-transphobia-detection + 10.18653/v1/2022.ltedi-1.18 <fixed-case>ABLIMET</fixed-case> @<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: A Roberta based Approach for Homophobia/Transphobia Detection in Social Media @@ -223,6 +241,7 @@ This paper describes our system that participated in LT-EDI-ACL2022- Homophobia/Transphobia Detection in Social Media. Sexual minorities face a lot of unfair treatment and discrimination in our world. This creates enormous stress and many psychological problems for sexual minorities. There is a lot of hate speech on the internet, and Homophobia/Transphobia is the one against sexual minorities. Identifying and processing Homophobia/ Transphobia through natural language processing technology can improve the efficiency of processing Homophobia/ Transphobia, and can quickly screen out Homophobia/Transphobia on the Internet. The organizer of LT-EDI-ACL2022- Homophobia/Transphobia Detection in Social Media constructs a Homophobia/ Transphobia detection dataset based on YouTube comments for English and Tamil. We use a Roberta -based approach to conduct Homophobia/ Transphobia detection experiments on the dataset of the competition, and get better results. 2022.ltedi-1.19 maimaitituoheti-2022-ablimet + 10.18653/v1/2022.ltedi-1.19 <fixed-case>MUCIC</fixed-case>@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: Hope Speech Detection using Data Re-Sampling and 1<fixed-case>D</fixed-case> Conv-<fixed-case>LSTM</fixed-case> @@ -234,6 +253,7 @@ Spreading positive vibes or hope content on social media may help many people to get motivated in their life. To address Hope Speech detection in YouTube comments, this paper presents the description of the models submitted by our team - MUCIC, to the Hope Speech Detection for Equality, Diversity, and Inclusion (HopeEDI) shared task at Association for Computational Linguistics (ACL) 2022. This shared task consists of texts in five languages, namely: English, Spanish (in Latin scripts), and Tamil, Malayalam, and Kannada (in code-mixed native and Roman scripts) with the aim of classifying the YouTube comment into “Hope”, “Not-Hope” or “Not-Intended” categories. The proposed methodology uses the re-sampling technique to deal with imbalanced data in the corpus and obtained 1st rank for English language with a macro-averaged F1-score of 0.550 and weighted-averaged F1-score of 0.860. The code to reproduce this work is available in GitHub. 2022.ltedi-1.20 gowda-etal-2022-mucic + 10.18653/v1/2022.ltedi-1.20 <fixed-case>D</fixed-case>eep<fixed-case>B</fixed-case>lues@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: Depression level detection modelling through domain specific <fixed-case>BERT</fixed-case> and short text Depression classifiers @@ -245,6 +265,7 @@ We discuss a variety of approaches to build a robust Depression level detection model from longer social media posts (i.e., Reddit Depression forum posts) using a mental health text pre-trained BERT model. Further, we report our experimental results based on a strategy to select excerpts from long text and then fine-tune the BERT model to combat the issue of memory constraints while processing such texts. We show that, with domain specific BERT, we can achieve reasonable accuracy with fixed text size (in this case 200 tokens) for this task. In addition we can use short text classifiers to extract relevant text from the long text and achieve slightly better accuracy, albeit, trading off with the processing time for extracting such excerpts. 2022.ltedi-1.21 farruque-etal-2022-deepblues + 10.18653/v1/2022.ltedi-1.21 <fixed-case>SSN</fixed-case>_<fixed-case>ARMM</fixed-case>@ <fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case> -<fixed-case>ACL</fixed-case>2022: Hope Speech Detection for Equality, Diversity, and Inclusion Using <fixed-case>ALBERT</fixed-case> model @@ -259,6 +280,7 @@ In recent years social media has become one of the major forums for expressing human views and emotions. With the help of smartphones and high-speed internet, anyone can express their views on Social media. However, this can also lead to the spread of hatred and violence in society. Therefore it is necessary to build a method to find and support helpful social media content. In this paper, we studied Natural Language Processing approach for detecting Hope speech in a given sentence. The task was to classify the sentences into ‘Hope speech’ and ‘Non-hope speech’. The dataset was provided by LT-EDI organizers with text from Youtube comments. Based on the task description, we developed a system using the pre-trained language model BERT to complete this task. Our model achieved 1st rank in the Kannada language with a weighted average F1 score of 0.750, 2nd rank in the Malayalam language with a weighted average F1 score of 0.740, 3rd rank in the Tamil language with a weighted average F1 score of 0.390 and 6th rank in the English language with a weighted average F1 score of 0.880. 2022.ltedi-1.22 vijayakumar-etal-2022-ssn + 10.18653/v1/2022.ltedi-1.22 <fixed-case>SUH</fixed-case>_<fixed-case>ASR</fixed-case>@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: Transformer based Approach for Speech Recognition for Vulnerable Individuals in <fixed-case>T</fixed-case>amil @@ -268,6 +290,7 @@ An Automatic Speech Recognition System is developed for addressing the Tamil conversational speech data of the elderly people andtransgender. The speech corpus used in this system is collected from the people who adhere their communication in Tamil at some primary places like bank, hospital, vegetable markets. Our ASR system is designed with pre-trained model which is used to recognize the speechdata. WER(Word Error Rate) calculation is used to analyse the performance of the ASR system. This evaluation could help to make acomparison of utterances between the elderly people and others. Similarly, the comparison between the transgender and other people isalso done. Our proposed ASR system achieves the word error rate as 39.65%. 2022.ltedi-1.23 s-b-2022-suh + 10.18653/v1/2022.ltedi-1.23 <fixed-case>LPS</fixed-case>@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022:An Ensemble Approach about Hope Speech Detection @@ -276,6 +299,7 @@ The task shared by sponsor about Hope Speech Detection for Equality, Diversity, and Inclusion at LT-EDI-ACL-2022.The goal of this task is to identify whether a given comment contains hope speech or not,and hope is considered significant for the well-being, recuperation and restoration of human life.Our work aims to change the prevalent way of thinking by moving away from a preoccupation with discrimination, loneliness or the worst things in life to building the confidence, support and good qualities based on comments by individuals. In response to the need to detect equality, diversity and inclusion of hope speech in a multilingual environment, we built an integration model and achieved well performance on multiple datasets presented by the sponsor and the specific results can be referred to the experimental results section. 2022.ltedi-1.24 zhu-2022-lps + 10.18653/v1/2022.ltedi-1.24 <fixed-case>CURAJ</fixed-case>_<fixed-case>IIITDWD</fixed-case>@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case> 2022: Hope Speech Detection in <fixed-case>E</fixed-case>nglish <fixed-case>Y</fixed-case>ou<fixed-case>T</fixed-case>ube Comments using Deep Learning Techniques @@ -286,6 +310,7 @@ Hope Speech are positive terms that help to promote or criticise a point of view without hurting the user’s or community’s feelings. Non-Hope Speech, on the other side, includes expressions that are harsh, ridiculing, or demotivating. The goal of this article is to find the hope speech comments in a YouTube dataset. The datasets were created as part of the “LT-EDI-ACL 2022: Hope Speech Detection for Equality, Diversity, and Inclusion” shared task. The shared task dataset was proposed in Malayalam, Tamil, English, Spanish, and Kannada languages. In this paper, we worked at English-language YouTube comments. We employed several deep learning based models such as DNN (dense or fully connected neural network), CNN (Convolutional Neural Network), Bi-LSTM (Bidirectional Long Short Term Memory Network), and GRU(Gated Recurrent Unit) to identify the hopeful comments. We also used Stacked LSTM-CNN and Stacked LSTM-LSTM network to train the model. The best macro average F1-score 0.67 for development dataset was obtained using the DNN model. The macro average F1-score of 0.67 was achieved for the classification done on the test data as well. 2022.ltedi-1.25 jha-etal-2022-curaj + 10.18653/v1/2022.ltedi-1.25 <fixed-case>SSN</fixed-case>_<fixed-case>MLRG</fixed-case>3 @<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022-Depression Detection System from Social Media Text using Transformer Models @@ -299,6 +324,7 @@ Depression is a common mental illness that involves sadness and lack of interest in all day-to-day activities. The task is to classify the social media text as signs of depression into three labels namely “not depressed”, “moderately depressed”, and “severely depressed”. We have build a system using Deep Learning Model “Transformers”. Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio. The multi-class classification model used in our system is based on the ALBERT model. In the shared task ACL 2022, Our team SSN_MLRG3 obtained a Macro F1 score of 0.473. 2022.ltedi-1.26 esackimuthu-etal-2022-ssn + 10.18653/v1/2022.ltedi-1.26 <fixed-case>BERT</fixed-case> 4<fixed-case>EVER</fixed-case>@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022-Detecting signs of Depression from Social Media:Detecting Depression in Social Media using Prompt-Learning and Word-Emotion Cluster @@ -311,6 +337,7 @@ In this paper, we report the solution of the team BERT 4EVER for the LT-EDI-2022 shared task2: Homophobia/Transphobia Detection in social media comments in ACL 2022, which aims to classify Youtube comments into one of the following categories: no,moderate, or severe depression. We model the problem as a text classification task and a text generation task and respectively propose two different models for the tasks.To combine the knowledge learned from these two different models, we softly fuse the predicted probabilities of the models above and then select the label with the highest probability as the final output.In addition, multiple augmentation strategies are leveraged to improve the model generalization capability, such as back translation and adversarial training.Experimental results demonstrate the effectiveness of the proposed models and two augmented strategies. 2022.ltedi-1.27 lin-etal-2022-bert + 10.18653/v1/2022.ltedi-1.27 <fixed-case>CIC</fixed-case>@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: Are transformers the only hope? Hope speech detection for <fixed-case>S</fixed-case>panish and <fixed-case>E</fixed-case>nglish comments @@ -322,6 +349,7 @@ Hope is an inherent part of human life and essential for improving the quality of life. Hope increases happiness and reduces stress and feelings of helplessness. Hope speech is the desired outcome for better and can be studied using text from various online sources where people express their desires and outcomes. In this paper, we address a deep-learning approach with a combination of linguistic and psycho-linguistic features for hope-speech detection. We report our best results submitted to LT-EDI-2022 which ranked 2nd and 3rd in English and Spanish respectively. 2022.ltedi-1.28 balouchzahi-etal-2022-cic + 10.18653/v1/2022.ltedi-1.28 scube<fixed-case>MSEC</fixed-case>@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: Detection of Depression using Transformer Models @@ -334,6 +362,7 @@ Social media platforms play a major role in our day-to-day life and are considered as a virtual friend by many users, who use the social media to share their feelings all day. Many a time, the content which is shared by users on social media replicate their internal life. Nowadays people love to share their daily life incidents like happy or unhappy moments and their feelings in social media and it makes them feel complete and it has become a habit for many users. Social media provides a new chance to identify the feelings of a person through their posts. The aim of the shared task is to develop a model in which the system is capable of analyzing the grammatical markers related to onset and permanent symptoms of depression. We as a team participated in the shared task Detecting Signs of Depression from Social Media Text at LT-EDI 2022- ACL 2022 and we have proposed a model which predicts depression from English social media posts using the data set shared for the task. The prediction is done based on the labels Moderate, Severe and Not Depressed. We have implemented this using different transformer models like DistilBERT, RoBERTa and ALBERT by which we were able to achieve a Macro F1 score of 0.337, 0.457 and 0.387 respectively. Our code is publicly available in the github 2022.ltedi-1.29 s-etal-2022-scubemsec + 10.18653/v1/2022.ltedi-1.29 <fixed-case>SSNCSE</fixed-case>_<fixed-case>NLP</fixed-case>@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022:Hope Speech Detection for Equality, Diversity and Inclusion using sentence transformers @@ -346,6 +375,7 @@ In recent times, applications have been developed to regulate and control the spread of negativity and toxicity on online platforms. The world is filled with serious problems like political & religious conflicts, wars, pandemics, and offensive hate speech is the last thing we desire. Our task was to classify a text into ‘Hope Speech’ and ‘Non-Hope Speech’. We searched for datasets acquired from YouTube comments that offer support, reassurance, inspiration, and insight, and the ones that don’t. The datasets were provided to us by the LTEDI organizers in English, Tamil, Spanish, Kannada, and Malayalam. To successfully identify and classify them, we employed several machine learning transformer models such as m-BERT, MLNet, BERT, XLMRoberta, and XLM_MLM. The observed results indicate that the BERT and m-BERT have obtained the best results among all the other techniques, gaining a weighted F1- score of 0.92, 0.71, 0.76, 0.87, and 0.83 for English, Tamil, Spanish, Kannada, and Malayalam respectively. This paper depicts our work for the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion at LTEDI 2021. 2022.ltedi-1.30 b-etal-2022-ssncse + 10.18653/v1/2022.ltedi-1.30 <fixed-case>SOA</fixed-case>_<fixed-case>NLP</fixed-case>@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: An Ensemble Model for Hope Speech Detection from <fixed-case>Y</fixed-case>ou<fixed-case>T</fixed-case>ube Comments @@ -356,6 +386,7 @@ Language should be accommodating of equality and diversity as a fundamental aspect of communication. The language of internet users has a big impact on peer users all over the world. On virtual platforms such as Facebook, Twitter, and YouTube, people express their opinions in different languages. People respect others’ accomplishments, pray for their well-being, and cheer them on when they fail. Such motivational remarks are hope speech remarks. Simultaneously, a group of users encourages discrimination against women, people of color, people with disabilities, and other minorities based on gender, race, sexual orientation, and other factors. To recognize hope speech from YouTube comments, the current study offers an ensemble approach that combines a support vector machine, logistic regression, and random forest classifiers. Extensive testing was carried out to discover the best features for the aforementioned classifiers. In the support vector machine and logistic regression classifiers, char-level TF-IDF features were used, whereas in the random forest classifier, word-level features were used. The proposed ensemble model performed significantly well among English, Spanish, Tamil, Malayalam, and Kannada YouTube comments. 2022.ltedi-1.31 kumar-etal-2022-soa + 10.18653/v1/2022.ltedi-1.31 <fixed-case>IIT</fixed-case> Dhanbad @<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022- Hope Speech Detection for Equality, Diversity, and Inclusion @@ -366,6 +397,7 @@ Hope is considered significant for the wellbeing,recuperation and restoration of humanlife by health professionals. Hope speech reflectsthe belief that one can discover pathwaysto their desired objectives and become rousedto utilise those pathways. Hope speech offerssupport, reassurance, suggestions, inspirationand insight. Hate speech is a prevalent practicethat society has to struggle with everyday.The freedom of speech and ease of anonymitygranted by social media has also resulted inincitement to hatred. In this paper, we workto identify and promote positive and supportivecontent on these platforms. We work withseveral machine learning models to classify socialmedia comments as hope speech or nonhopespeech in English. This paper portraysour work for the Shared Task on Hope SpeechDetection for Equality, Diversity, and Inclusionat LT-EDI-ACL 2022. 2022.ltedi-1.32 gupta-etal-2022-iit + 10.18653/v1/2022.ltedi-1.32 <fixed-case>IISERB</fixed-case>@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: A Bag of Words and Document Embeddings Based Framework to Identify Severity of Depression Over Social Media @@ -374,6 +406,7 @@ The DepSign-LT-EDI-ACL2022 shared task focuses on early prediction of severity of depression over social media posts. The BioNLP group at Department of Data Science and Engineering in Indian Institute of Science Education and Research Bhopal (IISERB) has participated in this challenge and submitted three runs based on three different text mining models. The severity of depression were categorized into three classes, viz., no depression, moderate, and severe and the data to build models were released as part of this shared task. The objective of this work is to identify relevant features from the given social media texts for effective text classification. As part of our investigation, we explored features derived from text data using document embeddings technique and simple bag of words model following different weighting schemes. Subsequently, adaptive boosting, logistic regression, random forest and support vector machine (SVM) classifiers were used to identify the scale of depression from the given texts. The experimental analysis on the given validation data show that the SVM classifier using the bag of words model following term frequency and inverse document frequency weighting scheme outperforms the other models for identifying depression. However, this framework could not achieve a place among the top ten runs of the shared task. This paper describes the potential of the proposed framework as well as the possible reasons behind mediocre performance on the given data. 2022.ltedi-1.33 basu-2022-iiserb + 10.18653/v1/2022.ltedi-1.33 <fixed-case>SSNCSE</fixed-case>_<fixed-case>NLP</fixed-case>@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: Homophobia/Transphobia Detection in Multiple Languages using <fixed-case>SVM</fixed-case> Classifiers and <fixed-case>BERT</fixed-case>-based Transformers @@ -385,6 +418,7 @@ Over the years, there has been a slow but steady change in the attitude of society towards different kinds of sexuality. However, on social media platforms, where people have the license to be anonymous, toxic comments targeted at homosexuals, transgenders and the LGBTQ+ community are not uncommon. Detection of homophobic comments on social media can be useful in making the internet a safer place for everyone. For this task, we used a combination of word embeddings and SVM Classifiers as well as some BERT-based transformers. We achieved a weighted F1-score of 0.93 on the English dataset, 0.75 on the Tamil dataset and 0.87 on the Tamil-English Code-Mixed dataset. 2022.ltedi-1.34 swaminathan-etal-2022-ssncse + 10.18653/v1/2022.ltedi-1.34 <fixed-case>KUCST</fixed-case>@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: Detecting Signs of Depression from Social Media Text @@ -394,6 +428,7 @@ In this paper we present our approach for detecting signs of depression from social media text. Our model relies on word unigrams, part-of-speech tags, readabilitiy measures and the use of first, second or third person and the number of words. Our best model obtained a macro F1-score of 0.439 and ranked 25th, out of 31 teams. We further take advantage of the interpretability of the Logistic Regression model and we make an attempt to interpret the model coefficients with the hope that these will be useful for further research on the topic. 2022.ltedi-1.35 agirrezabal-amann-2022-kucst + 10.18653/v1/2022.ltedi-1.35 E8-<fixed-case>IJS</fixed-case>@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022 - <fixed-case>BERT</fixed-case>, <fixed-case>A</fixed-case>uto<fixed-case>ML</fixed-case> and Knowledge-graph backed Detection of Depression @@ -405,6 +440,7 @@ Depression is a mental illness that negatively affects a person’s well-being and can, if left untreated, lead to serious consequences such as suicide. Therefore, it is important to recognize the signs of depression early. In the last decade, social media has become one of the most common places to express one’s feelings. Hence, there is a possibility of text processing and applying machine learning techniques to detect possible signs of depression. In this paper, we present our approaches to solving the shared task titled Detecting Signs of Depression from Social Media Text. We explore three different approaches to solve the challenge: fine-tuning BERT model, leveraging AutoML for the construction of features and classifier selection and finally, we explore latent spaces derived from the combination of textual and knowledge-based representations. We ranked 9th out of 31 teams in the competition. Our best solution, based on knowledge graph and textual representations, was 4.9% behind the best model in terms of Macro F1, and only 1.9% behind in terms of Recall. 2022.ltedi-1.36 tavchioski-etal-2022-e8 + 10.18653/v1/2022.ltedi-1.36 Nozza@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: Ensemble Modeling for Homophobia and Transphobia Detection @@ -413,6 +449,7 @@ In this paper, we describe our approach for the task of homophobia and transphobia detection in English social media comments. The dataset consists of YouTube comments, and it has been released for the shared task on Homophobia/Transphobia Detection in social media comments. Given the high class imbalance, we propose a solution based on data augmentation and ensemble modeling. We fine-tuned different large language models (BERT, RoBERTa, and HateBERT) and used the weighted majority vote on their predictions.Our proposed model obtained 0.48 and 0.94 for macro and weighted F1-score, respectively, ranking at the third position. 2022.ltedi-1.37 nozza-2022-nozza + 10.18653/v1/2022.ltedi-1.37 <fixed-case>KADO</fixed-case>@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: <fixed-case>BERT</fixed-case>-based Ensembles for Detecting Signs of Depression from Social Media Text @@ -423,6 +460,7 @@ Depression is a common and serious mental illness that early detection can improve the patient’s symptoms and make depression easier to treat. This paper mainly introduces the relevant content of the task “Detecting Signs of Depression from Social Media Text at DepSign-LT-EDI@ACL-2022”. The goal of DepSign is to classify the signs of depression into three labels namely “not depressed”, “moderately depressed”, and “severely depressed” based on social media’s posts. In this paper, we propose a predictive ensemble model that utilizes the fine-tuned contextualized word embedding, ALBERT, DistilBERT, RoBERTa, and BERT base model. We show that our model outperforms the baseline models in all considered metrics and achieves an F1 score of 54% and accuracy of 61%, ranking 5th on the leader-board for the DepSign task. 2022.ltedi-1.38 janatdoust-etal-2022-kado + 10.18653/v1/2022.ltedi-1.38 Sammaan@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: Ensembled Transformers Against Homophobia and Transphobia @@ -434,6 +472,7 @@ 2022.ltedi-1.39 upadhyay-etal-2022-sammaan GLUE + 10.18653/v1/2022.ltedi-1.39 <fixed-case>OPI</fixed-case>@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: Detecting Signs of Depression from Social Media Text using <fixed-case>R</fixed-case>o<fixed-case>BERT</fixed-case>a Pre-trained Language Models @@ -443,6 +482,7 @@ This paper presents our winning solution for the Shared Task on Detecting Signs of Depression from Social Media Text at LT-EDI-ACL2022. The task was to create a system that, given social media posts in English, should detect the level of depression as ‘not depressed’, ‘moderately depressed’ or ‘severely depressed’. We based our solution on transformer-based language models. We fine-tuned selected models: BERT, RoBERTa, XLNet, of which the best results were obtained for RoBERTa. Then, using the prepared corpus, we trained our own language model called DepRoBERTa (RoBERTa for Depression Detection). Fine-tuning of this model improved the results. The third solution was to use the ensemble averaging, which turned out to be the best solution. It achieved a macro-averaged F1-score of 0.583. The source code of prepared solution is available at https://github.com/rafalposwiata/depression-detection-lt-edi-2022. 2022.ltedi-1.40 poswiata-perelkiewicz-2022-opi + 10.18653/v1/2022.ltedi-1.40 <fixed-case>F</fixed-case>ilip<fixed-case>N</fixed-case>@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022-Detecting signs of Depression from Social Media: Examining the use of summarization methods as data augmentation for text classification @@ -454,6 +494,7 @@ nilsson-kovacs-2022-filipn flippe3/dsdsm_augmentation C4 + 10.18653/v1/2022.ltedi-1.41 <fixed-case>NAYEL</fixed-case> @<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: Homophobia/Transphobia Detection for Equality, Diversity, and Inclusion using <fixed-case>SVM</fixed-case> @@ -465,6 +506,7 @@ Analysing the contents of social media platforms such as YouTube, Facebook and Twitter gained interest due to the vast number of users. One of the important tasks is homophobia/transphobia detection. This paper illustrates the system submitted by our team for the homophobia/transphobia detection in social media comments shared task. A machine learning-based model has been designed and various classification algorithms have been implemented for automatic detection of homophobia in YouTube comments. TF/IDF has been used with a range of bigram model for vectorization of comments. Support Vector Machines has been used to develop the proposed model and our submission reported 0.91, 0.92, 0.88 weighted f1-score for English, Tamil and Tamil-English datasets respectively. 2022.ltedi-1.42 ashraf-etal-2022-nayel + 10.18653/v1/2022.ltedi-1.42 gini<fixed-case>U</fixed-case>s @<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: Aasha: Transformers based Hope-<fixed-case>EDI</fixed-case> @@ -475,6 +517,7 @@ 2022.ltedi-1.43 surana-chinagundi-2022-ginius HopeEDI + 10.18653/v1/2022.ltedi-1.43 <fixed-case>SSN</fixed-case>_<fixed-case>MLRG</fixed-case>1@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: Multi-Class Classification using <fixed-case>BERT</fixed-case> models for Detecting Depression Signs from Social Media Text @@ -487,6 +530,7 @@ DepSign-LT-EDI@ACL-2022 aims to ascer-tain the signs of depression of a person fromtheir messages and posts on social mediawherein people share their feelings and emo-tions. Given social media postings in English,the system should classify the signs of depres-sion into three labels namely “not depressed”,“moderately depressed”, and “severely de-pressed”. To achieve this objective, we haveadopted a fine-tuned BERT model. This solu-tion from team SSN_MLRG1 achieves 58.5%accuracy on the DepSign-LT-EDI@ACL-2022test set. 2022.ltedi-1.44 anantharaman-etal-2022-ssn + 10.18653/v1/2022.ltedi-1.44 <fixed-case>D</fixed-case>epression<fixed-case>O</fixed-case>ne@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: Using Machine Learning with <fixed-case>SMOTE</fixed-case> and Random <fixed-case>U</fixed-case>nder<fixed-case>S</fixed-case>ampling to Detect Signs of Depression on Social Media Text. @@ -496,6 +540,7 @@ Depression is a common and serious medical illness that negatively affects how you feel, the way you think, and how you act. Detecting depression is essential as it must be treated early to avoid painful consequences. Nowadays, people are broadcasting how they feel via posts and comments. Using social media, we can extract many comments related to depression and use NLP techniques to train and detect depression. This work presents the submission of the DepressionOne team at LT-EDI-2022 for the shared task, detecting signs of depression from social media text. The depression data is small and unbalanced. Thus, we have used oversampling and undersampling methods such as SMOTE and RandomUnderSampler to represent the data. Later, we used machine learning methods to train and detect the signs of depression. 2022.ltedi-1.45 dowlagar-mamidi-2022-depressionone + 10.18653/v1/2022.ltedi-1.45 <fixed-case>L</fixed-case>eaning<fixed-case>T</fixed-case>ower@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: When Hope and Hate Collide @@ -507,6 +552,7 @@ The 2022 edition of LT-EDI proposed two tasks in various languages. Task Hope Speech Detection required models for the automatic identification of hopeful comments for equality, diversity, and inclusion. Task Homophobia/Transphobia Detection focused on the identification of homophobic and transphobic comments. We targeted both tasks in English by using reinforced BERT-based approaches. Our core strategy aimed at exploiting the data available for each given task to augment the amount of supervised instances in the other. On the basis of an active learning process, we trained a model on the dataset for Task i and applied it to the dataset for Task j to iteratively integrate new silver data for Task i. Our official submissions to the shared task obtained a macro-averaged F_1 score of 0.53 for Hope Speech and 0.46 for Homo/Transphobia, placing our team in the third and fourth positions out of 11 and 12 participating teams respectively. 2022.ltedi-1.46 muti-etal-2022-leaningtower + 10.18653/v1/2022.ltedi-1.46 <fixed-case>MUCS</fixed-case>@Text-<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>@<fixed-case>ACL</fixed-case> 2022: Detecting Sign of Depression from Social Media Text using Supervised Learning Approach @@ -518,6 +564,7 @@ Social media has seen enormous growth in its users recently and knowingly or unknowingly the behavior of a person will be reflected in the comments she/he posts on social media. Users having the sign of depression may post negative or disturbing content seeking the attention of other users. Hence, social media data can be analysed to check whether the users’ have the sign of depression and help them to get through the situation if required. However, as analyzing the increasing amount of social media data manually in laborious and error-prone, automated tools have to be developed for the same. To address the issue of detecting the sign of depression content on social media, in this paper, we - team MUCS, describe an Ensemble of Machine Learning (ML) models and a Transfer Learning (TL) model submitted to “Detecting Signs of Depression from Social Media Text-LT-EDI@ACL 2022” (DepSign-LT-EDI@ACL-2022) shared task at Association for Computational Linguistics (ACL) 2022. Both frequency and text based features are used to train an Ensemble model and Bidirectional Encoder Representations from Transformers (BERT) fine-tuned with raw text is used to train the TL model. Among the two models, the TL model performed better with a macro averaged F-score of 0.479 and placed 18th rank in the shared task. The code to reproduce the proposed models is available in github page1. 2022.ltedi-1.47 hegde-etal-2022-mucs-text + 10.18653/v1/2022.ltedi-1.47 <fixed-case>SSNCSE</fixed-case>_<fixed-case>NLP</fixed-case>@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: Speech Recognition for Vulnerable Individuals in <fixed-case>T</fixed-case>amil using pre-trained <fixed-case>XLSR</fixed-case> models @@ -529,6 +576,7 @@ Automatic speech recognition is a tool used to transform human speech into a written form. It is used in a variety of avenues, such as in voice commands, customer, service and more. It has emerged as an essential tool in the digitisation of daily life. It has been known to be of vital importance in making the lives of elderly and disabled people much easier. In this paper we describe an automatic speech recognition model, determined by using three pre-trained models, fine-tuned from the Facebook XLSR Wav2Vec2 model, which was trained using the Common Voice Dataset. The best model for speech recognition in Tamil is determined by finding the word error rate of the data. This work explains the submission made by SSNCSE_NLP in the shared task organized by LT-EDI at ACL 2022. A word error rate of 39.4512 is achieved. 2022.ltedi-1.48 srinivasan-etal-2022-ssncse + 10.18653/v1/2022.ltedi-1.48 <fixed-case>IDIAP</fixed-case>_<fixed-case>TIET</fixed-case>@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022 : Hope Speech Detection in Social Media using Contextualized <fixed-case>BERT</fixed-case> with Attention Mechanism @@ -540,6 +588,7 @@ 2022.ltedi-1.49 khanna-etal-2022-idiap deepanshu-beep/hope-speech-attention + 10.18653/v1/2022.ltedi-1.49 <fixed-case>SSN</fixed-case>@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: Transfer Learning using <fixed-case>BERT</fixed-case> for Detecting Signs of Depression from Social Media Texts @@ -549,6 +598,7 @@ Depression is one of the most common mentalissues faced by people. Detecting signs ofdepression early on can help in the treatmentand prevention of extreme outcomes like suicide.Since the advent of the internet, peoplehave felt more comfortable discussing topicslike depression online due to the anonymityit provides. This shared task has used datascraped from various social media sites andaims to develop models that detect signs andthe severity of depression effectively. In thispaper, we employ transfer learning by applyingenhanced BERT model trained for Wikipediadataset to the social media text and performtext classification. The model gives a F1-scoreof 63.8% which was reasonably better than theother competing models. 2022.ltedi-1.50 s-antony-2022-ssn + 10.18653/v1/2022.ltedi-1.50 Findings of the Shared Task on Detecting Signs of Depression from Social Media @@ -560,6 +610,7 @@ Social media is considered as a platform whereusers express themselves. The rise of social me-dia as one of humanity’s most important publiccommunication platforms presents a potentialprospect for early identification and manage-ment of mental illness. Depression is one suchillness that can lead to a variety of emotionaland physical problems. It is necessary to mea-sure the level of depression from the socialmedia text to treat them and to avoid the nega-tive consequences. Detecting levels of depres-sion is a challenging task since it involves themindset of the people which can change period-ically. The aim of the DepSign-LT-EDI@ACL-2022 shared task is to classify the social me-dia text into three levels of depression namely“Not Depressed”, “Moderately Depressed”, and“Severely Depressed”. This overview presentsa description on the task, the data set, method-ologies used and an analysis on the results ofthe submissions. The models that were submit-ted as a part of the shared task had used a va-riety of technologies from traditional machinelearning algorithms to deep learning models.It could be observed from the result that thetransformer based models have outperformedthe other models. Among the 31 teams whohad submitted their results for the shared task,the best macro F1-score of 0.583 was obtainedusing transformer based model. 2022.ltedi-1.51 s-etal-2022-findings + 10.18653/v1/2022.ltedi-1.51 Findings of the Shared Task on Speech Recognition for Vulnerable Individuals in <fixed-case>T</fixed-case>amil @@ -573,6 +624,7 @@ This paper illustrates the overview of the sharedtask on automatic speech recognition in the Tamillanguage. In the shared task, spontaneousTamil speech data gathered from elderly andtransgender people was given for recognitionand evaluation. These utterances were collected from people when they communicatedin the public locations such as hospitals, markets, vegetable shop, etc. The speech corpusincludes utterances of male, female, and transgender and was split into training and testingdata. The given task was evaluated using WER(Word Error Rate). The participants used thetransformer-based model for automatic speechrecognition. Different results using differentpre-trained transformer models are discussedin this overview paper. 2022.ltedi-1.52 b-etal-2022-findings-shared + 10.18653/v1/2022.ltedi-1.52 <fixed-case>DLRG</fixed-case>@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022:Detecting signs of Depression from Social Media using <fixed-case>XGB</fixed-case>oost Method @@ -582,6 +634,7 @@ Depression is linked to the development of dementia.Cognitive functions such as thinkingand remembering generally deteriorate in dementiapatients. Social media usage has beenincreased among the people in recent days. Thetechnology advancements help the communityto express their views publicly. Analysing thesigns of depression from texts has become animportant area of research now, as it helps toidentify this kind of mental disorders among thepeople from their social media posts. As part ofthe shared task on detecting signs of depressionfrom social media text, a dataset has been providedby the organizers (Sampath et al.). Weapplied different machine learning techniquessuch as Support Vector Machine, Random Forestand XGBoost classifier to classify the signsof depression. Experimental results revealedthat, the XGBoost model outperformed othermodels with the highest classification accuracyof 0.61% and an Macro F1 score of 0.54. 2022.ltedi-1.53 sharen-rajalakshmi-2022-dlrg + 10.18653/v1/2022.ltedi-1.53 <fixed-case>IDIAP</fixed-case> Submission@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022 : Hope Speech Detection for Equality, Diversity and Inclusion @@ -593,6 +646,7 @@ singh-motlicek-2022-idiap muskaan-singh/hate-speech-detection HopeEDI + 10.18653/v1/2022.ltedi-1.54 <fixed-case>IDIAP</fixed-case> Submission@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: Homophobia/Transphobia Detection in social media comments @@ -603,6 +657,7 @@ 2022.ltedi-1.55 singh-motlicek-2022-idiap-submission muskaan-singh/homophobia-and-transphobia-acl-submission + 10.18653/v1/2022.ltedi-1.55 <fixed-case>IDIAP</fixed-case> Submission@<fixed-case>LT</fixed-case>-<fixed-case>EDI</fixed-case>-<fixed-case>ACL</fixed-case>2022: Detecting Signs of Depression from Social Media Text @@ -612,6 +667,7 @@ Depression is a common illness involving sadness and lack of interest in all day-to-day activities. It is important to detect depression at an early stage as it is treated at an early stage to avoid consequences. In this paper, we present our system submission of ARGUABLY for DepSign-LT-EDI@ACL-2022. We aim to detect the signs of depression of a person from their social media postings wherein people share their feelings and emotions. The proposed system is an ensembled voting model with fine-tuned BERT, RoBERTa, and XLNet. Given social media postings in English, the submitted system classify the signs of depression into three labels, namely “not depressed,” “moderately depressed,” and “severely depressed.” Our best model is ranked 3^{rd} position with 0.54% accuracy . We make our codebase accessible here. 2022.ltedi-1.56 singh-motlicek-2022-idiap-submission-lt + 10.18653/v1/2022.ltedi-1.56 Overview of The Shared Task on Homophobia and Transphobia Detection in Social Media Comments @@ -626,6 +682,7 @@ Homophobia and Transphobia Detection is the task of identifying homophobia, transphobia, and non-anti-LGBT+ content from the given corpus. Homophobia and transphobia are both toxic languages directed at LGBTQ+ individuals that are described as hate speech. This paper summarizes our findings on the “Homophobia and Transphobia Detection in social media comments” shared task held at LT-EDI 2022 - ACL 2022 1. This shared taskfocused on three sub-tasks for Tamil, English, and Tamil-English (code-mixed) languages. It received 10 systems for Tamil, 13 systems for English, and 11 systems for Tamil-English. The best systems for Tamil, English, and Tamil-English scored 0.570, 0.870, and 0.610, respectively, on average macro F1-score. 2022.ltedi-1.57 chakravarthi-etal-2022-overview + 10.18653/v1/2022.ltedi-1.57 Overview of the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion @@ -645,6 +702,7 @@ Hope Speech detection is the task of classifying a sentence as hope speech or non-hope speech given a corpus of sentences. Hope speech is any message or content that is positive, encouraging, reassuring, inclusive and supportive that inspires and engenders optimism in the minds of people. In contrast to identifying and censoring negative speech patterns, hope speech detection is focussed on recognising and promoting positive speech patterns online. In this paper, we report an overview of the findings and results from the shared task on hope speech detection for Tamil, Malayalam, Kannada, English and Spanish languages conducted in the second workshop on Language Technology for Equality, Diversity and Inclusion (LT-EDI-2022) organised as a part of ACL 2022. The participants were provided with annotated training & development datasets and unlabelled test datasets in all the five languages. The goal of the shared task is to classify the given sentences into one of the two hope speech classes. The performances of the systems submitted by the participants were evaluated in terms of micro-F1 score and weighted-F1 score. The datasets for this challenge are openly available 2022.ltedi-1.58 chakravarthi-etal-2022-overview-shared + 10.18653/v1/2022.ltedi-1.58 diff --git a/data/xml/2022.mml.xml b/data/xml/2022.mml.xml index e8107572d4..aa74f4da47 100644 --- a/data/xml/2022.mml.xml +++ b/data/xml/2022.mml.xml @@ -38,6 +38,7 @@ jung-etal-2022-language COCO COCO-CN + 10.18653/v1/2022.mml-1.1 diff --git a/data/xml/2022.nlp4convai.xml b/data/xml/2022.nlp4convai.xml index aa965f8ace..4cc5a1de09 100644 --- a/data/xml/2022.nlp4convai.xml +++ b/data/xml/2022.nlp4convai.xml @@ -31,6 +31,7 @@ 2022.nlp4convai-1.1 lee-etal-2022-randomized DailyDialog + 10.18653/v1/2022.nlp4convai-1.1 Are Pre-trained Transformers Robust in Intent Classification? A Missing Ingredient in Evaluation of Out-of-Scope Intent Detection @@ -45,6 +46,7 @@ Pre-trained Transformer-based models were reported to be robust in intent classification. In this work, we first point out the importance of in-domain out-of-scope detection in few-shot intent recognition tasks and then illustrate the vulnerability of pre-trained Transformer-based models against samples that are in-domain but out-of-scope (ID-OOS). We construct two new datasets, and empirically show that pre-trained models do not perform well on both ID-OOS examples and general out-of-scope examples, especially on fine-grained few-shot intent detection tasks. 2022.nlp4convai-1.2 zhang-etal-2022-pre-trained + 10.18653/v1/2022.nlp4convai-1.2 Conversational <fixed-case>AI</fixed-case> for Positive-sum Retailing under Falsehood Control @@ -56,6 +58,7 @@ Retailing combines complicated communication skills and strategies to reach an agreement between buyer and seller with identical or different goals. In each transaction a good seller finds an optimal solution by considering his/her own profits while simultaneously considering whether the buyer’s needs have been met. In this paper, we manage the retailing problem by mixing cooperation and competition. We present a rich dataset of buyer-seller bargaining in a simulated marketplace in which each agent values goods and utility separately. Various attributes (preference, quality, and profit) are initially hidden from one agent with respect to its role; during the conversation, both sides may reveal, fake, or retain the information uncovered to come to a final decision through natural language. Using this dataset, we leverage transfer learning techniques on a pretrained, end-to-end model and enhance its decision-making ability toward the best choice in terms of utility by means of multi-agent reinforcement learning. An automatic evaluation shows that our approach results in more optimal transactions than human does. We also show that our framework controls the falsehoods generated by seller agents. 2022.nlp4convai-1.3 liao-etal-2022-conversational + 10.18653/v1/2022.nlp4convai-1.3 <fixed-case>D</fixed-case>-<fixed-case>REX</fixed-case>: Dialogue Relation Extraction with Explanations @@ -70,6 +73,7 @@ albalak-etal-2022-rex alon-albalak/D-REX DialogRE + 10.18653/v1/2022.nlp4convai-1.4 Data Augmentation for Intent Classification with Off-the-shelf Large Language Models @@ -85,6 +89,7 @@ sahu-etal-2022-data elementai/data-augmentation-with-llms CLINC150 + 10.18653/v1/2022.nlp4convai-1.5 Extracting and Inferring Personal Attributes from Dialogue @@ -101,6 +106,7 @@ ConceptNet PERSONA-CHAT Universal Dependencies + 10.18653/v1/2022.nlp4convai-1.6 From Rewriting to Remembering: Common Ground for Conversational <fixed-case>QA</fixed-case> Models @@ -114,6 +120,7 @@ 2022.nlp4convai-1.7 tredici-etal-2022-rewriting QReCC + 10.18653/v1/2022.nlp4convai-1.7 Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents @@ -128,6 +135,7 @@ 2022.nlp4convai-1.8 smith-etal-2022-human PERSONA-CHAT + 10.18653/v1/2022.nlp4convai-1.8 <fixed-case>KG</fixed-case>-<fixed-case>CR</fixed-case>u<fixed-case>SE</fixed-case>: Recurrent Walks over Knowledge Graph for Explainable Conversation Reasoning using Semantic Embeddings @@ -140,6 +148,7 @@ sarkar-etal-2022-kg rajbsk/kg-cruse OpenDialKG + 10.18653/v1/2022.nlp4convai-1.9 Knowledge Distillation Meets Few-Shot Learning: An Approach for Few-Shot Intent Classification Within and Across Domains @@ -151,6 +160,7 @@ 2022.nlp4convai-1.10 sauer-etal-2022-knowledge ATIS + 10.18653/v1/2022.nlp4convai-1.10 <fixed-case>MTL</fixed-case>-<fixed-case>SLT</fixed-case>: Multi-Task Learning for Spoken Language Tasks @@ -169,6 +179,7 @@ LibriSpeech SLURP Spoken-SQuAD + 10.18653/v1/2022.nlp4convai-1.11 Multimodal Conversational <fixed-case>AI</fixed-case>: A Survey of Datasets and Approaches @@ -193,6 +204,7 @@ Visual Question Answering Visual7W YouCook2 + 10.18653/v1/2022.nlp4convai-1.12 Open-domain Dialogue Generation: What We Can Do, Cannot Do, And Should Do Next @@ -207,6 +219,7 @@ kann-etal-2022-open PERSONA-CHAT Wizard of Wikipedia + 10.18653/v1/2022.nlp4convai-1.13 Relevance in Dialogue: Is Less More? An Empirical Comparison of Existing Metrics, and a Novel Simple Metric @@ -219,6 +232,7 @@ ikb-a/idk-dialogue-relevance FED Topical-Chat + 10.18653/v1/2022.nlp4convai-1.14 <fixed-case>R</fixed-case>etro<fixed-case>NLU</fixed-case>: Retrieval Augmented Task-Oriented Semantic Parsing @@ -232,6 +246,7 @@ 2022.nlp4convai-1.15 gupta-etal-2022-retronlu TOPv2 + 10.18653/v1/2022.nlp4convai-1.15 Stylistic Response Generation by Controlling Personality Traits and Intent @@ -246,6 +261,7 @@ PANDORA Topical-Chat Wizard of Wikipedia + 10.18653/v1/2022.nlp4convai-1.16 Toward Knowledge-Enriched Conversational Recommendation Systems @@ -262,6 +278,7 @@ zhang-etal-2022-toward ConceptNet ReDial + 10.18653/v1/2022.nlp4convai-1.17 Understanding and Improving the Exemplar-based Generation for Open-domain Conversation @@ -274,6 +291,7 @@ Exemplar-based generative models for open-domain conversation produce responses based on the exemplars provided by the retriever, taking advantage of generative models and retrieval models. However, due to the one-to-many problem of the open-domain conversation, they often ignore the retrieved exemplars while generating responses or produce responses over-fitted to the retrieved exemplars. To address these advantages, we introduce a training method selecting exemplars that are semantically relevant to the gold response but lexically distanced from the gold response. In the training phase, our training method first uses the gold response instead of dialogue context as a query to select exemplars that are semantically relevant to the gold response. And then, it eliminates the exemplars that lexically resemble the gold responses to alleviate the dependency of the generative models on that exemplars. The remaining exemplars could be irrelevant to the given context since they are searched depending on the gold response. Thus, our training method further utilizes the relevance scores between the given context and the exemplars to penalize the irrelevant exemplars. Extensive experiments demonstrate that our proposed training method alleviates the drawbacks of the existing exemplar-based generative models and significantly improves the performance in terms of appropriateness and informativeness. 2022.nlp4convai-1.18 han-etal-2022-understanding + 10.18653/v1/2022.nlp4convai-1.18 diff --git a/data/xml/2022.nlppower.xml b/data/xml/2022.nlppower.xml index d704c35a00..7aaf0b6864 100644 --- a/data/xml/2022.nlppower.xml +++ b/data/xml/2022.nlppower.xml @@ -30,6 +30,7 @@ GLUE SQuAD SuperGLUE + 10.18653/v1/2022.nlppower-1.1 Towards Stronger Adversarial Baselines Through Human-<fixed-case>AI</fixed-case> Collaboration @@ -40,6 +41,7 @@ 2022.nlppower-1.2 you-lowd-2022-towards SST + 10.18653/v1/2022.nlppower-1.2 Benchmarking for Public Health Surveillance tasks on Social Media with a Domain-Specific Pretrained Language Model @@ -54,6 +56,7 @@ naseem-etal-2022-benchmarking Dreaddit PUBHEALTH + 10.18653/v1/2022.nlppower-1.3 Why only Micro-F1? Class Weighting of Measures for Relation Classification @@ -67,6 +70,7 @@ harbecke-etal-2022-micro dfki-nlp/weighting-schemes-report DocRED + 10.18653/v1/2022.nlppower-1.4 Automatically Discarding Straplines to Improve Data Quality for Abstractive News Summarization @@ -81,6 +85,7 @@ keleg-etal-2022-automatically CNN/Daily Mail NEWSROOM + 10.18653/v1/2022.nlppower-1.5 A global analysis of metrics used for measuring performance in natural language processing @@ -94,6 +99,7 @@ 2022.nlppower-1.6 blagec-etal-2022-global OpenBioLink/ITO + 10.18653/v1/2022.nlppower-1.6 Beyond Static models and test sets: Benchmarking the potential of pre-trained models across tasks and languages @@ -112,6 +118,7 @@ XCOPA XNLI XQuAD + 10.18653/v1/2022.nlppower-1.7 Checking <fixed-case>H</fixed-case>ate<fixed-case>C</fixed-case>heck: a cross-functional analysis of behaviour-aware learning for hate speech detection @@ -122,6 +129,7 @@ 2022.nlppower-1.8 henrique-luz-de-araujo-roth-2022-checking peluz/checking-hatecheck-code + 10.18653/v1/2022.nlppower-1.8 Language Invariant Properties in Natural Language Processing @@ -133,6 +141,7 @@ 2022.nlppower-1.9 bianchi-etal-2022-language milanlproc/language-invariant-properties + 10.18653/v1/2022.nlppower-1.9 <fixed-case>DACT</fixed-case>-<fixed-case>BERT</fixed-case>: Differentiable Adaptive Computation Time for an Efficient <fixed-case>BERT</fixed-case> Inference @@ -145,6 +154,7 @@ 2022.nlppower-1.10 eyzaguirre-etal-2022-dact GLUE + 10.18653/v1/2022.nlppower-1.10 Benchmarking Post-Hoc Interpretability Approaches for Transformer-based Misogyny Detection @@ -157,6 +167,7 @@ 2022.nlppower-1.11 attanasio-etal-2022-benchmarking milanlproc/benchmarking-xai-misogyny + 10.18653/v1/2022.nlppower-1.11 Characterizing the Efficiency vs. Accuracy Trade-off for Long-Context <fixed-case>NLP</fixed-case> Models @@ -172,6 +183,7 @@ LRA QASPER SCROLLS + 10.18653/v1/2022.nlppower-1.12 diff --git a/data/xml/2022.repl4nlp.xml b/data/xml/2022.repl4nlp.xml index 3091fb389c..96a47c208a 100644 --- a/data/xml/2022.repl4nlp.xml +++ b/data/xml/2022.repl4nlp.xml @@ -38,6 +38,7 @@ 2022.repl4nlp-1.1 valerio-miceli-barone-etal-2022-distributionally MTNT + 10.18653/v1/2022.repl4nlp-1.1 <fixed-case>Q</fixed-case>-Learning Scheduler for Multi Task Learning Through the use of Histogram of Task Uncertainty @@ -50,6 +51,7 @@ meshgi-etal-2022-q IMDb Movie Reviews Penn Treebank + 10.18653/v1/2022.repl4nlp-1.2 When does <fixed-case>CLIP</fixed-case> generalize better than unimodal models? When judging human-centric concepts @@ -62,6 +64,7 @@ 2022.repl4nlp-1.4 bielawski-etal-2022-clip Book Cover Dataset + 10.18653/v1/2022.repl4nlp-1.4 From Hyperbolic Geometry Back to Word Embeddings @@ -74,6 +77,7 @@ 2022.repl4nlp-1.5 assylbekov-etal-2022-hyperbolic soltustik/rhg + 10.18653/v1/2022.repl4nlp-1.5 A Comparative Study of Pre-trained Encoders for Low-Resource Named Entity Recognition @@ -90,6 +94,7 @@ CoNLL-2003 Few-NERD WNUT 2017 + 10.18653/v1/2022.repl4nlp-1.6 Clozer”:" Adaptable Data Augmentation for Cloze-style Reading Comprehension @@ -105,6 +110,7 @@ 2022.repl4nlp-1.7 lovenia-etal-2022-clozer ReCAM + 10.18653/v1/2022.repl4nlp-1.7 Analyzing Gender Representation in Multilingual Models @@ -115,6 +121,7 @@ Multilingual language models were shown to allow for nontrivial transfer across scripts and languages. In this work, we study the structure of the internal representations that enable this transfer. We focus on the representations of gender distinctions as a practical case study, and examine the extent to which the gender concept is encoded in shared subspaces across different languages. Our analysis shows that gender representations consist of several prominent components that are shared across languages, alongside language-specific components. The existence of language-independent and language-specific components provides an explanation for an intriguing empirical observation we make”:" while gender classification transfers well across languages, interventions for gender removal trained on a single language do not transfer easily to others. 2022.repl4nlp-1.8 gonen-etal-2022-analyzing + 10.18653/v1/2022.repl4nlp-1.8 Detecting Textual Adversarial Examples Based on Distributional Characteristics of Data Representations @@ -130,6 +137,7 @@ MultiNLI WikiText-103 WikiText-2 + 10.18653/v1/2022.repl4nlp-1.9 A Vocabulary-Free Multilingual Neural Tokenizer for End-to-End Task Learning @@ -143,6 +151,7 @@ Subword tokenization is a commonly used input pre-processing step in most recent NLP models. However, it limits the models’ ability to leverage end-to-end task learning. Its frequency-based vocabulary creation compromises tokenization in low-resource languages, leading models to produce suboptimal representations. Additionally, the dependency on a fixed vocabulary limits the subword models’ adaptability across languages and domains. In this work, we propose a vocabulary-free neural tokenizer by distilling segmentation information from heuristic-based subword tokenization. We pre-train our character-based tokenizer by processing unique words from multilingual corpus, thereby extensively increasing word diversity across languages. Unlike the predefined and fixed vocabularies in subword methods, our tokenizer allows end-to-end task learning, resulting in optimal task-specific tokenization. The experimental results show that replacing the subword tokenizer with our neural tokenizer consistently improves performance on multilingual (NLI) and code-switching (sentiment analysis) tasks, with larger gains in low-resource languages. Additionally, our neural tokenizer exhibits a robust performance on downstream tasks when adversarial noise is present (typos and misspelling), further increasing the initial improvements over statistical subword tokenizers. 2022.repl4nlp-1.10 mofijul-islam-etal-2022-vocabulary + 10.18653/v1/2022.repl4nlp-1.10 Identifying the Limits of Cross-Domain Knowledge Transfer for Pretrained Models @@ -160,6 +169,7 @@ QNLI SNLI SST + 10.18653/v1/2022.repl4nlp-1.11 Temporal Knowledge Graph Reasoning with Low-rank and Model-agnostic Representations @@ -172,6 +182,7 @@ dikeoulias-etal-2022-temporal iodike/chronokge ICEWS + 10.18653/v1/2022.repl4nlp-1.12 <fixed-case>ANNA</fixed-case>”:" Enhanced Language Representation for Question Answering @@ -189,6 +200,7 @@ C4 GLUE SQuAD + 10.18653/v1/2022.repl4nlp-1.13 Video Language Co-Attention with Multimodal Fast-Learning Feature Fusion for <fixed-case>V</fixed-case>ideo<fixed-case>QA</fixed-case> @@ -201,6 +213,7 @@ abdessaied-etal-2022-video MSR-VTT MSVD + 10.18653/v1/2022.repl4nlp-1.15 Detecting Word-Level Adversarial Text Attacks via <fixed-case>SH</fixed-case>apley Additive ex<fixed-case>P</fixed-case>lanations @@ -215,6 +228,7 @@ AG News IMDb Movie Reviews SST + 10.18653/v1/2022.repl4nlp-1.16 Binary Encoded Word Mover’s Distance @@ -223,6 +237,7 @@ Word Mover’s Distance is a textual distance metric which calculates the minimum transport cost between two sets of word embeddings. This metric achieves impressive results on semantic similarity tasks, but is slow and difficult to scale due to the large number of floating point calculations. This paper demonstrates that by combining pre-existing lower bounds with binary encoded word vectors, the metric can be rendered highly efficient in terms of computation time and memory while still maintaining accuracy on several textual similarity tasks. 2022.repl4nlp-1.17 johnson-2022-binary + 10.18653/v1/2022.repl4nlp-1.17 Unsupervised Geometric and Topological Approaches for Cross-Lingual Sentence Representation and Comparison @@ -232,6 +247,7 @@ We propose novel structural-based approaches for the generation and comparison of cross lingual sentence representations. We do so by applying geometric and topological methods to analyze the structure of sentences, as captured by their word embeddings. The key properties of our methods are”:" (a) They are designed to be isometric invariant, in order to provide language-agnostic representations. (b) They are fully unsupervised, and use no cross-lingual signal. The quality of our representations, and their preservation across languages, are evaluated in similarity comparison tasks, achieving competitive results. Furthermore, we show that our structural-based representations can be combined with existing methods for improved results. 2022.repl4nlp-1.18 haim-meirom-bobrowski-2022-unsupervised + 10.18653/v1/2022.repl4nlp-1.18 A Study on Entity Linking Across Domains”:" Which Data is Best for Fine-Tuning? @@ -244,6 +260,7 @@ Entity linking disambiguates mentions by mapping them to entities in a knowledge graph (KG). One important question in today’s research is how to extend neural entity linking systems to new domains. In this paper, we aim at a system that enables linking mentions to entities from a general-domain KG and a domain-specific KG at the same time. In particular, we represent the entities of different KGs in a joint vector space and address the questions of which data is best suited for creating and fine-tuning that space, and whether fine-tuning harms performance on the general domain. We find that a combination of data from both the general and the special domain is most helpful. The first is especially necessary for avoiding performance loss on the general domain. While additional supervision on entities that appear in both KGs performs best in an intrinsic evaluation of the vector space, it has less impact on the downstream task of entity linking. 2022.repl4nlp-1.19 soliman-etal-2022-study + 10.18653/v1/2022.repl4nlp-1.19 <fixed-case>TRA</fixed-case>ttack”:" Text Rewriting Attack Against Text Retrieval @@ -256,6 +273,7 @@ Text retrieval has been widely-used in many online applications to help users find relevant information from a text collection. In this paper, we study a new attack scenario against text retrieval to evaluate its robustness to adversarial attacks under the black-box setting, in which attackers want their own texts to always get high relevance scores with different users’ input queries and thus be retrieved frequently and can receive large amounts of impressions for profits. Considering that most current attack methods only simply follow certain fixed optimization rules, we propose a novel text rewriting attack (TRAttack) method with learning ability from the multi-armed bandit mechanism. Extensive experiments conducted on simulated victim environments demonstrate that TRAttack can yield texts that have higher relevance scores with different given users’ queries than those generated by current state-of-the-art attack methods. We also evaluate TRAttack on Tencent Cloud’s and Baidu Cloud’s commercially-available text retrieval APIs, and the rewritten adversarial texts successfully get high relevance scores with different user queries, which shows the practical potential of our method and the risk of text retrieval systems. 2022.repl4nlp-1.20 song-etal-2022-trattack + 10.18653/v1/2022.repl4nlp-1.20 On the Geometry of Concreteness @@ -264,6 +282,7 @@ In this paper we investigate how concreteness and abstractness are represented in word embedding spaces. We use data for English and German, and show that concreteness and abstractness can be determined independently and turn out to be completely opposite directions in the embedding space. Various methods can be used to determine the direction of concreteness, always resulting in roughly the same vector. Though concreteness is a central aspect of the meaning of words and can be detected clearly in embedding spaces, it seems not as easy to subtract or add concreteness to words to obtain other words or word senses like e.g. can be done with a semantic property like gender. 2022.repl4nlp-1.21 wartena-2022-geometry + 10.18653/v1/2022.repl4nlp-1.21 Towards Improving Selective Prediction Ability of <fixed-case>NLP</fixed-case> Systems @@ -276,6 +295,7 @@ varshney-etal-2022-towards MRPC SNLI + 10.18653/v1/2022.repl4nlp-1.23 On Target Representation in Continuous-output Neural Machine Translation @@ -285,6 +305,7 @@ Continuous generative models proved their usefulness in high-dimensional data, such as image and audio generation. However, continuous models for text generation have received limited attention from the community. In this work, we study continuous text generation using Transformers for neural machine translation (NMT). We argue that the choice of embeddings is crucial for such models, so we aim to focus on one particular aspect”:" target representation via embeddings. We explore pretrained embeddings and also introduce knowledge transfer from the discrete Transformer model using embeddings in Euclidean and non-Euclidean spaces. Our results on the WMT Romanian-English and English-Turkish benchmarks show such transfer leads to the best-performing continuous model. 2022.repl4nlp-1.24 tokarchuk-niculae-2022-target + 10.18653/v1/2022.repl4nlp-1.24 Zero-shot Cross-lingual Transfer is Under-specified Optimization @@ -296,6 +317,7 @@ 2022.repl4nlp-1.25 wu-etal-2022-zero XNLI + 10.18653/v1/2022.repl4nlp-1.25 Same Author or Just Same Topic? Towards Content-Independent Style Representations @@ -306,6 +328,7 @@ Linguistic style is an integral component of language. Recent advances in the development of style representations have increasingly used training objectives from authorship verification (AV)”:" Do two texts have the same author? The assumption underlying the AV training task (same author approximates same writing style) enables self-supervised and, thus, extensive training. However, a good performance on the AV task does not ensure good “general-purpose” style representations. For example, as the same author might typically write about certain topics, representations trained on AV might also encode content information instead of style alone. We introduce a variation of the AV training task that controls for content using conversation or domain labels. We evaluate whether known style dimensions are represented and preferred over content information through an original variation to the recently proposed STEL framework. We find that representations trained by controlling for conversation are better than representations trained with domain or no content control at representing style independent from content. 2022.repl4nlp-1.26 wegmann-etal-2022-author + 10.18653/v1/2022.repl4nlp-1.26 <fixed-case>W</fixed-case>ea<fixed-case>NF</fixed-case>”:" Weak Supervision with Normalizing Flows @@ -316,6 +339,7 @@ 2022.repl4nlp-1.27 stephan-roth-2022-weanf IMDb Movie Reviews + 10.18653/v1/2022.repl4nlp-1.27 diff --git a/data/xml/2022.slpat.xml b/data/xml/2022.slpat.xml index d1ddf15bae..cdf2b36f1d 100644 --- a/data/xml/2022.slpat.xml +++ b/data/xml/2022.slpat.xml @@ -24,6 +24,7 @@ We present MozoLM, an open-source language model microservice package intended for use in AAC text-entry applications, with a particular focus on the design principles of the library. The intent of the library is to allow the ensembling of multiple diverse language models without requiring the clients (user interface designers, system users or speech-language pathologists) to attend to the formats of the models. Issues around privacy, security, dynamic versus static models, and methods of model combination are explored and specific design choices motivated. Some simulation experiments demonstrating the benefits of personalized language model ensembling via the library are presented. 2022.slpat-1.1 roark-gutkin-2022-design + 10.18653/v1/2022.slpat-1.1 <fixed-case>C</fixed-case>olor<fixed-case>C</fixed-case>ode: A <fixed-case>B</fixed-case>ayesian Approach to Augmentative and Alternative Communication with Two Buttons @@ -33,6 +34,7 @@ 2022.slpat-1.2 daly-2022-colorcode mrdaly/colorcode + 10.18653/v1/2022.slpat-1.2 A glimpse of assistive technology in daily life @@ -48,6 +50,7 @@ Robitaille (2010) wrote ‘if all technology companies have accessibility in their mind then people with disabilities won’t be left behind.’ Current technology has come a long way from where it stood decades ago; however, researchers and manufacturers often do not include people with disabilities in the design process and tend to accommodate them after the fact. In this paper we share feedback from four assistive technology users who rely on one or more assistive technology devices in their everyday lives. We believe end users should be part of the design process and that by bringing together experts and users, we can bridge the research/practice gap. 2022.slpat-1.3 vaidyanathan-etal-2022-glimpse + 10.18653/v1/2022.slpat-1.3 A comparison study on patient-psychologist voice diarization @@ -64,6 +67,7 @@ Conversations between a clinician and a patient, in natural conditions, are valuable sources of information for medical follow-up. The automatic analysis of these dialogues could help extract new language markers and speed up the clinicians’ reports. Yet, it is not clear which model is the most efficient to detect and identify the speaker turns, especially for individuals with speech disorders. Here, we proposed a split of the data that allows conducting a comparative evaluation of different diarization methods. We designed and trained end-to-end neural network architectures to directly tackle this task from the raw signal and evaluate each approach under the same metric. We also studied the effect of fine-tuning models to find the best performance. Experimental results are reported on naturalistic clinical conversations between Psychologists and Interviewees, at different stages of Huntington’s disease, displaying a large panel of speech disorders. We found out that our best end-to-end model achieved 19.5 % IER on the test set, compared to 23.6% achieved by the finetuning of the X-vector architecture. Finally, we observed that we could extract clinical markers directly from the automatic systems, highlighting the clinical relevance of our methods. 2022.slpat-1.4 riad-etal-2022-comparison + 10.18653/v1/2022.slpat-1.4 Producing Standard <fixed-case>G</fixed-case>erman Subtitles for <fixed-case>S</fixed-case>wiss <fixed-case>G</fixed-case>erman <fixed-case>TV</fixed-case> Content @@ -74,6 +78,7 @@ In this study we compare two approaches (neural machine translation and edit-based) and the use of synthetic data for the task of translating normalised Swiss German ASR output into correct written Standard German for subtitles, with a special focus on syntactic differences. Results suggest that NMT is better suited to this task and that relatively simple rule-based generation of training data could be a valuable approach for cases where little training data is available and transformations are simple. 2022.slpat-1.5 gerlach-etal-2022-producing + 10.18653/v1/2022.slpat-1.5 Investigating the Medical Coverage of a Translation System into Pictographs for Patients with an Intellectual Disability @@ -85,6 +90,7 @@ Communication between physician and patients can lead to misunderstandings, especially for disabled people. An automatic system that translates natural language into a pictographic language is one of the solutions that could help to overcome this issue. In this preliminary study, we present the French version of a translation system using the Arasaac pictographs and we investigate the strategies used by speech therapists to translate into pictographs. We also evaluate the medical coverage of this tool for translating physician questions and patient instructions. 2022.slpat-1.6 norre-etal-2022-investigating + 10.18653/v1/2022.slpat-1.6 On the Ethical Considerations of Text Simplification @@ -94,6 +100,7 @@ 2022.slpat-1.7 gooding-2022-ethical Newsela + 10.18653/v1/2022.slpat-1.7 Applying the Stereotype Content Model to assess disability bias in popular pre-trained <fixed-case>NLP</fixed-case> models underlying <fixed-case>AI</fixed-case>-based assistive technologies @@ -104,6 +111,7 @@ Stereotypes are a positive or negative, generalized, and often widely shared belief about the attributes of certain groups of people, such as people with sensory disabilities. If stereotypes manifest in assistive technologies used by deaf or blind people, they can harm the user in a number of ways, especially considering the vulnerable nature of the target population. AI models underlying assistive technologies have been shown to contain biased stereotypes, including racial, gender, and disability biases. We build on this work to present a psychology-based stereotype assessment of the representation of disability, deafness, and blindness in BERT using the Stereotype Content Model. We show that BERT contains disability bias, and that this bias differs along established stereotype dimensions. 2022.slpat-1.8 herold-etal-2022-applying + 10.18653/v1/2022.slpat-1.8 <fixed-case>C</fixed-case>ue<fixed-case>B</fixed-case>ot: Cue-Controlled Response Generation for Assistive Interaction Usages @@ -119,6 +127,7 @@ 2022.slpat-1.9 h-kumar-etal-2022-cuebot DailyDialog + 10.18653/v1/2022.slpat-1.9 Challenges in assistive technology development for an endangered language: an <fixed-case>I</fixed-case>rish (<fixed-case>G</fixed-case>aelic) perspective @@ -133,6 +142,7 @@ This paper describes three areas of assistive technology development which deploy the resources and speech technology for Irish (Gaelic), newly emerging from the ABAIR initiative. These include (i) a screenreading facility for visually impaired people, (ii) an application to help develop phonological awareness and early literacy for dyslexic people (iii) a speech-enabled AAC system for non-speaking people. Each of these is at a different stage of development and poses unique challenges: these are dis-cussed along with the approaches adopted to address them. Three guiding principles underlie development. Firstly, the sociolinguistic context and the needs of the community are essential considerations in setting priorities. Secondly, development needs to be language sensitive. The need for skilled researchers with a deep knowledge of Irish structure is illustrated in the case of (ii) and (iii), where aspects of Irish linguistic structure (phonological, morphological and grammatical) and the striking differences from English pose challenges for systems aimed at bilingual Irish-English users. Thirdly, and most importantly, the users and their support networks are central – not as passive recipients of ready-made technologies, but as active partners at every stage of development, from design to implementation, evaluation and dissemination. 2022.slpat-1.10 ni-chasaide-etal-2022-challenges + 10.18653/v1/2022.slpat-1.10 diff --git a/data/xml/2022.spanlp.xml b/data/xml/2022.spanlp.xml index 14fe7903a5..9f1d51557c 100644 --- a/data/xml/2022.spanlp.xml +++ b/data/xml/2022.spanlp.xml @@ -30,6 +30,7 @@ tran-etal-2022-improving FewRel Wiki-ZSL + 10.18653/v1/2022.spanlp-1.1 Choose Your <fixed-case>QA</fixed-case> Model Wisely: A Systematic Study of Generative and Extractive Readers for Question Answering @@ -46,6 +47,7 @@ MRQA Natural Questions SQuAD + 10.18653/v1/2022.spanlp-1.2 Efficient Machine Translation Domain Adaptation @@ -57,6 +59,7 @@ 2022.spanlp-1.3 martins-etal-2022-efficient deep-spin/efficient_knn_mt + 10.18653/v1/2022.spanlp-1.3 Field Extraction from Forms with Unlabeled Data @@ -71,6 +74,7 @@ 2022.spanlp-1.4 gao-etal-2022-field salesforce/inv-cdip + 10.18653/v1/2022.spanlp-1.4 Knowledge Base Index Compression via Dimensionality and Precision Reduction @@ -84,6 +88,7 @@ zouhar-etal-2022-knowledge HotpotQA Natural Questions + 10.18653/v1/2022.spanlp-1.5 diff --git a/data/xml/2022.spnlp.xml b/data/xml/2022.spnlp.xml index f5ef4c8df4..c71a8f4e7d 100644 --- a/data/xml/2022.spnlp.xml +++ b/data/xml/2022.spnlp.xml @@ -29,6 +29,7 @@ kando-etal-2022-multilingual CLAMS Universal Dependencies + 10.18653/v1/2022.spnlp-1.1 Joint Entity and Relation Extraction Based on Table Labeling Using Convolutional Neural Networks @@ -40,6 +41,7 @@ 2022.spnlp-1.2 ma-etal-2022-joint youmima/tablert-cnn + 10.18653/v1/2022.spnlp-1.2 <fixed-case>T</fixed-case>emp<fixed-case>C</fixed-case>aps: A Capsule Network-based Embedding Model for Temporal Knowledge Graph Completion @@ -57,6 +59,7 @@ fu-etal-2022-tempcaps fuguigui/tempcaps ICEWS + 10.18653/v1/2022.spnlp-1.3 <fixed-case>S</fixed-case>lot<fixed-case>GAN</fixed-case>: Detecting Mentions in Text via Adversarial Distant Learning @@ -68,6 +71,7 @@ 2022.spnlp-1.4 daza-etal-2022-slotgan CoNLL-2003 + 10.18653/v1/2022.spnlp-1.4 A Joint Learning Approach for Semi-supervised Neural Topic Modeling @@ -80,6 +84,7 @@ Topic models are some of the most popular ways to represent textual data in an interpret-able manner. Recently, advances in deep generative models, specifically auto-encoding variational Bayes (AEVB), have led to the introduction of unsupervised neural topic models, which leverage deep generative models as opposed to traditional statistics-based topic models. We extend upon these neural topic models by introducing the Label-Indexed Neural Topic Model (LI-NTM), which is, to the extent of our knowledge, the first effective upstream semi-supervised neural topic model. We find that LI-NTM outperforms existing neural topic models in document reconstruction benchmarks, with the most notable results in low labeled data regimes and for data-sets with informative labels; furthermore, our jointly learned classifier outperforms baseline classifiers in ablation studies. 2022.spnlp-1.5 chiu-etal-2022-joint + 10.18653/v1/2022.spnlp-1.5 Neural String Edit Distance @@ -90,6 +95,7 @@ 2022.spnlp-1.6 libovicky-fraser-2022-neural jlibovicky/neural-string-edit-distance + 10.18653/v1/2022.spnlp-1.6 Predicting Attention Sparsity in Transformers @@ -104,6 +110,7 @@ treviso-etal-2022-predicting WikiText-103 WikiText-2 + 10.18653/v1/2022.spnlp-1.7 diff --git a/data/xml/2022.wassa.xml b/data/xml/2022.wassa.xml index 803f7975ab..781cc1cf20 100644 --- a/data/xml/2022.wassa.xml +++ b/data/xml/2022.wassa.xml @@ -30,6 +30,7 @@ Authors of posts in social media communicate their emotions and what causes them with text and images. While there is work on emotion and stimulus detection for each modality separately, it is yet unknown if the modalities contain complementary emotion information in social media. We aim at filling this research gap and contribute a novel, annotated corpus of English multimodal Reddit posts. On this resource, we develop models to automatically detect the relation between image and text, an emotion stimulus category and the emotion class. We evaluate if these tasks require both modalities and find for the image–text relations, that text alone is sufficient for most categories (complementary, illustrative, opposing): the information in the text allows to predict if an image is required for emotion understanding. The emotions of anger and sadness are best predicted with a multimodal model, while text alone is sufficient for disgust, joy, and surprise. Stimuli depicted by objects, animals, food, or a person are best predicted by image-only models, while multimodal mod- els are most effective on art, events, memes, places, or screenshots. 2022.wassa-1.1 khlyzova-etal-2022-complementarity + 10.18653/v1/2022.wassa-1.1 Multiplex Anti-<fixed-case>A</fixed-case>sian Sentiment before and during the Pandemic: Introducing New Datasets from <fixed-case>T</fixed-case>witter Mining @@ -42,6 +43,7 @@ COVID-19 has disproportionately threatened minority communities in the U.S, not only in health but also in societal impact. However, social scientists and policymakers lack critical data to capture the dynamics of the anti-Asian hate trend and to evaluate its scale and scope. We introduce new datasets from Twitter related to anti-Asian hate sentiment before and during the pandemic. Relying on Twitter’s academic API, we retrieve hateful and counter-hate tweets from the Twitter Historical Database. To build contextual understanding and collect related racial cues, we also collect instances of heated arguments, often political, but not necessarily hateful, discussing Chinese issues. We then use the state-of-the-art hate speech classifiers to discern whether these tweets express hatred. These datasets can be used to study hate speech, general anti-Asian or Chinese sentiment, and hate linguistics by social scientists as well as to evaluate and build hate speech or sentiment analysis classifiers by computational scholars. 2022.wassa-1.2 lin-etal-2022-multiplex + 10.18653/v1/2022.wassa-1.2 Domain-Aware Contrastive Knowledge Transfer for Multi-domain Imbalanced Data @@ -53,6 +55,7 @@ 2022.wassa-1.3 ke-etal-2022-domain LIAR + 10.18653/v1/2022.wassa-1.3 “splink” is happy and “phrouth” is scary: Emotion Intensity Analysis for Nonsense Words @@ -64,6 +67,7 @@ People associate affective meanings to words - “death” is scary and sad while “party” is connotated with surprise and joy. This raises the question if the association is purely a product of the learned affective imports inherent to semantic meanings, or is also an effect of other features of words, e.g., morphological and phonological patterns. We approach this question with an annotation-based analysis leveraging nonsense words. Specifically, we conduct a best-worst scaling crowdsourcing study in which participants assign intensity scores for joy, sadness, anger, disgust, fear, and surprise to 272 non-sense words and, for comparison of the results to previous work, to 68 real words. Based on this resource, we develop character-level and phonology-based intensity regressors. We evaluate them on both nonsense words and real words (making use of the NRC emotion intensity lexicon of 7493 words), across six emotion categories. The analysis of our data reveals that some phonetic patterns show clear differences between emotion intensities. For instance, s as a first phoneme contributes to joy, sh to surprise, p as last phoneme more to disgust than to anger and fear. In the modelling experiments, a regressor trained on real words from the NRC emotion intensity lexicon shows a higher performance (r = 0.17) than regressors that aim at learning the emotion connotation purely from nonsense words. We conclude that humans do associate affective meaning to words based on surface patterns, but also based on similarities to existing words (“juy” to “joy”, or “flike” to “like”). 2022.wassa-1.4 sabbatino-etal-2022-splink + 10.18653/v1/2022.wassa-1.4 <fixed-case>S</fixed-case>ent<fixed-case>EMO</fixed-case>: A Multilingual Adaptive Platform for Aspect-based Sentiment and Emotion Analysis @@ -78,6 +82,7 @@ In this paper, we present the SentEMO platform, a tool that provides aspect-based sentiment analysis and emotion detection of unstructured text data such as reviews, emails and customer care conversations. Currently, models have been trained for five domains and one general domain and are implemented in a pipeline approach, where the output of one model serves as the input for the next. The results are presented in three dashboards, allowing companies to gain more insights into what stakeholders think of their products and services. The SentEMO platform is available at https://sentemo.ugent.be 2022.wassa-1.5 de-geyndt-etal-2022-sentemo + 10.18653/v1/2022.wassa-1.5 Can Emotion Carriers Explain Automatic Sentiment Prediction? A Study on Personal Narratives @@ -91,6 +96,7 @@ 2022.wassa-1.6 mousavi-etal-2022-emotion sislab/pns_val-ec_annotation + 10.18653/v1/2022.wassa-1.6 Infusing Knowledge from <fixed-case>W</fixed-case>ikipedia to Enhance Stance Detection @@ -102,6 +108,7 @@ 2022.wassa-1.7 he-etal-2022-infusing zihaohe123/wiki-enhanced-stance-detection + 10.18653/v1/2022.wassa-1.7 Uncertainty Regularized Multi-Task Learning @@ -113,6 +120,7 @@ 2022.wassa-1.8 meshgi-etal-2022-uncertainty IMDb Movie Reviews + 10.18653/v1/2022.wassa-1.8 Evaluating Contextual Embeddings and their Extraction Layers for Depression Assessment @@ -123,6 +131,7 @@ Many recent works in natural language processing have demonstrated ability to assess aspects of mental health from personal discourse. At the same time, pre-trained contextual word embedding models have grown to dominate much of NLP but little is known empirically on how to best apply them for mental health assessment. Using degree of depression as a case study, we do an empirical analysis on which off-the-shelf language model, individual layers, and combinations of layers seem most promising when applied to human-level NLP tasks. Notably, we find RoBERTa most effective and, despite the standard in past work suggesting the second-to-last or concatenation of the last 4 layers, we find layer 19 (sixth-to last) is at least as good as layer 23 when using 1 layer. Further, when using multiple layers, distributing them across the second half (i.e. Layers 12+), rather than last 4, of the 24 layers yielded the most accurate results. 2022.wassa-1.9 matero-etal-2022-understanding + 10.18653/v1/2022.wassa-1.9 Emotion Analysis of Writers and Readers of <fixed-case>J</fixed-case>apanese Tweets on Vaccinations @@ -136,6 +145,7 @@ 2022.wassa-1.10 ramos-etal-2022-emotion patrickjohnramos/bert-japan-vaccination + 10.18653/v1/2022.wassa-1.10 Opinion-based Relational Pivoting for Cross-domain Aspect Term Extraction @@ -149,6 +159,7 @@ Domain adaptation methods often exploit domain-transferable input features, a.k.a. pivots. The task of Aspect and Opinion Term Extraction presents a special challenge for domain transfer: while opinion terms largely transfer across domains, aspects change drastically from one domain to another (e.g. from restaurants to laptops). In this paper, we investigate and establish empirically a prior conjecture, which suggests that the linguistic relations connecting opinion terms to their aspects transfer well across domains and therefore can be leveraged for cross-domain aspect term extraction. We present several analyses supporting this conjecture, via experiments with four linguistic dependency formalisms to represent relation patterns. Subsequently, we present an aspect term extraction method that drives models to consider opinion–aspect relations via explicit multitask objectives. This method provides significant performance gains, even on top of a prior state-of-the-art linguistically-informed model, which are shown in analysis to stem from the relational pivoting signal. 2022.wassa-1.11 klein-etal-2022-opinion + 10.18653/v1/2022.wassa-1.11 <fixed-case>E</fixed-case>nglish-<fixed-case>M</fixed-case>alay Word Embeddings Alignment for Cross-lingual Emotion Classification with Hierarchical Attention Network @@ -158,6 +169,7 @@ The main challenge in English-Malay cross-lingual emotion classification is that there are no Malay training emotion corpora. Given that machine translation could fall short in contextually complex tweets, we only limited machine translation to the word level. In this paper, we bridge the language gap between English and Malay through cross-lingual word embeddings constructed using singular value decomposition. We pre-trained our hierarchical attention model using English tweets and fine-tuned it using a set of gold standard Malay tweets. Our model uses significantly less computational resources compared to the language models. Experimental results show that the performance of our model is better than mBERT in zero-shot learning by 2.4% and Malay BERT by 0.8% when a limited number of Malay tweets is available. In exchange for 6 – 7 times less in computational time, our model only lags behind mBERT and XLM-RoBERTa by a margin of 0.9 – 4.3 % in few-shot learning. Also, the word-level attention could be transferred to the Malay tweets accurately using the cross-lingual word embeddings. 2022.wassa-1.12 lim-liew-2022-english-malay + 10.18653/v1/2022.wassa-1.12 Assessment of Massively Multilingual Sentiment Classifiers @@ -171,6 +183,7 @@ Models are increasing in size and complexity in the hunt for SOTA. But what if those 2%increase in performance does not make a difference in a production use case? Maybe benefits from a smaller, faster model outweigh those slight performance gains. Also, equally good performance across languages in multilingual tasks is more important than SOTA results on a single one. We present the biggest, unified, multilingual collection of sentiment analysis datasets. We use these to assess 11 models and 80 high-quality sentiment datasets (out of 342 raw datasets collected) in 27 languages and included results on the internally annotated datasets. We deeply evaluate multiple setups, including fine-tuning transformer-based models for measuring performance. We compare results in numerous dimensions addressing the imbalance in both languages coverage and dataset sizes. Finally, we present some best practices for working with such a massive collection of datasets and models for a multi-lingual perspective. 2022.wassa-1.13 rajda-etal-2022-assessment + 10.18653/v1/2022.wassa-1.13 Improving Social Meaning Detection with Pragmatic Masking and Surrogate Fine-Tuning @@ -180,6 +193,7 @@ Masked language models (MLMs) are pre-trained with a denoising objective that is in a mismatch with the objective of downstream fine-tuning. We propose pragmatic masking and surrogate fine-tuning as two complementing strategies that exploit social cues to drive pre-trained representations toward a broad set of concepts useful for a wide class of social meaning tasks. We test our models on 15 different Twitter datasets for social meaning detection. Our methods achieve 2.34% F_1 over a competitive baseline, while outperforming domain-specific language models pre-trained on large datasets. Our methods also excel in few-shot learning: with only 5% of training data (severely few-shot), our methods enable an impressive 68.54% average F_1. The methods are also language agnostic, as we show in a zero-shot setting involving six datasets from three different languages. 2022.wassa-1.14 zhang-abdul-mageed-2022-improving + 10.18653/v1/2022.wassa-1.14 Distinguishing In-Groups and Onlookers by Language Use @@ -195,6 +209,7 @@ Inferring group membership of social media users is of high interest in many domains. Group membership is typically inferred via network interactions with other members, or by the usage of in-group language. However, network information is incomplete when users or groups move between platforms, and in-group keywords lose significance as public discussion about a group increases. Similarly, using keywords to filter content and users can fail to distinguish between the various groups that discuss a topic—perhaps confounding research on public opinion and narrative trends. We present a classifier intended to distinguish members of groups from users discussing a group based on contextual usage of keywords. We demonstrate the classifier on a sample of community pairs from Reddit and focus on results related to the COVID-19 pandemic. 2022.wassa-1.15 minot-etal-2022-distinguishing + 10.18653/v1/2022.wassa-1.15 Irony Detection for <fixed-case>D</fixed-case>utch: a Venture into the Implicit @@ -206,6 +221,7 @@ This paper presents the results of a replication experiment for automatic irony detection in Dutch social media text, investigating both a feature-based SVM classifier, as was done by Van Hee et al. (2017) and and a transformer-based approach. In addition to building a baseline model, an important goal of this research is to explore the implementation of common-sense knowledge in the form of implicit sentiment, as we strongly believe that common-sense and connotative knowledge are essential to the identification of irony and implicit meaning in tweets.We show promising results and the presented approach can provide a solid baseline and serve as a staging ground to build on in future experiments for irony detection in Dutch. 2022.wassa-1.16 maladry-etal-2022-irony + 10.18653/v1/2022.wassa-1.16 Pushing on Personality Detection from Verbal Behavior: A Transformer Meets Text Contours of Psycholinguistic Features @@ -217,6 +233,7 @@ Research at the intersection of personality psychology, computer science, and linguistics has recently focused increasingly on modeling and predicting personality from language use. We report two major improvements in predicting personality traits from text data: (1) to our knowledge, the most comprehensive set of theory-based psycholinguistic features and (2) hybrid models that integrate a pre-trained Transformer Language Model BERT and Bidirectional Long Short-Term Memory (BLSTM) networks trained on within-text distributions (‘text contours’) of psycholinguistic features. We experiment with BLSTM models (with and without Attention) and with two techniques for applying pre-trained language representations from the transformer model - ‘feature-based’ and ‘fine-tuning’. We evaluate the performance of the models we built on two benchmark datasets that target the two dominant theoretical models of personality: the Big Five Essay dataset (Pennebaker and King, 1999) and the MBTI Kaggle dataset (Li et al., 2018). Our results are encouraging as our models outperform existing work on the same datasets. More specifically, our models achieve improvement in classification accuracy by 2.9% on the Essay dataset and 8.28% on the Kaggle MBTI dataset. In addition, we perform ablation experiments to quantify the impact of different categories of psycholinguistic features in the respective personality prediction models. 2022.wassa-1.17 kerz-etal-2022-pushing + 10.18653/v1/2022.wassa-1.17 <fixed-case>XLM</fixed-case>-<fixed-case>EMO</fixed-case>: Multilingual Emotion Prediction in Social Media Text @@ -228,6 +245,7 @@ 2022.wassa-1.18 bianchi-etal-2022-xlm milanlproc/xlm-emo + 10.18653/v1/2022.wassa-1.18 Evaluating Content Features and Classification Methods for Helpfulness Prediction of Online Reviews: Establishing a Benchmark for <fixed-case>P</fixed-case>ortuguese @@ -237,6 +255,7 @@ Over the years, the review helpfulness prediction task has been the subject of several works, but remains being a challenging issue in Natural Language Processing, as results vary a lot depending on the domain, on the adopted features and on the chosen classification strategy. This paper attempts to evaluate the impact of content features and classification methods for two different domains. In particular, we run our experiments for a low resource language – Portuguese –, trying to establish a benchmark for this language. We show that simple features and classical classification methods are powerful for the task of helpfulness prediction, but are largely outperformed by a convolutional neural network-based solution. 2022.wassa-1.19 sousa-pardo-2022-evaluating + 10.18653/v1/2022.wassa-1.19 <fixed-case>WASSA</fixed-case> 2022 Shared Task: Predicting Empathy, Emotion and Personality in Reaction to News Stories @@ -249,6 +268,7 @@ 2022.wassa-1.20 barriere-etal-2022-wassa GoEmotions + 10.18653/v1/2022.wassa-1.20 <fixed-case>IUCL</fixed-case> at <fixed-case>WASSA</fixed-case> 2022 Shared Task: A Text-only Approach to Empathy and Emotion Detection @@ -259,6 +279,7 @@ Our system, IUCL, participated in the WASSA 2022 Shared Task on Empathy Detection and Emotion Classification. Our main goal in building this system is to investigate how the use of demographic attributes influences performance. Our (official) results show that our text-only systems perform very competitively, ranking first in the empathy detection task, reaching an average Pearson correlation of 0.54, and second in the emotion classification task, reaching a Macro-F of 0.572. Our systems that use both text and demographic data are less competitive. 2022.wassa-1.21 chen-etal-2022-iucl + 10.18653/v1/2022.wassa-1.21 Continuing Pre-trained Model with Multiple Training Strategies for Emotional Classification @@ -271,6 +292,7 @@ Emotion is the essential attribute of human beings. Perceiving and understanding emotions in a human-like manner is the most central part of developing emotional intelligence. This paper describes the contribution of the LingJing team’s method to the Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA) 2022 shared task on Emotion Classification. The participants are required to predict seven emotions from empathic responses to news or stories that caused harm to individuals, groups, or others. This paper describes the continual pre-training method for the masked language model (MLM) to enhance the DeBERTa pre-trained language model. Several training strategies are designed to further improve the final downstream performance including the data augmentation with the supervised transfer, child-tuning training, and the late fusion method. Extensive experiments on the emotional classification dataset show that the proposed method outperforms other state-of-the-art methods, demonstrating our method’s effectiveness. Moreover, our submission ranked Top-1 with all metrics in the evaluation phase for the Emotion Classification task. 2022.wassa-1.22 li-etal-2022-continuing + 10.18653/v1/2022.wassa-1.22 Empathy and Distress Prediction using Transformer Multi-output Regression and Emotion Analysis with an Ensemble of Supervised and Zero-Shot Learning Models @@ -283,6 +305,7 @@ 2022.wassa-1.23 del-arco-etal-2022-empathy CARER + 10.18653/v1/2022.wassa-1.23 Leveraging Emotion-Specific features to improve Transformer performance for Emotion Classification @@ -295,6 +318,7 @@ This paper describes team PVG’s AI Club’s approach to the Emotion Classification shared task held at WASSA 2022. This Track 2 sub-task focuses on building models which can predict a multi-class emotion label based on essays from news articles where a person, group or another entity is affected. Baseline transformer models have been demonstrating good results on sequence classification tasks, and we aim to improve this performance with the help of ensembling techniques, and by leveraging two variations of emotion-specific representations. We observe better results than our baseline models and achieve an accuracy of 0.619 and a macro F1 score of 0.520 on the emotion classification task. 2022.wassa-1.24 desai-etal-2022-leveraging + 10.18653/v1/2022.wassa-1.24 Transformer based ensemble for emotion detection @@ -307,6 +331,7 @@ 2022.wassa-1.25 kane-etal-2022-transformer GoEmotions + 10.18653/v1/2022.wassa-1.25 Team <fixed-case>IITP</fixed-case>-<fixed-case>AINLPML</fixed-case> at <fixed-case>WASSA</fixed-case> 2022: Empathy Detection, Emotion Classification and Personality Detection @@ -318,6 +343,7 @@ Computational comprehension and identifying emotional components in language have been critical in enhancing human-computer connection in recent years. The WASSA 2022 Shared Task introduced four tracks and released a dataset of news stories: Track-1 for Empathy and Distress Prediction, Track-2 for Emotion classification, Track-3 for Personality prediction, and Track-4 for Interpersonal Reactivity Index prediction at the essay level. This paper describes our participation in the WASSA 2022 shared task on the tasks mentioned above. We developed multi-task deep learning methods to address Tracks 1 and 2 and machine learning models for Track 3 and 4. Our developed systems achieved average Pearson scores of 0.483, 0.05, and 0.08 for Track 1, 3, and 4, respectively, and a macro F1 score of 0.524 for Track 2 on the test set. We ranked 8th, 11th, 2nd and 2nd for tracks 1, 2, 3, and 4 respectively. 2022.wassa-1.26 ghosh-etal-2022-team + 10.18653/v1/2022.wassa-1.26 Transformer-based Architecture for Empathy Prediction and Emotion Classification @@ -329,6 +355,7 @@ This paper describes the contribution of team PHG to the WASSA 2022 shared task on Empathy Prediction and Emotion Classification. The broad goal of this task was to model an empathy score, a distress score and the type of emotion associated with the person who had reacted to the essay written in response to a newspaper article. We have used the RoBERTa model for training and top of which few layers are added to finetune the transformer. We also use few machine learning techniques to augment as well as upsample the data. Our system achieves a Pearson Correlation Coefficient of 0.488 on Task 1 (Empathy - 0.470 and Distress - 0.506) and Macro F1-score of 0.531 on Task 2. 2022.wassa-1.27 vasava-etal-2022-transformer + 10.18653/v1/2022.wassa-1.27 Prompt-based Pre-trained Model for Personality and Interpersonal Reactivity Prediction @@ -342,6 +369,7 @@ This paper describes the LingJing team’s method to the Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA) 2022 shared task on Personality Prediction (PER) and Reactivity Index Prediction (IRI). In this paper, we adopt the prompt-based method with the pre-trained language model to accomplish these tasks. Specifically, the prompt is designed to provide knowledge of the extra personalized information for enhancing the pre-trained model. Data augmentation and model ensemble are adopted for obtaining better results. Extensive experiments are performed, which shows the effectiveness of the proposed method. On the final submission, our system achieves a Pearson Correlation Coefficient of 0.2301 and 0.2546 on Track 3 and Track 4 respectively. We ranked 1-st on both sub-tasks. 2022.wassa-1.28 li-etal-2022-prompt-based + 10.18653/v1/2022.wassa-1.28 <fixed-case>SURREY</fixed-case>-<fixed-case>CTS</fixed-case>-<fixed-case>NLP</fixed-case> at <fixed-case>WASSA</fixed-case>2022: An Experiment of Discourse and Sentiment Analysis for the Prediction of Empathy, Distress and Emotion @@ -355,6 +383,7 @@ 2022.wassa-1.29 qian-etal-2022-surrey GoEmotions + 10.18653/v1/2022.wassa-1.29 An Ensemble Approach to Detect Emotions at an Essay Level @@ -366,6 +395,7 @@ maheshwari-varma-2022-ensemble him-mah10/an-ensemble-approach-to-detect-emotions-at-an-essay-level GoEmotions + 10.18653/v1/2022.wassa-1.30 <fixed-case>CAISA</fixed-case> at <fixed-case>WASSA</fixed-case> 2022: Adapter-Tuning for Empathy Prediction @@ -378,6 +408,7 @@ lahnala-etal-2022-caisa caisa-lab/wassa-empathy-adapters CARER + 10.18653/v1/2022.wassa-1.31 <fixed-case>NLPOP</fixed-case>: a Dataset for Popularity Prediction of Promoted <fixed-case>NLP</fixed-case> Research on <fixed-case>T</fixed-case>witter @@ -389,6 +420,7 @@ 2022.wassa-1.32 obadic-etal-2022-nlpop lobadic/nlpop + 10.18653/v1/2022.wassa-1.32 Tagging Without Rewriting: A Probabilistic Model for Unpaired Sentiment and Style Transfer @@ -399,6 +431,7 @@ shuo-2022-tagging GYAFC IMDb Movie Reviews + 10.18653/v1/2022.wassa-1.33 Polite Task-oriented Dialog Agents: To Generate or to Rewrite? @@ -410,6 +443,7 @@ 2022.wassa-1.34 silva-etal-2022-polite MMD + 10.18653/v1/2022.wassa-1.34 Items from Psychometric Tests as Training Data for Personality Profiling Models of <fixed-case>T</fixed-case>witter Users @@ -420,6 +454,7 @@ Machine-learned models for author profiling in social media often rely on data acquired via self-reporting-based psychometric tests (questionnaires) filled out by social media users. This is an expensive but accurate data collection strategy. Another, less costly alternative, which leads to potentially more noisy and biased data, is to rely on labels inferred from publicly available information in the profiles of the users, for instance self-reported diagnoses or test results. In this paper, we explore a third strategy, namely to directly use a corpus of items from validated psychometric tests as training data. Items from psychometric tests often consist of sentences from an I-perspective (e.g., ‘I make friends easily.’). Such corpora of test items constitute ‘small data’, but their availability for many concepts is a rich resource. We investigate this approach for personality profiling, and evaluate BERT classifiers fine-tuned on such psychometric test items for the big five personality traits (openness, conscientiousness, extraversion, agreeableness, neuroticism) and analyze various augmentation strategies regarding their potential to address the challenges coming with such a small corpus. Our evaluation on a publicly available Twitter corpus shows a comparable performance to in-domain training for 4/5 personality traits with T5-based data augmentation. 2022.wassa-1.35 kreuter-etal-2022-items + 10.18653/v1/2022.wassa-1.35 diff --git a/data/xml/2022.wit.xml b/data/xml/2022.wit.xml index b3d18dbec0..d3ae7690df 100644 --- a/data/xml/2022.wit.xml +++ b/data/xml/2022.wit.xml @@ -27,6 +27,7 @@ 2022.wit-1.1 park-lee-2022-unsupervised seongminp/graph-dialogue-summary + 10.18653/v1/2022.wit-1.1 An Interactive Analysis of User-reported Long <fixed-case>COVID</fixed-case> Symptoms using <fixed-case>T</fixed-case>witter Data @@ -37,6 +38,7 @@ With millions of documented recoveries from COVID-19 worldwide, various long-term sequelae have been observed in a large group of survivors. This paper is aimed at systematically analyzing user-generated conversations on Twitter that are related to long-term COVID symptoms for a better understanding of the Long COVID health consequences. Using an interactive information extraction tool built especially for this purpose, we extracted key information from the relevant tweets and analyzed the user-reported Long COVID symptoms with respect to their demographic and geographical characteristics. The results of our analysis are expected to improve the public awareness on long-term COVID-19 sequelae and provide important insights to public health authorities. 2022.wit-1.2 miao-etal-2022-interactive + 10.18653/v1/2022.wit-1.2 Bi-Directional Recurrent Neural Ordinary Differential Equations for Social Media Text Classification @@ -47,6 +49,7 @@ Classification of posts in social media such as Twitter is difficult due to the noisy and short nature of texts. Sequence classification models based on recurrent neural networks (RNN) are popular for classifying posts that are sequential in nature. RNNs assume the hidden representation dynamics to evolve in a discrete manner and do not consider the exact time of the posting. In this work, we propose to use recurrent neural ordinary differential equations (RNODE) for social media post classification which consider the time of posting and allow the computation of hidden representation to evolve in a time-sensitive continuous manner. In addition, we propose a novel model, Bi-directional RNODE (Bi-RNODE), which can consider the information flow in both the forward and backward directions of posting times to predict the post label. Our experiments demonstrate that RNODE and Bi-RNODE are effective for the problem of stance classification of rumours in social media. 2022.wit-1.3 tamire-etal-2022-bi + 10.18653/v1/2022.wit-1.3