diff --git a/data/xml/2022.acl.xml b/data/xml/2022.acl.xml
index b81c16095d..57f536bd80 100644
--- a/data/xml/2022.acl.xml
+++ b/data/xml/2022.acl.xml
@@ -34,6 +34,7 @@
MultiRC
QNLI
SST
+ 10.18653/v1/2022.acl-long.1
Quantified Reproducibility Assessment of NLP Results
@@ -44,6 +45,7 @@
This paper describes and tests a method for carrying out quantified reproducibility assessment (QRA) that is based on concepts and definitions from metrology. QRA produces a single score estimating the degree of reproducibility of a given system and evaluation measure, on the basis of the scores from, and differences between, different reproductions. We test QRA on 18 different system and evaluation measure combinations (involving diverse NLP tasks and types of evaluation), for each of which we have the original results and one to seven reproduction results. The proposed QRA method produces degree-of-reproducibility scores that are comparable across multiple reproductions not only of the same, but also of different, original studies. We find that the proposed method facilitates insights into causes of variation between reproductions, and as a result, allows conclusions to be drawn about what aspects of system and/or evaluation design need to be changed in order to improve reproducibility.
2022.acl-long.2
belz-etal-2022-quantified
+ 10.18653/v1/2022.acl-long.2
Rare Tokens Degenerate All Tokens: Improving Neural Text Generation via Adaptive Gradient Gating for Rare Token Embeddings
@@ -59,6 +61,7 @@
yu-etal-2022-rare
WikiText-103
WikiText-2
+ 10.18653/v1/2022.acl-long.3
AlephBERT: Language Model Pre-training and Evaluation from Sub-Word to Sentence Level
@@ -73,6 +76,7 @@
2022.acl-long.4
seker-etal-2022-alephbert
OSCAR
+ 10.18653/v1/2022.acl-long.4
Learning to Imagine: Integrating Counterfactual Thinking in Neural Discrete Reasoning
@@ -88,6 +92,7 @@
li-etal-2022-learning
DROP
HybridQA
+ 10.18653/v1/2022.acl-long.5
Domain Adaptation in Multilingual and Multi-Domain Monolingual Settings for Complex Word Identification
@@ -99,6 +104,7 @@
Complex word identification (CWI) is a cornerstone process towards proper text simplification. CWI is highly dependent on context, whereas its difficulty is augmented by the scarcity of available datasets which vary greatly in terms of domains and languages. As such, it becomes increasingly more difficult to develop a robust model that generalizes across a wide array of input examples. In this paper, we propose a novel training technique for the CWI task based on domain adaptation to improve the target character and context representations. This technique addresses the problem of working with multiple domains, inasmuch as it creates a way of smoothing the differences between the explored datasets. Moreover, we also propose a similar auxiliary task, namely text simplification, that can be used to complement lexical complexity prediction. Our model obtains a boost of up to 2.42% in terms of Pearson Correlation Coefficients in contrast to vanilla training techniques, when considering the CompLex from the Lexical Complexity Prediction 2021 dataset. At the same time, we obtain an increase of 3% in Pearson scores, while considering a cross-lingual setup relying on the Complex Word Identification 2018 dataset. In addition, our model yields state-of-the-art results in terms of Mean Absolute Error.
2022.acl-long.6
zaharia-etal-2022-domain
+ 10.18653/v1/2022.acl-long.6
JointCL: A Joint Contrastive Learning Framework for Zero-Shot Stance Detection
@@ -115,6 +121,7 @@
2022.acl-long.7.software.zip
liang-etal-2022-jointcl
hitsz-hlt/jointcl
+ 10.18653/v1/2022.acl-long.7
[CASPI] Causal-aware Safe Policy Improvement for Task-oriented Dialogue
@@ -126,6 +133,7 @@
2022.acl-long.8
ramachandran-etal-2022-caspi
MultiWOZ
+ 10.18653/v1/2022.acl-long.8
UniTranSeR: A Unified Transformer Semantic Representation Framework for Multimodal Task-Oriented Dialog System
@@ -139,6 +147,7 @@
2022.acl-long.9.software.zip
ma-etal-2022-unitranser
MMD
+ 10.18653/v1/2022.acl-long.9
Dynamic Schema Graph Fusion Network for Multi-Domain Dialogue State Tracking
@@ -152,6 +161,7 @@
2022.acl-long.10
feng-etal-2022-dynamic
SGD
+ 10.18653/v1/2022.acl-long.10
Attention Temperature Matters in Abstractive Summarization Distillation
@@ -165,6 +175,7 @@
2022.acl-long.11.software.zip
zhang-etal-2022-attention
shengqiang-zhang/plate
+ 10.18653/v1/2022.acl-long.11
Towards Making the Most of Cross-Lingual Transfer for Zero-Shot Neural Machine Translation
@@ -182,6 +193,7 @@
ghchen18/acl22-sixtp
FLORES-101
FLoRes
+ 10.18653/v1/2022.acl-long.12
TopWORDS-Seg: Simultaneous Text Segmentation and Word Discovery for Open-Domain Chinese Texts via Bayesian Inference
@@ -192,6 +204,7 @@
Processing open-domain Chinese texts has been a critical bottleneck in computational linguistics for decades, partially because text segmentation and word discovery often entangle with each other in this challenging scenario. No existing methods yet can achieve effective text segmentation and word discovery simultaneously in open domain. This study fills in this gap by proposing a novel method called TopWORDS-Seg based on Bayesian inference, which enjoys robust performance and transparent interpretation when no training corpus and domain vocabulary are available. Advantages of TopWORDS-Seg are demonstrated by a series of experimental studies.
2022.acl-long.13
pan-etal-2022-topwords
+ 10.18653/v1/2022.acl-long.13
An Unsupervised Multiple-Task and Multiple-Teacher Model for Cross-lingual Named Entity Recognition
@@ -207,6 +220,7 @@
2022.acl-long.14.software.zip
li-etal-2022-unsupervised-multiple
CoNLL-2003
+ 10.18653/v1/2022.acl-long.14
Discriminative Marginalized Probabilistic Neural Method for Multi-Document Summarization of Medical Literature
@@ -218,6 +232,7 @@
Although current state-of-the-art Transformer-based solutions succeeded in a wide range for single-document NLP tasks, they still struggle to address multi-input tasks such as multi-document summarization. Many solutions truncate the inputs, thus ignoring potential summary-relevant contents, which is unacceptable in the medical domain where each information can be vital. Others leverage linear model approximations to apply multi-input concatenation, worsening the results because all information is considered, even if it is conflicting or noisy with respect to a shared background. Despite the importance and social impact of medicine, there are no ad-hoc solutions for multi-document summarization. For this reason, we propose a novel discriminative marginalized probabilistic method (DAMEN) trained to discriminate critical information from a cluster of topic-related medical documents and generate a multi-document summary via token probability marginalization. Results prove we outperform the previous state-of-the-art on a biomedical dataset for multi-document summarization of systematic literature reviews. Moreover, we perform extensive ablation studies to motivate the design choices and prove the importance of each module of our method.
2022.acl-long.15
moro-etal-2022-discriminative
+ 10.18653/v1/2022.acl-long.15
Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm
@@ -239,6 +254,7 @@
huang-etal-2022-sparse
GLUE
QNLI
+ 10.18653/v1/2022.acl-long.16
CipherDAug: Ciphertext based Data Augmentation for Neural Machine Translation
@@ -250,6 +266,7 @@
2022.acl-long.17
kambhatla-etal-2022-cipherdaug
protonish/cipherdaug-nmt
+ 10.18653/v1/2022.acl-long.17
Overlap-based Vocabulary Generation Improves Cross-lingual Transfer Among Related Languages
@@ -262,6 +279,7 @@
patil-etal-2022-overlap
vaidehi99/obpe
XNLI
+ 10.18653/v1/2022.acl-long.18
Long-range Sequence Modeling with Predictable Sparse Attention
@@ -273,6 +291,7 @@
2022.acl-long.19
zhuang-etal-2022-long
LRA
+ 10.18653/v1/2022.acl-long.19
Improving Personalized Explanation Generation through Visualization
@@ -286,6 +305,7 @@
In modern recommender systems, there are usually comments or reviews from users that justify their ratings for different items. Trained on such textual corpus, explainable recommendation models learn to discover user interests and generate personalized explanations. Though able to provide plausible explanations, existing models tend to generate repeated sentences for different items or empty sentences with insufficient details. This begs an interesting question: can we immerse the models in a multimodal environment to gain proper awareness of real-world concepts and alleviate above shortcomings? To this end, we propose a visually-enhanced approach named METER with the help of visualization generation and text–image matching discrimination: the explainable recommendation model is encouraged to visualize what it refers to while incurring a penalty if the visualization is incongruent with the textual explanation. Experimental results and a manual assessment demonstrate that our approach can improve not only the text quality but also the diversity and explainability of the generated explanations.
2022.acl-long.20
geng-etal-2022-improving
+ 10.18653/v1/2022.acl-long.20
New Intent Discovery with Pre-training and Contrastive Learning
@@ -300,6 +320,7 @@
zhang-etal-2022-new
zhang-yu-wei/mtp-clnn
CLINC150
+ 10.18653/v1/2022.acl-long.21
Modeling U.S. State-Level Policies by Extracting Winners and Losers from Legislative Texts
@@ -310,6 +331,7 @@
Decisions on state-level policies have a deep effect on many aspects of our everyday life, such as health-care and education access. However, there is little understanding of how these policies and decisions are being formed in the legislative process. We take a data-driven approach by decoding the impact of legislation on relevant stakeholders (e.g., teachers in education bills) to understand legislators’ decision-making process and votes. We build a new dataset for multiple US states that interconnects multiple sources of data including bills, stakeholders, legislators, and money donors. Next, we develop a textual graph-based model to embed and analyze state bills. Our model predicts winners/losers of bills and then utilizes them to better determine the legislative body’s vote breakdown according to demographic/ideological criteria, e.g., gender.
2022.acl-long.22
davoodi-etal-2022-modeling
+ 10.18653/v1/2022.acl-long.22
Structural Characterization for Dialogue Disentanglement
@@ -323,6 +345,7 @@
ma-etal-2022-structural
xbmxb/structurecharacterization4dd
Molweni
+ 10.18653/v1/2022.acl-long.23
Multi-Party Empathetic Dialogue Generation: A New Task for Dialog Systems
@@ -338,6 +361,7 @@
zhu-etal-2022-multi
MELD
PEC
+ 10.18653/v1/2022.acl-long.24
MISC: A Mixed Strategy-Aware Model integrating COMET for Emotional Support Conversation
@@ -354,6 +378,7 @@
morecry/misc
ATOMIC
ConceptNet
+ 10.18653/v1/2022.acl-long.25
GLM: General Language Model Pretraining with Autoregressive Blank Infilling
@@ -375,6 +400,7 @@
SuperGLUE
WikiText-103
WikiText-2
+ 10.18653/v1/2022.acl-long.26
QuoteR: A Benchmark of Quote Recommendation for Writing
@@ -391,6 +417,7 @@
qi-etal-2022-quoter
thunlp/quoter
BookCorpus
+ 10.18653/v1/2022.acl-long.27
Towards Comprehensive Patent Approval Predictions:Beyond Traditional Document Classification
@@ -405,6 +432,7 @@
Predicting the approval chance of a patent application is a challenging problem involving multiple facets. The most crucial facet is arguably the novelty — 35 U.S. Code § 102 rejects more recent applications that have very similar prior arts. Such novelty evaluations differ the patent approval prediction from conventional document classification — Successful patent applications may share similar writing patterns; however, too-similar newer applications would receive the opposite label, thus confusing standard document classifiers (e.g., BERT). To address this issue, we propose a novel framework that unifies the document classifier with handcrafted features, particularly time-dependent novelty scores. Specifically, we formulate the novelty scores by comparing each application with millions of prior arts using a hybrid of efficient filters and a neural bi-encoder. Moreover, we impose a new regularization term into the classification objective to enforce the monotonic change of approval prediction w.r.t. novelty scores. From extensive experiments on a large-scale USPTO dataset, we find that standard BERT fine-tuning can partially learn the correct relationship between novelty and approvals from inconsistent data. However, our time-dependent novelty features offer a boost on top of it. Also, our monotonic regularization, while shrinking the search space, can drive the optimizer to better local optima, yielding a further small performance gain.
2022.acl-long.28
gao-etal-2022-towards
+ 10.18653/v1/2022.acl-long.28
Hypergraph Transformer: Weakly-Supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering
@@ -420,6 +448,7 @@
yujungheo/kbvqa-public
DBpedia
Visual Question Answering
+ 10.18653/v1/2022.acl-long.29
Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech
@@ -438,6 +467,7 @@
li-etal-2022-cross-utterance
neurowave-ai/cucvae-tts
LJSpeech
+ 10.18653/v1/2022.acl-long.30
Mix and Match: Learning-free Controllable Text Generationusing Energy Language Models
@@ -450,6 +480,7 @@
mireshghallah-etal-2022-mix
mireshghallah/mixmatch
GYAFC
+ 10.18653/v1/2022.acl-long.31
So Different Yet So Alike! Constrained Unsupervised Text Style Transfer
@@ -463,6 +494,7 @@
2022.acl-long.32
ramesh-kashyap-etal-2022-different
abhinavkashyap/dct
+ 10.18653/v1/2022.acl-long.32
e-CARE: a New Dataset for Exploring Explainable Causal Reasoning
@@ -479,6 +511,7 @@
COPA
CommonsenseQA
GenericsKB
+ 10.18653/v1/2022.acl-long.33
Fantastic Questions and Where to Find Them: FairytaleQA – An Authentic Dataset for Narrative Comprehension
@@ -508,6 +541,7 @@
CLOTH
NarrativeQA
RACE
+ 10.18653/v1/2022.acl-long.34
KaFSP: Knowledge-Aware Fuzzy Semantic Parsing for Conversational Question Answering over a Large-Scale Knowledge Base
@@ -519,6 +553,7 @@
2022.acl-long.35.software.zip
li-xiong-2022-kafsp
CSQA
+ 10.18653/v1/2022.acl-long.35
Multilingual Knowledge Graph Completion with Self-Supervised Adaptive Graph Alignment
@@ -536,6 +571,7 @@
2022.acl-long.36
huang-etal-2022-multilingual
amzn/ss-aga-kgc
+ 10.18653/v1/2022.acl-long.36
Modeling Hierarchical Syntax Structure with Triplet Position for Source Code Summarization
@@ -548,6 +584,7 @@
Automatic code summarization, which aims to describe the source code in natural language, has become an essential task in software maintenance. Our fellow researchers have attempted to achieve such a purpose through various machine learning-based approaches. One key challenge keeping these approaches from being practical lies in the lacking of retaining the semantic structure of source code, which has unfortunately been overlooked by the state-of-the-art. Existing approaches resort to representing the syntax structure of code by modeling the Abstract Syntax Trees (ASTs). However, the hierarchical structures of ASTs have not been well explored. In this paper, we propose CODESCRIBE to model the hierarchical syntax structure of code by introducing a novel triplet position for code summarization. Specifically, CODESCRIBE leverages the graph neural network and Transformer to preserve the structural and sequential information of code, respectively. In addition, we propose a pointer-generator network that pays attention to both the structure and sequential tokens of code for a better summary generation. Experiments on two real-world datasets in Java and Python demonstrate the effectiveness of our proposed approach when compared with several state-of-the-art baselines.
2022.acl-long.37
guo-etal-2022-modeling
+ 10.18653/v1/2022.acl-long.37
FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding
@@ -575,6 +612,7 @@
MultiRC
SuperGLUE
WSC
+ 10.18653/v1/2022.acl-long.38
Learn to Adapt for Generalized Zero-Shot Text Classification
@@ -590,6 +628,7 @@
zhang-etal-2022-learn
quareia/lta
ATIS
+ 10.18653/v1/2022.acl-long.39
TableFormer: Robust Transformer Modeling for Table-Text Encoding
@@ -606,6 +645,7 @@
google-research/tapas
SQA
TabFact
+ 10.18653/v1/2022.acl-long.40
Perceiving the World: Question-guided Reinforcement Learning for Text-based Games
@@ -620,6 +660,7 @@
2022.acl-long.41
xu-etal-2022-perceiving
yunqiuxu/qwa
+ 10.18653/v1/2022.acl-long.41
Neural Label Search for Zero-Shot Multi-Lingual Extractive Summarization
@@ -635,6 +676,7 @@
jia-etal-2022-neural
MLSUM
WikiLingua
+ 10.18653/v1/2022.acl-long.42
Few-Shot Class-Incremental Learning for Named Entity Recognition
@@ -649,6 +691,7 @@
Previous work of class-incremental learning for Named Entity Recognition (NER) relies on the assumption that there exists abundance of labeled data for the training of new classes. In this work, we study a more challenging but practical problem, i.e., few-shot class-incremental learning for NER, where an NER model is trained with only few labeled samples of the new classes, without forgetting knowledge of the old ones. To alleviate the problem of catastrophic forgetting in few-shot class-incremental learning, we reconstruct synthetic training data of the old classes using the trained NER model, augmenting the training of new classes. We further develop a framework that distills from the existing model with both synthetic data, and real data from the current training set. Experimental results show that our approach achieves significant improvements over existing baselines.
2022.acl-long.43
wang-etal-2022-shot
+ 10.18653/v1/2022.acl-long.43
Improving Meta-learning for Low-resource Text Classification and Generation via Memory Imitation
@@ -666,6 +709,7 @@
2022.acl-long.44.software.zip
zhao-etal-2022-improving
PERSONA-CHAT
+ 10.18653/v1/2022.acl-long.44
Quality Controlled Paraphrase Generation
@@ -681,6 +725,7 @@
bandel-etal-2022-quality
ibm/quality-controlled-paraphrase-generation
COCO
+ 10.18653/v1/2022.acl-long.45
Controllable Dictionary Example Generation: Generating Example Sentences for Specific Targeted Audiences
@@ -691,6 +736,7 @@
2022.acl-long.46
2022.acl-long.46.software.zip
he-yiu-2022-controllable
+ 10.18653/v1/2022.acl-long.46
AraT5: Text-to-Text Transformers for Arabic Language Generation
@@ -703,6 +749,7 @@
nagoudi-etal-2022-arat5
C4
mC4
+ 10.18653/v1/2022.acl-long.47
Legal Judgment Prediction via Event Extraction with Constraints
@@ -715,6 +762,7 @@
2022.acl-long.48.software.zip
feng-etal-2022-legal
wapay/epm
+ 10.18653/v1/2022.acl-long.48
Answer-level Calibration for Free-form Multiple Choice Question Answering
@@ -733,6 +781,7 @@
SWAG
Social IQA
WinoGrande
+ 10.18653/v1/2022.acl-long.49
Learning When to Translate for Streaming Speech
@@ -747,6 +796,7 @@
dong-etal-2022-learning
dqqcasia/mosst
MuST-C
+ 10.18653/v1/2022.acl-long.50
Compact Token Representations with Contextual Quantization for Efficient Document Re-ranking
@@ -759,6 +809,7 @@
yang-etal-2022-compact
yingrui-yang/ContextualQuantizer
MS MARCO
+ 10.18653/v1/2022.acl-long.51
Early Stopping Based on Unlabeled Samples in Text Classification
@@ -774,6 +825,7 @@
AG News
IMDb Movie Reviews
SST
+ 10.18653/v1/2022.acl-long.52
Meta-learning via Language Model In-context Tuning
@@ -788,6 +840,7 @@
chen-etal-2022-meta
yandachen/in-context-tuning
LAMA
+ 10.18653/v1/2022.acl-long.53
It is AI’s Turn to Ask Humans a Question: Question-Answer Pair Generation for Children’s Story Books
@@ -806,6 +859,7 @@
MS MARCO
NarrativeQA
PAQ
+ 10.18653/v1/2022.acl-long.54
Prompt-Based Rule Discovery and Boosting for Interactive Weakly-Supervised Learning
@@ -820,6 +874,7 @@
zhang-etal-2022-prompt
rz-zhang/prboost
AG News
+ 10.18653/v1/2022.acl-long.55
Constrained Multi-Task Learning for Bridging Resolution
@@ -831,6 +886,7 @@
2022.acl-long.56
kobayashi-etal-2022-constrained
juntaoy/dali-bridging
+ 10.18653/v1/2022.acl-long.56
DEAM: Dialogue Coherence Evaluation using AMR-based Semantic Manipulations
@@ -846,6 +902,7 @@
FED
PERSONA-CHAT
Topical-Chat
+ 10.18653/v1/2022.acl-long.57
HIBRIDS: Attention with Hierarchical Biases for Structure-aware Long Document Summarization
@@ -855,6 +912,7 @@
Document structure is critical for efficient information consumption. However, it is challenging to encode it efficiently into the modern Transformer architecture. In this work, we present HIBRIDS, which injects Hierarchical Biases foR Incorporating Document Structure into attention score calculation. We further present a new task, hierarchical question-summary generation, for summarizing salient content in the source document into a hierarchy of questions and summaries, where each follow-up question inquires about the content of its parent question-summary pair. We also annotate a new dataset with 6,153 question-summary hierarchies labeled on government reports. Experiment results show that our model produces better question-summary hierarchies than comparisons on both hierarchy quality and content coverage, a finding also echoed by human judges. Additionally, our model improves the generation of long-form summaries from long government reports and Wikipedia articles, as measured by ROUGE scores.
2022.acl-long.58
cao-wang-2022-hibrids
+ 10.18653/v1/2022.acl-long.58
De-Bias for Generative Extraction in Unified NER Task
@@ -868,6 +926,7 @@
2022.acl-long.59
zhang-etal-2022-de
GENIA
+ 10.18653/v1/2022.acl-long.59
An Information-theoretic Approach to Prompt Engineering Without Ground Truth Labels
@@ -890,6 +949,7 @@
CommonsenseQA
IMDb Movie Reviews
LAMBADA
+ 10.18653/v1/2022.acl-long.60
Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation
@@ -902,6 +962,7 @@
wang-etal-2022-expanding
cindyxinyiwang/expand-via-lexicon-based-adaptation
MasakhaNER
+ 10.18653/v1/2022.acl-long.61
Language-agnostic BERT Sentence Embedding
@@ -918,6 +979,7 @@
MPQA Opinion Corpus
SST
SentEval
+ 10.18653/v1/2022.acl-long.62
Nested Named Entity Recognition with Span-level Graphs
@@ -930,6 +992,7 @@
2022.acl-long.63
wan-etal-2022-nested
GENIA
+ 10.18653/v1/2022.acl-long.63
CogTaskonomy: Cognitively Inspired Task Taxonomy Is Beneficial to Transfer Learning in NLP
@@ -944,6 +1007,7 @@
GLUE
QNLI
Taskonomy
+ 10.18653/v1/2022.acl-long.64
RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining
@@ -959,6 +1023,7 @@
2022.acl-long.65
2022.acl-long.65.software.zip
su-etal-2022-rocbert
+ 10.18653/v1/2022.acl-long.65
Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues
@@ -982,6 +1047,7 @@
SNLI-VE
VCR
Visual Question Answering
+ 10.18653/v1/2022.acl-long.66
Parallel Instance Query Network for Named Entity Recognition
@@ -1006,6 +1072,7 @@
GENIA
NNE
OntoNotes 5.0
+ 10.18653/v1/2022.acl-long.67
ProphetChat: Enhancing Dialogue Generation with Simulation of Future Conversation
@@ -1022,6 +1089,7 @@
liu-etal-2022-prophetchat
DailyDialog
PERSONA-CHAT
+ 10.18653/v1/2022.acl-long.68
Modeling Multi-hop Question Answering as Single Sequence Prediction
@@ -1037,6 +1105,7 @@
HotpotQA
IIRC
SQuAD
+ 10.18653/v1/2022.acl-long.69
Learning Disentangled Semantic Representations for Zero-Shot Cross-Lingual Transfer in Multilingual Machine Reading Comprehension
@@ -1057,6 +1126,7 @@
TyDi QA
TyDiQA-GoldP
XQuAD
+ 10.18653/v1/2022.acl-long.70
Multi-Granularity Structural Knowledge Distillation for Language Model Compression
@@ -1074,6 +1144,7 @@
MRPC
QNLI
SST
+ 10.18653/v1/2022.acl-long.71
Auto-Debias: Debiasing Masked Language Models with Automated Biased Prompts
@@ -1086,6 +1157,7 @@
guo-etal-2022-auto
CrowS-Pairs
GLUE
+ 10.18653/v1/2022.acl-long.72
Where to Go for the Holidays: Towards Mixed-Type Dialogs for Clarification of User Goals
@@ -1103,6 +1175,7 @@
DuRecDial
KdConv
MultiWOZ
+ 10.18653/v1/2022.acl-long.73
Semi-supervised Domain Adaptation for Dependency Parsing with Dynamic Matching Network
@@ -1113,6 +1186,7 @@
Supervised parsing models have achieved impressive results on in-domain texts. However, their performances drop drastically on out-of-domain texts due to the data distribution shift. The shared-private model has shown its promising advantages for alleviating this problem via feature separation, whereas prior works pay more attention to enhance shared features but neglect the in-depth relevance of specific ones. To address this issue, we for the first time apply a dynamic matching network on the shared-private model for semi-supervised cross-domain dependency parsing. Meanwhile, considering the scarcity of target-domain labeled data, we leverage unlabeled data from two aspects, i.e., designing a new training strategy to improve the capability of the dynamic matching network and fine-tuning BERT to obtain domain-related contextualized representations. Experiments on benchmark datasets show that our proposed model consistently outperforms various baselines, leading to new state-of-the-art results on all domains. Detailed analysis on different matching strategies demonstrates that it is essential to learn suitable matching weights to emphasize useful features and ignore useless or even harmful ones. Besides, our proposed model can be directly extended to multi-source domain adaptation and achieves best performances among various baselines, further verifying the effectiveness and robustness.
2022.acl-long.74
li-etal-2022-semi
+ 10.18653/v1/2022.acl-long.74
A Closer Look at How Fine-tuning Changes BERT
@@ -1123,6 +1197,7 @@
2022.acl-long.75
zhou-srikumar-2022-closer
utahnlp/BERT-fine-tuning-analysis
+ 10.18653/v1/2022.acl-long.75
Sentence-aware Contrastive Learning for Open-Domain Passage Retrieval
@@ -1137,6 +1212,7 @@
Natural Questions
SQuAD
TriviaQA
+ 10.18653/v1/2022.acl-long.76
FaiRR: Faithful and Robust Deductive Reasoning over Natural Language
@@ -1150,6 +1226,7 @@
sanyal-etal-2022-fairr
ink-usc/fairr
ProofWriter
+ 10.18653/v1/2022.acl-long.77
HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation
@@ -1171,6 +1248,7 @@
FinQA
ToTTo
WikiSQL
+ 10.18653/v1/2022.acl-long.78
Doctor Recommendation in Online Health Forums via Expertise Learning
@@ -1183,6 +1261,7 @@
2022.acl-long.79
lu-etal-2022-doctor
polyusmart/doctor-recommendation
+ 10.18653/v1/2022.acl-long.79
Continual Prompt Tuning for Dialog State Tracking
@@ -1197,6 +1276,7 @@
2022.acl-long.80.software.zip
zhu-etal-2022-continual
thu-coai/cpt4dst
+ 10.18653/v1/2022.acl-long.80
There’s a Time and Place for Reasoning Beyond the Image
@@ -1211,6 +1291,7 @@
fu-etal-2022-theres
zeyofu/tara
WIT
+ 10.18653/v1/2022.acl-long.81
FORTAP: Using Formulas for Numerical-Reasoning-Aware Table Pretraining
@@ -1227,6 +1308,7 @@
2022.acl-long.82.software.zip
cheng-etal-2022-fortap
microsoft/TUTA_table_understanding
+ 10.18653/v1/2022.acl-long.82
Multimodal fusion via cortical network inspired losses
@@ -1235,6 +1317,7 @@
Information integration from different modalities is an active area of research. Human beings and, in general, biological neural systems are quite adept at using a multitude of signals from different sensory perceptive fields to interact with the environment and each other. Recent work in deep fusion models via neural networks has led to substantial improvements over unimodal approaches in areas like speech recognition, emotion recognition and analysis, captioning and image description. However, such research has mostly focused on architectural changes allowing for fusion of different modalities while keeping the model complexity manageable.Inspired by neuroscientific ideas about multisensory integration and processing, we investigate the effect of introducing neural dependencies in the loss functions. Experiments on multimodal sentiment analysis tasks with different models show that our approach provides a consistent performance boost.
2022.acl-long.83
shankar-2022-multimodal
+ 10.18653/v1/2022.acl-long.83
Modeling Temporal-Modal Entity Graph for Procedural Multimodal Machine Comprehension
@@ -1252,6 +1335,7 @@
zhang-etal-2022-modeling
RecipeQA
Visual Question Answering
+ 10.18653/v1/2022.acl-long.84
Explanation Graph Generation via Pre-trained Language Models: An Empirical Study with Contrastive Learning
@@ -1264,6 +1348,7 @@
2022.acl-long.85.software.zip
saha-etal-2022-explanation
swarnahub/explagraphgen
+ 10.18653/v1/2022.acl-long.85
Unsupervised Extractive Opinion Summarization Using Sparse Coding
@@ -1276,6 +1361,7 @@
2022.acl-long.86.software.zip
basu-roy-chowdhury-etal-2022-unsupervised
brcsomnath/semae
+ 10.18653/v1/2022.acl-long.86
LexSubCon: Integrating Knowledge from Lexical Resources into Contextual Embeddings for Lexical Substitution
@@ -1289,6 +1375,7 @@
2022.acl-long.87.software.zip
michalopoulos-etal-2022-lexsubcon
gmichalo/lexsubcon
+ 10.18653/v1/2022.acl-long.87
Think Before You Speak: Explicitly Generating Implicit Commonsense Knowledge for Response Generation
@@ -1307,6 +1394,7 @@
zhou-etal-2022-think
ConceptNet
MuTual
+ 10.18653/v1/2022.acl-long.88
Flow-Adapter Architecture for Unsupervised Machine Translation
@@ -1317,6 +1405,7 @@
In this work, we propose a flow-adapter architecture for unsupervised NMT. It leverages normalizing flows to explicitly model the distributions of sentence-level latent representations, which are subsequently used in conjunction with the attention mechanism for the translation task. The primary novelties of our model are: (a) capturing language-specific sentence representations separately for each language using normalizing flows and (b) using a simple transformation of these latent representations for translating from one language to another. This architecture allows for unsupervised training of each language independently. While there is prior work on latent variables for supervised MT, to the best of our knowledge, this is the first work that uses latent variables and normalizing flows for unsupervised MT. We obtain competitive results on several unsupervised MT benchmarks.
2022.acl-long.89
liu-etal-2022-flow
+ 10.18653/v1/2022.acl-long.89
Efficient Unsupervised Sentence Compression by Fine-tuning Transformers with Reinforcement Learning
@@ -1330,6 +1419,7 @@
complementizer/rl-sentence-compression
NEWSROOM
Sentence Compression
+ 10.18653/v1/2022.acl-long.90
Tracing Origins: Coreference-aware Machine Reading Comprehension
@@ -1346,6 +1436,7 @@
Quoref
SQuAD
SearchQA
+ 10.18653/v1/2022.acl-long.91
WatClaimCheck: A new Dataset for Claim Entailment and Inference
@@ -1357,6 +1448,7 @@
2022.acl-long.92
khan-etal-2022-watclaimcheck
PUBHEALTH
+ 10.18653/v1/2022.acl-long.92
FrugalScore: Learning Cheaper, Lighter and Faster Evaluation Metrics for Automatic Text Generation
@@ -1369,6 +1461,7 @@
2022.acl-long.93
kamal-eddine-etal-2022-frugalscore
CNN/Daily Mail
+ 10.18653/v1/2022.acl-long.93
A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation
@@ -1385,6 +1478,7 @@
narayan-etal-2022-well
google-research/language
SQuAD
+ 10.18653/v1/2022.acl-long.94
Synthetic Question Value Estimation for Domain Adaptation of Question Answering
@@ -1400,6 +1494,7 @@
Natural Questions
NewsQA
TriviaQA
+ 10.18653/v1/2022.acl-long.95
Better Language Model with Hypernym Class Prediction
@@ -1415,6 +1510,7 @@
richardbaihe/robustlm
WikiText-103
WikiText-2
+ 10.18653/v1/2022.acl-long.96
Tackling Fake News Detection by Continually Improving Social Context Representations using Graph Neural Networks
@@ -1426,6 +1522,7 @@
2022.acl-long.97
mehta-etal-2022-tackling
hockeybro12/fakenews_inference_operators
+ 10.18653/v1/2022.acl-long.97
Understanding Gender Bias in Knowledge Base Embeddings
@@ -1439,6 +1536,7 @@
Knowledge base (KB) embeddings have been shown to contain gender biases. In this paper, we study two questions regarding these biases: how to quantify them, and how to trace their origins in KB? Specifically, first, we develop two novel bias measures respectively for a group of person entities and an individual person entity. Evidence of their validity is observed by comparison with real-world census data. Second, we use the influence function to inspect the contribution of each triple in KB to the overall group bias. To exemplify the potential applications of our study, we also present two strategies (by adding and removing KB triples) to mitigate gender biases in KB embeddings.
2022.acl-long.98
du-etal-2022-understanding
+ 10.18653/v1/2022.acl-long.98
Computational Historical Linguistics and Language Diversity in South Asia
@@ -1451,6 +1549,7 @@
2022.acl-long.99
arora-etal-2022-computational
Universal Dependencies
+ 10.18653/v1/2022.acl-long.99
Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization
@@ -1465,6 +1564,7 @@
ladhak-etal-2022-faithful
fladhak/effective-faithfulness
WikiHow
+ 10.18653/v1/2022.acl-long.100
Slangvolution: A Causal Analysis of Semantic Change and Frequency Dynamics in Slang
@@ -1478,6 +1578,7 @@
2022.acl-long.101.software.zip
keidar-etal-2022-slangvolution
andreasopedal/slangvolution
+ 10.18653/v1/2022.acl-long.101
Spurious Correlations in Reference-Free Evaluation of Text Generation
@@ -1491,6 +1592,7 @@
esdurmus/adversarial_eval
DailyDialog
PERSONA-CHAT
+ 10.18653/v1/2022.acl-long.102
On The Ingredients of an Effective Zero-shot Semantic Parser
@@ -1502,6 +1604,7 @@
Semantic parsers map natural language utterances into meaning representations (e.g., programs). Such models are typically bottlenecked by the paucity of training data due to the required laborious annotation efforts. Recent studies have performed zero-shot learning by synthesizing training examples of canonical utterances and programs from a grammar, and further paraphrasing these utterances to improve linguistic diversity. However, such synthetic examples cannot fully capture patterns in real data. In this paper we analyze zero-shot parsers through the lenses of the language and logical gaps (Herzig and Berant, 2019), which quantify the discrepancy of language and programmatic patterns between the canonical examples and real-world user-issued ones. We propose bridging these gaps using improved grammars, stronger paraphrasers, and efficient learning methods using canonical examples that most likely reflect real user intents. Our model achieves strong performance on two semantic parsing benchmarks (Scholar, Geo) with zero labeled data.
2022.acl-long.103
yin-etal-2022-ingredients
+ 10.18653/v1/2022.acl-long.103
Bias Mitigation in Machine Translation Quality Estimation
@@ -1516,6 +1619,7 @@
agesb/transquest
MLQE-PE
WikiMatrix
+ 10.18653/v1/2022.acl-long.104
Unified Speech-Text Pre-training for Speech Translation and Recognition
@@ -1537,6 +1641,7 @@
Libri-Light
LibriSpeech
MuST-C
+ 10.18653/v1/2022.acl-long.105
Match the Script, Adapt if Multilingual: Analyzing the Effect of Multilingual Pretraining on Cross-lingual Transferability
@@ -1548,6 +1653,7 @@
2022.acl-long.106
fujinuma-etal-2022-match
XNLI
+ 10.18653/v1/2022.acl-long.106
Structured Pruning Learns Compact and Accurate Models
@@ -1566,6 +1672,7 @@
QNLI
SQuAD
SST
+ 10.18653/v1/2022.acl-long.107
How can NLP Help Revitalize Endangered Languages? A Case Study and Roadmap for the Cherokee Language
@@ -1577,6 +1684,7 @@
2022.acl-long.108
zhang-etal-2022-nlp
zhangshiyue/revitalizecherokee
+ 10.18653/v1/2022.acl-long.108
Differentiable Multi-Agent Actor-Critic for Multi-Step Radiology Report Summarization
@@ -1588,6 +1696,7 @@
The IMPRESSIONS section of a radiology report about an imaging study is a summary of the radiologist’s reasoning and conclusions, and it also aids the referring physician in confirming or excluding certain diagnoses. A cascade of tasks are required to automatically generate an abstractive summary of the typical information-rich radiology report. These tasks include acquisition of salient content from the report and generation of a concise, easily consumable IMPRESSIONS section. Prior research on radiology report summarization has focused on single-step end-to-end models – which subsume the task of salient content acquisition. To fully explore the cascade structure and explainability of radiology report summarization, we introduce two innovations. First, we design a two-step approach: extractive summarization followed by abstractive summarization. Second, we additionally break down the extractive part into two independent tasks: extraction of salient (1) sentences and (2) keywords. Experiments on English radiology reports from two clinical sites show our novel approach leads to a more precise summary compared to single-step and to two-step-with-single-extractive-process baselines with an overall improvement in F1 score of 3-4%.
2022.acl-long.109
karn-etal-2022-differentiable
+ 10.18653/v1/2022.acl-long.109
Online Semantic Parsing for Latency Reduction in Task-Oriented Dialogue
@@ -1601,6 +1710,7 @@
Standard conversational semantic parsing maps a complete user utterance into an executable program, after which the program is executed to respond to the user. This could be slow when the program contains expensive function calls. We investigate the opportunity to reduce latency by predicting and executing function calls while the user is still speaking. We introduce the task of online semantic parsing for this purpose, with a formal latency reduction metric inspired by simultaneous machine translation. We propose a general framework with first a learned prefix-to-program prediction module, and then a simple yet effective thresholding heuristic for subprogram selection for early execution. Experiments on the SMCalFlow and TreeDST datasets show our approach achieves large latency reduction with good parsing quality, with a 30%–65% latency reduction depending on function execution time and allowed cost.
2022.acl-long.110
zhou-etal-2022-online
+ 10.18653/v1/2022.acl-long.110
Few-Shot Tabular Data Enrichment Using Fine-Tuned Transformer Architectures
@@ -1611,6 +1721,7 @@
2022.acl-long.111
2022.acl-long.111.software.zip
harari-katz-2022-shot
+ 10.18653/v1/2022.acl-long.111
Summ^N: A Multi-Stage Summarization Framework for Long Input Dialogues and Documents
@@ -1630,6 +1741,7 @@
psunlpgroup/summ-n
GovReport
QMSum
+ 10.18653/v1/2022.acl-long.112
Open Domain Question Answering with A Unified Knowledge Interface
@@ -1648,6 +1760,7 @@
Natural Questions
OTT-QA
WebQuestions
+ 10.18653/v1/2022.acl-long.113
Principled Paraphrase Generation with Parallel Corpora
@@ -1661,6 +1774,7 @@
2022.acl-long.114
ormazabal-etal-2022-principled
aitorormazabal/paraphrasing-from-parallel
+ 10.18653/v1/2022.acl-long.114
GlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented Dialogue Systems
@@ -1677,6 +1791,7 @@
2022.acl-long.115.software.zip
ding-etal-2022-globalwoz
MultiWOZ
+ 10.18653/v1/2022.acl-long.115
Domain Knowledge Transferring for Pre-trained Language Model via Calibrated Activation Boundary Distillation
@@ -1691,6 +1806,7 @@
dmcb-gist/doktra
BLUE
HOC
+ 10.18653/v1/2022.acl-long.116
Retrieval-guided Counterfactual Generation for QA
@@ -1709,6 +1825,7 @@
Quoref
SQuAD
TriviaQA
+ 10.18653/v1/2022.acl-long.117
DYLE: Dynamic Latent Extraction for Abstractive Long-Input Summarization
@@ -1729,6 +1846,7 @@
yale-lily/dyle
GovReport
QMSum
+ 10.18653/v1/2022.acl-long.118
Searching for fingerspelled content in American Sign Language
@@ -1740,6 +1858,7 @@
Natural language processing for sign language video—including tasks like recognition, translation, and search—is crucial for making artificial intelligence technologies accessible to deaf individuals, and is gaining research interest in recent years. In this paper, we address the problem of searching for fingerspelled keywords or key phrases in raw sign language videos. This is an important task since significant content in sign language is often conveyed via fingerspelling, and to our knowledge the task has not been studied before. We propose an end-to-end model for this task, FSS-Net, that jointly detects fingerspelling and matches it to a text sequence. Our experiments, done on a large public dataset of ASL fingerspelling in the wild, show the importance of fingerspelling detection as a component of a search and retrieval model. Our model significantly outperforms baseline methods adapted from prior work on related tasks.
2022.acl-long.119
shi-etal-2022-searching
+ 10.18653/v1/2022.acl-long.119
Skill Induction and Planning with Latent Language
@@ -1751,6 +1870,7 @@
2022.acl-long.120
sharma-etal-2022-skill
ALFRED
+ 10.18653/v1/2022.acl-long.120
Fully-Semantic Parsing and Generation: the BabelNet Meaning Representation
@@ -1762,6 +1882,7 @@
2022.acl-long.121
martinez-lorenzo-etal-2022-fully
sapienzanlp/bmr
+ 10.18653/v1/2022.acl-long.121
Leveraging Similar Users for Personalized Language Modeling with Limited Data
@@ -1774,6 +1895,7 @@
Personalized language models are designed and trained to capture language patterns specific to individual users. This makes them more accurate at predicting what a user will write. However, when a new user joins a platform and not enough text is available, it is harder to build effective personalized language models. We propose a solution for this problem, using a model trained on users that are similar to a new user. In this paper, we explore strategies for finding the similarity between new users and existing ones and methods for using the data from existing users who are a good match. We further explore the trade-off between available data for new users and how well their language can be modeled.
2022.acl-long.122
welch-etal-2022-leveraging
+ 10.18653/v1/2022.acl-long.122
DEEP: DEnoising Entity Pre-training for Neural Machine Translation
@@ -1786,6 +1908,7 @@
2022.acl-long.123
hu-etal-2022-deep
ParaCrawl
+ 10.18653/v1/2022.acl-long.123
Multi-Modal Sarcasm Detection via Cross-Modal Graph Convolutional Network
@@ -1802,6 +1925,7 @@
2022.acl-long.124
2022.acl-long.124.software.zip
liang-etal-2022-multi
+ 10.18653/v1/2022.acl-long.124
Composable Sparse Fine-Tuning for Cross-Lingual Transfer
@@ -1817,6 +1941,7 @@
CoNLL-2003
GLUE
MasakhaNER
+ 10.18653/v1/2022.acl-long.125
Toward Annotator Group Bias in Crowdsourcing
@@ -1832,6 +1957,7 @@
Crowdsourcing has emerged as a popular approach for collecting annotated data to train supervised machine learning models. However, annotator bias can lead to defective annotations. Though there are a few works investigating individual annotator bias, the group effects in annotators are largely overlooked. In this work, we reveal that annotators within the same demographic group tend to show consistent group bias in annotation tasks and thus we conduct an initial study on annotator group bias. We first empirically verify the existence of annotator group bias in various real-world crowdsourcing datasets. Then, we develop a novel probabilistic graphical framework GroupAnno to capture annotator group bias with an extended Expectation Maximization (EM) algorithm. We conduct experiments on both synthetic and real-world datasets. Experimental results demonstrate the effectiveness of our model in modeling annotator group bias in label aggregation and model learning over competitive baselines.
2022.acl-long.126
liu-etal-2022-toward
+ 10.18653/v1/2022.acl-long.126
Under the Morphosyntactic Lens: A Multifaceted Evaluation of Gender Bias in Speech Translation
@@ -1847,6 +1973,7 @@
mgaido91/FBK-fairseq-ST
Europarl-ST
WinoBias
+ 10.18653/v1/2022.acl-long.127
Answering Open-Domain Multi-Answer Questions via a Recall-then-Verify Framework
@@ -1858,6 +1985,7 @@
shao-huang-2022-answering
zhihongshao/rectify
Natural Questions
+ 10.18653/v1/2022.acl-long.128
Probing as Quantifying Inductive Bias
@@ -1871,6 +1999,7 @@
immer-etal-2022-probing
BoolQ
SuperGLUE
+ 10.18653/v1/2022.acl-long.129
Probing Structured Pruning on Multilingual Pre-trained Models: Settings, Algorithms, and Efficiency
@@ -1892,6 +2021,7 @@
TyDi QA
XNLI
XQuAD
+ 10.18653/v1/2022.acl-long.130
GPT-D: Inducing Dementia-related Linguistic Anomalies by Deliberate Degradation of Artificial Neural Language Models
@@ -1906,6 +2036,7 @@
2022.acl-long.131.software.zip
li-etal-2022-gpt
linguisticanomalies/hammer-nets
+ 10.18653/v1/2022.acl-long.131
An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models
@@ -1921,6 +2052,7 @@
CrowS-Pairs
StereoSet
WikiText-2
+ 10.18653/v1/2022.acl-long.132
Exploring and Adapting Chinese GPT to Pinyin Input Method
@@ -1937,6 +2069,7 @@
2022.acl-long.133
tan-etal-2022-exploring
VisualJoyce/Transformers4IME
+ 10.18653/v1/2022.acl-long.133
Enhancing Cross-lingual Natural Language Inference by Prompt-learning from Cross-lingual Templates
@@ -1951,6 +2084,7 @@
qi-etal-2022-enhancing
qikunxun/pct
PAWS-X
+ 10.18653/v1/2022.acl-long.134
Sense Embeddings are also Biased – Evaluating Social Biases in Static and Contextualised Sense Embeddings
@@ -1964,6 +2098,7 @@
zhou-etal-2022-sense
CrowS-Pairs
StereoSet
+ 10.18653/v1/2022.acl-long.135
Hybrid Semantics for Goal-Directed Natural Language Generation
@@ -1973,6 +2108,7 @@
We consider the problem of generating natural language given a communicative goal and a world description. We ask the question: is it possible to combine complementary meaning representations to scale a goal-directed NLG system without losing expressiveness? In particular, we consider using two meaning representations, one based on logical semantics and the other based on distributional semantics. We build upon an existing goal-directed generation system, S-STRUCT, which models sentence generation as planning in a Markov decision process. We develop a hybrid approach, which uses distributional semantics to quickly and imprecisely add the main elements of the sentence and then uses first-order logic based semantics to more slowly add the precise details. We find that our hybrid method allows S-STRUCT’s generation to scale significantly better in early phases of generation and that the hybrid can often generate sentences with the same quality as S-STRUCT in substantially less time. However, we also observe and give insight into cases where the imprecision in distributional semantics leads to generation that is not as good as using pure logical semantics.
2022.acl-long.136
baumler-ray-2022-hybrid
+ 10.18653/v1/2022.acl-long.136
Predicting Intervention Approval in Clinical Trials through Multi-Document Summarization
@@ -1982,6 +2118,7 @@
Clinical trials offer a fundamental opportunity to discover new treatments and advance the medical knowledge. However, the uncertainty of the outcome of a trial can lead to unforeseen costs and setbacks. In this study, we propose a new method to predict the effectiveness of an intervention in a clinical trial. Our method relies on generating an informative summary from multiple documents available in the literature about the intervention under study. Specifically, our method first gathers all the abstracts of PubMed articles related to the intervention. Then, an evidence sentence, which conveys information about the effectiveness of the intervention, is extracted automatically from each abstract. Based on the set of evidence sentences extracted from the abstracts, a short summary about the intervention is constructed. Finally, the produced summaries are used to train a BERT-based classifier, in order to infer the effectiveness of an intervention. To evaluate our proposed method, we introduce a new dataset which is a collection of clinical trials together with their associated PubMed articles. Our experiments, demonstrate the effectiveness of producing short informative summaries and using them to predict the effectiveness of an intervention.
2022.acl-long.137
katsimpras-paliouras-2022-predicting
+ 10.18653/v1/2022.acl-long.137
BiTIIMT: A Bilingual Text-infilling Method for Interactive Machine Translation
@@ -1997,6 +2134,7 @@
2022.acl-long.138
xiao-etal-2022-bitiimt
WMT 2014
+ 10.18653/v1/2022.acl-long.138
Distributionally Robust Finetuning BERT for Covariate Drift in Spoken Language Understanding
@@ -2007,6 +2145,7 @@
In this study, we investigate robustness against covariate drift in spoken language understanding (SLU). Covariate drift can occur in SLUwhen there is a drift between training and testing regarding what users request or how they request it. To study this we propose a method that exploits natural variations in data to create a covariate drift in SLU datasets. Experiments show that a state-of-the-art BERT-based model suffers performance loss under this drift. To mitigate the performance loss, we investigate distributionally robust optimization (DRO) for finetuning BERT-based models. We discuss some recent DRO methods, propose two new variants and empirically show that DRO improves robustness under drift.
2022.acl-long.139
broscheit-etal-2022-distributionally
+ 10.18653/v1/2022.acl-long.139
Enhancing Chinese Pre-trained Language Model via Heterogeneous Linguistics Graph
@@ -2026,6 +2165,7 @@
CMRC
CMRC 2018
DRCD
+ 10.18653/v1/2022.acl-long.140
Divide and Denoise: Learning from Noisy Labels in Fine-Grained Entity Typing with Cluster-Wise Loss Correction
@@ -2037,6 +2177,7 @@
Fine-grained Entity Typing (FET) has made great progress based on distant supervision but still suffers from label noise. Existing FET noise learning methods rely on prediction distributions in an instance-independent manner, which causes the problem of confirmation bias. In this work, we propose a clustering-based loss correction framework named Feature Cluster Loss Correction (FCLC), to address these two problems. FCLC first train a coarse backbone model as a feature extractor and noise estimator. Loss correction is then applied to each feature cluster, learning directly from the noisy labels. Experimental results on three public datasets show that FCLC achieves the best performance over existing competitive systems. Auxiliary experiments further demonstrate that FCLC is stable to hyperparameters and it does help mitigate confirmation bias. We also find that in the extreme case of no clean data, the FCLC framework still achieves competitive performance.
2022.acl-long.141
pang-etal-2022-divide
+ 10.18653/v1/2022.acl-long.141
Towards Robustness of Text-to-SQL Models Against Natural and Realistic Adversarial Table Perturbation
@@ -2055,6 +2196,7 @@
ConceptNet
SParC
WikiSQL
+ 10.18653/v1/2022.acl-long.142
Overcoming Catastrophic Forgetting beyond Continual Learning: Balanced Training for Neural Machine Translation
@@ -2067,6 +2209,7 @@
ictnlp/cokd
CIFAR-10
CIFAR-100
+ 10.18653/v1/2022.acl-long.143
Metaphors in Pre-Trained Language Models: Probing and Generalization Across Datasets and Languages
@@ -2079,6 +2222,7 @@
2022.acl-long.144.software.zip
aghazadeh-etal-2022-metaphors
ehsanaghazadeh/metaphors_in_plms
+ 10.18653/v1/2022.acl-long.144
Discrete Opinion Tree Induction for Aspect-based Sentiment Analysis
@@ -2092,6 +2236,7 @@
2022.acl-long.145.software.zip
chen-etal-2022-discrete
MAMS
+ 10.18653/v1/2022.acl-long.145
Investigating Non-local Features for Neural Constituency Parsing
@@ -2104,6 +2249,7 @@
2022.acl-long.146.software.zip
cui-etal-2022-investigating
ringos/nfc-parser
+ 10.18653/v1/2022.acl-long.146
Learning from Sibling Mentions with Scalable Graph Inference in Fine-Grained Entity Typing
@@ -2119,6 +2265,7 @@
2022.acl-long.147
2022.acl-long.147.software.zip
chen-etal-2022-learning-sibling
+ 10.18653/v1/2022.acl-long.147
A Variational Hierarchical Model for Neural Cross-Lingual Summarization
@@ -2136,6 +2283,7 @@
liang-etal-2022-variational
xl2248/vhm
LCSTS
+ 10.18653/v1/2022.acl-long.148
On the Robustness of Question Rewriting Systems to Questions of Varying Hardness
@@ -2149,6 +2297,7 @@
ye-etal-2022-robustness
CANARD
QuAC
+ 10.18653/v1/2022.acl-long.149
OpenHands: Making Sign Language Recognition Accessible with Pose-based Pretrained Models across Languages
@@ -2165,6 +2314,7 @@
AUTSL
GSL
WLASL
+ 10.18653/v1/2022.acl-long.150
bert2BERT: Towards Reusable Pretrained Language Models
@@ -2185,6 +2335,7 @@
BookCorpus
CoLA
GLUE
+ 10.18653/v1/2022.acl-long.151
Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis
@@ -2196,6 +2347,7 @@
2022.acl-long.152
ling-etal-2022-vision
nustm/vlp-mabsa
+ 10.18653/v1/2022.acl-long.152
"You might think about slightly revising the title”: Identifying Hedges in Peer-tutoring Interactions
@@ -2206,6 +2358,7 @@
Hedges have an important role in the management of rapport. In peer-tutoring, they are notably used by tutors in dyads experiencing low rapport to tone down the impact of instructions and negative feedback.Pursuing the objective of building a tutoring agent that manages rapport with teenagers in order to improve learning, we used a multimodal peer-tutoring dataset to construct a computational framework for identifying hedges. We compared approaches relying on pre-trained resources with others that integrate insights from the social science literature. Our best performance involved a hybrid approach that outperforms the existing baseline while being easier to interpret. We employ a model explainability tool to explore the features that characterize hedges in peer-tutoring conversations, and we identify some novel features, and the benefits of a such a hybrid model approach.
2022.acl-long.153
raphalen-etal-2022-might
+ 10.18653/v1/2022.acl-long.153
Efficient Cluster-Based k-Nearest-Neighbor Machine Translation
@@ -2221,6 +2374,7 @@
wang-etal-2022-efficient
tjunlp-lab/pckmt
WikiMatrix
+ 10.18653/v1/2022.acl-long.154
Headed-Span-Based Projective Dependency Parsing
@@ -2232,6 +2386,7 @@
yang-tu-2022-headed
sustcsonglin/span-based-dependency-parsing
Penn Treebank
+ 10.18653/v1/2022.acl-long.155
Decoding Part-of-Speech from Human EEG Signals
@@ -2243,6 +2398,7 @@
This work explores techniques to predict Part-of-Speech (PoS) tags from neural signals measured at millisecond resolution with electroencephalography (EEG) during text reading. We first show that information about word length, frequency and word class is encoded by the brain at different post-stimulus latencies. We then demonstrate that pre-training on averaged EEG data and data augmentation techniques boost PoS decoding accuracy for single EEG trials. Finally, applying optimised temporally-resolved decoding techniques we show that Transformers substantially outperform linear-SVMs on PoS tagging of unigram and bigram data.
2022.acl-long.156
murphy-etal-2022-decoding
+ 10.18653/v1/2022.acl-long.156
Robust Lottery Tickets for Pre-trained Language Models
@@ -2263,6 +2419,7 @@
AG News
IMDb Movie Reviews
SST
+ 10.18653/v1/2022.acl-long.157
Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification
@@ -2282,6 +2439,7 @@
thunlp/knowledgeableprompttuning
C4
IMDb Movie Reviews
+ 10.18653/v1/2022.acl-long.158
Cross-Lingual Contrastive Learning for Fine-Grained Entity Typing for Low-Resource Languages
@@ -2301,6 +2459,7 @@
thunlp/crosset
Few-NERD
Open Entity
+ 10.18653/v1/2022.acl-long.159
MELM: Data Augmentation with Masked Entity Language Modeling for Low-Resource NER
@@ -2317,6 +2476,7 @@
2022.acl-long.160.software.zip
zhou-etal-2022-melm
randyzhouran/melm
+ 10.18653/v1/2022.acl-long.160
Word2Box: Capturing Set-Theoretic Semantics of Words using Box Embeddings
@@ -2332,6 +2492,7 @@
2022.acl-long.161
2022.acl-long.161.software.zip
dasgupta-etal-2022-word2box
+ 10.18653/v1/2022.acl-long.161
IAM: A Comprehensive and Large-Scale Dataset for Integrated Argument Mining Tasks
@@ -2348,6 +2509,7 @@
cheng-etal-2022-iam
liyingcheng95/iam
IAM Dataset
+ 10.18653/v1/2022.acl-long.162
PLANET: Dynamic Content Planning in Autoregressive Transformers for Long-form Text Generation
@@ -2361,6 +2523,7 @@
Despite recent progress of pre-trained language models on generating fluent text, existing methods still suffer from incoherence problems in long-form text generation tasks that require proper content control and planning to form a coherent high-level logical flow. In this work, we propose PLANET, a novel generation framework leveraging autoregressive self-attention mechanism to conduct content planning and surface realization dynamically. To guide the generation of output sentences, our framework enriches the Transformer decoder with latent representations to maintain sentence-level semantic plans grounded by bag-of-words. Moreover, we introduce a new coherence-based contrastive learning objective to further improve the coherence of output. Extensive experiments are conducted on two challenging long-form text generation tasks including counterargument generation and opinion article generation. Both automatic and human evaluations show that our method significantly outperforms strong baselines and generates more coherent texts with richer contents.
2022.acl-long.163
hu-etal-2022-planet
+ 10.18653/v1/2022.acl-long.163
CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation
@@ -2376,6 +2539,7 @@
2022.acl-long.164
2022.acl-long.164.software.zip
ke-etal-2022-ctrleval
+ 10.18653/v1/2022.acl-long.164
Beyond the Granularity: Multi-Perspective Dialogue Collaborative Selection for Dialogue State Tracking
@@ -2390,6 +2554,7 @@
2022.acl-long.165.software.zip
guo-etal-2022-beyond
guojinyu88/dicos-master
+ 10.18653/v1/2022.acl-long.165
Are Prompt-based Models Clueless?
@@ -2403,6 +2568,7 @@
GLUE
SNLI
SuperGLUE
+ 10.18653/v1/2022.acl-long.166
Learning Confidence for Transformer-based Neural Machine Translation
@@ -2417,6 +2583,7 @@
2022.acl-long.167.software.zip
lu-etal-2022-learning
yulu-dada/learned-conf-nmt
+ 10.18653/v1/2022.acl-long.167
Things not Written in Text: Exploring Spatial Commonsense from Visual Signals
@@ -2432,6 +2599,7 @@
xxxiaol/spatial-commonsense
COCO
Relative Size
+ 10.18653/v1/2022.acl-long.168
Conditional Bilingual Mutual Information Based Adaptive Training for Neural Machine Translation
@@ -2447,6 +2615,7 @@
2022.acl-long.169
zhang-etal-2022-conditional
songmzhang/cbmi
+ 10.18653/v1/2022.acl-long.169
ClusterFormer: Neural Clustering Attention for Efficient and Effective Transformer
@@ -2465,6 +2634,7 @@
MPQA Opinion Corpus
SNLI
WikiQA
+ 10.18653/v1/2022.acl-long.170
Bottom-Up Constituency Parsing and Nested Named Entity Recognition with Pointer Networks
@@ -2477,6 +2647,7 @@
sustcsonglin/pointer-net-for-nested
GENIA
Penn Treebank
+ 10.18653/v1/2022.acl-long.171
Redistributing Low-Frequency Words: Making the Most of Monolingual Data in Non-Autoregressive Translation
@@ -2489,6 +2660,7 @@
Knowledge distillation (KD) is the preliminary step for training non-autoregressive translation (NAT) models, which eases the training of NAT models at the cost of losing important information for translating low-frequency words. In this work, we provide an appealing alternative for NAT – monolingual KD, which trains NAT student on external monolingual data with AT teacher trained on the original bilingual data. Monolingual KD is able to transfer both the knowledge of the original bilingual data (implicitly encoded in the trained AT teacher model) and that of the new monolingual data to the NAT student model. Extensive experiments on eight WMT benchmarks over two advanced NAT models show that monolingual KD consistently outperforms the standard KD by improving low-frequency word translation, without introducing any computational cost. Monolingual KD enjoys desirable expandability, which can be further enhanced (when given more computational budget) by combining with the standard KD, a reverse monolingual KD, or enlarging the scale of monolingual data. Extensive analyses demonstrate that these techniques can be used together profitably to further recall the useful information lost in the standard KD. Encouragingly, combining with standard KD, our approach achieves 30.4 and 34.1 BLEU points on the WMT14 English-German and German-English datasets, respectively. Our code and trained models are freely available at https://github.com/alphadl/RLFW-NAT.mono.
2022.acl-long.172
ding-etal-2022-redistributing
+ 10.18653/v1/2022.acl-long.172
Dependency Parsing as MRC-based Span-Span Prediction
@@ -2507,6 +2679,7 @@
ShannonAI/mrc-for-dependency-parsing
Penn Treebank
Universal Dependencies
+ 10.18653/v1/2022.acl-long.173
Adversarial Soft Prompt Tuning for Cross-Domain Sentiment Analysis
@@ -2516,6 +2689,7 @@
Cross-domain sentiment analysis has achieved promising results with the help of pre-trained language models. As GPT-3 appears, prompt tuning has been widely explored to enable better semantic modeling in many natural language processing tasks. However, directly using a fixed predefined template for cross-domain research cannot model different distributions of the \operatorname{[MASK]} token in different domains, thus making underuse of the prompt tuning technique. In this paper, we propose a novel Adversarial Soft Prompt Tuning method (AdSPT) to better model cross-domain sentiment analysis. On the one hand, AdSPT adopts separate soft prompts instead of hard templates to learn different vectors for different domains, thus alleviating the domain discrepancy of the \operatorname{[MASK]} token in the masked language modeling task. On the other hand, AdSPT uses a novel domain adversarial training strategy to learn domain-invariant representations between each source domain and the target domain. Experiments on a publicly available sentiment analysis dataset show that our model achieves the new state-of-the-art results for both single-source domain adaptation and multi-source domain adaptation.
2022.acl-long.174
wu-shi-2022-adversarial
+ 10.18653/v1/2022.acl-long.174
Generating Scientific Claims for Zero-Shot Scientific Fact Checking
@@ -2533,6 +2707,7 @@
allenai/scientific-claim-generation
FEVER
SciFact
+ 10.18653/v1/2022.acl-long.175
Modeling Dual Read/Write Paths for Simultaneous Machine Translation
@@ -2543,6 +2718,7 @@
2022.acl-long.176
zhang-feng-2022-modeling
ictnlp/dual-paths
+ 10.18653/v1/2022.acl-long.176
ExtEnD: Extractive Entity Disambiguation
@@ -2555,6 +2731,7 @@
barba-etal-2022-extend
sapienzanlp/extend
AIDA CoNLL-YAGO
+ 10.18653/v1/2022.acl-long.177
Hierarchical Sketch Induction for Paraphrase Generation
@@ -2570,6 +2747,7 @@
GLUE
Paralex
Quora Question Pairs
+ 10.18653/v1/2022.acl-long.178
Alignment-Augmented Consistent Translation for Multilingual Open Information Extraction
@@ -2585,6 +2763,7 @@
kolluru-etal-2022-alignment
dair-iitd/moie
X-SRL
+ 10.18653/v1/2022.acl-long.179
Text-to-Table: A New Way of Information Extraction
@@ -2598,6 +2777,7 @@
shirley-wu/text_to_table
RotoWire
WikiBio
+ 10.18653/v1/2022.acl-long.180
Accelerating Code Search with Deep Hashing and Code Classification
@@ -2613,6 +2793,7 @@
2022.acl-long.181
gu-etal-2022-accelerating
CodeSearchNet
+ 10.18653/v1/2022.acl-long.181
Other Roles Matter! Enhancing Role-Oriented Dialogue Summarization via Role Interactions
@@ -2628,6 +2809,7 @@
2022.acl-long.182.software.zip
lin-etal-2022-roles
xiaolinandy/rods
+ 10.18653/v1/2022.acl-long.182
ClarET: Pre-training a Correlation-Aware Context-To-Event Transformer for Event-Centric Generation and Classification
@@ -2644,6 +2826,7 @@
yczhou001/ClarET
GLUE
ROCStories
+ 10.18653/v1/2022.acl-long.183
Measuring and Mitigating Name Biases in Neural Machine Translation
@@ -2654,6 +2837,7 @@
Neural Machine Translation (NMT) systems exhibit problematic biases, such as stereotypical gender bias in the translation of occupation terms into languages with grammatical gender. In this paper we describe a new source of bias prevalent in NMT systems, relating to translations of sentences containing person names. To correctly translate such sentences, a NMT system needs to determine the gender of the name. We show that leading systems are particularly poor at this task, especially for female given names. This bias is deeper than given name gender: we show that the translation of terms with ambiguous sentiment can also be affected by person names, and the same holds true for proper nouns denoting race. To mitigate these biases we propose a simple but effective data augmentation method based on randomly switching entities during translation, which effectively eliminates the problem without any effect on translation quality.
2022.acl-long.184
wang-etal-2022-measuring
+ 10.18653/v1/2022.acl-long.184
Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation
@@ -2668,6 +2852,7 @@
In this paper, we present a substantial step in better understanding the SOTA sequence-to-sequence (Seq2Seq) pretraining for neural machine translation (NMT). We focus on studying the impact of the jointly pretrained decoder, which is the main difference between Seq2Seq pretraining and previous encoder-based pretraining approaches for NMT. By carefully designing experiments on three language pairs, we find that Seq2Seq pretraining is a double-edged sword: On one hand, it helps NMT models to produce more diverse translations and reduce adequacy-related translation errors. On the other hand, the discrepancies between Seq2Seq pretraining and NMT finetuning limit the translation quality (i.e., domain discrepancy) and induce the over-estimation issue (i.e., objective discrepancy). Based on these observations, we further propose simple and effective strategies, named in-domain pretraining and input adaptation to remedy the domain and objective discrepancies, respectively. Experimental results on several language pairs show that our approach can consistently improve both translation performance and model robustness upon Seq2Seq pretraining.
2022.acl-long.185
wang-etal-2022-understanding
+ 10.18653/v1/2022.acl-long.185
MSCTD: A Multimodal Sentiment Chat Translation Dataset
@@ -2684,6 +2869,7 @@
BMELD
MELD
OpenViDial
+ 10.18653/v1/2022.acl-long.186
Learning Disentangled Textual Representations via Statistical Measures of Similarity
@@ -2696,6 +2882,7 @@
2022.acl-long.187
2022.acl-long.187.software.zip
colombo-etal-2022-learning
+ 10.18653/v1/2022.acl-long.187
On the Sensitivity and Stability of Model Interpretations in NLP
@@ -2710,6 +2897,7 @@
uclanlp/nlp-interpretation-faithfulness
AG News
SST
+ 10.18653/v1/2022.acl-long.188
Down and Across: Introducing Crossword-Solving as a New NLP Benchmark
@@ -2721,6 +2909,7 @@
Solving crossword puzzles requires diverse reasoning capabilities, access to a vast amount of knowledge about language and the world, and the ability to satisfy the constraints imposed by the structure of the puzzle. In this work, we introduce solving crossword puzzles as a new natural language understanding task. We release a corpus of crossword puzzles collected from the New York Times daily crossword spanning 25 years and comprised of a total of around nine thousand puzzles. These puzzles include a diverse set of clues: historic, factual, word meaning, synonyms/antonyms, fill-in-the-blank, abbreviations, prefixes/suffixes, wordplay, and cross-lingual, as well as clues that depend on the answers to other clues. We separately release the clue-answer pairs from these puzzles as an open-domain question answering dataset containing over half a million unique clue-answer pairs. For the question answering task, our baselines include several sequence-to-sequence and retrieval-based generative models. We also introduce a non-parametric constraint satisfaction baseline for solving the entire crossword puzzle. Finally, we propose an evaluation framework which consists of several complementary performance metrics.
2022.acl-long.189
kulshreshtha-etal-2022-across
+ 10.18653/v1/2022.acl-long.189
Generating Data to Mitigate Spurious Correlations in Natural Language Inference Datasets
@@ -2737,6 +2926,7 @@
HANS
MultiNLI
SNLI
+ 10.18653/v1/2022.acl-long.190
GL-CLeF: A Global–Local Contrastive Learning Framework for Cross-lingual Spoken Language Understanding
@@ -2753,6 +2943,7 @@
2022.acl-long.191.software.zip
qin-etal-2022-gl
lightchen233/gl-clef
+ 10.18653/v1/2022.acl-long.191
Good Examples Make A Faster Learner: Simple Demonstration-based Learning for Low-resource NER
@@ -2773,6 +2964,7 @@
lee-etal-2022-good
ink-usc/fewner
BC5CDR
+ 10.18653/v1/2022.acl-long.192
Contextual Representation Learning beyond Masked Language Modeling
@@ -2790,6 +2982,7 @@
MRPC
QNLI
SST
+ 10.18653/v1/2022.acl-long.193
Efficient Hyper-parameter Search for Knowledge Graph Embedding
@@ -2804,6 +2997,7 @@
automl-research/kgtuner
FB15k-237
OGB
+ 10.18653/v1/2022.acl-long.194
A Meta-framework for Spatiotemporal Quantity Extraction from Text
@@ -2817,6 +3011,7 @@
News events are often associated with quantities (e.g., the number of COVID-19 patients or the number of arrests in a protest), and it is often important to extract their type, time, and location from unstructured text in order to analyze these quantity events. This paper thus formulates the NLP problem of spatiotemporal quantity extraction, and proposes the first meta-framework for solving it. This meta-framework contains a formalism that decomposes the problem into several information extraction tasks, a shareable crowdsourcing pipeline, and transformer-based baseline models. We demonstrate the meta-framework in three domains—the COVID-19 pandemic, Black Lives Matter protests, and 2020 California wildfires—to show that the formalism is general and extensible, the crowdsourcing pipeline facilitates fast and high-quality data annotation, and the baseline system can handle spatiotemporal quantity extraction well enough to be practically useful. We release all resources for future research on this topic at https://github.com/steqe.
2022.acl-long.195
ning-etal-2022-meta
+ 10.18653/v1/2022.acl-long.195
Leveraging Visual Knowledge in Language Tasks: An Empirical Study on Intermediate Pre-training for Cross-Modal Knowledge Transfer
@@ -2838,6 +3033,7 @@
PIQA
WikiText-103
WikiText-2
+ 10.18653/v1/2022.acl-long.196
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models
@@ -2858,6 +3054,7 @@
OK-VQA
Visual Genome
nocaps
+ 10.18653/v1/2022.acl-long.197
Continual Few-shot Relation Learning via Embedding Space Regularization and Data Augmentation
@@ -2870,6 +3067,7 @@
qin-joty-2022-continual
qcwthu/continual_fewshot_relation_learning
FewRel
+ 10.18653/v1/2022.acl-long.198
Variational Graph Autoencoding as Cheap Supervision for AMR Coreference Resolution
@@ -2882,6 +3080,7 @@
2022.acl-long.199
li-etal-2022-variational
AMR Bank
+ 10.18653/v1/2022.acl-long.199
Identifying Chinese Opinion Expressions with Extremely-Noisy Crowdsourcing Annotations
@@ -2897,6 +3096,7 @@
2022.acl-long.200.software.zip
zhang-etal-2022-identifying
MPQA Opinion Corpus
+ 10.18653/v1/2022.acl-long.200
Sequence-to-Sequence Knowledge Graph Completion and Question Answering
@@ -2915,6 +3115,7 @@
WebQuestions
WebQuestionsSP
WikiMovies
+ 10.18653/v1/2022.acl-long.201
Learning to Mediate Disparities Towards Pragmatic Communication
@@ -2926,6 +3127,7 @@
2022.acl-long.202
bao-etal-2022-learning
sled-group/pragmatic-rational-speaker
+ 10.18653/v1/2022.acl-long.202
Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval
@@ -2939,6 +3141,7 @@
MS MARCO
Natural Questions
TriviaQA
+ 10.18653/v1/2022.acl-long.203
Multimodal Dialogue Response Generation
@@ -2958,6 +3161,7 @@
2022.acl-long.204.software.zip
sun-etal-2022-multimodal
ImageNet
+ 10.18653/v1/2022.acl-long.204
CAKE: A Scalable Commonsense-Aware Framework For Multi-View Knowledge Graph Completion
@@ -2973,6 +3177,7 @@
ConceptNet
FB15k-237
NELL-995
+ 10.18653/v1/2022.acl-long.205
Confidence Based Bidirectional Global Context Aware Training Framework for Neural Machine Translation
@@ -2987,6 +3192,7 @@
2022.acl-long.206
2022.acl-long.206.software.zip
zhou-etal-2022-confidence
+ 10.18653/v1/2022.acl-long.206
BRIO: Bringing Order to Abstractive Summarization
@@ -3001,6 +3207,7 @@
yixinl7/brio
CNN/Daily Mail
XSum
+ 10.18653/v1/2022.acl-long.207
Leveraging Relaxed Equilibrium by Lazy Transition for Sequence Modeling
@@ -3012,6 +3219,7 @@
2022.acl-long.208.software.zip
ai-fang-2022-leveraging
LAMBADA
+ 10.18653/v1/2022.acl-long.208
FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation Framework
@@ -3031,6 +3239,7 @@
ActivityNet Captions
VATEX
Visual Question Answering
+ 10.18653/v1/2022.acl-long.209
KenMeSH: Knowledge-enhanced End-to-end Biomedical Text Labelling
@@ -3042,6 +3251,7 @@
2022.acl-long.210
wang-etal-2022-kenmesh
xdwang0726/kenmesh
+ 10.18653/v1/2022.acl-long.210
A Taxonomy of Empathetic Questions in Social Dialogs
@@ -3055,6 +3265,7 @@
2022.acl-long.211.software.zip
svikhnushina-etal-2022-taxonomy
sea94/eqt
+ 10.18653/v1/2022.acl-long.211
Enhanced Multi-Channel Graph Convolutional Network for Aspect Sentiment Triplet Extraction
@@ -3069,6 +3280,7 @@
2022.acl-long.212.software.zip
chen-etal-2022-enhanced
ccchenhao997/emcgcn-aste
+ 10.18653/v1/2022.acl-long.212
ProtoTEx: Explaining Model Decisions with Prototype Tensors
@@ -3082,6 +3294,7 @@
2022.acl-long.213
das-etal-2022-prototex
anubrata/prototex
+ 10.18653/v1/2022.acl-long.213
Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data
@@ -3099,6 +3312,7 @@
zhou-etal-2022-show
shuyanzhou/wikihow_hierarchy
HowTo100M
+ 10.18653/v1/2022.acl-long.214
Cross-Modal Discrete Representation Learning
@@ -3115,6 +3329,7 @@
ImageNet
MSR-VTT
Places205
+ 10.18653/v1/2022.acl-long.215
Improving Event Representation via Simultaneous Weakly Supervised Contrastive Learning and Clustering
@@ -3130,6 +3345,7 @@
2022.acl-long.216.software.zip
gao-etal-2022-improving
gaojun4ever/swcc4event
+ 10.18653/v1/2022.acl-long.216
Contrastive Visual Semantic Pretraining Magnifies the Semantics of Natural Language Representations
@@ -3140,6 +3356,7 @@
2022.acl-long.217
2022.acl-long.217.software.zip
wolfe-caliskan-2022-contrastive
+ 10.18653/v1/2022.acl-long.217
ConTinTin: Continual Learning from Task Instructions
@@ -3150,6 +3367,7 @@
The mainstream machine learning paradigms for NLP often work with two underlying presumptions. First, the target task is predefined and static; a system merely needs to learn to solve it exclusively. Second, the supervision of a task mainly comes from a set of labeled examples. A question arises: how to build a system that can keep learning new tasks from their instructions?This work defines a new learning paradigm ConTinTin (Continual Learning from Task Instructions), in which a system should learn a sequence of new tasks one by one, each task is explained by a piece of textual instruction. The system is required to (i) generate the expected outputs of a new task by learning from its instruction, (ii) transfer the knowledge acquired from upstream tasks to help solve downstream tasks (i.e., forward-transfer), and (iii) retain or even improve the performance on earlier tasks after learning new tasks (i.e., backward-transfer). This new problem is studied on a stream of more than 60 tasks, each equipped with an instruction. Technically, our method InstructionSpeak contains two strategies that make full use of task instructions to improve forward-transfer and backward-transfer: one is to learn from negative outputs, the other is to re-visit instructions of previous tasks. To our knowledge, this is the first time to study ConTinTin in NLP. In addition to the problem formulation and our promising approach, this work also contributes to providing rich analyses for the community to better understand this novel learning problem.
2022.acl-long.218
yin-etal-2022-contintin
+ 10.18653/v1/2022.acl-long.218
Automated Crossword Solving
@@ -3166,6 +3384,7 @@
2022.acl-long.219.software.zip
wallace-etal-2022-automated
albertkx/berkeley-crossword-solver
+ 10.18653/v1/2022.acl-long.219
Learned Incremental Representations for Parsing
@@ -3179,6 +3398,7 @@
kitaev-etal-2022-learned
thomaslu2000/incremental-parsing-representations
Penn Treebank
+ 10.18653/v1/2022.acl-long.220
Knowledge Enhanced Reflection Generation for Counseling Dialogues
@@ -3192,6 +3412,7 @@
2022.acl-long.221
shen-etal-2022-knowledge
ConceptNet
+ 10.18653/v1/2022.acl-long.221
Misinfo Reaction Frames: Reasoning about Readers’ Reactions to News Headlines
@@ -3209,6 +3430,7 @@
skgabriel/mrf-modeling
CoAID
RealNews
+ 10.18653/v1/2022.acl-long.222
On Continual Model Refinement in Out-of-Distribution Data Streams
@@ -3227,6 +3449,7 @@
Natural Questions
SQuAD
SearchQA
+ 10.18653/v1/2022.acl-long.223
Achieving Conversational Goals with Unsupervised Post-hoc Knowledge Injection
@@ -3240,6 +3463,7 @@
2022.acl-long.224.software.zip
majumder-etal-2022-achieving
majumderb/poki
+ 10.18653/v1/2022.acl-long.224
Generated Knowledge Prompting for Commonsense Reasoning
@@ -3261,6 +3485,7 @@
ConceptNet
NumerSense
QASC
+ 10.18653/v1/2022.acl-long.225
Training Data is More Valuable than You Think: A Simple and Effective Method by Retrieving from Training Data
@@ -3287,6 +3512,7 @@
WikiHow
WikiText-103
WikiText-2
+ 10.18653/v1/2022.acl-long.226
Life after BERT: What do Other Muppets Understand about Language?
@@ -3300,6 +3526,7 @@
lialin-etal-2022-life
kev-zhao/life-after-bert
WebText
+ 10.18653/v1/2022.acl-long.227
Tailor: Generating and Perturbing Text with Semantic Controls
@@ -3317,6 +3544,7 @@
SNLI
StylePTB
Universal Dependencies
+ 10.18653/v1/2022.acl-long.228
TruthfulQA: Measuring How Models Mimic Human Falsehoods
@@ -3330,6 +3558,7 @@
lin-etal-2022-truthfulqa
sylinrl/truthfulqa
TruthfulQA
+ 10.18653/v1/2022.acl-long.229
Adaptive Testing and Debugging of NLP Models
@@ -3340,6 +3569,7 @@
2022.acl-long.230
ribeiro-lundberg-2022-adaptive
PAWS
+ 10.18653/v1/2022.acl-long.230
Right for the Right Reason: Evidence Extraction for Trustworthy Tabular Reasoning
@@ -3354,6 +3584,7 @@
2022.acl-long.231
gupta-etal-2022-right
TabFact
+ 10.18653/v1/2022.acl-long.231
Interactive Word Completion for Plains Cree
@@ -3364,6 +3595,7 @@
The composition of richly-inflected words in morphologically complex languages can be a challenge for language learners developing literacy. Accordingly, Lane and Bird (2020) proposed a finite state approach which maps prefixes in a language to a set of possible completions up to the next morpheme boundary, for the incremental building of complex words. In this work, we develop an approach to morph-based auto-completion based on a finite state morphological analyzer of Plains Cree (nêhiyawêwin), showing the portability of the concept to a much larger, more complete morphological transducer. Additionally, we propose and compare various novel ranking strategies on the morph auto-complete output. The best weighting scheme ranks the target completion in the top 10 results in 64.9% of queries, and in the top 50 in 73.9% of queries.
2022.acl-long.232
lane-etal-2022-interactive
+ 10.18653/v1/2022.acl-long.232
LAGr: Label Aligned Graphs for Better Systematic Generalization in Semantic Parsing
@@ -3374,6 +3606,7 @@
2022.acl-long.233
jambor-bahdanau-2022-lagr
CFQ
+ 10.18653/v1/2022.acl-long.233
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection
@@ -3391,6 +3624,7 @@
ToxiGen
Hate Speech
Implicit Hate
+ 10.18653/v1/2022.acl-long.234
Direct Speech-to-Speech Translation With Discrete Units
@@ -3411,6 +3645,7 @@
2022.acl-long.235
lee-etal-2022-direct
LibriSpeech
+ 10.18653/v1/2022.acl-long.235
Hallucinated but Factual! Inspecting the Factuality of Hallucinations in Abstractive Summarization
@@ -3423,6 +3658,7 @@
2022.acl-long.236.software.zip
cao-etal-2022-hallucinated
mcao516/entfa
+ 10.18653/v1/2022.acl-long.236
EntSUM: A Data Set for Entity-Centric Extractive Summarization
@@ -3434,6 +3670,7 @@
2022.acl-long.237
maddela-etal-2022-entsum
bloomberg/entsum
+ 10.18653/v1/2022.acl-long.237
Sentence-level Privacy for Document Embeddings
@@ -3445,6 +3682,7 @@
2022.acl-long.238
meehan-etal-2022-sentence
IMDb Movie Reviews
+ 10.18653/v1/2022.acl-long.238
Dataset Geography: Mapping Language Data to Language Users
@@ -3460,6 +3698,7 @@
Natural Questions
SQuAD
TyDi QA
+ 10.18653/v1/2022.acl-long.239
ILDAE: Instance-Level Difficulty Analysis of Evaluation Data
@@ -3479,6 +3718,7 @@
SNLI
SWAG
WinoGrande
+ 10.18653/v1/2022.acl-long.240
Image Retrieval from Contextual Descriptions
@@ -3496,6 +3736,7 @@
Spot-the-diff
Video Storytelling
YouCook
+ 10.18653/v1/2022.acl-long.241
Multilingual Molecular Representation Learning via Contrastive Pre-training
@@ -3509,6 +3750,7 @@
2022.acl-long.242
guo-etal-2022-multilingual
MoleculeNet
+ 10.18653/v1/2022.acl-long.242
Investigating Failures of Automatic Translation
@@ -3521,6 +3763,7 @@ in the Case of Unambiguous Gender
2022.acl-long.243
2022.acl-long.243.software.zip
renduchintala-williams-2022-investigating
+ 10.18653/v1/2022.acl-long.243
Cross-Task Generalization via Natural Language Crowdsourcing Instructions
@@ -3540,6 +3783,7 @@ in the Case of Unambiguous Gender
QASC
Quoref
WinoGrande
+ 10.18653/v1/2022.acl-long.244
Imputing Out-of-Vocabulary Embeddings with LOVE Makes LanguageModels Robust with Little Cost
@@ -3553,6 +3797,7 @@ in the Case of Unambiguous Gender
chen-etal-2022-imputing
tigerchen52/love
SST
+ 10.18653/v1/2022.acl-long.245
NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks
@@ -3570,6 +3815,7 @@ in the Case of Unambiguous Gender
GLUE
MATH
SuperGLUE
+ 10.18653/v1/2022.acl-long.246
Upstream Mitigation Is
@@ -3584,6 +3830,7 @@ in the Case of Unambiguous Gender
2022.acl-long.247
2022.acl-long.247.software.zip
steed-etal-2022-upstream
+ 10.18653/v1/2022.acl-long.247
Improving Multi-label Malevolence Detection in Dialogues through Multi-faceted Label Correlation Enhancement
@@ -3598,6 +3845,7 @@ in the Case of Unambiguous Gender
2022.acl-long.248.software.zip
zhang-etal-2022-improving-multi
repozhang/malevolent_dialogue
+ 10.18653/v1/2022.acl-long.248
How Do We Answer Complex Questions: Discourse Structure of Long-form Answers
@@ -3611,6 +3859,7 @@ in the Case of Unambiguous Gender
utcsnlp/lfqa_discourse
ELI5
Natural Questions
+ 10.18653/v1/2022.acl-long.249
Understanding Iterative Revision from Human-Written Text
@@ -3625,6 +3874,7 @@ in the Case of Unambiguous Gender
2022.acl-long.250
du-etal-2022-understanding-iterative
vipulraheja/iterater
+ 10.18653/v1/2022.acl-long.250
Making Transformers Solve Compositional Tasks
@@ -3640,6 +3890,7 @@ in the Case of Unambiguous Gender
google-research/google-research
CFQ
SCAN
+ 10.18653/v1/2022.acl-long.251
Can Transformer be Too Compositional? Analysing Idiom Processing in Neural Machine Translation
@@ -3651,6 +3902,7 @@ in the Case of Unambiguous Gender
2022.acl-long.252
2022.acl-long.252.software.zip
dankers-etal-2022-transformer
+ 10.18653/v1/2022.acl-long.252
ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers
@@ -3666,6 +3918,7 @@ in the Case of Unambiguous Gender
PolicyQA
QASPER
ShARC
+ 10.18653/v1/2022.acl-long.253
Prompt-free and Efficient Few-shot Learning with Language Models
@@ -3688,6 +3941,7 @@ in the Case of Unambiguous Gender
SST
SuperGLUE
WiC
+ 10.18653/v1/2022.acl-long.254
Continual Sequence Generation with Adaptive Compositional Modules
@@ -3702,6 +3956,7 @@ in the Case of Unambiguous Gender
GT-SALT/Adaptive-Compositional-Modules
MultiWOZ
WikiSQL
+ 10.18653/v1/2022.acl-long.255
An Investigation of the (In)effectiveness of Counterfactually Augmented Data
@@ -3714,6 +3969,7 @@ in the Case of Unambiguous Gender
joshi-he-2022-investigation
joshinh/investigation-cad
BoolQ
+ 10.18653/v1/2022.acl-long.256
Inducing Positive Perspectives with Text Reframing
@@ -3727,6 +3983,7 @@ in the Case of Unambiguous Gender
2022.acl-long.257
ziems-etal-2022-inducing
gt-salt/positive-frames
+ 10.18653/v1/2022.acl-long.257
VALUE: Understanding Dialect Disparity in NLU
@@ -3742,6 +3999,7 @@ in the Case of Unambiguous Gender
CoLA
GLUE
QNLI
+ 10.18653/v1/2022.acl-long.258
From the Detection of Toxic Spans in Online Discussions to the Analysis of Toxic-to-Civil Transfer
@@ -3755,6 +4013,7 @@ in the Case of Unambiguous Gender
2022.acl-long.259
pavlopoulos-etal-2022-detection
ipavlopoulos/toxic_spans
+ 10.18653/v1/2022.acl-long.259
FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction
@@ -3773,6 +4032,7 @@ in the Case of Unambiguous Gender
2022.acl-long.260
lee-etal-2022-formnet
FUNSD
+ 10.18653/v1/2022.acl-long.260
The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems
@@ -3787,6 +4047,7 @@ in the Case of Unambiguous Gender
ziems-etal-2022-moral
gt-salt/mic
ETHICS
+ 10.18653/v1/2022.acl-long.261
Token Dropping for Efficient BERT Pretraining
@@ -3805,6 +4066,7 @@ in the Case of Unambiguous Gender
QNLI
SQuAD
SST
+ 10.18653/v1/2022.acl-long.262
DialFact: A Benchmark for Fact-Checking in Dialogue
@@ -3821,6 +4083,7 @@ in the Case of Unambiguous Gender
FEVER
VitaminC
Wizard of Wikipedia
+ 10.18653/v1/2022.acl-long.263
The Trade-offs of Domain Adaptation for Neural Language Models
@@ -3830,6 +4093,7 @@ in the Case of Unambiguous Gender
This work connects language model adaptation with concepts of machine learning theory. We consider a training setup with a large out-of-domain set and a small in-domain set. We derive how the benefit of training a model on either set depends on the size of the sets and the distance between their underlying distributions. We analyze how out-of-domain pre-training before in-domain fine-tuning achieves better generalization than either solution independently. Finally, we present how adaptation techniques based on data selection, such as importance sampling, intelligent data selection and influence functions, can be presented in a common framework which highlights their similarity and also their subtle differences.
2022.acl-long.264
grangier-iter-2022-trade
+ 10.18653/v1/2022.acl-long.264
Towards Afrocentric NLP for African Languages: Where We Are and Where We Can Go
@@ -3839,6 +4103,7 @@ in the Case of Unambiguous Gender
Aligning with ACL 2022 special Theme on “Language Diversity: from Low Resource to Endangered Languages”, we discuss the major linguistic and sociopolitical challenges facing development of NLP technologies for African languages. Situating African languages in a typological framework, we discuss how the particulars of these languages can be harnessed. To facilitate future research, we also highlight current efforts, communities, venues, datasets, and tools. Our main objective is to motivate and advocate for an Afrocentric approach to technology development. With this in mind, we recommend what technologies to build and how to build, evaluate, and deploy them based on the needs of local African communities.
2022.acl-long.265
adebara-abdul-mageed-2022-towards
+ 10.18653/v1/2022.acl-long.265
Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction
@@ -3853,6 +4118,7 @@ in the Case of Unambiguous Gender
makstarnavskyi/gector-large
FCE
WI-LOCNESS
+ 10.18653/v1/2022.acl-long.266
Speaker Information Can Guide Models to Better Inductive Biases: A Case Study On Predicting Code-Switching
@@ -3866,6 +4132,7 @@ in the Case of Unambiguous Gender
2022.acl-long.267.software.zip
ostapenko-etal-2022-speaker
ostapen/switch-and-explain
+ 10.18653/v1/2022.acl-long.267
Detecting Unassimilated Borrowings in Spanish: An Annotated Corpus and Approaches to Modeling
@@ -3877,6 +4144,7 @@ in the Case of Unambiguous Gender
2022.acl-long.268.software.zip
alvarez-mellado-lignos-2022-detecting
lirondos/coalas
+ 10.18653/v1/2022.acl-long.268
Is Attention Explanation? An Introduction to the Debate
@@ -3891,6 +4159,7 @@ in the Case of Unambiguous Gender
The performance of deep learning models in NLP and other fields of machine learning has led to a rise in their popularity, and so the need for explanations of these models becomes paramount. Attention has been seen as a solution to increase performance, while providing some explanations. However, a debate has started to cast doubt on the explanatory power of attention in neural networks. Although the debate has created a vast literature thanks to contributions from various areas, the lack of communication is becoming more and more tangible. In this paper, we provide a clear overview of the insights on the debate by critically confronting works from these different areas. This holistic vision can be of great interest for future works in all the communities concerned by this debate. We sum up the main challenges spotted in these areas, and we conclude by discussing the most promising future avenues on attention as an explanation.
2022.acl-long.269
bibal-etal-2022-attention
+ 10.18653/v1/2022.acl-long.269
There Are a Thousand Hamlets in a Thousand People’s Eyes: Enhancing Knowledge-grounded Dialogue with Personal Memory
@@ -3904,6 +4173,7 @@ in the Case of Unambiguous Gender
2022.acl-long.270
2022.acl-long.270.software.zip
fu-etal-2022-thousand
+ 10.18653/v1/2022.acl-long.270
Neural Pipeline for Zero-Shot Data-to-Text Generation
@@ -3915,6 +4185,7 @@ in the Case of Unambiguous Gender
kasner-dusek-2022-neural
kasnerz/zeroshot-d2t-pipeline
WikiSplit
+ 10.18653/v1/2022.acl-long.271
Not always about you: Prioritizing community needs when developing endangered language technology
@@ -3926,6 +4197,7 @@ in the Case of Unambiguous Gender
Languages are classified as low-resource when they lack the quantity of data necessary for training statistical and machine learning tools and models. Causes of resource scarcity vary but can include poor access to technology for developing these resources, a relatively small population of speakers, or a lack of urgency for collecting such resources in bilingual populations where the second language is high-resource. As a result, the languages described as low-resource in the literature are as different as Finnish on the one hand, with millions of speakers using it in every imaginable domain, and Seneca, with only a small-handful of fluent speakers using the language primarily in a restricted domain. While issues stemming from the lack of resources necessary to train models unite this disparate group of languages, many other issues cut across the divide between widely-spoken low-resource languages and endangered languages. In this position paper, we discuss the unique technological, cultural, practical, and ethical challenges that researchers and indigenous speech community members face when working together to develop language technology to support endangered language documentation and revitalization. We report the perspectives of language teachers, Master Speakers and elders from indigenous communities, as well as the point of view of academics. We describe an ongoing fruitful collaboration and make recommendations for future partnerships between academic researchers and language community stakeholders.
2022.acl-long.272
liu-etal-2022-always
+ 10.18653/v1/2022.acl-long.272
Automatic Identification and Classification of Bragging in Social Media
@@ -3937,6 +4209,7 @@ in the Case of Unambiguous Gender
Bragging is a speech act employed with the goal of constructing a favorable self-image through positive statements about oneself. It is widespread in daily communication and especially popular in social media, where users aim to build a positive image of their persona directly or indirectly. In this paper, we present the first large scale study of bragging in computational linguistics, building on previous research in linguistics and pragmatics. To facilitate this, we introduce a new publicly available data set of tweets annotated for bragging and their types. We empirically evaluate different transformer-based models injected with linguistic information in (a) binary bragging classification, i.e., if tweets contain bragging statements or not; and (b) multi-class bragging type prediction including not bragging. Our results show that our models can predict bragging with macro F1 up to 72.42 and 35.95 in the binary and multi-class classification tasks respectively. Finally, we present an extensive linguistic and error analysis of bragging prediction to guide future research on this topic.
2022.acl-long.273
jin-etal-2022-automatic
+ 10.18653/v1/2022.acl-long.273
Automatic Error Analysis for Document-level Information Extraction
@@ -3954,6 +4227,7 @@ in the Case of Unambiguous Gender
das-etal-2022-automatic
icejinx33/auto-err-template-fill
SciREX
+ 10.18653/v1/2022.acl-long.274
Learning Functional Distributional Semantics with Visual Data
@@ -3964,6 +4238,7 @@ in the Case of Unambiguous Gender
2022.acl-long.275
liu-emerson-2022-learning
Visual Question Answering
+ 10.18653/v1/2022.acl-long.275
ePiC: Employing Proverbs in Context as a Benchmark for Abstract Language Understanding
@@ -3976,6 +4251,7 @@ in the Case of Unambiguous Gender
ghosh-srivastava-2022-epic
sgdgp/epic
GLUE
+ 10.18653/v1/2022.acl-long.276
Chart-to-Text: A Large-Scale Benchmark for Chart Summarization
@@ -3993,6 +4269,7 @@ in the Case of Unambiguous Gender
Chart-to-text
Chart2Text
+ 10.18653/v1/2022.acl-long.277
Characterizing Idioms: Conventionality and Contingency
@@ -4004,6 +4281,7 @@ in the Case of Unambiguous Gender
Idioms are unlike most phrases in two important ways. First, words in an idiom have non-canonical meanings. Second, the non-canonical meanings of words in an idiom are contingent on the presence of other words in the idiom. Linguistic theories differ on whether these properties depend on one another, as well as whether special theoretical machinery is needed to accommodate idioms. We define two measures that correspond to the properties above, and we show that idioms fall at the expected intersection of the two dimensions, but that the dimensions themselves are not correlated. Our results suggest that introducing special machinery to handle idioms may not be warranted.
2022.acl-long.278
socolof-etal-2022-characterizing
+ 10.18653/v1/2022.acl-long.278
Bag-of-Words vs. Graph vs. Sequence in Text Classification: Questioning the Necessity of Text-Graphs and the Surprising Strength of a Wide MLP
@@ -4015,6 +4293,7 @@ in the Case of Unambiguous Gender
2022.acl-long.279.software.zip
galke-scherp-2022-bag
lgalke/text-clf-baselines
+ 10.18653/v1/2022.acl-long.279
Generative Pretraining for Paraphrase Evaluation
@@ -4032,6 +4311,7 @@ in the Case of Unambiguous Gender
PARANMT-50M
PAWS
SNLI
+ 10.18653/v1/2022.acl-long.280
Incorporating Stock Market Signals for Twitter Stance Detection
@@ -4047,6 +4327,7 @@ in the Case of Unambiguous Gender
2022.acl-long.281.software.zip
conforti-etal-2022-incorporating
cambridge-wtwt/acl2022-wtwt-stocks
+ 10.18653/v1/2022.acl-long.281
Multilingual Mix: Example Interpolation Improves Multilingual Neural Machine Translation
@@ -4060,6 +4341,7 @@ in the Case of Unambiguous Gender
Multilingual neural machine translation models are trained to maximize the likelihood of a mix of examples drawn from multiple language pairs. The dominant inductive bias applied to these models is a shared vocabulary and a shared set of parameters across languages; the inputs and labels corresponding to examples drawn from different language pairs might still reside in distinct sub-spaces. In this paper, we introduce multilingual crossover encoder-decoder (mXEncDec) to fuse language pairs at an instance level. Our approach interpolates instances from different language pairs into joint ‘crossover examples’ in order to encourage sharing input and output spaces across languages. To ensure better fusion of examples in multilingual settings, we propose several techniques to improve example interpolation across dissimilar languages under heavy data imbalance. Experiments on a large-scale WMT multilingual dataset demonstrate that our approach significantly improves quality on English-to-Many, Many-to-English and zero-shot translation tasks (from +0.5 BLEU up to +5.5 BLEU points). Results on code-switching sets demonstrate the capability of our approach to improve model generalization to out-of-distribution multilingual examples. We also conduct qualitative and quantitative representation comparisons to analyze the advantages of our approach at the representation level.
2022.acl-long.282
cheng-etal-2022-multilingual
+ 10.18653/v1/2022.acl-long.282
Word Segmentation as Unsupervised Constituency Parsing
@@ -4069,6 +4351,7 @@ in the Case of Unambiguous Gender
2022.acl-long.283
alhama-2022-word
OpenSubtitles
+ 10.18653/v1/2022.acl-long.283
SafetyKit: First Aid for Measuring Safety in Open-domain Conversational Systems
@@ -4085,6 +4368,7 @@ in the Case of Unambiguous Gender
dinan-etal-2022-safetykit
Blended Skill Talk
HONEST
+ 10.18653/v1/2022.acl-long.284
Zero-Shot Cross-lingual Semantic Parsing
@@ -4099,6 +4383,7 @@ in the Case of Unambiguous Gender
ATIS
MKQA
ParaCrawl
+ 10.18653/v1/2022.acl-long.285
The Paradox of the Compositionality of Natural Language: A Neural Machine Translation Case Study
@@ -4111,6 +4396,7 @@ in the Case of Unambiguous Gender
2022.acl-long.286.software.zip
dankers-etal-2022-paradox
i-machine-think/compositionality_paradox_mt
+ 10.18653/v1/2022.acl-long.286
Multilingual Document-Level Translation Enables Zero-Shot Transfer From Sentences to Documents
@@ -4124,6 +4410,7 @@ in the Case of Unambiguous Gender
Document-level neural machine translation (DocNMT) achieves coherent translations by incorporating cross-sentence context. However, for most language pairs there’s a shortage of parallel documents, although parallel sentences are readily available. In this paper, we study whether and how contextual modeling in DocNMT is transferable via multilingual modeling. We focus on the scenario of zero-shot transfer from teacher languages with document level data to student languages with no documents but sentence level data, and for the first time treat document-level translation as a transfer learning problem. Using simple concatenation-based DocNMT, we explore the effect of 3 factors on the transfer: the number of teacher languages with document level data, the balance between document and sentence level data at training, and the data condition of parallel documents (genuine vs. back-translated). Our experiments on Europarl-7 and IWSLT-10 show the feasibility of multilingual transfer for DocNMT, particularly on document-specific metrics. We observe that more teacher languages and adequate data balance both contribute to better transfer quality. Surprisingly, the transfer is less sensitive to the data condition, where multilingual DocNMT delivers decent performance with either back-translated or genuine document pairs.
2022.acl-long.287
zhang-etal-2022-multilingual
+ 10.18653/v1/2022.acl-long.287
Cross-Lingual Phrase Retrieval
@@ -4139,6 +4426,7 @@ in the Case of Unambiguous Gender
Cross-lingual retrieval aims to retrieve relevant text across languages. Current methods typically achieve cross-lingual retrieval by learning language-agnostic text representations in word or sentence level. However, how to learn phrase representations for cross-lingual phrase retrieval is still an open problem. In this paper, we propose , a cross-lingual phrase retriever that extracts phrase representations from unlabeled example sentences. Moreover, we create a large-scale cross-lingual phrase retrieval dataset, which contains 65K bilingual phrase pairs and 4.2M example sentences in 8 English-centric language pairs. Experimental results show that outperforms state-of-the-art baselines which utilize word-level or sentence-level representations. also shows impressive zero-shot transferability that enables the model to perform retrieval in an unseen language pair during training. Our dataset, code, and trained models are publicly available at github.com/cwszz/XPR/.
2022.acl-long.288
zheng-etal-2022-cross-lingual
+ 10.18653/v1/2022.acl-long.288
Improving Compositional Generalization with Self-Training for Data-to-Text Generation
@@ -4154,6 +4442,7 @@ in the Case of Unambiguous Gender
mehta-etal-2022-improving
google-research/google-research
SGD
+ 10.18653/v1/2022.acl-long.289
MMCoQA: Conversational Question Answering over Text, Tables, and Images
@@ -4168,6 +4457,7 @@ in the Case of Unambiguous Gender
liyongqi67/mmcoqa
ManyModalQA
ORConvQA
+ 10.18653/v1/2022.acl-long.290
Effective Token Graph Modeling using a Novel Labeling Strategy for Structured Sentiment Analysis
@@ -4183,6 +4473,7 @@ in the Case of Unambiguous Gender
xgswlg/tgls
MPQA Opinion Corpus
NoReC_fine
+ 10.18653/v1/2022.acl-long.291
PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks
@@ -4201,6 +4492,7 @@ in the Case of Unambiguous Gender
garyyufei/promda
CoNLL-2003
SST
+ 10.18653/v1/2022.acl-long.292
Disentangled Sequence to Sequence Learning for Compositional Generalization
@@ -4212,6 +4504,7 @@ in the Case of Unambiguous Gender
zheng-lapata-2022-disentangled
mswellhao/dangle
CFQ
+ 10.18653/v1/2022.acl-long.293
RST Discourse Parsing with Second-Stage EDU-Level Pre-training
@@ -4224,6 +4517,7 @@ in the Case of Unambiguous Gender
2022.acl-long.294
2022.acl-long.294.software.zip
yu-etal-2022-rst
+ 10.18653/v1/2022.acl-long.294
SimKGC: Simple Contrastive Knowledge Graph Completion with Pre-trained Language Models
@@ -4237,6 +4531,7 @@ in the Case of Unambiguous Gender
2022.acl-long.295.software.zip
wang-etal-2022-simkgc
intfloat/simkgc
+ 10.18653/v1/2022.acl-long.295
Do Transformer Models Show Similar Attention Patterns to Task-Specific Human Gaze?
@@ -4250,6 +4545,7 @@ in the Case of Unambiguous Gender
eberle-etal-2022-transformer
oeberle/task_gaze_transformers
SST
+ 10.18653/v1/2022.acl-long.296
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English
@@ -4272,6 +4568,7 @@ in the Case of Unambiguous Gender
ECtHR
GLUE
SuperGLUE
+ 10.18653/v1/2022.acl-long.297
DiBiMT: A Novel Benchmark for Measuring Word Sense Disambiguation Biases in Machine Translation
@@ -4286,6 +4583,7 @@ in the Case of Unambiguous Gender
campolungo-etal-2022-dibimt
Various fixes throughout the paper.
+ 10.18653/v1/2022.acl-long.298
Improving Word Translation via Two-Stage Contrastive Learning
@@ -4302,6 +4600,7 @@ in the Case of Unambiguous Gender
cambridgeltl/contrastivebli
PanLex-BLI
XLING
+ 10.18653/v1/2022.acl-long.299
Scheduled Multi-task Learning for Neural Chat Translation
@@ -4316,6 +4615,7 @@ in the Case of Unambiguous Gender
liang-etal-2022-scheduled
xl2248/sml
BMELD
+ 10.18653/v1/2022.acl-long.300
FairLex: A Multilingual Benchmark for Evaluating Fairness in Legal Text Processing
@@ -4332,6 +4632,7 @@ in the Case of Unambiguous Gender
chalkidis-etal-2022-fairlex
coastalcph/fairlex
ECtHR
+ 10.18653/v1/2022.acl-long.301
Towards Abstractive Grounded Summarization of Podcast Transcripts
@@ -4345,6 +4646,7 @@ in the Case of Unambiguous Gender
2022.acl-long.302
song-etal-2022-towards
tencent-ailab/grndpodcastsum
+ 10.18653/v1/2022.acl-long.302
FiNER: Financial Numeric Entity Recognition for XBRL Tagging
@@ -4361,6 +4663,7 @@ in the Case of Unambiguous Gender
loukas-etal-2022-finer
nlpaueb/finer
FiNER-139
+ 10.18653/v1/2022.acl-long.303
Keywords and Instances: A Hierarchical Contrastive Learning Framework Unifying Hybrid Granularities for Text Generation
@@ -4381,6 +4684,7 @@ in the Case of Unambiguous Gender
2022.acl-long.304.software.zip
li-etal-2022-keywords
ROCStories
+ 10.18653/v1/2022.acl-long.304
EPT-X: An Expression-Pointer Transformer model that generates eXplanations for numbers
@@ -4393,6 +4697,7 @@ in the Case of Unambiguous Gender
2022.acl-long.305
2022.acl-long.305.software.tgz
kim-etal-2022-ept
+ 10.18653/v1/2022.acl-long.305
Identifying the Human Values behind Arguments
@@ -4407,6 +4712,7 @@ in the Case of Unambiguous Gender
2022.acl-long.306
kiesel-etal-2022-identifying
webis-de/acl-22
+ 10.18653/v1/2022.acl-long.306
BenchIE: A Framework for Multi-Faceted Fact-Based Open Information Extraction Evaluation
@@ -4423,6 +4729,7 @@ in the Case of Unambiguous Gender
gashteovski-etal-2022-benchie
gkiril/benchie
BenchIE
+ 10.18653/v1/2022.acl-long.307
Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
@@ -4443,6 +4750,7 @@ in the Case of Unambiguous Gender
LRW
Libri-Light
LibriSpeech
+ 10.18653/v1/2022.acl-long.308
SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive Summarization
@@ -4457,6 +4765,7 @@ in the Case of Unambiguous Gender
ntunlp/summareranker
CNN/Daily Mail
Reddit TIFU
+ 10.18653/v1/2022.acl-long.309
Understanding Multimodal Procedural Knowledge by Sequencing Multimodal Instructional Manuals
@@ -4473,6 +4782,7 @@ in the Case of Unambiguous Gender
wu-etal-2022-understanding
RecipeQA
WikiHow
+ 10.18653/v1/2022.acl-long.310
Zoom Out and Observe: News Environment Perception for Fake News Detection
@@ -4487,6 +4797,7 @@ in the Case of Unambiguous Gender
2022.acl-long.311
sheng-etal-2022-zoom
ictmcg/news-environment-perception
+ 10.18653/v1/2022.acl-long.311
Divide and Rule: Effective Pre-Training for Context-Aware Multi-Encoder Translation Models
@@ -4502,6 +4813,7 @@ in the Case of Unambiguous Gender
IWSLT 2017
OpenSubtitles
WMT 2014
+ 10.18653/v1/2022.acl-long.312
Saliency as Evidence: Event Detection with Trigger Saliency Attribution
@@ -4514,6 +4826,7 @@ in the Case of Unambiguous Gender
2022.acl-long.313.software.zip
liu-etal-2022-saliency
MAVEN
+ 10.18653/v1/2022.acl-long.313
SRL4E – Semantic Role Labeling for Emotions: A Unified Evaluation Framework
@@ -4525,6 +4838,7 @@ in the Case of Unambiguous Gender
2022.acl-long.314
campagnano-etal-2022-srl4e
sapienzanlp/srl4e
+ 10.18653/v1/2022.acl-long.314
Context Matters: A Pragmatic Study of PLMs’ Negation Understanding
@@ -4536,6 +4850,7 @@ in the Case of Unambiguous Gender
gubelmann-handschuh-2022-context
GLUE
SuperGLUE
+ 10.18653/v1/2022.acl-long.315
Probing for Predicate Argument Structures in Pretrained Language Models
@@ -4546,6 +4861,7 @@ in the Case of Unambiguous Gender
2022.acl-long.316
conia-navigli-2022-probing
sapienzanlp/srl-pas-probing
+ 10.18653/v1/2022.acl-long.316
Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction
@@ -4560,6 +4876,7 @@ in the Case of Unambiguous Gender
2022.acl-long.317.software.zip
huang-etal-2022-multilingual-generative
pluslabnlp/x-gear
+ 10.18653/v1/2022.acl-long.317
Identifying Moments of Change from Longitudinal User Text
@@ -4573,6 +4890,7 @@ in the Case of Unambiguous Gender
Identifying changes in individuals’ behaviour and mood, as observed via content shared on online platforms, is increasingly gaining importance. Most research to-date on this topic focuses on either: (a) identifying individuals at risk or with a certain mental health condition given a batch of posts or (b) providing equivalent labels at the post level. A disadvantage of such work is the lack of a strong temporal component and the inability to make longitudinal assessments following an individual’s trajectory and allowing timely interventions. Here we define a new task, that of identifying moments of change in individuals on the basis of their shared content online. The changes we consider are sudden shifts in mood (switches) or gradual mood progression (escalations). We have created detailed guidelines for capturing moments of change and a corpus of 500 manually annotated user timelines (18.7K posts). We have developed a variety of baseline models drawing inspiration from related tasks and show that the best performance is obtained through context aware sequential modelling. We also introduce new metrics for capturing rare events in temporal windows.
2022.acl-long.318
tsakalidis-etal-2022-identifying
+ 10.18653/v1/2022.acl-long.318
Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System
@@ -4589,6 +4907,7 @@ in the Case of Unambiguous Gender
2022.acl-long.319.software.zip
su-etal-2022-multi
awslabs/pptod
+ 10.18653/v1/2022.acl-long.319
Graph Enhanced Contrastive Learning for Radiology Findings Summarization
@@ -4604,6 +4923,7 @@ in the Case of Unambiguous Gender
2022.acl-long.320.software.zip
hu-etal-2022-graph
jinpeng01/aig_cl
+ 10.18653/v1/2022.acl-long.320
Semi-Supervised Formality Style Transfer with Consistency Training
@@ -4616,6 +4936,7 @@ in the Case of Unambiguous Gender
liu-etal-2022-semi
aolius/semi-fst
GYAFC
+ 10.18653/v1/2022.acl-long.321
Cross-Lingual Ability of Multilingual Masked Language Models: A Study of Language Structure
@@ -4627,6 +4948,7 @@ in the Case of Unambiguous Gender
2022.acl-long.322
chai-etal-2022-cross
XNLI
+ 10.18653/v1/2022.acl-long.322
Rare and Zero-shot Word Sense Disambiguation using Z-Reweighting
@@ -4641,6 +4963,7 @@ in the Case of Unambiguous Gender
su-etal-2022-rare
suytingwan/wsd-z-reweighting
Word Sense Disambiguation: a Unified Evaluation Framework and Empirical Comparison
+ 10.18653/v1/2022.acl-long.323
Nibbling at the Hard Core of Word Sense Disambiguation
@@ -4654,6 +4977,7 @@ in the Case of Unambiguous Gender
maru-etal-2022-nibbling
sapienzanlp/wsd-hard-benchmark
Word Sense Disambiguation: a Unified Evaluation Framework and Empirical Comparison
+ 10.18653/v1/2022.acl-long.324
Large Scale Substitution-based Word Sense Induction
@@ -4667,6 +4991,7 @@ in the Case of Unambiguous Gender
eyal-etal-2022-large
CoarseWSD-20
WiC
+ 10.18653/v1/2022.acl-long.325
Can Synthetic Translations Improve Bitext Quality?
@@ -4677,6 +5002,7 @@ in the Case of Unambiguous Gender
2022.acl-long.326
briakou-carpuat-2022-synthetic
WikiMatrix
+ 10.18653/v1/2022.acl-long.326
Unsupervised Dependency Graph Network
@@ -4692,6 +5018,7 @@ in the Case of Unambiguous Gender
shen-etal-2022-unsupervised
yikangshen/udgn
Penn Treebank
+ 10.18653/v1/2022.acl-long.327
WikiDiverse: A Multimodal Entity Linking Dataset with Diversified Contextual Topics and Entity Types
@@ -4710,6 +5037,7 @@ in the Case of Unambiguous Gender
wang-etal-2022-wikidiverse
wangxw5/wikidiverse
ZESHEL
+ 10.18653/v1/2022.acl-long.328
Rewire-then-Probe: A Contrastive Recipe for Probing Biomedical Knowledge of Pre-trained Language Models
@@ -4728,6 +5056,7 @@ in the Case of Unambiguous Gender
BLUE
BioLAMA
LAMA
+ 10.18653/v1/2022.acl-long.329
Fine- and Coarse-Granularity Hybrid Self-Attention for Efficient BERT
@@ -4745,6 +5074,7 @@ in the Case of Unambiguous Gender
GLUE
QNLI
RACE
+ 10.18653/v1/2022.acl-long.330
Compression of Generative Pre-trained Language Models via Quantization
@@ -4764,6 +5094,7 @@ in the Case of Unambiguous Gender
PERSONA-CHAT
WikiText-103
WikiText-2
+ 10.18653/v1/2022.acl-long.331
Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration
@@ -4780,6 +5111,7 @@ in the Case of Unambiguous Gender
Conceptual Captions
Objects365
Places
+ 10.18653/v1/2022.acl-long.332
DialogVED: A Pre-trained Latent Variable Encoder-Decoder Model for Dialog Response Generation
@@ -4803,6 +5135,7 @@ in the Case of Unambiguous Gender
DSTC7 Task 2
DailyDialog
PERSONA-CHAT
+ 10.18653/v1/2022.acl-long.333
Contextual Fine-to-Coarse Distillation for Coarse-grained Response Selection in Open-Domain Conversations
@@ -4823,6 +5156,7 @@ in the Case of Unambiguous Gender
2022.acl-long.334
chen-etal-2022-contextual
lemuria-wchen/CFC
+ 10.18653/v1/2022.acl-long.334
Textomics: A Dataset for Genomics Data Summary Generation
@@ -4835,6 +5169,7 @@ in the Case of Unambiguous Gender
2022.acl-long.335.software.zip
wang-etal-2022-textomics
amos814/textomics
+ 10.18653/v1/2022.acl-long.335
A Contrastive Framework for Learning Sentence Representations from Pairwise and Triple-wise Perspective in Angular Space
@@ -4852,6 +5187,7 @@ in the Case of Unambiguous Gender
MRPC
SST
SentEval
+ 10.18653/v1/2022.acl-long.336
Packed Levitated Marker for Entity and Relation Extraction
@@ -4871,6 +5207,7 @@ in the Case of Unambiguous Gender
Few-NERD
OntoNotes 5.0
SciERC
+ 10.18653/v1/2022.acl-long.337
An Interpretable Neuro-Symbolic Reasoning Framework for Task-Oriented Dialogue Generation
@@ -4884,6 +5221,7 @@ in the Case of Unambiguous Gender
2022.acl-long.338.software.zip
yang-etal-2022-interpretable
shiquanyang/ns-dial
+ 10.18653/v1/2022.acl-long.338
Impact of Evaluation Methodologies on Code Summarization
@@ -4897,6 +5235,7 @@ in the Case of Unambiguous Gender
2022.acl-long.339
nie-etal-2022-impact
engineeringsoftware/time-segmented-evaluation
+ 10.18653/v1/2022.acl-long.339
KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering
@@ -4915,6 +5254,7 @@ in the Case of Unambiguous Gender
yu-etal-2022-kg
Natural Questions
TriviaQA
+ 10.18653/v1/2022.acl-long.340
Which side are you on? Insider-Outsider classification in conspiracy-theoretic social media
@@ -4928,6 +5268,7 @@ in the Case of Unambiguous Gender
2022.acl-long.341
2022.acl-long.341.software.zip
holur-etal-2022-side
+ 10.18653/v1/2022.acl-long.341
Learning From Failure: Data Capture in an Australian Aboriginal Community
@@ -4938,6 +5279,7 @@ in the Case of Unambiguous Gender
Most low resource language technology development is premised on the need to collect data for training statistical models. When we follow the typical process of recording and transcribing text for small Indigenous languages, we hit up against the so-called “transcription bottleneck.” Therefore it is worth exploring new ways of engaging with speakers which generate data while avoiding the transcription bottleneck. We have deployed a prototype app for speakers to use for confirming system guesses in an approach to transcription based on word spotting. However, in the process of testing the app we encountered many new problems for engagement with speakers. This paper presents a close-up study of the process of deploying data capture technology on the ground in an Australian Aboriginal community. We reflect on our interactions with participants and draw lessons that apply to anyone seeking to develop methods for language data collection in an Indigenous community.
2022.acl-long.342
le-ferrand-etal-2022-learning
+ 10.18653/v1/2022.acl-long.342
Deep Inductive Logic Reasoning for Multi-Hop Reading Comprehension
@@ -4949,6 +5291,7 @@ in the Case of Unambiguous Gender
wang-pan-2022-deep
MedHop
WikiHop
+ 10.18653/v1/2022.acl-long.343
CICERO: A Dataset for Contextualized Commonsense Inference in Dialogues
@@ -4966,6 +5309,7 @@ in the Case of Unambiguous Gender
DREAM
DailyDialog
MuTual
+ 10.18653/v1/2022.acl-long.344
A Comparative Study of Faithfulness Metrics for Model Interpretability Methods
@@ -4978,6 +5322,7 @@ in the Case of Unambiguous Gender
chan-etal-2022-comparative
IMDb Movie Reviews
SST
+ 10.18653/v1/2022.acl-long.345
SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer
@@ -5013,6 +5358,7 @@ in the Case of Unambiguous Gender
WSC
WiC
WinoGrande
+ 10.18653/v1/2022.acl-long.346
Pass off Fish Eyes for Pearls: Attacking Model Selection of Pre-trained Models
@@ -5034,6 +5380,7 @@ in the Case of Unambiguous Gender
OLID
QNLI
SST
+ 10.18653/v1/2022.acl-long.347
Educational Question Generation of Children Storybooks via Question Type Distribution Learning and Event-centric Summarization
@@ -5050,6 +5397,7 @@ in the Case of Unambiguous Gender
zhao-etal-2022-educational
zhaozj89/Educational-Question-Generation
FairytaleQA
+ 10.18653/v1/2022.acl-long.348
HeterMPC: A Heterogeneous Graph Neural Network for Response Generation in Multi-Party Conversations
@@ -5065,6 +5413,7 @@ in the Case of Unambiguous Gender
2022.acl-long.349
gu-etal-2022-hetermpc
lxchtan/hetermpc
+ 10.18653/v1/2022.acl-long.349
The patient is more dead than alive: exploring the current state of the multi-document summarisation of the biomedical literature
@@ -5076,6 +5425,7 @@ in the Case of Unambiguous Gender
Although multi-document summarisation (MDS) of the biomedical literature is a highly valuable task that has recently attracted substantial interest, evaluation of the quality of biomedical summaries lacks consistency and transparency. In this paper, we examine the summaries generated by two current models in order to understand the deficiencies of existing evaluation approaches in the context of the challenges that arise in the MDS task. Based on this analysis, we propose a new approach to human evaluation and identify several challenges that must be overcome to develop effective biomedical MDS systems.
2022.acl-long.350
otmakhova-etal-2022-patient
+ 10.18653/v1/2022.acl-long.350
A Multi-Document Coverage Reward for RELAXed Multi-Document Summarization
@@ -5089,6 +5439,7 @@ in the Case of Unambiguous Gender
jacob-parnell-rozetta/longformer_coverage
Multi-News
WCEP
+ 10.18653/v1/2022.acl-long.351
KNN-Contrastive Learning for Out-of-Domain Intent Classification
@@ -5099,6 +5450,7 @@ in the Case of Unambiguous Gender
The Out-of-Domain (OOD) intent classification is a basic and challenging task for dialogue systems. Previous methods commonly restrict the region (in feature space) of In-domain (IND) intent features to be compact or simply-connected implicitly, which assumes no OOD intents reside, to learn discriminative semantic features. Then the distribution of the IND intent features is often assumed to obey a hypothetical distribution (Gaussian mostly) and samples outside this distribution are regarded as OOD samples. In this paper, we start from the nature of OOD intent classification and explore its optimization objective. We further propose a simple yet effective method, named KNN-contrastive learning. Our approach utilizes k-nearest neighbors (KNN) of IND intents to learn discriminative semantic features that are more conducive to OOD detection.Notably, the density-based novelty detection algorithm is so well-grounded in the essence of our method that it is reasonable to use it as the OOD detection algorithm without making any requirements for the feature distribution.Extensive experiments on four public datasets show that our approach can not only enhance the OOD detection performance substantially but also improve the IND intent classification while requiring no restrictions on feature distribution.
2022.acl-long.352
zhou-etal-2022-knn
+ 10.18653/v1/2022.acl-long.352
A Neural Network Architecture for Program Understanding Inspired by Human Behaviors
@@ -5115,6 +5467,7 @@ in the Case of Unambiguous Gender
recklessronan/pgnn-ek
CodeSearchNet
CodeXGLUE
+ 10.18653/v1/2022.acl-long.353
FaVIQ: FAct Verification from Information-seeking Questions
@@ -5134,6 +5487,7 @@ in the Case of Unambiguous Gender
FM2
KILT
Natural Questions
+ 10.18653/v1/2022.acl-long.354
Simulating Bandit Learning from User Feedback for Extractive Question Answering
@@ -5152,6 +5506,7 @@ in the Case of Unambiguous Gender
SQuAD
SearchQA
TriviaQA
+ 10.18653/v1/2022.acl-long.355
Beyond Goldfish Memory: Long-Term Open-Domain Conversation
@@ -5163,6 +5518,7 @@ in the Case of Unambiguous Gender
2022.acl-long.356
xu-etal-2022-beyond
PERSONA-CHAT
+ 10.18653/v1/2022.acl-long.356
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension
@@ -5180,6 +5536,7 @@ in the Case of Unambiguous Gender
CLEVR
COCO
RefCOCO
+ 10.18653/v1/2022.acl-long.357
Dynamic Prefix-Tuning for Generative Template-based Event Extraction
@@ -5191,6 +5548,7 @@ in the Case of Unambiguous Gender
We consider event extraction in a generative manner with template-based conditional generation.Although there is a rising trend of casting the task of event extraction as a sequence generation problem with prompts, these generation-based methods have two significant challenges, including using suboptimal prompts and static event type information.In this paper, we propose a generative template-based event extraction method with dynamic prefix (GTEE-DynPref) by integrating context information with type-specific prefixes to learn a context-specific prefix for each context.Experimental results show that our model achieves competitive results with the state-of-the-art classification-based model OneIE on ACE 2005 and achieves the best performances on ERE.Additionally, our model is proven to be portable to new types of events effectively.
2022.acl-long.358
liu-etal-2022-dynamic
+ 10.18653/v1/2022.acl-long.358
E-LANG: Energy-Based Joint Inferencing of Super and Swift Language Models
@@ -5204,6 +5562,7 @@ in the Case of Unambiguous Gender
GLUE
QNLI
SuperGLUE
+ 10.18653/v1/2022.acl-long.359
PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization
@@ -5223,6 +5582,7 @@ in the Case of Unambiguous Gender
WikiSum
arXiv
arXiv Summarization Dataset
+ 10.18653/v1/2022.acl-long.360
Dynamic Global Memory for Document-level Argument Extraction
@@ -5235,6 +5595,7 @@ in the Case of Unambiguous Gender
2022.acl-long.361.software.zip
du-etal-2022-dynamic
xinyadu/memory_docie
+ 10.18653/v1/2022.acl-long.361
Measuring the Impact of (Psycho-)Linguistic and Readability Features and Their Spill Over Effects on the Prediction of Eye Movement Patterns
@@ -5246,6 +5607,7 @@ in the Case of Unambiguous Gender
There is a growing interest in the combined use of NLP and machine learning methods to predict gaze patterns during naturalistic reading. While promising results have been obtained through the use of transformer-based language models, little work has been undertaken to relate the performance of such models to general text characteristics. In this paper we report on experiments with two eye-tracking corpora of naturalistic reading and two language models (BERT and GPT-2). In all experiments, we test effects of a broad spectrum of features for predicting human reading behavior that fall into five categories (syntactic complexity, lexical richness, register-based multiword combinations, readability and psycholinguistic word properties). Our experiments show that both the features included and the architecture of the transformer-based language models play a role in predicting multiple eye-tracking measures during naturalistic reading. We also report the results of experiments aimed at determining the relative importance of features from different groups using SP-LIME.
2022.acl-long.362
wiechmann-kerz-2022-measuring
+ 10.18653/v1/2022.acl-long.362
Alternative Input Signals Ease Transfer in Multilingual Machine Translation
@@ -5260,6 +5622,7 @@ in the Case of Unambiguous Gender
Recent work in multilingual machine translation (MMT) has focused on the potential of positive transfer between languages, particularly cases where higher-resourced languages can benefit lower-resourced ones. While training an MMT model, the supervision signals learned from one language pair can be transferred to the other via the tokens shared by multiple source languages. However, the transfer is inhibited when the token overlap among source languages is small, which manifests naturally when languages use different writing systems. In this paper, we tackle inhibited transfer by augmenting the training data with alternative signals that unify different writing systems, such as phonetic, romanized, and transliterated input. We test these signals on Indic and Turkic languages, two language families where the writing systems differ but languages still share common features. Our results indicate that a straightforward multi-source self-ensemble – training a model on a mixture of various signals and ensembling the outputs of the same model fed with different signals during inference, outperforms strong ensemble baselines by 1.3 BLEU points on both language families. Further, we find that incorporating alternative inputs via self-ensemble can be particularly effective when training set is small, leading to +5 BLEU when only 5% of the total training data is accessible. Finally, our analysis demonstrates that including alternative signals yields more consistency and translates named entities more accurately, which is crucial for increased factuality of automated systems.
2022.acl-long.363
sun-etal-2022-alternative
+ 10.18653/v1/2022.acl-long.363
Phone-ing it in: Towards Flexible Multi-Modal Language Model Training by Phonetic Representations of Data
@@ -5271,6 +5634,7 @@ in the Case of Unambiguous Gender
leong-whitenack-2022-phone
sil-ai/phone-it-in
MasakhaNER
+ 10.18653/v1/2022.acl-long.364
Noisy Channel Language Model Prompting for Few-Shot Text Classification
@@ -5285,6 +5649,7 @@ in the Case of Unambiguous Gender
shmsw25/Channel-LM-Prompting
AG News
SST
+ 10.18653/v1/2022.acl-long.365
Multilingual unsupervised sequence segmentation transfers to extremely low-resource languages
@@ -5297,6 +5662,7 @@ in the Case of Unambiguous Gender
2022.acl-long.366
downey-etal-2022-multilingual
cmdowney88/xlslm
+ 10.18653/v1/2022.acl-long.366
KinyaBERT: a Morphology-aware Kinyarwanda Language Model
@@ -5310,6 +5676,7 @@ in the Case of Unambiguous Gender
anzeyimana/kinyabert-acl2022
GLUE
QNLI
+ 10.18653/v1/2022.acl-long.367
On the Calibration of Pre-trained Language Models using Mixup Guided by Area Under the Margin and Saliency
@@ -5321,6 +5688,7 @@ in the Case of Unambiguous Gender
park-caragea-2022-calibration
SNLI
SWAG
+ 10.18653/v1/2022.acl-long.368
IMPLI: Investigating NLI Models’ Performance on Figurative Language
@@ -5332,6 +5700,7 @@ in the Case of Unambiguous Gender
2022.acl-long.369
stowe-etal-2022-impli
ukplab/acl2022-impli
+ 10.18653/v1/2022.acl-long.369
QAConv: Question Answering on Informative Conversations
@@ -5353,6 +5722,7 @@ in the Case of Unambiguous Gender
Molweni
QuAC
SQuAD
+ 10.18653/v1/2022.acl-long.370
Prix-LM: Pretraining for Multilingual Knowledge Base Construction
@@ -5369,6 +5739,7 @@ in the Case of Unambiguous Gender
DBpedia
LAMA
XL-BEL
+ 10.18653/v1/2022.acl-long.371
Semantic Composition with PSHRG for Derivation Tree Reconstruction from Graph-Based Meaning Representations
@@ -5379,6 +5750,7 @@ in the Case of Unambiguous Gender
We introduce a data-driven approach to generating derivation trees from meaning representation graphs with probabilistic synchronous hyperedge replacement grammar (PSHRG). SHRG has been used to produce meaning representation graphs from texts and syntax trees, but little is known about its viability on the reverse. In particular, we experiment on Dependency Minimal Recursion Semantics (DMRS) and adapt PSHRG as a formalism that approximates the semantic composition of DMRS graphs and simultaneously recovers the derivations that license the DMRS graphs. Consistent results are obtained as evaluated on a collection of annotated corpora. This work reveals the ability of PSHRG in formalizing a syntax–semantics interface, modelling compositional graph-to-tree translations, and channelling explainability to surface realization.
2022.acl-long.372
lo-etal-2022-semantic
+ 10.18653/v1/2022.acl-long.372
HOLM: Hallucinating Objects with Language Models for Referring Expression Recognition in Partially-Observed Scenes
@@ -5390,6 +5762,7 @@ in the Case of Unambiguous Gender
2022.acl-long.373
cirik-etal-2022-holm
Visual Genome
+ 10.18653/v1/2022.acl-long.373
Multi Task Learning For Zero Shot Performance Prediction of Multilingual Models
@@ -5409,6 +5782,7 @@ in the Case of Unambiguous Gender
XCOPA
XNLI
XQuAD
+ 10.18653/v1/2022.acl-long.374
\infty-former: Infinite Memory Transformer
@@ -5423,6 +5797,7 @@ in the Case of Unambiguous Gender
PG-19
WikiText-103
WikiText-2
+ 10.18653/v1/2022.acl-long.375
Systematic Inequalities in Language Technology Performance across the World’s Languages
@@ -5435,6 +5810,7 @@ in the Case of Unambiguous Gender
2022.acl-long.376.software.zip
blasi-etal-2022-systematic
neubig/globalutility
+ 10.18653/v1/2022.acl-long.376
CaMEL: Case Marker Extraction without Labels
@@ -5448,6 +5824,7 @@ in the Case of Unambiguous Gender
2022.acl-long.377.software.zip
weissweiler-etal-2022-camel
leonieweissweiler/camel
+ 10.18653/v1/2022.acl-long.377
Improving Generalizability in Implicitly Abusive Language Detection with Concept Activation Vectors
@@ -5460,6 +5837,7 @@ in the Case of Unambiguous Gender
2022.acl-long.378.software.zip
nejadgholi-etal-2022-improving
isarnejad/tcav-for-text-classifiers
+ 10.18653/v1/2022.acl-long.378
Reports of personal experiences and stories in argumentation: datasets and analysis
@@ -5469,6 +5847,7 @@ in the Case of Unambiguous Gender
Reports of personal experiences or stories can play a crucial role in argumentation, as they represent an immediate and (often) relatable way to back up one’s position with respect to a given topic. They are easy to understand and increase empathy: this makes them powerful in argumentation. The impact of personal reports and stories in argumentation has been studied in the Social Sciences, but it is still largely underexplored in NLP. Our work is the first step towards filling this gap: our goal is to develop robust classifiers to identify documents containing personal experiences and reports. The main challenge is the scarcity of annotated data: our solution is to leverage existing annotations to be able to scale-up the analysis. Our contribution is two-fold. First, we conduct a set of in-domain and cross-domain experiments involving three datasets (two from Argument Mining, one from the Social Sciences), modeling architectures, training setups and fine-tuning options tailored to the involved domains. We show that despite the differences among datasets and annotations, robust cross-domain classification is possible. Second, we employ linear regression for performance mining, identifying performance trends both for overall classification performance and individual classifier predictions.
2022.acl-long.379
falk-lapesa-2022-reports
+ 10.18653/v1/2022.acl-long.379
Non-neural Models Matter: a Re-evaluation of Neural Referring Expression Generation Systems
@@ -5481,6 +5860,7 @@ in the Case of Unambiguous Gender
2022.acl-long.380.software.zip
same-etal-2022-non
WebNLG
+ 10.18653/v1/2022.acl-long.380
Bridging the Generalization Gap in Text-to-SQL Parsing with Schema Expansion
@@ -5492,6 +5872,7 @@ in the Case of Unambiguous Gender
Text-to-SQL parsers map natural language questions to programs that are executable over tables to generate answers, and are typically evaluated on large-scale datasets like Spider (Yu et al., 2018). We argue that existing benchmarks fail to capture a certain out-of-domain generalization problem that is of significant practical importance: matching domain specific phrases to composite operation over columns. To study this problem, we first propose a synthetic dataset along with a re-purposed train/test split of the Squall dataset (Shi et al., 2020) as new benchmarks to quantify domain generalization over column operations, and find existing state-of-the-art parsers struggle in these benchmarks. We propose to address this problem by incorporating prior domain knowledge by preprocessing table schemas, and design a method that consists of two components: schema expansion and schema pruning. This method can be easily applied to multiple existing base parsers, and we show that it significantly outperforms baseline parsers on this domain generalization problem, boosting the underlying parsers’ overall performance by up to 13.8% relative accuracy gain (5.1% absolute) on the new Squall data split.
2022.acl-long.381
zhao-etal-2022-bridging
+ 10.18653/v1/2022.acl-long.381
Predicate-Argument Based Bi-Encoder for Paraphrase Identification
@@ -5505,6 +5886,7 @@ in the Case of Unambiguous Gender
peng-etal-2022-predicate
GLUE
PIT
+ 10.18653/v1/2022.acl-long.382
MINER: Improving Out-of-Vocabulary Named Entity Recognition from an Information Theoretic Perspective
@@ -5523,6 +5905,7 @@ in the Case of Unambiguous Gender
wang-etal-2022-miner
beyonderxx/miner
WNUT 2017
+ 10.18653/v1/2022.acl-long.383
Leveraging Wikipedia article evolution for promotional tone detection
@@ -5533,6 +5916,7 @@ in the Case of Unambiguous Gender
2022.acl-long.384
de-kock-vlachos-2022-leveraging
christinedekock11/wiki-evolve
+ 10.18653/v1/2022.acl-long.384
From text to talk: Harnessing conversational corpora for humane and diversity-aware language technology
@@ -5542,6 +5926,7 @@ in the Case of Unambiguous Gender
Informal social interaction is the primordial home of human language. Linguistically diverse conversational corpora are an important and largely untapped resource for computational linguistics and language technology. Through the efforts of a worldwide language documentation movement, such corpora are increasingly becoming available. We show how interactional data from 63 languages (26 families) harbours insights about turn-taking, timing, sequential structure and social action, with implications for language technology, natural language understanding, and the design of conversational interfaces. Harnessing linguistically diverse conversational corpora will provide the empirical foundations for flexible, localizable, humane language technologies of the future.
2022.acl-long.385
dingemanse-liesenfeld-2022-text
+ 10.18653/v1/2022.acl-long.385
Flooding-X: Improving BERT’s Resistance to Adversarial Attacks via Loss-Restricted Fine-Tuning
@@ -5562,6 +5947,7 @@ in the Case of Unambiguous Gender
AG News
IMDb Movie Reviews
SST
+ 10.18653/v1/2022.acl-long.386
RoMe: A Robust Metric for Evaluating Natural Language Generation
@@ -5577,6 +5963,7 @@ in the Case of Unambiguous Gender
rashad101/rome
CoLA
KELM
+ 10.18653/v1/2022.acl-long.387
Finding Structural Knowledge in Multimodal-BERT
@@ -5590,6 +5977,7 @@ in the Case of Unambiguous Gender
vsjmilewski/multimodal-probes
Flickr30k
Visual Genome
+ 10.18653/v1/2022.acl-long.388
Fully Hyperbolic Neural Networks
@@ -5607,6 +5995,7 @@ in the Case of Unambiguous Gender
chen-etal-2022-fully
chenweize1998/fully-hyperbolic-nn
FB15k-237
+ 10.18653/v1/2022.acl-long.389
Neural Machine Translation with Phrase-Level Universal Visual Representations
@@ -5617,6 +6006,7 @@ in the Case of Unambiguous Gender
2022.acl-long.390
fang-feng-2022-neural
ictnlp/pluvr
+ 10.18653/v1/2022.acl-long.390
M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database
@@ -5639,6 +6029,7 @@ in the Case of Unambiguous Gender
EmotionLines
IEMOCAP
MELD
+ 10.18653/v1/2022.acl-long.391
Few-shot Named Entity Recognition with Self-describing Networks
@@ -5654,6 +6045,7 @@ in the Case of Unambiguous Gender
chen-etal-2022-shot
chen700564/sdnet
WNUT 2017
+ 10.18653/v1/2022.acl-long.392
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
@@ -5681,6 +6073,7 @@ in the Case of Unambiguous Gender
MuST-C
VoxCeleb1
WHAM!
+ 10.18653/v1/2022.acl-long.393
Human Evaluation and Correlation with Automatic Metrics in Consultation Note Generation
@@ -5697,6 +6090,7 @@ in the Case of Unambiguous Gender
2022.acl-long.394
moramarco-etal-2022-human
CNN/Daily Mail
+ 10.18653/v1/2022.acl-long.394
Unified Structure Generation for Universal Information Extraction
@@ -5714,6 +6108,7 @@ in the Case of Unambiguous Gender
lu-etal-2022-unified
CoNLL-2003
SciERC
+ 10.18653/v1/2022.acl-long.395
Subgraph Retrieval Enhanced Model for Multi-hop Knowledge Base Question Answering
@@ -5729,6 +6124,7 @@ in the Case of Unambiguous Gender
2022.acl-long.396
zhang-etal-2022-subgraph
ruckbreasoning/subgraphretrievalkbqa
+ 10.18653/v1/2022.acl-long.396
Pre-training to Match for Unified Low-shot Relation Extraction
@@ -5743,6 +6139,7 @@ in the Case of Unambiguous Gender
liu-etal-2022-pre
fc-liu/mcmn
FewRel
+ 10.18653/v1/2022.acl-long.397
Can Prompt Probe Pretrained Language Models? Understanding the Invisible Risks from a Causal View
@@ -5760,6 +6157,7 @@ in the Case of Unambiguous Gender
BioLAMA
LAMA
WebText
+ 10.18653/v1/2022.acl-long.398
Evaluating Extreme Hierarchical Multi-label Classification
@@ -5769,6 +6167,7 @@ in the Case of Unambiguous Gender
Several natural language processing (NLP) tasks are defined as a classification problem in its most complex form: Multi-label Hierarchical Extreme classification, in which items may be associated with multiple classes from a set of thousands of possible classes organized in a hierarchy and with a highly unbalanced distribution both in terms of class frequency and the number of labels per item. We analyze the state of the art of evaluation metrics based on a set of formal properties and we define an information theoretic based metric inspired by the Information Contrast Model (ICM). Experiments on synthetic data and a case study on real data show the suitability of the ICM for such scenarios.
2022.acl-long.399
amigo-delgado-2022-evaluating
+ 10.18653/v1/2022.acl-long.399
What does the sea say to the shore? A BERT based DST style approach for speaker to dialogue attribution in novels
@@ -5779,6 +6178,7 @@ in the Case of Unambiguous Gender
We present a complete pipeline to extract characters in a novel and link them to their direct-speech utterances. Our model is divided into three independent components: extracting direct-speech, compiling a list of characters, and attributing those characters to their utterances. Although we find that existing systems can perform the first two tasks accurately, attributing characters to direct speech is a challenging problem due to the narrator’s lack of explicit character mentions, and the frequent use of nominal and pronominal coreference when such explicit mentions are made. We adapt the progress made on Dialogue State Tracking to tackle a new problem: attributing speakers to dialogues. This is the first application of deep learning to speaker attribution, and it shows that is possible to overcome the need for the hand-crafted features and rules used in the past. Our full pipeline improves the performance of state-of-the-art models by a relative 50% in F1-score.
2022.acl-long.400
cuesta-lazaro-etal-2022-sea
+ 10.18653/v1/2022.acl-long.400
Measuring Fairness of Text Classifiers via Prediction Sensitivity
@@ -5792,6 +6192,7 @@ in the Case of Unambiguous Gender
With the rapid growth in language processing applications, fairness has emerged as an important consideration in data-driven solutions. Although various fairness definitions have been explored in the recent literature, there is lack of consensus on which metrics most accurately reflect the fairness of a system. In this work, we propose a new formulation – accumulated prediction sensitivity, which measures fairness in machine learning models based on the model’s prediction sensitivity to perturbations in input features. The metric attempts to quantify the extent to which a single prediction depends on a protected attribute, where the protected attribute encodes the membership status of an individual in a protected group. We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness. It also correlates well with humans’ perception of fairness. We conduct experiments on two text classification datasets – Jigsaw Toxicity, and Bias in Bios, and evaluate the correlations between metrics and manual annotations on whether the model produced a fair outcome. We observe that the proposed fairness metric based on prediction sensitivity is statistically significantly more correlated with human annotation than the existing counterfactual fairness metric.
2022.acl-long.401
krishna-etal-2022-measuring
+ 10.18653/v1/2022.acl-long.401
RotateQVS: Representing Temporal Information as Rotations in Quaternion Vector Space for Temporal Knowledge Graph Completion
@@ -5806,6 +6207,7 @@ in the Case of Unambiguous Gender
chen-etal-2022-rotateqvs
ICEWS
YAGO
+ 10.18653/v1/2022.acl-long.402
Feeding What You Need by Understanding What You Learned
@@ -5823,6 +6225,7 @@ in the Case of Unambiguous Gender
HotpotQA
RACE
SQuAD
+ 10.18653/v1/2022.acl-long.403
Probing Simile Knowledge from Pre-trained Language Models
@@ -5842,6 +6245,7 @@ in the Case of Unambiguous Gender
chen-etal-2022-probing
nairoj/Probing-Simile-from-PLM
BookCorpus
+ 10.18653/v1/2022.acl-long.404
An Effective and Efficient Entity Alignment Decoding Algorithm via Third-Order Tensor Isomorphism
@@ -5858,6 +6262,7 @@ in the Case of Unambiguous Gender
2022.acl-long.405
2022.acl-long.405.software.zip
mao-etal-2022-effective
+ 10.18653/v1/2022.acl-long.405
Entailment Graph Learning with Textual Entailment and Soft Transitivity
@@ -5870,6 +6275,7 @@ in the Case of Unambiguous Gender
chen-etal-2022-entailment
zacharychenpk/egt2
FIGER
+ 10.18653/v1/2022.acl-long.406
Logic Traps in Evaluating Attribution Scores
@@ -5886,6 +6292,7 @@ in the Case of Unambiguous Gender
GLUE
RACE
SST
+ 10.18653/v1/2022.acl-long.407
Continual Pre-training of Language Models for Math Problem Understanding with Syntax-Aware Memory Network
@@ -5900,6 +6307,7 @@ in the Case of Unambiguous Gender
2022.acl-long.408
gong-etal-2022-continual
MATH
+ 10.18653/v1/2022.acl-long.408
Multitasking Framework for Unsupervised Simple Definition Generation
@@ -5914,6 +6322,7 @@ in the Case of Unambiguous Gender
2022.acl-long.409.software.zip
kong-etal-2022-multitasking
blcuicall/simpdefiner
+ 10.18653/v1/2022.acl-long.409
Learning to Reason Deductively: Math Word Problem Solving as Complex Relation Extraction
@@ -5929,6 +6338,7 @@ in the Case of Unambiguous Gender
Math23K
MathQA
SVAMP
+ 10.18653/v1/2022.acl-long.410
When did you become so smart, oh wise one?! Sarcasm Explanation in Multi-modal Multi-party Dialogues
@@ -5943,6 +6353,7 @@ in the Case of Unambiguous Gender
kumar-etal-2022-become
lcs2-iiitd/maf
WITS
+ 10.18653/v1/2022.acl-long.411
Toward Interpretable Semantic Textual Similarity via Optimal Transport-based Contrastive Sentence Learning
@@ -5957,6 +6368,7 @@ in the Case of Unambiguous Gender
lee-etal-2022-toward
sh0416/clrcmd
SNLI
+ 10.18653/v1/2022.acl-long.412
Pre-training and Fine-tuning Neural Topic Model: A Simple yet Effective Approach to Incorporating External Knowledge
@@ -5972,6 +6384,7 @@ in the Case of Unambiguous Gender
zhang-etal-2022-pre
OpenWebText
WebText
+ 10.18653/v1/2022.acl-long.413
Multi-View Document Representation Learning for Open-Domain Dense Retrieval
@@ -5987,6 +6400,7 @@ in the Case of Unambiguous Gender
Natural Questions
SQuAD
TriviaQA
+ 10.18653/v1/2022.acl-long.414
Graph Pre-training for AMR Parsing and Generation
@@ -6004,6 +6418,7 @@ in the Case of Unambiguous Gender
LDC2020T02
New3
The Little Prince
+ 10.18653/v1/2022.acl-long.415
Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning Skills
@@ -6018,6 +6433,7 @@ in the Case of Unambiguous Gender
oriyor/turning_tables
DROP
IIRC
+ 10.18653/v1/2022.acl-long.416
RNG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering
@@ -6032,6 +6448,7 @@ in the Case of Unambiguous Gender
2022.acl-long.417.software.zip
ye-etal-2022-rng
salesforce/rng-kbqa
+ 10.18653/v1/2022.acl-long.417
Rethinking Self-Supervision Objectives for Generalizable Coherence Modeling
@@ -6043,6 +6460,7 @@ in the Case of Unambiguous Gender
2022.acl-long.418
2022.acl-long.418.software.zip
jwalapuram-etal-2022-rethinking
+ 10.18653/v1/2022.acl-long.418
Just Rank: Rethinking Evaluation with Word and Sentence Similarities
@@ -6059,6 +6477,7 @@ in the Case of Unambiguous Gender
SST
SciCite
SentEval
+ 10.18653/v1/2022.acl-long.419
MarkupLM: Pre-training of Text and Markup Language for Visually Rich Document Understanding
@@ -6070,6 +6489,7 @@ in the Case of Unambiguous Gender
Multimodal pre-training with text, layout, and image has made significant progress for Visually Rich Document Understanding (VRDU), especially the fixed-layout documents such as scanned document images. While, there are still a large number of digital documents where the layout information is not fixed and needs to be interactively and dynamically rendered for visualization, making existing layout-based pre-training approaches not easy to apply. In this paper, we propose MarkupLM for document understanding tasks with markup languages as the backbone, such as HTML/XML-based documents, where text and markup information is jointly pre-trained. Experiment results show that the pre-trained MarkupLM significantly outperforms the existing strong baseline models on several document understanding tasks. The pre-trained model and code will be publicly available at https://aka.ms/markuplm.
2022.acl-long.420
li-etal-2022-markuplm
+ 10.18653/v1/2022.acl-long.420
CLIP Models are Few-Shot Learners: Empirical Studies on VQA and Visual Entailment
@@ -6085,6 +6505,7 @@ in the Case of Unambiguous Gender
song-etal-2022-clip
SNLI-VE
Visual Question Answering
+ 10.18653/v1/2022.acl-long.421
KQA Pro: A Dataset with Explicit Compositional Programs for Complex Question Answering over Knowledge Base
@@ -6109,6 +6530,7 @@ in the Case of Unambiguous Gender
ComplexWebQuestions
MetaQA
WebQuestions
+ 10.18653/v1/2022.acl-long.422
Debiased Contrastive Learning of Unsupervised Sentence Representations
@@ -6121,6 +6543,7 @@ in the Case of Unambiguous Gender
2022.acl-long.423
zhou-etal-2022-debiased
rucaibox/dclr
+ 10.18653/v1/2022.acl-long.423
MSP: Multi-Stage Prompting for Making Pre-trained Language Models Better Translators
@@ -6133,6 +6556,7 @@ in the Case of Unambiguous Gender
2022.acl-long.424
tan-etal-2022-msp
thunlp-mt/plm4mt
+ 10.18653/v1/2022.acl-long.424
SalesBot: Transitioning from Chit-Chat to Task-Oriented Dialogues
@@ -6148,6 +6572,7 @@ in the Case of Unambiguous Gender
CommonsenseQA
SGD
SWAG
+ 10.18653/v1/2022.acl-long.425
UCTopic: Unsupervised Contrastive Learning for Phrase Representations and Topic Mining
@@ -6165,6 +6590,7 @@ in the Case of Unambiguous Gender
KP20k
KPTimes
WNUT 2017
+ 10.18653/v1/2022.acl-long.426
XLM-E: Cross-lingual Language Model Pre-training via ELECTRA
@@ -6191,6 +6617,7 @@ in the Case of Unambiguous Gender
XNLI
XQuAD
XTREME
+ 10.18653/v1/2022.acl-long.427
Nested Named Entity Recognition as Latent Lexicalized Constituency Parsing
@@ -6203,6 +6630,7 @@ in the Case of Unambiguous Gender
lou-etal-2022-nested
louchao98/nner_as_parsing
NNE
+ 10.18653/v1/2022.acl-long.428
Can Explanations Be Useful for Calibrating Black Box Models?
@@ -6218,6 +6646,7 @@ in the Case of Unambiguous Gender
MRPC
QNLI
SQuAD
+ 10.18653/v1/2022.acl-long.429
OIE@OIA: an Adaptable and Efficient Open Information Extraction Framework
@@ -6229,6 +6658,7 @@ in the Case of Unambiguous Gender
Different Open Information Extraction (OIE) tasks require different types of information, so the OIE field requires strong adaptability of OIE algorithms to meet different task requirements. This paper discusses the adaptability problem in existing OIE systems and designs a new adaptable and efficient OIE system - OIE@OIA as a solution. OIE@OIA follows the methodology of Open Information eXpression (OIX): parsing a sentence to an Open Information Annotation (OIA) Graph and then adapting the OIA graph to different OIE tasks with simple rules. As the core of our OIE@OIA system, we implement an end-to-end OIA generator by annotating a dataset (we make it open available) and designing an efficient learning algorithm for the complex OIA graph. We easily adapt the OIE@OIA system to accomplish three popular OIE tasks. The experimental show that our OIE@OIA achieves new SOTA performances on these tasks, showing the great adaptability of our OIE@OIA system. Furthermore, compared to other end-to-end OIE baselines that need millions of samples for training, our OIE@OIA needs much fewer training samples (12K), showing a significant advantage in terms of efficiency.
2022.acl-long.430
wang-etal-2022-oie
+ 10.18653/v1/2022.acl-long.430
ReACC: A Retrieval-Augmented Code Completion Framework
@@ -6245,6 +6675,7 @@ in the Case of Unambiguous Gender
celbree/reacc
CodeSearchNet
CodeXGLUE
+ 10.18653/v1/2022.acl-long.431
Does Recommend-Revise Produce Reliable Annotations? An Analysis on Missing Instances in DocRED
@@ -6260,6 +6691,7 @@ in the Case of Unambiguous Gender
huang-etal-2022-recommend
andrewzhe/revisit-docred
DocRED
+ 10.18653/v1/2022.acl-long.432
UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning
@@ -6278,6 +6710,7 @@ in the Case of Unambiguous Gender
morningmoni/unipelt
GLUE
QNLI
+ 10.18653/v1/2022.acl-long.433
An Empirical Study of Memorization in NLP
@@ -6290,6 +6723,7 @@ in the Case of Unambiguous Gender
xszheng2020/memorization
CIFAR-10
SST
+ 10.18653/v1/2022.acl-long.434
AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages
@@ -6317,6 +6751,7 @@ in the Case of Unambiguous Gender
AmericasNLP/americasnlp2021
SNLI
SuperGLUE
+ 10.18653/v1/2022.acl-long.435
Towards Learning (Dis)-Similarity of Source Code from Program Contrasts
@@ -6331,6 +6766,7 @@ in the Case of Unambiguous Gender
2022.acl-long.436
ding-etal-2022-towards
CodeXGLUE
+ 10.18653/v1/2022.acl-long.436
Guided Attention Multimodal Multitask Financial Forecasting with Inter-Company Relationships and Global and Local News
@@ -6341,6 +6777,7 @@ in the Case of Unambiguous Gender
2022.acl-long.437
2022.acl-long.437.software.zip
ang-lim-2022-guided
+ 10.18653/v1/2022.acl-long.437
On Vision Features in Multimodal Machine Translation
@@ -6356,6 +6793,7 @@ in the Case of Unambiguous Gender
2022.acl-long.438
li-etal-2022-vision
libeineu/fairseq_mmt
+ 10.18653/v1/2022.acl-long.438
CONTaiNER: Few-Shot Named Entity Recognition via Contrastive Learning
@@ -6370,6 +6808,7 @@ in the Case of Unambiguous Gender
psunlpgroup/container
Few-NERD
WNUT 2017
+ 10.18653/v1/2022.acl-long.439
Cree Corpus: A Collection of nêhiyawêwin Resources
@@ -6382,6 +6821,7 @@ in the Case of Unambiguous Gender
Plains Cree (nêhiyawêwin) is an Indigenous language that is spoken in Canada and the USA. It is the most widely spoken dialect of Cree and a morphologically complex language that is polysynthetic, highly inflective, and agglutinative. It is an extremely low resource language, with no existing corpus that is both available and prepared for supporting the development of language technologies. To support nêhiyawêwin revitalization and preservation, we developed a corpus covering diverse genres, time periods, and texts for a variety of intended audiences. The data has been verified and cleaned; it is ready for use in developing language technologies for nêhiyawêwin. The corpus includes the corresponding English phrases or audio files where available. We demonstrate the utility of the corpus through its community use and its use to build language technologies that can provide the types of support that community members have expressed are desirable. The corpus is available for public use.
2022.acl-long.440
teodorescu-etal-2022-cree
+ 10.18653/v1/2022.acl-long.440
Learning to Rank Visual Stories From Human Ranking Data
@@ -6399,6 +6839,7 @@ in the Case of Unambiguous Gender
academiasinicanlplab/vhed
VIST
VIST-Edit
+ 10.18653/v1/2022.acl-long.441
Universal Conditional Masked Language Pre-training for Neural Machine Translation
@@ -6412,6 +6853,7 @@ in the Case of Unambiguous Gender
2022.acl-long.442
li-etal-2022-universal
huawei-noah/Pretrained-Language-Model
+ 10.18653/v1/2022.acl-long.442
CARETS: A Consistency And Robustness Evaluative Test Suite for VQA
@@ -6427,6 +6869,7 @@ in the Case of Unambiguous Gender
GQA
Visual Genome
Visual Question Answering
+ 10.18653/v1/2022.acl-long.443
Phrase-aware Unsupervised Constituency Parsing
@@ -6439,6 +6882,7 @@ in the Case of Unambiguous Gender
Recent studies have achieved inspiring success in unsupervised grammar induction using masked language modeling (MLM) as the proxy task. Despite their high accuracy in identifying low-level structures, prior arts tend to struggle in capturing high-level structures like clauses, since the MLM task usually only requires information from local context. In this work, we revisit LM-based constituency parsing from a phrase-centered perspective. Inspired by the natural reading process of human, we propose to regularize the parser with phrases extracted by an unsupervised phrase tagger to help the LM model quickly manage low-level structures. For a better understanding of high-level structures, we propose a phrase-guided masking strategy for LM to emphasize more on reconstructing non-phrase words. We show that the initial phrase regularization serves as an effective bootstrap, and phrase-guided masking improves the identification of high-level structures. Experiments on the public benchmark with two different backbone models demonstrate the effectiveness and generality of our method.
2022.acl-long.444
gu-etal-2022-phrase
+ 10.18653/v1/2022.acl-long.444
Achieving Reliable Human Assessment of Open-Domain Dialogue Systems
@@ -6454,6 +6898,7 @@ in the Case of Unambiguous Gender
tianboji/dialogue-eval
ConvAI2
FED
+ 10.18653/v1/2022.acl-long.445
Updated Headline Generation: Creating Updated Summaries for Evolving News Stories
@@ -6464,6 +6909,7 @@ in the Case of Unambiguous Gender
We propose the task of updated headline generation, in which a system generates a headline for an updated article, considering both the previous article and headline. The system must identify the novel information in the article update, and modify the existing headline accordingly. We create data for this task using the NewsEdits corpus by automatically identifying contiguous article versions that are likely to require a substantive headline update. We find that models conditioned on the prior headline and body revisions produce headlines judged by humans to be as factual as gold headlines while making fewer unnecessary edits compared to a standard headline generation model. Our experiments establish benchmarks for this new contextual summarization task.
2022.acl-long.446
panthaplackel-etal-2022-updated
+ 10.18653/v1/2022.acl-long.446
SaFeRDialogues: Taking Feedback Gracefully after Conversational Safety Failures
@@ -6474,6 +6920,7 @@ in the Case of Unambiguous Gender
Current open-domain conversational models can easily be made to talk in inadequate ways. Online learning from conversational feedback given by the conversation partner is a promising avenue for a model to improve and adapt, so as to generate fewer of these safety failures. However, current state-of-the-art models tend to react to feedback with defensive or oblivious responses. This makes for an unpleasant experience and may discourage conversation partners from giving feedback in the future. This work proposes SaFeRDialogues, a task and dataset of graceful responses to conversational feedback about safety failures.We collect a dataset of 8k dialogues demonstrating safety failures, feedback signaling them, and a response acknowledging the feedback. We show how fine-tuning on this dataset results in conversations that human raters deem considerably more likely to lead to a civil conversation, without sacrificing engagingness or general conversational ability.
2022.acl-long.447
ung-etal-2022-saferdialogues
+ 10.18653/v1/2022.acl-long.447
Compositional Generalization in Dependency Parsing
@@ -6485,6 +6932,7 @@ in the Case of Unambiguous Gender
Compositionality— the ability to combine familiar units like words into novel phrases and sentences— has been the focus of intense interest in artificial intelligence in recent years. To test compositional generalization in semantic parsing, Keysers et al. (2020) introduced Compositional Freebase Queries (CFQ). This dataset maximizes the similarity between the test and train distributions over primitive units, like words, while maximizing the compound divergence: the dissimilarity between test and train distributions over larger structures, like phrases. Dependency parsing, however, lacks a compositional generalization benchmark. In this work, we introduce a gold-standard set of dependency parses for CFQ, and use this to analyze the behaviour of a state-of-the art dependency parser (Qi et al., 2020) on the CFQ dataset. We find that increasing compound divergence degrades dependency parsing performance, although not as dramatically as semantic parsing performance. Additionally, we find the performance of the dependency parser does not uniformly degrade relative to compound divergence, and the parser performs differently on different splits with the same compound divergence. We explore a number of hypotheses for what causes the non-uniform degradation in dependency parsing performance, and identify a number of syntactic structures that drive the dependency parser’s lower performance on the most challenging splits.
2022.acl-long.448
goodwin-etal-2022-compositional
+ 10.18653/v1/2022.acl-long.448
ASPECTNEWS: Aspect-Oriented Summarization of News Documents
@@ -6499,6 +6947,7 @@ in the Case of Unambiguous Gender
2022.acl-long.449.software.zip
ahuja-etal-2022-aspectnews
oja/aosumm
+ 10.18653/v1/2022.acl-long.449
MemSum: Extractive Summarization of Long Documents Using Multi-Step Episodic Markov Decision Processes
@@ -6512,6 +6961,7 @@ in the Case of Unambiguous Gender
gu-etal-2022-memsum
nianlonggu/memsum
GovReport
+ 10.18653/v1/2022.acl-long.450
CLUES: A Benchmark for Learning Classifiers using Natural Language Explanations
@@ -6524,6 +6974,7 @@ in the Case of Unambiguous Gender
2022.acl-long.451.software.zip
menon-etal-2022-clues
CLUES (Classifier Learning Using natural language ExplanationS)
+ 10.18653/v1/2022.acl-long.451
Substructure Distribution Projection for Zero-Shot Cross-Lingual Dependency Parsing
@@ -6537,6 +6988,7 @@ in the Case of Unambiguous Gender
shi-etal-2022-substructure
Universal Dependencies
WikiMatrix
+ 10.18653/v1/2022.acl-long.452
Multilingual Detection of Personal Employment Status on Twitter
@@ -6550,6 +7002,7 @@ in the Case of Unambiguous Gender
2022.acl-long.453
tonneau-etal-2022-multilingual
manueltonneau/twitter-unemployment
+ 10.18653/v1/2022.acl-long.453
MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data
@@ -6567,6 +7020,7 @@ in the Case of Unambiguous Gender
HybridQA
MATH
MathQA
+ 10.18653/v1/2022.acl-long.454
Transformers in the loop: Polarity in neural models of language
@@ -6581,6 +7035,7 @@ in the Case of Unambiguous Gender
altsoph/transformers-in-the-loop
Natural sentences that contain *any*
Synthetic parallel sentences that contain *any*
+ 10.18653/v1/2022.acl-long.455
Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation
@@ -6595,6 +7050,7 @@ in the Case of Unambiguous Gender
2022.acl-long.456.software.zip
he-etal-2022-bridging
zwhe99/selftraining4unmt
+ 10.18653/v1/2022.acl-long.456
SDR: Efficient Neural Re-ranking using Succinct Document Representation
@@ -6607,6 +7063,7 @@ in the Case of Unambiguous Gender
2022.acl-long.457
cohen-etal-2022-sdr
MS MARCO
+ 10.18653/v1/2022.acl-long.457
The AI Doctor Is In: A Survey of Task-Oriented Dialogue Systems for Healthcare Applications
@@ -6616,6 +7073,7 @@ in the Case of Unambiguous Gender
Task-oriented dialogue systems are increasingly prevalent in healthcare settings, and have been characterized by a diverse range of architectures and objectives. Although these systems have been surveyed in the medical community from a non-technical perspective, a systematic review from a rigorous computational perspective has to date remained noticeably absent. As a result, many important implementation details of healthcare-oriented dialogue systems remain limited or underspecified, slowing the pace of innovation in this area. To fill this gap, we investigated an initial pool of 4070 papers from well-known computer science, natural language processing, and artificial intelligence venues, identifying 70 papers discussing the system-level implementation of task-oriented dialogue systems for healthcare applications. We conducted a comprehensive technical review of these papers, and present our key findings including identified gaps and corresponding recommendations.
2022.acl-long.458
valizadeh-parde-2022-ai
+ 10.18653/v1/2022.acl-long.458
SHIELD: Defending Textual Neural Networks against Multiple Black-Box Adversarial Attacks with Stochastic Multi-Expert Patcher
@@ -6627,6 +7085,7 @@ in the Case of Unambiguous Gender
2022.acl-long.459
le-etal-2022-shield
lethaiq/shield-defend-adversarial-texts
+ 10.18653/v1/2022.acl-long.459
Accurate Online Posterior Alignments for Principled Lexically-Constrained Decoding
@@ -6637,6 +7096,7 @@ in the Case of Unambiguous Gender
Online alignment in machine translation refers to the task of aligning a target word to a source word when the target sequence has only been partially decoded. Good online alignments facilitate important applications such as lexically constrained translation where user-defined dictionaries are used to inject lexical constraints into the translation model. We propose a novel posterior alignment technique that is truly online in its execution and superior in terms of alignment error rates compared to existing methods. Our proposed inference technique jointly considers alignment and token probabilities in a principled manner and can be seamlessly integrated within existing constrained beam-search decoding algorithms. On five language pairs, including two distant language pairs, we achieve consistent drop in alignment error rates. When deployed on seven lexically constrained translation tasks, we achieve significant improvements in BLEU specifically around the constrained positions.
2022.acl-long.460
chatterjee-etal-2022-accurate
+ 10.18653/v1/2022.acl-long.460
Leveraging Task Transferability to Meta-learning for Clinical Section Classification with Limited Data
@@ -6648,6 +7108,7 @@ in the Case of Unambiguous Gender
Identifying sections is one of the critical components of understanding medical information from unstructured clinical notes and developing assistive technologies for clinical note-writing tasks. Most state-of-the-art text classification systems require thousands of in-domain text data to achieve high performance. However, collecting in-domain and recent clinical note data with section labels is challenging given the high level of privacy and sensitivity. The present paper proposes an algorithmic way to improve the task transferability of meta-learning-based text classification in order to address the issue of low-resource target data. Specifically, we explore how to make the best use of the source dataset and propose a unique task transferability measure named Normalized Negative Conditional Entropy (NNCE). Leveraging the NNCE, we develop strategies for selecting clinical categories and sections from source task data to boost cross-domain meta-learning accuracy. Experimental results show that our task selection strategies improve section classification accuracy significantly compared to meta-learning algorithms.
2022.acl-long.461
chen-etal-2022-leveraging
+ 10.18653/v1/2022.acl-long.461
Reinforcement Guided Multi-Task Learning Framework for Low-Resource Stereotype Detection
@@ -6664,6 +7125,7 @@ in the Case of Unambiguous Gender
Hate Speech
Hate Speech and Offensive Language
StereoSet
+ 10.18653/v1/2022.acl-long.462
Letters From the Past: Modeling Historical Sound Change Through Diachronic Character Embeddings
@@ -6674,6 +7136,7 @@ in the Case of Unambiguous Gender
2022.acl-long.463
boldsen-paggio-2022-letters
syssel/letters-from-the-past
+ 10.18653/v1/2022.acl-long.463
A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation
@@ -6690,6 +7153,7 @@ in the Case of Unambiguous Gender
2022.acl-long.464.software.zip
liu-etal-2022-token
microsoft/HaDes
+ 10.18653/v1/2022.acl-long.464
Low-Rank Softmax Can Have Unargmaxable Classes in Theory but Rarely in Practice
@@ -6702,6 +7166,7 @@ in the Case of Unambiguous Gender
2022.acl-long.465.software.zip
grivas-etal-2022-low
andreasgrv/unargmaxable
+ 10.18653/v1/2022.acl-long.465
Prompt for Extraction? PAIE: Prompting Argument Interaction for Event Argument Extraction
@@ -6718,6 +7183,7 @@ in the Case of Unambiguous Gender
2022.acl-long.466.software.zip
ma-etal-2022-prompt
mayubo2333/paie
+ 10.18653/v1/2022.acl-long.466
Reducing Position Bias in Simultaneous Machine Translation with Length-Aware Framework
@@ -6727,6 +7193,7 @@ in the Case of Unambiguous Gender
Simultaneous machine translation (SiMT) starts translating while receiving the streaming source inputs, and hence the source sentence is always incomplete during translating. Different from the full-sentence MT using the conventional seq-to-seq architecture, SiMT often applies prefix-to-prefix architecture, which forces each target word to only align with a partial source prefix to adapt to the incomplete source in streaming inputs. However, the source words in the front positions are always illusoryly considered more important since they appear in more prefixes, resulting in position bias, which makes the model pay more attention on the front source positions in testing. In this paper, we first analyze the phenomenon of position bias in SiMT, and develop a Length-Aware Framework to reduce the position bias by bridging the structural gap between SiMT and full-sentence MT. Specifically, given the streaming inputs, we first predict the full-sentence length and then fill the future source position with positional encoding, thereby turning the streaming inputs into a pseudo full-sentence. The proposed framework can be integrated into most existing SiMT methods to further improve performance. Experiments on two representative SiMT methods, including the state-of-the-art adaptive policy, show that our method successfully reduces the position bias and thereby achieves better SiMT performance.
2022.acl-long.467
zhang-feng-2022-reducing
+ 10.18653/v1/2022.acl-long.467
A Statutory Article Retrieval Dataset in French
@@ -6739,6 +7206,7 @@ in the Case of Unambiguous Gender
louis-spanakis-2022-statutory
maastrichtlawtech/bsard
BSARD
+ 10.18653/v1/2022.acl-long.468
ParaDetox: Detoxification with Parallel Data
@@ -6755,6 +7223,7 @@ in the Case of Unambiguous Gender
2022.acl-long.469
logacheva-etal-2022-paradetox
skoltech-nlp/paradetox
+ 10.18653/v1/2022.acl-long.469
Interpreting Character Embeddings With Perceptual Representations: The Case of Shape, Sound, and Color
@@ -6767,6 +7236,7 @@ in the Case of Unambiguous Gender
2022.acl-long.470.software.zip
boldsen-etal-2022-interpreting
syssel/interpreting-character-embeddings
+ 10.18653/v1/2022.acl-long.470
Fine-Grained Controllable Text Generation Using Non-Residual Prompting
@@ -6783,6 +7253,7 @@ in the Case of Unambiguous Gender
freddefrallan/non-residual-prompting
C4
CommonGen
+ 10.18653/v1/2022.acl-long.471
Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features
@@ -6794,6 +7265,7 @@ in the Case of Unambiguous Gender
lux-vu-2022-language
digitalphonetics/ims-toucan
CSS10
+ 10.18653/v1/2022.acl-long.472
TwittIrish: A Universal Dependencies Treebank of Tweets in Modern Irish
@@ -6805,6 +7277,7 @@ in the Case of Unambiguous Gender
Modern Irish is a minority language lacking sufficient computational resources for the task of accurate automatic syntactic parsing of user-generated content such as tweets. Although language technology for the Irish language has been developing in recent years, these tools tend to perform poorly on user-generated content. As with other languages, the linguistic style observed in Irish tweets differs, in terms of orthography, lexicon, and syntax, from that of standard texts more commonly used for the development of language models and parsers. We release the first Universal Dependencies treebank of Irish tweets, facilitating natural language processing of user-generated content in Irish. In this paper, we explore the differences between Irish tweets and standard Irish text, and the challenges associated with dependency parsing of Irish tweets. We describe our bootstrapping method of treebank development and report on preliminary parsing experiments.
2022.acl-long.473
cassidy-etal-2022-twittirish
+ 10.18653/v1/2022.acl-long.473
Length Control in Abstractive Summarization by Pretraining Information Selection
@@ -6817,6 +7290,7 @@ in the Case of Unambiguous Gender
2022.acl-long.474.software.zip
liu-etal-2022-length
yizhuliu/lengthcontrol
+ 10.18653/v1/2022.acl-long.474
CQG: A Simple and Effective Controlled Generation Framework for Multi-hop Question Generation
@@ -6833,6 +7307,7 @@ in the Case of Unambiguous Gender
fei-etal-2022-cqg
sion-zcfei/cqg
HotpotQA
+ 10.18653/v1/2022.acl-long.475
Word Order Does Matter and Shuffled Language Models Know It
@@ -6852,6 +7327,7 @@ in the Case of Unambiguous Gender
ReCoRD
SuperGLUE
WinoGrande
+ 10.18653/v1/2022.acl-long.476
An Empirical Study on Explanations in Out-of-Domain Settings
@@ -6865,6 +7341,7 @@ in the Case of Unambiguous Gender
gchrysostomou/ood_faith
IMDb Movie Reviews
SST
+ 10.18653/v1/2022.acl-long.477
MILIE: Modular & Iterative Multilingual Open Information Extraction
@@ -6881,6 +7358,7 @@ in the Case of Unambiguous Gender
2022.acl-long.478
2022.acl-long.478.software.zip
kotnis-etal-2022-milie
+ 10.18653/v1/2022.acl-long.478
What Makes Reading Comprehension Questions Difficult?
@@ -6896,6 +7374,7 @@ in the Case of Unambiguous Gender
MCTest
RACE
ReClor
+ 10.18653/v1/2022.acl-long.479
From Simultaneous to Streaming Machine Translation by Leveraging Streaming History
@@ -6908,6 +7387,7 @@ in the Case of Unambiguous Gender
2022.acl-long.480.software.zip
iranzo-sanchez-etal-2022-simultaneous
MuST-C
+ 10.18653/v1/2022.acl-long.480
A Rationale-Centric Framework for Human-in-the-loop Machine Learning
@@ -6922,6 +7402,7 @@ in the Case of Unambiguous Gender
GeorgeLuImmortal/RDL-Rationales-centric-Double-robustness-Learning
IMDb Movie Reviews
SST
+ 10.18653/v1/2022.acl-long.481
Challenges and Strategies in Cross-Cultural NLP
@@ -6944,6 +7425,7 @@ in the Case of Unambiguous Gender
2022.acl-long.482
hershcovich-etal-2022-challenges
MaRVL
+ 10.18653/v1/2022.acl-long.482
Prototypical Verbalizer for Prompt-based Few-shot Tuning
@@ -6958,6 +7440,7 @@ in the Case of Unambiguous Gender
cui-etal-2022-prototypical
thunlp/OpenPrompt
Few-NERD
+ 10.18653/v1/2022.acl-long.483
Clickbait Spoiling via Question Answering and Passage Retrieval
@@ -6974,6 +7457,7 @@ in the Case of Unambiguous Gender
MS MARCO
SQuAD
TriviaQA
+ 10.18653/v1/2022.acl-long.484
BERT Learns to Teach: Knowledge Distillation with Meta Learning
@@ -6990,6 +7474,7 @@ in the Case of Unambiguous Gender
MRPC
QNLI
SST
+ 10.18653/v1/2022.acl-long.485
STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation
@@ -7004,6 +7489,7 @@ in the Case of Unambiguous Gender
fang-etal-2022-stemm
ictnlp/stemm
MuST-C
+ 10.18653/v1/2022.acl-long.486
Integrating Vectorized Lexical Constraints for Neural Machine Translation
@@ -7015,6 +7501,7 @@ in the Case of Unambiguous Gender
2022.acl-long.487
wang-etal-2022-integrating
shuo-git/vecconstnmt
+ 10.18653/v1/2022.acl-long.487
MPII: Multi-Level Mutual Promotion for Inference and Interpretation
@@ -7030,6 +7517,7 @@ in the Case of Unambiguous Gender
MultiNLI
SNLI
e-SNLI
+ 10.18653/v1/2022.acl-long.488
StableMoE: Stable Routing Strategy for Mixture of Experts
@@ -7047,6 +7535,7 @@ in the Case of Unambiguous Gender
dai-etal-2022-stablemoe
hunter-ddm/stablemoe
CC100
+ 10.18653/v1/2022.acl-long.489
Boundary Smoothing for Named Entity Recognition
@@ -7061,6 +7550,7 @@ in the Case of Unambiguous Gender
CoNLL++
Resume NER
Weibo NER
+ 10.18653/v1/2022.acl-long.490
Incorporating Hierarchy into Text Encoder: a Contrastive Learning Approach for Hierarchical Text Classification
@@ -7076,6 +7566,7 @@ in the Case of Unambiguous Gender
wzh9969/contrastive-htc
RCV1
WOS
+ 10.18653/v1/2022.acl-long.491
Signal in Noise: Exploring Meaning Encoded in Random Character Sequences with Character-Aware Language Models
@@ -7091,6 +7582,7 @@ in the Case of Unambiguous Gender
2022.acl-long.492.software.zip
chu-etal-2022-signal
comp-syn/garble
+ 10.18653/v1/2022.acl-long.492
Hyperlink-induced Pre-training for Passage Retrieval in Open-domain Question Answering
@@ -7117,6 +7609,7 @@ in the Case of Unambiguous Gender
MS MARCO
Natural Questions
TriviaQA
+ 10.18653/v1/2022.acl-long.493
AdaLoGN: Adaptive Logic Graph Network for Reasoning-Based Machine Reading Comprehension
@@ -7133,6 +7626,7 @@ in the Case of Unambiguous Gender
nju-websoft/adalogn
LogiQA
ReClor
+ 10.18653/v1/2022.acl-long.494
CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing
@@ -7151,6 +7645,7 @@ in the Case of Unambiguous Gender
MRPC
QNLI
SST
+ 10.18653/v1/2022.acl-long.495
Interpretability for Language Learners Using Example-Based Grammatical Error Correction
@@ -7166,6 +7661,7 @@ in the Case of Unambiguous Gender
kanekomasahiro/eb-gec
FCE
JFLEG
+ 10.18653/v1/2022.acl-long.496
Rethinking Negative Sampling for Handling Missing Entity Annotations
@@ -7176,6 +7672,7 @@ in the Case of Unambiguous Gender
Negative sampling is highly effective in handling missing annotations for named entity recognition (NER). One of our contributions is an analysis on how it makes sense through introducing two insightful concepts: missampling and uncertainty. Empirical studies show low missampling rate and high uncertainty are both essential for achieving promising performances with negative sampling. Based on the sparsity of named entities, we also theoretically derive a lower bound for the probability of zero missampling rate, which is only relevant to sentence length. The other contribution is an adaptive and weighted sampling distribution that further improves negative sampling via our former analysis. Experiments on synthetic datasets and well-annotated datasets (e.g., CoNLL-2003) show that our proposed approach benefits negative sampling in terms of F1 score and loss convergence. Besides, models with improved negative sampling have achieved new state-of-the-art results on real-world datasets (e.g., EC).
2022.acl-long.497
li-etal-2022-rethinking
+ 10.18653/v1/2022.acl-long.497
Distantly Supervised Named Entity Recognition via Confidence-Based Multi-Class Positive and Unlabeled Learning
@@ -7187,6 +7684,7 @@ in the Case of Unambiguous Gender
2022.acl-long.498
2022.acl-long.498.software.zip
zhou-etal-2022-distantly
+ 10.18653/v1/2022.acl-long.498
UniXcoder: Unified Cross-Modal Pre-training for Code Representation
@@ -7204,6 +7702,7 @@ in the Case of Unambiguous Gender
CoSQA
CodeSearchNet
CodeXGLUE
+ 10.18653/v1/2022.acl-long.499
One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia
@@ -7223,6 +7722,7 @@ in the Case of Unambiguous Gender
NLP research is impeded by a lack of resources and awareness of the challenges presented by underrepresented languages and dialects. Focusing on the languages spoken in Indonesia, the second most linguistically diverse and the fourth most populous nation of the world, we provide an overview of the current state of NLP research for Indonesia’s 700+ languages. We highlight challenges in Indonesian NLP and how these affect the performance of current NLP systems. Finally, we provide general recommendations to help develop NLP technology not only for languages of Indonesia but also other underrepresented languages.
2022.acl-long.500
aji-etal-2022-one
+ 10.18653/v1/2022.acl-long.500
Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework for Scrutinizing Machine Text
@@ -7237,6 +7737,7 @@ in the Case of Unambiguous Gender
2022.acl-long.501.software.zip
dou-etal-2022-gpt
WebText
+ 10.18653/v1/2022.acl-long.501
Transkimmer: Transformer Learns to Layer-wise Skim
@@ -7252,6 +7753,7 @@ in the Case of Unambiguous Gender
GLUE
IMDb Movie Reviews
QNLI
+ 10.18653/v1/2022.acl-long.502
SkipBERT: Efficient Inference with Shallow Layer Skipping
@@ -7269,6 +7771,7 @@ in the Case of Unambiguous Gender
MRPC
SQuAD
SST
+ 10.18653/v1/2022.acl-long.503
Pretraining with Artificial Language: Studying Transferable Knowledge in Language Models
@@ -7279,6 +7782,7 @@ in the Case of Unambiguous Gender
2022.acl-long.504
ri-tsuruoka-2022-pretraining
Penn Treebank
+ 10.18653/v1/2022.acl-long.504
mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models
@@ -7298,6 +7802,7 @@ in the Case of Unambiguous Gender
RELX
SQuAD
XQuAD
+ 10.18653/v1/2022.acl-long.505
Evaluating Factuality in Text Simplification
@@ -7313,6 +7818,7 @@ in the Case of Unambiguous Gender
ashologn/evaluating-factuality-in-text-simplification
Newsela
WikiLarge
+ 10.18653/v1/2022.acl-long.506
Requirements and Motivations of Low-Resource Speech Synthesis for Language Revitalization
@@ -7326,6 +7832,7 @@ in the Case of Unambiguous Gender
This paper describes the motivation and development of speech synthesis systems for the purposes of language revitalization. By building speech synthesis systems for three Indigenous languages spoken in Canada, Kanien’kéha, Gitksan & SENĆOŦEN, we re-evaluate the question of how much data is required to build low-resource speech synthesis systems featuring state-of-the-art neural models. For example, preliminary results with English data show that a FastSpeech2 model trained with 1 hour of training data can produce speech with comparable naturalness to a Tacotron2 model trained with 10 hours of data. Finally, we motivate future research in evaluation and classroom integration in the field of speech synthesis for language revitalization.
2022.acl-long.507
pine-etal-2022-requirements
+ 10.18653/v1/2022.acl-long.507
Sharpness-Aware Minimization Improves Language Model Generalization
@@ -7343,6 +7850,7 @@ in the Case of Unambiguous Gender
TyDi QA
TyDiQA-GoldP
WebQuestions
+ 10.18653/v1/2022.acl-long.508
Adversarial Authorship Attribution for Deobfuscation
@@ -7354,6 +7862,7 @@ in the Case of Unambiguous Gender
Recent advances in natural language processing have enabled powerful privacy-invasive authorship attribution. To counter authorship attribution, researchers have proposed a variety of rule-based and learning-based text obfuscation approaches. However, existing authorship obfuscation approaches do not consider the adversarial threat model. Specifically, they are not evaluated against adversarially trained authorship attributors that are aware of potential obfuscation. To fill this gap, we investigate the problem of adversarial authorship attribution for deobfuscation. We show that adversarially trained authorship attributors are able to degrade the effectiveness of existing obfuscators from 20-30% to 5-10%. We also evaluate the effectiveness of adversarial training when the attributor makes incorrect assumptions about whether and which obfuscator was used. While there is a a clear degradation in attribution accuracy, it is noteworthy that this degradation is still at or above the attribution accuracy of the attributor that is not adversarially trained at all. Our results motivate the need to develop authorship obfuscation approaches that are resistant to deobfuscation.
2022.acl-long.509
zhai-etal-2022-adversarial
+ 10.18653/v1/2022.acl-long.509
Weakly Supervised Word Segmentation for Computational Language Documentation
@@ -7365,6 +7874,7 @@ in the Case of Unambiguous Gender
2022.acl-long.510
okabe-etal-2022-weakly
shuokabe/pyseg
+ 10.18653/v1/2022.acl-long.510
SciNLI: A Corpus for Natural Language Inference on Scientific Text
@@ -7381,6 +7891,7 @@ in the Case of Unambiguous Gender
SNLI
SWAG
SuperGLUE
+ 10.18653/v1/2022.acl-long.511
Neural reality of argument structure constructions
@@ -7395,6 +7906,7 @@ in the Case of Unambiguous Gender
2022.acl-long.512.software.zip
li-etal-2022-neural
spoclab-ca/neural-reality-constructions
+ 10.18653/v1/2022.acl-long.512
On the Robustness of Offensive Language Classifiers
@@ -7408,6 +7920,7 @@ in the Case of Unambiguous Gender
rusert-etal-2022-robustness
jonrusert/robustnessofoffensiveclassifiers
OLID
+ 10.18653/v1/2022.acl-long.513
Few-shot Controllable Style Transfer for Low-Resource Multilingual Settings
@@ -7423,6 +7936,7 @@ in the Case of Unambiguous Gender
Samanantar
XFORMAL
mC4
+ 10.18653/v1/2022.acl-long.514
ABC: Attention with Bounded-memory Control
@@ -7443,6 +7957,7 @@ in the Case of Unambiguous Gender
WMT 2014
WikiText-103
WikiText-2
+ 10.18653/v1/2022.acl-long.515
The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP Systems Fail
@@ -7453,6 +7968,7 @@ in the Case of Unambiguous Gender
bowman-2022-dangers
SQuAD
SuperGLUE
+ 10.18653/v1/2022.acl-long.516
RELiC: Retrieving Evidence for Literary Claims
@@ -7467,6 +7983,7 @@ in the Case of Unambiguous Gender
martiansideofthemoon/relic-retrieval
RELiC
BEIR
+ 10.18653/v1/2022.acl-long.517
Analyzing Generalization of Vision and Language Navigation to Unseen Outdoor Areas
@@ -7480,6 +7997,7 @@ in the Case of Unambiguous Gender
raphael-sch/map2seq_vln
Touchdown Dataset
map2seq
+ 10.18653/v1/2022.acl-long.518
Adapting Coreference Resolution Models through Active Learning
@@ -7493,6 +8011,7 @@ in the Case of Unambiguous Gender
2022.acl-long.519
yuan-etal-2022-adapting
forest-snow/incremental-coref
+ 10.18653/v1/2022.acl-long.519
An Imitation Learning Curriculum for Text Editing with Non-Autoregressive Models
@@ -7503,6 +8022,7 @@ in the Case of Unambiguous Gender
2022.acl-long.520
agrawal-carpuat-2022-imitation
Newsela
+ 10.18653/v1/2022.acl-long.520
Memorisation versus Generalisation in Pre-trained Language Models
@@ -7517,6 +8037,7 @@ in the Case of Unambiguous Gender
CoNLL++
CoNLL-2003
WNUT 2017
+ 10.18653/v1/2022.acl-long.521
ChatMatch: Evaluating Chatbots by Autonomous Chat Tournaments
@@ -7530,6 +8051,7 @@ in the Case of Unambiguous Gender
2022.acl-long.522.software.zip
yang-etal-2022-chatmatch
ruolanyang/chatmatch
+ 10.18653/v1/2022.acl-long.522
Do self-supervised speech models develop human-like perception biases?
@@ -7541,6 +8063,7 @@ in the Case of Unambiguous Gender
millet-dunbar-2022-self
AudioSet
LibriSpeech
+ 10.18653/v1/2022.acl-long.523
Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions
@@ -7559,6 +8082,7 @@ in the Case of Unambiguous Gender
RxR
StreetLearn
Talk the Walk
+ 10.18653/v1/2022.acl-long.524
Learning to Generate Programs for Table Fact Verification via Structure-Aware Semantic Parsing
@@ -7570,6 +8094,7 @@ in the Case of Unambiguous Gender
ou-liu-2022-learning
ousuixin/sasp
TabFact
+ 10.18653/v1/2022.acl-long.525
Cluster & Tune: Boost Cold Start Performance in Text Classification
@@ -7585,6 +8110,7 @@ in the Case of Unambiguous Gender
2022.acl-long.526
shnarch-etal-2022-cluster
ibm/intermediate-training-using-clustering
+ 10.18653/v1/2022.acl-long.526
Overcoming a Theoretical Limitation of Self-Attention
@@ -7595,6 +8121,7 @@ in the Case of Unambiguous Gender
2022.acl-long.527
chiang-cholak-2022-overcoming
ndnlp/parity
+ 10.18653/v1/2022.acl-long.527
Prediction Difference Regularization against Perturbation for Neural Machine Translation
@@ -7606,6 +8133,7 @@ in the Case of Unambiguous Gender
Regularization methods applying input perturbation have drawn considerable attention and have been frequently explored for NMT tasks in recent years. Despite their simplicity and effectiveness, we argue that these methods are limited by the under-fitting of training data. In this paper, we utilize prediction difference for ground-truth tokens to analyze the fitting of token-level samples and find that under-fitting is almost as common as over-fitting. We introduce prediction difference regularization (PD-R), a simple and effective method that can reduce over-fitting and under-fitting at the same time. For all token-level samples, PD-R minimizes the prediction difference between the original pass and the input-perturbed pass, making the model less sensitive to small input changes, thus more robust to both perturbations and under-fitted training data. Experiments on three widely used WMT translation tasks show that our approach can significantly improve over existing perturbation regularization methods. On WMT16 En-De task, our model achieves 1.80 SacreBLEU improvement over vanilla transformer.
2022.acl-long.528
guo-etal-2022-prediction
+ 10.18653/v1/2022.acl-long.528
Make the Best of Cross-lingual Transfer: Evidence from POS Tagging with over 100 Languages
@@ -7617,6 +8145,7 @@ in the Case of Unambiguous Gender
2022.acl-long.529
de-vries-etal-2022-make
wietsedv/xpos
+ 10.18653/v1/2022.acl-long.529
Should a Chatbot be Sarcastic? Understanding User Preferences Towards Sarcasm Generation
@@ -7627,6 +8156,7 @@ in the Case of Unambiguous Gender
Previous sarcasm generation research has focused on how to generate text that people perceive as sarcastic to create more human-like interactions. In this paper, we argue that we should first turn our attention to the question of when sarcasm should be generated, finding that humans consider sarcastic responses inappropriate to many input utterances. Next, we use a theory-driven framework for generating sarcastic responses, which allows us to control the linguistic devices included during generation. For each device, we investigate how much humans associate it with sarcasm, finding that pragmatic insincerity and emotional markers are devices crucial for making sarcasm recognisable.
2022.acl-long.530
oprea-etal-2022-chatbot
+ 10.18653/v1/2022.acl-long.530
How Do Seq2Seq Models Perform on End-to-End Data-to-Text Generation?
@@ -7639,6 +8169,7 @@ in the Case of Unambiguous Gender
xunjianyin/seq2seqondata2text
ToTTo
WikiBio
+ 10.18653/v1/2022.acl-long.531
Probing for Labeled Dependency Trees
@@ -7652,6 +8183,7 @@ in the Case of Unambiguous Gender
muller-eberstein-etal-2022-probing
personads/depprobe
Universal Dependencies
+ 10.18653/v1/2022.acl-long.532
DoCoGen: Domain Counterfactual Generation for Low Resource Domain Adaptation
@@ -7664,6 +8196,7 @@ in the Case of Unambiguous Gender
2022.acl-long.533
calderon-etal-2022-docogen
nitaytech/docogen
+ 10.18653/v1/2022.acl-long.533
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding
@@ -7680,6 +8213,7 @@ in the Case of Unambiguous Gender
FUNSD
RVL-CDIP
XFUND
+ 10.18653/v1/2022.acl-long.534
Dependency-based Mixture Language Models
@@ -7693,6 +8227,7 @@ in the Case of Unambiguous Gender
fadedcosine/dependency-guided-neural-text-generation
Penn Treebank
ROCStories
+ 10.18653/v1/2022.acl-long.535
Can Unsupervised Knowledge Transfer from Social Discussions Help Argument Mining?
@@ -7706,6 +8241,7 @@ in the Case of Unambiguous Gender
2022.acl-long.536.software.zip
dutta-etal-2022-unsupervised
jeevesh8/arg_mining
+ 10.18653/v1/2022.acl-long.536
Entity-based Neural Local Coherence Modeling
@@ -7717,6 +8253,7 @@ in the Case of Unambiguous Gender
jeon-strube-2022-entity
sdeva14/acl22-entity-neural-local-cohe
GCDC
+ 10.18653/v1/2022.acl-long.537
“That Is a Suspicious Reaction!”: Interpreting Logits Variation to Detect NLP Adversarial Attacks
@@ -7731,6 +8268,7 @@ in the Case of Unambiguous Gender
mosca-etal-2022-suspicious
AG News
IMDb Movie Reviews
+ 10.18653/v1/2022.acl-long.538
Local Languages, Third Spaces, and other High-Resource Scenarios
@@ -7739,6 +8277,7 @@ in the Case of Unambiguous Gender
How can language technology address the diverse situations of the world’s languages? In one view, languages exist on a resource continuum and the challenge is to scale existing solutions, bringing under-resourced languages into the high-resource world. In another view, presented here, the world’s language ecology includes standardised languages, local languages, and contact languages. These are often subsumed under the label of “under-resourced languages” even though they have distinct functions and prospects. I explore this position and propose some ecologically-aware language technology agendas.
2022.acl-long.539
bird-2022-local
+ 10.18653/v1/2022.acl-long.539
That Slepen Al the Nyght with Open Ye! Cross-era Sequence Segmentation with Switch-memory
@@ -7748,6 +8287,7 @@ in the Case of Unambiguous Gender
The evolution of language follows the rule of gradual change. Grammar, vocabulary, and lexical semantic shifts take place over time, resulting in a diachronic linguistic gap. As such, a considerable amount of texts are written in languages of different eras, which creates obstacles for natural language processing tasks, such as word segmentation and machine translation. Although the Chinese language has a long history, previous Chinese natural language processing research has primarily focused on tasks within a specific era. Therefore, we propose a cross-era learning framework for Chinese word segmentation (CWS), CROSSWISE, which uses the Switch-memory (SM) module to incorporate era-specific linguistic knowledge. Experiments on four corpora from different eras show that the performance of each corpus significantly improves. Further analyses also demonstrate that the SM can effectively integrate the knowledge of the eras into the neural network.
2022.acl-long.540
tang-su-2022-slepen
+ 10.18653/v1/2022.acl-long.540
Fair and Argumentative Language Modeling for Computational Argumentation
@@ -7760,6 +8300,7 @@ in the Case of Unambiguous Gender
2022.acl-long.541.software.zip
holtermann-etal-2022-fair
umanlp/fairargumentativelm
+ 10.18653/v1/2022.acl-long.541
Learning Adaptive Segmentation Policy for End-to-End Simultaneous Translation
@@ -7773,6 +8314,7 @@ in the Case of Unambiguous Gender
zhang-etal-2022-learning
BSTC
MuST-C
+ 10.18653/v1/2022.acl-long.542
Can Pre-trained Language Models Interpret Similes as Smart as Human?
@@ -7786,6 +8328,7 @@ in the Case of Unambiguous Gender
2022.acl-long.543
he-etal-2022-pre
abbey4799/plms-interpret-simile
+ 10.18653/v1/2022.acl-long.543
CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
@@ -7828,6 +8371,7 @@ in the Case of Unambiguous Gender
CLUE
CMeIE
SuperGLUE
+ 10.18653/v1/2022.acl-long.544
Learning Non-Autoregressive Models from Search for Unsupervised Sentence Summarization
@@ -7839,6 +8383,7 @@ in the Case of Unambiguous Gender
2022.acl-long.545
liu-etal-2022-learning
manga-uofa/naus
+ 10.18653/v1/2022.acl-long.545
Learning to Generalize to More: Continuous Semantic Augmentation for Neural Machine Translation
@@ -7854,6 +8399,7 @@ in the Case of Unambiguous Gender
2022.acl-long.546
wei-etal-2022-learning
pemywei/csanmt
+ 10.18653/v1/2022.acl-long.546
Lexical Knowledge Internalization for Neural Dialog Generation
@@ -7869,6 +8415,7 @@ in the Case of Unambiguous Gender
lividwo/ki
DailyDialog
Wizard of Wikipedia
+ 10.18653/v1/2022.acl-long.547
Modeling Syntactic-Semantic Dependency Correlations in Semantic Role Labeling Using Mixture Models
@@ -7881,6 +8428,7 @@ in the Case of Unambiguous Gender
2022.acl-long.548.software.zip
chen-etal-2022-modeling
christomartin/syn-sem_dependency_correlation_mixture_model
+ 10.18653/v1/2022.acl-long.548
Learning the Beauty in Songs: Neural Singing Voice Beautifier
@@ -7894,6 +8442,7 @@ in the Case of Unambiguous Gender
2022.acl-long.549
liu-etal-2022-learning-beauty
moonintheriver/neuralsvb
+ 10.18653/v1/2022.acl-long.549
A Model-agnostic Data Manipulation Method for Persona-based Dialogue Generation
@@ -7909,6 +8458,7 @@ in the Case of Unambiguous Gender
cao-etal-2022-model
caoyu-noob/d3
PERSONA-CHAT
+ 10.18653/v1/2022.acl-long.550
LinkBERT: Pretraining Language Models with Document Links
@@ -7943,6 +8493,7 @@ in the Case of Unambiguous Gender
SQuAD
SearchQA
TriviaQA
+ 10.18653/v1/2022.acl-long.551
Improving Time Sensitivity for Question Answering over Temporal Knowledge Graphs
@@ -7955,6 +8506,7 @@ in the Case of Unambiguous Gender
2022.acl-long.552
shang-etal-2022-improving
CronQuestions
+ 10.18653/v1/2022.acl-long.552
Self-supervised Semantic-driven Phoneme Discovery for Zero-resource Speech Recognition
@@ -7967,6 +8519,7 @@ in the Case of Unambiguous Gender
2022.acl-long.553
wang-etal-2022-self
LibriSpeech
+ 10.18653/v1/2022.acl-long.553
Softmax Bottleneck Makes Language Models Unable to Represent Multi-mode Word Distributions
@@ -7979,6 +8532,7 @@ in the Case of Unambiguous Gender
chang-mccallum-2022-softmax
ProtoQA
WebText
+ 10.18653/v1/2022.acl-long.554
Ditch the Gold Standard: Re-evaluating Conversational Question Answering
@@ -7995,6 +8549,7 @@ in the Case of Unambiguous Gender
CANARD
CoQA
QuAC
+ 10.18653/v1/2022.acl-long.555
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
@@ -8011,6 +8566,7 @@ in the Case of Unambiguous Gender
AG News
MPQA Opinion Corpus
SST
+ 10.18653/v1/2022.acl-long.556
Situated Dialogue Learning through Procedural Environment Generation
@@ -8021,6 +8577,7 @@ in the Case of Unambiguous Gender
We teach goal-driven agents to interactively act and speak in situated environments by training on generated curriculums. Our agents operate in LIGHT (Urbanek et al. 2019)—a large-scale crowd-sourced fantasy text adventure game wherein an agent perceives and interacts with the world through textual natural language. Goals in this environment take the form of character-based quests, consisting of personas and motivations. We augment LIGHT by learning to procedurally generate additional novel textual worlds and quests to create a curriculum of steadily increasing difficulty for training agents to achieve such goals. In particular, we measure curriculum difficulty in terms of the rarity of the quest in the original training distribution—an easier environment is one that is more likely to have been found in the unaugmented dataset. An ablation study shows that this method of learning from the tail of a distribution results in significantly higher generalization abilities as measured by zero-shot performance on never-before-seen quests.
2022.acl-long.557
ammanabrolu-etal-2022-situated
+ 10.18653/v1/2022.acl-long.557
UniTE: Unified Translation Evaluation
@@ -8037,6 +8594,7 @@ in the Case of Unambiguous Gender
2022.acl-long.558.software.zip
wan-etal-2022-unite
nlp2ct/unite
+ 10.18653/v1/2022.acl-long.558
Program Transfer for Answering Complex Questions over Knowledge Bases
@@ -8056,6 +8614,7 @@ in the Case of Unambiguous Gender
thu-keg/programtransfer
ComplexWebQuestions
WebQuestions
+ 10.18653/v1/2022.acl-long.559
EAG: Extract and Generate Multi-way Aligned Corpus for Complete Multi-lingual Neural Machine Translation
@@ -8069,6 +8628,7 @@ in the Case of Unambiguous Gender
2022.acl-long.560.software.zip
xu-etal-2022-eag
OPUS-100
+ 10.18653/v1/2022.acl-long.560
Using Context-to-Vector with Graph Retrofitting to Improve Word Embeddings
@@ -8084,6 +8644,7 @@ in the Case of Unambiguous Gender
Although contextualized embeddings generated from large-scale pre-trained models perform well in many tasks, traditional static embeddings (e.g., Skip-gram, Word2Vec) still play an important role in low-resource and lightweight settings due to their low computational cost, ease of deployment, and stability. In this paper, we aim to improve word embeddings by 1) incorporating more contextual information from existing pre-trained models into the Skip-gram framework, which we call Context-to-Vec; 2) proposing a post-processing retrofitting method for static embeddings independent of training by employing priori synonym knowledge and weighted vector distribution. Through extrinsic and intrinsic tasks, our methods are well proven to outperform the baselines by a large margin.
2022.acl-long.561
zheng-etal-2022-using
+ 10.18653/v1/2022.acl-long.561
Multimodal Sarcasm Target Identification in Tweets
@@ -8098,6 +8659,7 @@ in the Case of Unambiguous Gender
2022.acl-long.562.software.zip
wang-etal-2022-multimodal
wjq-learning/msti
+ 10.18653/v1/2022.acl-long.562
Flexible Generation from Fragmentary Linguistic Input
@@ -8109,6 +8671,7 @@ in the Case of Unambiguous Gender
qian-levy-2022-flexible
pqian11/fragment-completion
New York Times Annotated Corpus
+ 10.18653/v1/2022.acl-long.563
Revisiting Over-Smoothness in Text to Speech
@@ -8122,6 +8685,7 @@ in the Case of Unambiguous Gender
2022.acl-long.564
ren-etal-2022-revisiting
LJSpeech
+ 10.18653/v1/2022.acl-long.564
Coherence boosting: When your pretrained language model is not paying enough attention
@@ -8144,6 +8708,7 @@ in the Case of Unambiguous Gender
PIQA
SST
WebText
+ 10.18653/v1/2022.acl-long.565
Uncertainty Estimation of Transformer Predictions for Misclassification Detection
@@ -8169,6 +8734,7 @@ in the Case of Unambiguous Gender
GLUE
MRPC
SST
+ 10.18653/v1/2022.acl-long.566
VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena
@@ -8187,6 +8753,7 @@ in the Case of Unambiguous Gender
VisDial
Visual Question Answering
Visual7W
+ 10.18653/v1/2022.acl-long.567
The Grammar-Learning Trajectories of Neural Language Models
@@ -8203,6 +8770,7 @@ in the Case of Unambiguous Gender
OpenSubtitles
OpenWebText
WebText
+ 10.18653/v1/2022.acl-long.568
Generating Scientific Definitions with Controllable Complexity
@@ -8214,6 +8782,7 @@ in the Case of Unambiguous Gender
2022.acl-long.569
august-etal-2022-generating
talaugust/definition-complexity
+ 10.18653/v1/2022.acl-long.569
Label Semantic Aware Pre-training for Few-shot Text Classification
@@ -8231,6 +8800,7 @@ in the Case of Unambiguous Gender
SGD
SNIPS
TOPv2
+ 10.18653/v1/2022.acl-long.570
ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation
@@ -8250,6 +8820,7 @@ in the Case of Unambiguous Gender
2022.acl-long.571.software.zip
li-etal-2022-ode
libeineu/ode-transformer
+ 10.18653/v1/2022.acl-long.571
A Comparison of Strategies for Source-Free Domain Adaptation
@@ -8261,6 +8832,7 @@ in the Case of Unambiguous Gender
2022.acl-long.572
su-etal-2022-comparison
xinsu626/sourcefreedomainadaptation
+ 10.18653/v1/2022.acl-long.572
Ethics Sheets for AI Tasks
@@ -8269,6 +8841,7 @@ in the Case of Unambiguous Gender
Several high-profile events, such as the mass testing of emotion recognition systems on vulnerable sub-populations and using question answering systems to make moral judgments, have highlighted how technology will often lead to more adverse outcomes for those that are already marginalized. At issue here are not just individual systems and datasets, but also the AI tasks themselves. In this position paper, I make a case for thinking about ethical considerations not just at the level of individual models and datasets, but also at the level of AI tasks. I will present a new form of such an effort, Ethics Sheets for AI Tasks, dedicated to fleshing out the assumptions and ethical considerations hidden in how a task is commonly framed and in the choices we make regarding the data, method, and evaluation. I will also present a template for ethics sheets with 50 ethical considerations, using the task of emotion recognition as a running example. Ethics sheets are a mechanism to engage with and document ethical considerations before building datasets and systems. Similar to survey articles, a small number of carefully created ethics sheets can serve numerous researchers and developers.
2022.acl-long.573
mohammad-2022-ethics
+ 10.18653/v1/2022.acl-long.573
Learning Disentangled Representations of Negation and Uncertainty
@@ -8281,6 +8854,7 @@ in the Case of Unambiguous Gender
2022.acl-long.574
vasilakes-etal-2022-learning
jvasilakes/disentanglement-vae
+ 10.18653/v1/2022.acl-long.574
latent-GLAT: Glancing at Latent Variables for Parallel Text Generation
@@ -8298,6 +8872,7 @@ in the Case of Unambiguous Gender
bao-etal-2022-textit
baoy-nlp/latent-glat
DailyDialog
+ 10.18653/v1/2022.acl-long.575
PPT: Pre-trained Prompt Tuning for Few-shot Learning
@@ -8317,6 +8892,7 @@ in the Case of Unambiguous Gender
OCNLI
SST
SuperGLUE
+ 10.18653/v1/2022.acl-long.576
Deduplicating Training Data Makes Language Models Better
@@ -8335,6 +8911,7 @@ in the Case of Unambiguous Gender
Billion Word Benchmark
RealNews
Wiki-40B
+ 10.18653/v1/2022.acl-long.577
Improving the Generalizability of Depression Detection by Leveraging Clinical Questionnaires
@@ -8349,6 +8926,7 @@ in the Case of Unambiguous Gender
nguyen-etal-2022-improving
thongnt99/acl22-depression-phq9
SMHD
+ 10.18653/v1/2022.acl-long.578
Internet-Augmented Dialogue Generation
@@ -8362,6 +8940,7 @@ in the Case of Unambiguous Gender
PERSONA-CHAT
Topical-Chat
Wizard of Wikipedia
+ 10.18653/v1/2022.acl-long.579
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities
@@ -8391,6 +8970,7 @@ in the Case of Unambiguous Gender
Common Voice
DEMAND
LibriMix
+ 10.18653/v1/2022.acl-long.580
Knowledge Neurons in Pretrained Transformers
@@ -8406,6 +8986,7 @@ in the Case of Unambiguous Gender
2022.acl-long.581.software.zip
dai-etal-2022-knowledge
hunter-ddm/knowledge-neurons
+ 10.18653/v1/2022.acl-long.581
Meta-Learning for Fast Cross-Lingual Adaptation in Dependency Parsing
@@ -8422,6 +9003,7 @@ in the Case of Unambiguous Gender
2022.acl-long.582.software.zip
langedijk-etal-2022-meta
annaproxy/udify-metalearning
+ 10.18653/v1/2022.acl-long.582
French CrowS-Pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English
@@ -8434,6 +9016,7 @@ in the Case of Unambiguous Gender
2022.acl-long.583
neveol-etal-2022-french
CrowS-Pairs
+ 10.18653/v1/2022.acl-long.583
Few-Shot Learning with Siamese Networks and Label Tuning
@@ -8451,6 +9034,7 @@ in the Case of Unambiguous Gender
IMDb Movie Reviews
ISEAR
SNLI
+ 10.18653/v1/2022.acl-long.584
Inferring Rewards from Language in Context
@@ -8463,6 +9047,7 @@ in the Case of Unambiguous Gender
2022.acl-long.585
lin-etal-2022-inferring
jlin816/rewards-from-language
+ 10.18653/v1/2022.acl-long.585
Generating Biographies on Wikipedia: The Impact of Gender Bias on the Retrieval-Based Generation of Women Biographies
@@ -8473,6 +9058,7 @@ in the Case of Unambiguous Gender
2022.acl-long.586
fan-gardent-2022-generating
WikiSum
+ 10.18653/v1/2022.acl-long.586
Your Answer is Incorrect... Would you like to know why? Introducing a Bilingual Short Answer Feedback Dataset
@@ -8488,6 +9074,7 @@ in the Case of Unambiguous Gender
filighera-etal-2022-answer
sebochs/saf
SNLI
+ 10.18653/v1/2022.acl-long.587
Towards Better Characterization of Paraphrases
@@ -8502,6 +9089,7 @@ in the Case of Unambiguous Gender
GLUE
MRPC
PAWS
+ 10.18653/v1/2022.acl-long.588
SummScreen: A Dataset for Abstractive Screenplay Summarization
@@ -8516,6 +9104,7 @@ in the Case of Unambiguous Gender
mingdachen/SummScreen
Multi-News
TVRecap
+ 10.18653/v1/2022.acl-long.589
Sparsifying Transformer Models with Trainable Representation Pooling
@@ -8530,6 +9119,7 @@ in the Case of Unambiguous Gender
applicaai/pyramidions
Pubmed
arXiv Summarization Dataset
+ 10.18653/v1/2022.acl-long.590
Uncertainty Determines the Adequacy of the Mode and the Tractability of Decoding in Sequence-to-Sequence Models
@@ -8541,6 +9131,7 @@ in the Case of Unambiguous Gender
2022.acl-long.591
stahlberg-etal-2022-uncertainty
JFLEG
+ 10.18653/v1/2022.acl-long.591
FlipDA: Effective and Robust Data Augmentation for Few-Shot Learning
@@ -8562,6 +9153,7 @@ in the Case of Unambiguous Gender
SuperGLUE
WSC
WiC
+ 10.18653/v1/2022.acl-long.592
Text-Free Prosody-Aware Generative Spoken Language Modeling
@@ -8582,6 +9174,7 @@ in the Case of Unambiguous Gender
kharitonov-etal-2022-text
pytorch/fairseq
LibriSpeech
+ 10.18653/v1/2022.acl-long.593
Lite Unified Modeling for Discriminative Reading Comprehension
@@ -8598,6 +9191,7 @@ in the Case of Unambiguous Gender
DREAM
RACE
SQuAD
+ 10.18653/v1/2022.acl-long.594
Bilingual alignment transfers to multilingual alignment for unsupervised parallel text mining
@@ -8608,6 +9202,7 @@ in the Case of Unambiguous Gender
2022.acl-long.595
tien-steinert-threlkeld-2022-bilingual
cctien/bimultialign
+ 10.18653/v1/2022.acl-long.595
End-to-End Modeling via Information Tree for One-Shot Natural Language Spatial Video Grounding
@@ -8627,6 +9222,7 @@ in the Case of Unambiguous Gender
Natural language spatial video grounding aims to detect the relevant objects in video frames with descriptive sentences as the query. In spite of the great advances, most existing methods rely on dense video frame annotations, which require a tremendous amount of human effort. To achieve effective grounding under a limited annotation budget, we investigate one-shot video grounding and learn to ground natural language in all video frames with solely one frame labeled, in an end-to-end manner. One major challenge of end-to-end one-shot video grounding is the existence of videos frames that are either irrelevant to the language query or the labeled frame. Another challenge relates to the limited supervision, which might result in ineffective representation learning. To address these challenges, we designed an end-to-end model via Information Tree for One-Shot video grounding (IT-OS). Its key module, the information tree, can eliminate the interference of irrelevant frames based on branch search and branch cropping techniques. In addition, several self-supervised tasks are proposed based on the information tree to improve the representation learning under insufficient labeling. Experiments on the benchmark dataset demonstrate the effectiveness of our model.
2022.acl-long.596
li-etal-2022-end
+ 10.18653/v1/2022.acl-long.596
RNSum: A Large-Scale Dataset for Automatic Release Note Generation via Commit Logs Summarization
@@ -8639,6 +9235,7 @@ in the Case of Unambiguous Gender
A release note is a technical document that describes the latest changes to a software product and is crucial in open source software development. However, it still remains challenging to generate release notes automatically. In this paper, we present a new dataset called RNSum, which contains approximately 82,000 English release notes and the associated commit messages derived from the online repositories in GitHub. Then, we propose classwise extractive-then-abstractive/abstractive summarization approaches to this task, which can employ a modern transformer-based seq2seq network like BART and can be applied to various repositories without specific constraints. The experimental results on the RNSum dataset show that the proposed methods can generate less noisy release notes at higher coverage than the baselines. We also observe that there is a significant gap in the coverage of essential information when compared to human references. Our dataset and the code are publicly available.
2022.acl-long.597
kamezawa-etal-2022-rnsum
+ 10.18653/v1/2022.acl-long.597
Improving Machine Reading Comprehension with Contextualized Commonsense Knowledge
@@ -8654,6 +9251,7 @@ in the Case of Unambiguous Gender
C3
ConceptNet
DialogRE
+ 10.18653/v1/2022.acl-long.598
Modeling Persuasive Discourse to Adaptively Support Students’ Argumentative Writing
@@ -8665,6 +9263,7 @@ in the Case of Unambiguous Gender
2022.acl-long.599.software.zip
wambsganss-niklaus-2022-modeling
thiemowa/-argumentative_business_model_pitches
+ 10.18653/v1/2022.acl-long.599
Active Evaluation: Efficient NLG Evaluation with Few Pairwise Comparisons
@@ -8680,6 +9279,7 @@ in the Case of Unambiguous Gender
ParaBank
WMT 2015
WMT 2016
+ 10.18653/v1/2022.acl-long.600
The Moral Debater: A Study on the Computational Generation of Morally Framed Arguments
@@ -8693,6 +9293,7 @@ in the Case of Unambiguous Gender
2022.acl-long.601.software.zip
alshomary-etal-2022-moral
webis-de/acl-22
+ 10.18653/v1/2022.acl-long.601
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection
@@ -8708,6 +9309,7 @@ in the Case of Unambiguous Gender
GLUE
LRA
QNLI
+ 10.18653/v1/2022.acl-long.602
Probing for the Usage of Grammatical Number
@@ -8720,6 +9322,7 @@ in the Case of Unambiguous Gender
A central quest of probing is to uncover how pre-trained models encode a linguistic property within their representations. An encoding, however, might be spurious—i.e., the model might not rely on it when making predictions. In this paper, we try to find an encoding that the model actually uses, introducing a usage-based probing setup. We first choose a behavioral task which cannot be solved without using the linguistic property. Then, we attempt to remove the property by intervening on the model’s representations. We contend that, if an encoding is used by the model, its removal should harm the performance on the chosen behavioral task. As a case study, we focus on how BERT encodes grammatical number, and on how it uses this encoding to solve the number agreement task. Experimentally, we find that BERT relies on a linear encoding of grammatical number to produce the correct behavioral output. We also find that BERT uses a separate encoding of grammatical number for nouns and verbs. Finally, we identify in which layers information about grammatical number is transferred from a noun to its head verb.
2022.acl-long.603
lasri-etal-2022-probing
+ 10.18653/v1/2022.acl-long.603
@@ -8755,6 +9358,7 @@ in the Case of Unambiguous Gender
QNLI
SQuAD
SST
+ 10.18653/v1/2022.acl-short.1
Are Shortest Rationales the Best Explanations for Human Understanding?
@@ -8767,6 +9371,7 @@ in the Case of Unambiguous Gender
2022.acl-short.2
shen-etal-2022-shortest
huashen218/limitedink
+ 10.18653/v1/2022.acl-short.2
Analyzing Wrap-Up Effects through an Information-Theoretic Lens
@@ -8779,6 +9384,7 @@ in the Case of Unambiguous Gender
Numerous analyses of reading time (RT) data have been undertaken in the effort to learn more about the internal processes that occur during reading comprehension. However, data measured on words at the end of a sentence–or even clause–is often omitted due to the confounding factors introduced by so-called “wrap-up effects,” which manifests as a skewed distribution of RTs for these words. Consequently, the understanding of the cognitive processes that might be involved in these effects is limited. In this work, we attempt to learn more about these processes by looking for the existence–or absence–of a link between wrap-up effects and information theoretic quantities, such as word and context information content. We find that the information distribution of prior context is often predictive of sentence- and clause-final RTs (while not of sentence-medial RTs), which lends support to several prior hypotheses about the processes involved in wrap-up effects.
2022.acl-short.3
meister-etal-2022-analyzing
+ 10.18653/v1/2022.acl-short.3
Have my arguments been replied to? Argument Pair Extraction as Machine Reading Comprehension
@@ -8791,6 +9397,7 @@ in the Case of Unambiguous Gender
2022.acl-short.4
2022.acl-short.4.software.zip
bao-etal-2022-arguments
+ 10.18653/v1/2022.acl-short.4
On the probability–quality paradox in language generation
@@ -8802,6 +9409,7 @@ in the Case of Unambiguous Gender
When generating natural language from neural probabilistic models, high probability does not always coincide with high quality: It has often been observed that mode-seeking decoding methods, i.e., those that produce high-probability text under the model, lead to unnatural language. On the other hand, the lower-probability text generated by stochastic methods is perceived as more human-like. In this note, we offer an explanation for this phenomenon by analyzing language generation through an information-theoretic lens. Specifically, we posit that human-like language should contain an amount of information (quantified as negative log-probability) that is close to the entropy of the distribution over natural strings. Further, we posit that language with substantially more (or less) information is undesirable. We provide preliminary empirical evidence in favor of this hypothesis; quality ratings of both human and machine-generated text—covering multiple tasks and common decoding strategies—suggest high-quality text has an information content significantly closer to the entropy than we would expect by chance.
2022.acl-short.5
meister-etal-2022-high
+ 10.18653/v1/2022.acl-short.5
Disentangled Knowledge Transfer for OOD Intent Discovery with Unified Contrastive Learning
@@ -8818,6 +9426,7 @@ in the Case of Unambiguous Gender
2022.acl-short.6
mou-etal-2022-disentangled
myt517/dkt
+ 10.18653/v1/2022.acl-short.6
Voxel-informed Language Grounding
@@ -8831,6 +9440,7 @@ in the Case of Unambiguous Gender
corona-etal-2022-voxel
rcorona/voxel_informed_language_grounding
SNARE
+ 10.18653/v1/2022.acl-short.7
P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks
@@ -8848,6 +9458,7 @@ in the Case of Unambiguous Gender
GLUE
SQuAD
SuperGLUE
+ 10.18653/v1/2022.acl-short.8
On Efficiently Acquiring Annotations for Multilingual Models
@@ -8858,6 +9469,7 @@ in the Case of Unambiguous Gender
When tasked with supporting multiple languages for a given problem, two approaches have arisen: training a model for each language with the annotation budget divided equally among them, and training on a high-resource language followed by zero-shot transfer to the remaining languages. In this work, we show that the strategy of joint learning across multiple languages using a single model performs substantially better than the aforementioned alternatives. We also demonstrate that active learning provides additional, complementary benefits. We show that this simple approach enables the model to be data efficient by allowing it to arbitrate its annotation budget to query languages it is less certain on. We illustrate the effectiveness of our proposed method on a diverse set of tasks: a classification task with 4 languages, a sequence tagging task with 4 languages and a dependency parsing task with 5 languages. Our proposed method, whilst simple, substantially outperforms the other viable alternatives for building a model in a multilingual setting under constrained budgets.
2022.acl-short.9
moniz-etal-2022-efficiently
+ 10.18653/v1/2022.acl-short.9
Automatic Detection of Entity-Manipulated Text using Factual Knowledge
@@ -8869,6 +9481,7 @@ in the Case of Unambiguous Gender
2022.acl-short.10
jawahar-etal-2022-automatic
RealNews
+ 10.18653/v1/2022.acl-short.10
Does BERT Know that the IS-A Relation Is Transitive?
@@ -8880,6 +9493,7 @@ in the Case of Unambiguous Gender
2022.acl-short.11.software.zip
lin-ng-2022-bert
nusnlp/probe-bert-transitivity
+ 10.18653/v1/2022.acl-short.11
Buy Tesla, Sell Ford: Assessing Implicit Stock Market Preference in Pre-trained Language Models
@@ -8889,6 +9503,7 @@ in the Case of Unambiguous Gender
Pretrained language models such as BERT have achieved remarkable success in several NLP tasks. With the wide adoption of BERT in real-world applications, researchers begin to investigate the implicit biases encoded in the BERT. In this paper, we assess the implicit stock market preferences in BERT and its finance domain-specific model FinBERT. We find some interesting patterns. For example, the language models are overall more positive towards the stock market, but there are significant differences in preferences between a pair of industry sectors, or even within a sector. Given the prevalence of NLP models in financial decision making systems, this work raises the awareness of their potential implicit preferences in the stock markets. Awareness of such problems can help practitioners improve robustness and accountability of their financial NLP pipelines .
2022.acl-short.12
chuang-yang-2022-buy
+ 10.18653/v1/2022.acl-short.12
Pixie: Preference in Implicit and Explicit Comparisons
@@ -8901,6 +9516,7 @@ in the Case of Unambiguous Gender
2022.acl-short.13
haque-etal-2022-pixie
ahaque2/pixie
+ 10.18653/v1/2022.acl-short.13
Counterfactual Explanations for Natural Language Interfaces
@@ -8914,6 +9530,7 @@ in the Case of Unambiguous Gender
2022.acl-short.14.software.zip
tolkachev-etal-2022-counterfactual
georgeto20/counterfactual_explanations
+ 10.18653/v1/2022.acl-short.14
Predicting Difficulty and Discrimination of Natural Language Questions
@@ -8925,6 +9542,7 @@ in the Case of Unambiguous Gender
2022.acl-short.15.software.zip
byrd-srivastava-2022-predicting
HotpotQA
+ 10.18653/v1/2022.acl-short.15
How does the pre-training objective affect what large language models learn about linguistic properties?
@@ -8935,6 +9553,7 @@ in the Case of Unambiguous Gender
2022.acl-short.16
alajrami-aletras-2022-pre
GLUE
+ 10.18653/v1/2022.acl-short.16
The Power of Prompt Tuning for Low-Resource Semantic Parsing
@@ -8945,6 +9564,7 @@ in the Case of Unambiguous Gender
Prompt tuning has recently emerged as an effective method for adapting pre-trained language models to a number of language understanding and generation tasks. In this paper, we investigate prompt tuning for semantic parsing—the task of mapping natural language utterances onto formal meaning representations. On the low-resource splits of Overnight and TOPv2, we find that a prompt tuned T5-xl significantly outperforms its fine-tuned counterpart, as well as strong GPT-3 and BART baselines. We also conduct ablation studies across different model scales and target representations, finding that, with increasing model scale, prompt tuned T5 models improve at generating target representations that are far from the pre-training distribution.
2022.acl-short.17
schucher-etal-2022-power
+ 10.18653/v1/2022.acl-short.17
Data Contamination: From Memorization to Exploitation
@@ -8956,6 +9576,7 @@ in the Case of Unambiguous Gender
magar-schwartz-2022-data
schwartz-lab-nlp/data_contamination
SST
+ 10.18653/v1/2022.acl-short.18
Detecting Annotation Errors in Morphological Data with the Transformer
@@ -8965,6 +9586,7 @@ in the Case of Unambiguous Gender
Annotation errors that stem from various sources are usually unavoidable when performing large-scale annotation of linguistic data. In this paper, we evaluate the feasibility of using the Transformer model to detect various types of annotator errors in morphological data sets that contain inflected word forms. We evaluate our error detection model on four languages by introducing three different types of artificial errors in the data: (1) typographic errors, where single characters in the data are inserted, replaced, or deleted; (2) linguistic confusion errors where two inflected forms are systematically swapped; and (3) self-adversarial errors where the Transformer model itself is used to generate plausible-looking, but erroneous forms by retrieving high-scoring predictions from the search beam. Results show that the Transformer model can with perfect, or near-perfect recall detect errors in all three scenarios, even when significant amounts of the annotated data (5%-30%) are corrupted on all languages tested. Precision varies across the languages and types of errors, but is high enough that the model can be very effectively used to flag suspicious entries in large data sets for further scrutiny by human annotators.
2022.acl-short.19
liu-hulden-2022-detecting
+ 10.18653/v1/2022.acl-short.19
Estimating the Entropy of Linguistic Distributions
@@ -8976,6 +9598,7 @@ in the Case of Unambiguous Gender
2022.acl-short.20
2022.acl-short.20.software.zip
arora-etal-2022-estimating
+ 10.18653/v1/2022.acl-short.20
Morphological Reinflection with Multiple Arguments: An Extended Annotation schema and a Georgian Case Study
@@ -8986,6 +9609,7 @@ in the Case of Unambiguous Gender
In recent years, a flurry of morphological datasets had emerged, most notably UniMorph, aa multi-lingual repository of inflection tables. However, the flat structure of the current morphological annotation makes the treatment of some languages quirky, if not impossible, specifically in cases of polypersonal agreement. In this paper we propose a general solution for such cases and expand the UniMorph annotation schema to naturally address this phenomenon, in which verbs agree with multiple arguments using true affixes. We apply this extended schema to one such language, Georgian, and provide a human-verified, accurate and balanced morphological dataset for Georgian verbs. The dataset has 4 times more tables and 6 times more verb forms compared to the existing UniMorph dataset, covering all possible variants of argument marking, demonstrating the adequacy of our proposed scheme. Experiments on a reinflection task show that generalization is easy when the data is split at the form level, but extremely hard when splitting along lemma lines. Expanding the other languages in UniMorph according to this schema is expected to improve both the coverage, consistency and interpretability of this benchmark.
2022.acl-short.21
guriel-etal-2022-morphological
+ 10.18653/v1/2022.acl-short.21
DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization
@@ -9004,6 +9628,7 @@ in the Case of Unambiguous Gender
li-etal-2022-dq
CNN/Daily Mail
ELI5
+ 10.18653/v1/2022.acl-short.22
Learning-by-Narrating: Narrative Pre-Training for Zero-Shot Dialogue Comprehension
@@ -9021,6 +9646,7 @@ in the Case of Unambiguous Gender
CRD3
DREAM
MovieNet
+ 10.18653/v1/2022.acl-short.23
Kronecker Decomposition for GPT Compression
@@ -9040,6 +9666,7 @@ in the Case of Unambiguous Gender
WebText
WikiText-103
WikiText-2
+ 10.18653/v1/2022.acl-short.24
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction
@@ -9051,6 +9678,7 @@ in the Case of Unambiguous Gender
A key challenge in attribute value extraction (AVE) from e-commerce sites is how to handle a large number of attributes for diverse products. Although this challenge is partially addressed by a question answering (QA) approach which finds a value in product data for a given query (attribute), it does not work effectively for rare and ambiguous queries. We thus propose simple knowledge-driven query expansion based on possible answers (values) of a query (attribute) for QA-based AVE. We retrieve values of a query (attribute) from the training data to expand the query. We train a model with two tricks, knowledge dropout and knowledge token mixing, which mimic the imperfection of the value knowledge in testing. Experimental results on our cleaned version of AliExpress dataset show that our method improves the performance of AVE (+6.08 macro F1), especially for rare and ambiguous attributes (+7.82 and +6.86 macro F1, respectively).
2022.acl-short.25
shinzato-etal-2022-simple
+ 10.18653/v1/2022.acl-short.25
Event-Event Relation Extraction using Probabilistic Box Embedding
@@ -9064,6 +9692,7 @@ in the Case of Unambiguous Gender
To understand a story with multiple events, it is important to capture the proper relations across these events. However, existing event relation extraction (ERE) framework regards it as a multi-class classification task and do not guarantee any coherence between different relation types, such as anti-symmetry. If a phone line “died” after “storm”, then it is obvious that the “storm” happened before the “died”. Current framework of event relation extraction do not guarantee this coherence and thus enforces it via constraint loss function (Wang et al., 2020). In this work, we propose to modify the underlying ERE model to guarantee coherence by representing each event as a box representation (BERE) without applying explicit constraints. From our experiments, BERE also shows stronger conjunctive constraint satisfaction while performing on par or better in F1 compared to previous models with constraint injection.
2022.acl-short.26
hwang-etal-2022-event
+ 10.18653/v1/2022.acl-short.26
Sample, Translate, Recombine: Leveraging Audio Alignments for Data Augmentation in End-to-end Speech Translation
@@ -9076,6 +9705,7 @@ in the Case of Unambiguous Gender
2022.acl-short.27.software.tgz
lam-etal-2022-sample
Europarl-ST
+ 10.18653/v1/2022.acl-short.27
Predicting Sentence Deletions for Text Simplification Using a Functional Discourse Structure
@@ -9087,6 +9717,7 @@ in the Case of Unambiguous Gender
2022.acl-short.28
zhang-etal-2022-predicting
Newsela
+ 10.18653/v1/2022.acl-short.28
Multilingual Pre-training with Language and Task Adaptation for Multilingual Text Style Transfer
@@ -9100,6 +9731,7 @@ in the Case of Unambiguous Gender
laihuiyuan/multilingual-tst
GYAFC
XFORMAL
+ 10.18653/v1/2022.acl-short.29
When to Use Multi-Task Learning vs Intermediate Fine-Tuning for Pre-Trained Encoder Transfer Learning
@@ -9110,6 +9742,7 @@ in the Case of Unambiguous Gender
Transfer learning (TL) in natural language processing (NLP) has seen a surge of interest in recent years, as pre-trained models have shown an impressive ability to transfer to novel tasks. Three main strategies have emerged for making use of multiple supervised datasets during fine-tuning: training on an intermediate task before training on the target task (STILTs), using multi-task learning (MTL) to train jointly on a supplementary task and the target task (pairwise MTL), or simply using MTL to train jointly on all available datasets (MTL-ALL). In this work, we compare all three TL methods in a comprehensive analysis on the GLUE dataset suite. We find that there is a simple heuristic for when to use one of these techniques over the other: pairwise MTL is better than STILTs when the target task has fewer instances than the supporting task and vice versa. We show that this holds true in more than 92% of applicable cases on the GLUE dataset and validate this hypothesis with experiments varying dataset size. The simplicity and effectiveness of this heuristic is surprising and warrants additional exploration by the TL community. Furthermore, we find that MTL-ALL is worse than the pairwise methods in almost every case. We hope this study will aid others as they choose between TL methods for NLP tasks.
2022.acl-short.30
weller-etal-2022-use
+ 10.18653/v1/2022.acl-short.30
Leveraging Explicit Lexico-logical Alignments in Text-to-SQL Parsing
@@ -9125,6 +9758,7 @@ in the Case of Unambiguous Gender
2022.acl-short.31
2022.acl-short.31.software.zip
sun-etal-2022-leveraging
+ 10.18653/v1/2022.acl-short.31
Complex Evolutional Pattern Learning for Temporal Knowledge Graph Reasoning
@@ -9144,6 +9778,7 @@ in the Case of Unambiguous Gender
li-etal-2022-complex
lee-zix/cen
ICEWS
+ 10.18653/v1/2022.acl-short.32
Mismatch between Multi-turn Dialogue and its Evaluation Metric in Dialogue State Tracking
@@ -9157,6 +9792,7 @@ in the Case of Unambiguous Gender
2022.acl-short.33
kim-etal-2022-mismatch
MultiWOZ
+ 10.18653/v1/2022.acl-short.33
LM-BFF-MS: Improving Few-Shot Fine-tuning of Language Models based on Multiple Soft Demonstration Memory
@@ -9175,6 +9811,7 @@ in the Case of Unambiguous Gender
MRPC
SNLI
SST
+ 10.18653/v1/2022.acl-short.34
Towards Fair Evaluation of Dialogue State Tracking by Flexible Incorporation of Turn-level Performances
@@ -9188,6 +9825,7 @@ in the Case of Unambiguous Gender
dey-etal-2022-towards
suvodipdey/fga
MultiWOZ
+ 10.18653/v1/2022.acl-short.35
Exploiting Language Model Prompts Using Similarity Measures: A Case Study on the Word-in-Context Task
@@ -9202,6 +9840,7 @@ in the Case of Unambiguous Gender
SST
SuperGLUE
WiC
+ 10.18653/v1/2022.acl-short.36
Hierarchical Curriculum Learning for AMR Parsing
@@ -9219,6 +9858,7 @@ in the Case of Unambiguous Gender
wang-etal-2022-hierarchical
wangpeiyi9979/hcl-text2amr
Bio
+ 10.18653/v1/2022.acl-short.37
PARE: A Simple and Strong Baseline for Monolingual and Multilingual Distantly Supervised Relation Extraction
@@ -9233,6 +9873,7 @@ in the Case of Unambiguous Gender
rathore-etal-2022-pare
dair-iitd/dsre
DiS-ReX
+ 10.18653/v1/2022.acl-short.38
To Find Waldo You Need Contextual Cues: Debiasing Who’s Waldo
@@ -9249,6 +9890,7 @@ in the Case of Unambiguous Gender
COCO
Visual Genome
Who’s Waldo
+ 10.18653/v1/2022.acl-short.39
Translate-Train Embracing Translationese Artifacts
@@ -9261,6 +9903,7 @@ in the Case of Unambiguous Gender
2022.acl-short.40
yu-etal-2022-translate
TyDi QA
+ 10.18653/v1/2022.acl-short.40
C-MORE: Pretraining to Answer Open-Domain Questions by Consulting Millions of References
@@ -9277,6 +9920,7 @@ in the Case of Unambiguous Gender
xiangyue9607/c-more
Natural Questions
TriviaQA
+ 10.18653/v1/2022.acl-short.41
k-Rater Reliability: The Correct Unit of Reliability for Aggregated Human Annotations
@@ -9286,6 +9930,7 @@ in the Case of Unambiguous Gender
Since the inception of crowdsourcing, aggregation has been a common strategy for dealing with unreliable data. Aggregate ratings are more reliable than individual ones. However, many Natural Language Processing (NLP) applications that rely on aggregate ratings only report the reliability of individual ratings, which is the incorrect unit of analysis. In these instances, the data reliability is under-reported, and a proposed k-rater reliability (kRR) should be used as the correct data reliability for aggregated datasets. It is a multi-rater generalization of inter-rater reliability (IRR). We conducted two replications of the WordSim-353 benchmark, and present empirical, analytical, and bootstrap-based methods for computing kRR on WordSim-353. These methods produce very similar results. We hope this discussion will nudge researchers to report kRR in addition to IRR.
2022.acl-short.42
wong-paritosh-2022-k
+ 10.18653/v1/2022.acl-short.42
An Embarrassingly Simple Method to Mitigate Undesirable Properties of Pretrained Language Model Tokenizers
@@ -9298,6 +9943,7 @@ in the Case of Unambiguous Gender
2022.acl-short.43.software.zip
hofmann-etal-2022-embarrassingly
valentinhofmann/flota
+ 10.18653/v1/2022.acl-short.43
SCD: Self-Contrastive Decorrelation of Sentence Embeddings
@@ -9312,6 +9958,7 @@ in the Case of Unambiguous Gender
MRPC
SST
SentEval
+ 10.18653/v1/2022.acl-short.44
Problems with Cosine as a Measure of Embedding Similarity for High Frequency Words
@@ -9325,6 +9972,7 @@ in the Case of Unambiguous Gender
zhou-etal-2022-problems
katezhou/cosine_and_frequency
WiC
+ 10.18653/v1/2022.acl-short.45
Revisiting the Compositional Generalization Abilities of Neural Sequence Models
@@ -9339,6 +9987,7 @@ in the Case of Unambiguous Gender
patel-etal-2022-revisiting
arkilpatel/compositional-generalization-seq2seq
SCAN
+ 10.18653/v1/2022.acl-short.46
A Copy-Augmented Generative Model for Open-Domain Question Answering
@@ -9353,6 +10002,7 @@ in the Case of Unambiguous Gender
liu-etal-2022-copy
Natural Questions
TriviaQA
+ 10.18653/v1/2022.acl-short.47
Augmenting Document Representations for Dense Retrieval with Interpolation and Perturbation
@@ -9368,6 +10018,7 @@ in the Case of Unambiguous Gender
starsuzi/dar
Natural Questions
TriviaQA
+ 10.18653/v1/2022.acl-short.48
WLASL-LEX: a Dataset for Recognising Phonological Properties in American Sign Language
@@ -9382,6 +10033,7 @@ in the Case of Unambiguous Gender
2022.acl-short.49.software.zip
tavella-etal-2022-wlasl
WLASL
+ 10.18653/v1/2022.acl-short.49
Investigating person-specific errors in chat-oriented dialogue systems
@@ -9393,6 +10045,7 @@ in the Case of Unambiguous Gender
Creating chatbots to behave like real people is important in terms of believability. Errors in general chatbots and chatbots that follow a rough persona have been studied, but those in chatbots that behave like real people have not been thoroughly investigated. We collected a large amount of user interactions of a generation-based chatbot trained from large-scale dialogue data of a specific character, i.e., target person, and analyzed errors related to that person. We found that person-specific errors can be divided into two types: errors in attributes and those in relations, each of which can be divided into two levels: self and other. The correspondence with an existing taxonomy of errors was also investigated, and person-specific errors that should be addressed in the future were clarified.
2022.acl-short.50
mitsuda-etal-2022-investigating
+ 10.18653/v1/2022.acl-short.50
Direct parsing to sentiment graphs
@@ -9409,6 +10062,7 @@ in the Case of Unambiguous Gender
samuel-etal-2022-direct
jerbarnes/direct_parsing_to_sent_graph
MPQA Opinion Corpus
+ 10.18653/v1/2022.acl-short.51
XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding
@@ -9422,6 +10076,7 @@ in the Case of Unambiguous Gender
hsu-etal-2022-xdbert
GLUE
SWAG
+ 10.18653/v1/2022.acl-short.52
As Little as Possible, as Much as Necessary: Detecting Over- and Undertranslations with Contrastive Conditioning
@@ -9433,6 +10088,7 @@ in the Case of Unambiguous Gender
2022.acl-short.53.software.zip
vamvas-sennrich-2022-little
zurichnlp/coverage-contrastive-conditioning
+ 10.18653/v1/2022.acl-short.53
How Distributed are Distributed Representations? An Observation on the Locality of Syntactic Information in Verb Agreement Tasks
@@ -9443,6 +10099,7 @@ in the Case of Unambiguous Gender
This work addresses the question of the localization of syntactic information encoded in the transformers representations. We tackle this question from two perspectives, considering the object-past participle agreement in French, by identifying, first, in which part of the sentence and, second, in which part of the representation the syntactic information is encoded. The results of our experiments, using probing, causal analysis and feature selection method, show that syntactic information is encoded locally in a way consistent with the French grammar.
2022.acl-short.54
li-etal-2022-distributed
+ 10.18653/v1/2022.acl-short.54
Machine Translation for Livonian: Catering to 20 Speakers
@@ -9455,6 +10112,7 @@ in the Case of Unambiguous Gender
Livonian is one of the most endangered languages in Europe with just a tiny handful of speakers and virtually no publicly available corpora. In this paper we tackle the task of developing neural machine translation (NMT) between Livonian and English, with a two-fold aim: on one hand, preserving the language and on the other – enabling access to Livonian folklore, lifestories and other textual intangible heritage as well as making it easier to create further parallel corpora. We rely on Livonian’s linguistic similarity to Estonian and Latvian and collect parallel and monolingual data for the four languages for translation experiments. We combine different low-resource NMT techniques like zero-shot translation, cross-lingual transfer and synthetic data creation to reach the highest possible translation quality as well as to find which base languages are empirically more helpful for transfer to Livonian. The resulting NMT systems and the collected monolingual and parallel data, including a manually translated and verified translation benchmark, are publicly released via OPUS and Huggingface repositories.
2022.acl-short.55
rikters-etal-2022-machine
+ 10.18653/v1/2022.acl-short.55
Fire Burns, Sword Cuts: Commonsense Inductive Bias for Exploration in Text-based Games
@@ -9470,6 +10128,7 @@ in the Case of Unambiguous Gender
ryu-etal-2022-fire
ktr0921/comm-expl-kg-a2c
Jericho
+ 10.18653/v1/2022.acl-short.56
A Simple but Effective Pluggable Entity Lookup Table for Pre-trained Language Models
@@ -9489,6 +10148,7 @@ in the Case of Unambiguous Gender
LAMA
S2ORC
T-REx
+ 10.18653/v1/2022.acl-short.57
S^4-Tuning: A Simple Cross-lingual Sub-network Tuning Method
@@ -9503,6 +10163,7 @@ in the Case of Unambiguous Gender
xu-etal-2022-s4
PAWS-X
XNLI
+ 10.18653/v1/2022.acl-short.58
Region-dependent temperature scaling for certainty calibration and application to class-imbalanced token classification
@@ -9513,6 +10174,7 @@ in the Case of Unambiguous Gender
2022.acl-short.59
dawkins-nejadgholi-2022-region
Few-NERD
+ 10.18653/v1/2022.acl-short.59
Developmental Negation Processing in Transformer Language Models
@@ -9524,6 +10186,7 @@ in the Case of Unambiguous Gender
2022.acl-short.60.software.zip
laverghetta-jr-licato-2022-developmental
advancing-machine-human-reasoning-lab/negation-processing-acl-2022
+ 10.18653/v1/2022.acl-short.60
Canary Extraction in Natural Language Understanding Models
@@ -9535,6 +10198,7 @@ in the Case of Unambiguous Gender
2022.acl-short.61
parikh-etal-2022-canary
SNIPS
+ 10.18653/v1/2022.acl-short.61
On the Intrinsic and Extrinsic Fairness Evaluation Metrics for Contextualized Language Representations
@@ -9550,6 +10214,7 @@ in the Case of Unambiguous Gender
2022.acl-short.62
cao-etal-2022-intrinsic
StereoSet
+ 10.18653/v1/2022.acl-short.62
Sequence-to-sequence AMR Parsing with Ancestor Information
@@ -9559,6 +10224,7 @@ in the Case of Unambiguous Gender
AMR parsing is the task that maps a sentence to an AMR semantic graph automatically. The difficulty comes from generating the complex graph structure. The previous state-of-the-art method translates the AMR graph into a sequence, then directly fine-tunes a pretrained sequence-to-sequence Transformer model (BART). However, purely treating the graph as a sequence does not take advantage of structural information about the graph. In this paper, we design several strategies to add the important ancestor information into the Transformer Decoder. Our experiments show that we can improve the performance for both AMR 2.0 and AMR 3.0 dataset and achieve new state-of-the-art results.
2022.acl-short.63
yu-gildea-2022-sequence
+ 10.18653/v1/2022.acl-short.63
Zero-Shot Dependency Parsing with Worst-Case Aware Automated Curriculum Learning
@@ -9570,6 +10236,7 @@ in the Case of Unambiguous Gender
2022.acl-short.64
de-lhoneux-etal-2022-zero
mdelhoneux/machamp-worst_case_acl
+ 10.18653/v1/2022.acl-short.64
PriMock57: A Dataset Of Primary Care Mock Consultations
@@ -9581,6 +10248,7 @@ in the Case of Unambiguous Gender
Recent advances in Automatic Speech Recognition (ASR) have made it possible to reliably produce automatic transcripts of clinician-patient conversations. However, access to clinical datasets is heavily restricted due to patient privacy, thus slowing down normal research practices. We detail the development of a public access, high quality dataset comprising of 57 mocked primary care consultations, including audio recordings, their manual utterance-level transcriptions, and the associated consultation notes. Our work illustrates how the dataset can be used as a benchmark for conversational medical ASR as well as consultation note generation from transcripts.
2022.acl-short.65
papadopoulos-korfiatis-etal-2022-primock57
+ 10.18653/v1/2022.acl-short.65
UniGDD: A Unified Generative Framework for Goal-Oriented Document-Grounded Dialogue
@@ -9593,6 +10261,7 @@ in the Case of Unambiguous Gender
gao-etal-2022-unigdd
gao-xiao-bai/UniGDD
Doc2Dial
+ 10.18653/v1/2022.acl-short.66
DMix: Adaptive Distance-aware Interpolative Mixup
@@ -9611,6 +10280,7 @@ in the Case of Unambiguous Gender
CoLA
GLUE
SST
+ 10.18653/v1/2022.acl-short.67
Sub-Word Alignment is Still Useful: A Vest-Pocket Method for Enhancing Low-Resource Machine Translation
@@ -9622,6 +10292,7 @@ in the Case of Unambiguous Gender
2022.acl-short.68.software.zip
xu-hong-2022-sub
Cosmos-Break/transfer-mt-submit
+ 10.18653/v1/2022.acl-short.68
HYPHEN: Hyperbolic Hawkes Attention For Text Streams
@@ -9636,6 +10307,7 @@ in the Case of Unambiguous Gender
2022.acl-short.69.software.zip
agarwal-etal-2022-hyphen
gtfintechlab/hyphen-acl
+ 10.18653/v1/2022.acl-short.69
A Risk-Averse Mechanism for Suicidality Assessment on Social Media
@@ -9646,6 +10318,7 @@ in the Case of Unambiguous Gender
Recent studies have shown that social media has increasingly become a platform for users to express suicidal thoughts outside traditional clinical settings. With advances in Natural Language Processing strategies, it is now possible to design automated systems to assess suicide risk. However, such systems may generate uncertain predictions, leading to severe consequences. We hence reformulate suicide risk assessment as a selective prioritized prediction problem over the Columbia Suicide Severity Risk Scale (C-SSRS). We propose SASI, a risk-averse and self-aware transformer-based hierarchical attention classifier, augmented to refrain from making uncertain predictions. We show that SASI is able to refrain from 83% of incorrect predictions on real-world Reddit data. Furthermore, we discuss the qualitative, practical, and ethical aspects of SASI for suicide risk assessment as a human-in-the-loop framework.
2022.acl-short.70
sawhney-etal-2022-risk
+ 10.18653/v1/2022.acl-short.70
When classifying grammatical role, BERT doesn’t care about word order... except when it matters
@@ -9657,6 +10330,7 @@ in the Case of Unambiguous Gender
2022.acl-short.71
2022.acl-short.71.software.tgz
papadimitriou-etal-2022-classifying-grammatical
+ 10.18653/v1/2022.acl-short.71
Triangular Transfer: Freezing the Pivot for Triangular Machine Translation
@@ -9667,6 +10341,7 @@ in the Case of Unambiguous Gender
Triangular machine translation is a special case of low-resource machine translation where the language pair of interest has limited parallel data, but both languages have abundant parallel data with a pivot language. Naturally, the key to triangular machine translation is the successful exploitation of such auxiliary data. In this work, we propose a transfer-learning-based approach that utilizes all types of auxiliary data. As we train auxiliary source-pivot and pivot-target translation models, we initialize some parameters of the pivot side with a pre-trained language model and freeze them to encourage both translation models to work in the same pivot language space, so that they can be smoothly transferred to the source-target translation model. Experiments show that our approach can outperform previous ones.
2022.acl-short.72
zhang-etal-2022-triangular
+ 10.18653/v1/2022.acl-short.72
Can Visual Dialogue Models Do Scorekeeping? Exploring How Dialogue Representations Incrementally Encode Shared Knowledge
@@ -9678,6 +10353,7 @@ in the Case of Unambiguous Gender
madureira-schlangen-2022-visual
COCO
VisDial
+ 10.18653/v1/2022.acl-short.73
Focus on the Target’s Vocabulary: Masked Label Smoothing for Machine Translation
@@ -9690,6 +10366,7 @@ in the Case of Unambiguous Gender
2022.acl-short.74.software.zip
chen-etal-2022-focus
chenllliang/MLS
+ 10.18653/v1/2022.acl-short.74
Contrastive Learning-Enhanced Nearest Neighbor Mechanism for Multi-Label Text Classification
@@ -9701,6 +10378,7 @@ in the Case of Unambiguous Gender
2022.acl-short.75
su-etal-2022-contrastive
RCV1
+ 10.18653/v1/2022.acl-short.75
NoisyTune: A Little Noise Can Help You Finetune Pretrained Language Models Better
@@ -9714,6 +10392,7 @@ in the Case of Unambiguous Gender
wu-etal-2022-noisytune
GLUE
XTREME
+ 10.18653/v1/2022.acl-short.76
Adjusting the Precision-Recall Trade-Off with Align-and-Predict Decoding for Grammatical Error Correction
@@ -9724,6 +10403,7 @@ in the Case of Unambiguous Gender
2022.acl-short.77
sun-wang-2022-adjusting
autotemp/align-and-predict
+ 10.18653/v1/2022.acl-short.77
On the Effect of Isotropy on VAE Representations of Text
@@ -9736,6 +10416,7 @@ in the Case of Unambiguous Gender
2022.acl-short.78.software.zip
zhang-etal-2022-effect
lanzhang128/IGPVAE
+ 10.18653/v1/2022.acl-short.78
Efficient Classification of Long Documents Using Transformers
@@ -9747,6 +10428,7 @@ in the Case of Unambiguous Gender
2022.acl-short.79
park-etal-2022-efficient
EURLEX57K
+ 10.18653/v1/2022.acl-short.79
Rewarding Semantic Similarity under Optimized Alignments for AMR-to-Text Generation
@@ -9756,6 +10438,7 @@ in the Case of Unambiguous Gender
A common way to combat exposure bias is by applying scores from evaluation metrics as rewards in reinforcement learning (RL). Metrics leveraging contextualized embeddings appear more flexible than their n-gram matching counterparts and thus ideal as training rewards. However, metrics such as BERTScore greedily align candidate and reference tokens, which can allow system outputs to receive excess credit relative to a reference. Furthermore, past approaches featuring semantic similarity rewards suffer from repetitive outputs and overfitting. We address these issues by proposing metrics that replace the greedy alignments in BERTScore with optimized ones. We compute them on a model’s trained token embeddings to prevent domain mismatch. Our model optimizing discrete alignment metrics consistently outperforms cross-entropy and BLEU reward baselines on AMR-to-text generation. In addition, we find that this approach enjoys stable training compared to a non-RL setting.
2022.acl-short.80
jin-gildea-2022-rewarding
+ 10.18653/v1/2022.acl-short.80
An Analysis of Negation in Natural Language Understanding Corpora
@@ -9777,6 +10460,7 @@ in the Case of Unambiguous Gender
SuperGLUE
WSC
WiC
+ 10.18653/v1/2022.acl-short.81
Primum Non Nocere: Before working with Indigenous data, the ACL must confront ongoing colonialism
@@ -9785,6 +10469,7 @@ in the Case of Unambiguous Gender
In this paper, we challenge the ACL community to reckon with historical and ongoing colonialism by adopting a set of ethical obligations and best practices drawn from the Indigenous studies literature. While the vast majority of NLP research focuses on a very small number of very high resource languages (English, Chinese, etc), some work has begun to engage with Indigenous languages. No research involving Indigenous language data can be considered ethical without first acknowledging that Indigenous languages are not merely very low resource languages. The toxic legacy of colonialism permeates every aspect of interaction between Indigenous communities and outside researchers. To this end, we propose that the ACL draft and adopt an ethical framework for NLP researchers and computational linguists wishing to engage in research involving Indigenous languages.
2022.acl-short.82
schwartz-2022-primum
+ 10.18653/v1/2022.acl-short.82
Unsupervised multiple-choice question generation for out-of-domain Q&A fine-tuning
@@ -9801,6 +10486,7 @@ in the Case of Unambiguous Gender
QASC
SQuAD
SciQ
+ 10.18653/v1/2022.acl-short.83
Can a Transformer Pass the Wug Test? Tuning Copying Bias in Neural Morphological Inflection Models
@@ -9811,6 +10497,7 @@ in the Case of Unambiguous Gender
2022.acl-short.84
2022.acl-short.84.software.zip
liu-hulden-2022-transformer
+ 10.18653/v1/2022.acl-short.84
Probing the Robustness of Trained Metrics for Conversational Dialogue Systems
@@ -9826,6 +10513,7 @@ in the Case of Unambiguous Gender
jderiu/metric-robustness
DailyDialog
PERSONA-CHAT
+ 10.18653/v1/2022.acl-short.85
Rethinking and Refining the Distinct Metric
@@ -9840,6 +10528,7 @@ in the Case of Unambiguous Gender
2022.acl-short.86
liu-etal-2022-rethinking
DailyDialog
+ 10.18653/v1/2022.acl-short.86
How reparametrization trick broke differentially-private text representation learning
@@ -9850,6 +10539,7 @@ in the Case of Unambiguous Gender
2022.acl-short.87.software.zip
habernal-2022-reparametrization
trusthlt/acl2022-reparametrization-trick-broke-differential-privacy
+ 10.18653/v1/2022.acl-short.87
Towards Consistent Document-level Entity Linking: Joint Models for Entity Linking and Coreference Resolution
@@ -9864,6 +10554,7 @@ in the Case of Unambiguous Gender
zaporojets-etal-2022-towards
klimzaporojets/consistent-el
DWIE
+ 10.18653/v1/2022.acl-short.88
A Flexible Multi-Task Model for BERT Serving
@@ -9879,6 +10570,7 @@ in the Case of Unambiguous Gender
MRPC
QNLI
SST
+ 10.18653/v1/2022.acl-short.89
Understanding Game-Playing Agents with Natural Language Annotations
@@ -9891,6 +10583,7 @@ in the Case of Unambiguous Gender
2022.acl-short.90.software.zip
tomlin-etal-2022-understanding
andrehe02/go-probe
+ 10.18653/v1/2022.acl-short.90
Code Synonyms Do Matter: Multiple Synonyms Matching Network for Automatic ICD Coding
@@ -9904,6 +10597,7 @@ in the Case of Unambiguous Gender
yuan-etal-2022-code
ganjinzero/icd-msmn
MIMIC-III
+ 10.18653/v1/2022.acl-short.91
CoDA21: Evaluating Language Understanding Capabilities of NLP Models With Context-Definition Alignment
@@ -9915,6 +10609,7 @@ in the Case of Unambiguous Gender
2022.acl-short.92
senel-etal-2022-coda21
lksenel/coda21
+ 10.18653/v1/2022.acl-short.92
On the Importance of Effectively Adapting Pretrained Language Models for Active Learning
@@ -9929,6 +10624,7 @@ in the Case of Unambiguous Gender
AG News
IMDb Movie Reviews
SST
+ 10.18653/v1/2022.acl-short.93
A Recipe for Arbitrary Text Style Transfer with Large Language Models
@@ -9943,6 +10639,7 @@ in the Case of Unambiguous Gender
2022.acl-short.94
2022.acl-short.94.software.zip
reif-etal-2022-recipe
+ 10.18653/v1/2022.acl-short.94
DiS-ReX: A Multilingual Dataset for Distantly Supervised Relation Extraction
@@ -9957,6 +10654,7 @@ in the Case of Unambiguous Gender
dair-iitd/DiS-ReX
DiS-ReX
RELX
+ 10.18653/v1/2022.acl-short.95
(Un)solving Morphological Inflection: Lemma Overlap Artificially Inflates Models’ Performance
@@ -9967,6 +10665,7 @@ in the Case of Unambiguous Gender
In the domain of Morphology, Inflection is a fundamental and important task that gained a lot of traction in recent years, mostly via SIGMORPHON’s shared-tasks.With average accuracy above 0.9 over the scores of all languages, the task is considered mostly solved using relatively generic neural seq2seq models, even with little data provided.In this work, we propose to re-evaluate morphological inflection models by employing harder train-test splits that will challenge the generalization capacity of the models. In particular, as opposed to the naïve split-by-form, we propose a split-by-lemma method to challenge the performance on existing benchmarks.Our experiments with the three top-ranked systems on the SIGMORPHON’s 2020 shared-task show that the lemma-split presents an average drop of 30 percentage points in macro-average for the 90 languages included. The effect is most significant for low-resourced languages with a drop as high as 95 points, but even high-resourced languages lose about 10 points on average. Our results clearly show that generalizing inflection to unseen lemmas is far from being solved, presenting a simple yet effective means to promote more sophisticated models.
2022.acl-short.96
goldman-etal-2022-un
+ 10.18653/v1/2022.acl-short.96
Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks
@@ -9981,6 +10680,7 @@ in the Case of Unambiguous Gender
wu-etal-2022-text
SNIPS
SST
+ 10.18653/v1/2022.acl-short.97
@@ -10006,6 +10706,7 @@ in the Case of Unambiguous Gender
This work presents two experiments with the goal of replicating the transferability of dependency parsers and POS taggers trained on closely related languages within the low-resource language family Tupían. The experiments include both zero-shot settings as well as multilingual models. Previous studies have found that even a comparably small treebank from a closely related language will improve sequence labelling considerably in such cases. Results from both POS tagging and dependency parsing confirm previous evidence that the closer the phylogenetic relation between two languages, the better the predictions for sequence labelling tasks get. In many cases, the results are improved if multiple languages from the same family are combined. This suggests that in addition to leveraging similarity between two related languages, the incorporation of multiple languages of the same family might lead to better results in transfer learning for NLP applications.
2022.acl-srw.1
blum-2022-evaluating
+ 10.18653/v1/2022.acl-srw.1
RFBFN: A Relation-First Blank Filling Network for Joint Relational Triple Extraction
@@ -10019,6 +10720,7 @@ in the Case of Unambiguous Gender
2022.acl-srw.2
li-etal-2022-rfbfn
lizhe2016/rfbfn
+ 10.18653/v1/2022.acl-srw.2
Building a Dialogue Corpus Annotated with Expressed and Experienced Emotions
@@ -10033,6 +10735,7 @@ in the Case of Unambiguous Gender
EmoBank
EmotionLines
Story Commonsense
+ 10.18653/v1/2022.acl-srw.3
Darkness can not drive out darkness: Investigating Bias in Hate SpeechDetection Models
@@ -10041,6 +10744,7 @@ in the Case of Unambiguous Gender
It has become crucial to develop tools for automated hate speech and abuse detection. These tools would help to stop the bullies and the haters and provide a safer environment for individuals especially from marginalized groups to freely express themselves. However, recent research shows that machine learning models are biased and they might make the right decisions for the wrong reasons. In this thesis, I set out to understand the performance of hate speech and abuse detection models and the different biases that could influence them. I show that hate speech and abuse detection models are not only subject to social bias but also to other types of bias that have not been explored before. Finally, I investigate the causal effect of the social and intersectional bias on the performance and unfairness of hate speech detection models.
2022.acl-srw.4
elsafoury-2022-darkness
+ 10.18653/v1/2022.acl-srw.4
Ethical Considerations for Low-resourced Machine Translation
@@ -10049,6 +10753,7 @@ in the Case of Unambiguous Gender
This paper considers some ethical implications of machine translation for low-resourced languages. I use Armenian as a case study and investigate specific needs for and concerns arising from the creation and deployment of improved machine translation between English and Armenian. To do this, I conduct stakeholder interviews and construct Value Scenarios (Nathan et al., 2007) from the themes that emerge. These scenarios illustrate some of the potential harms that low-resourced language communities may face due to the deployment of improved machine translation systems. Based on these scenarios, I recommend 1) collaborating with stakeholders in order to create more useful and reliable machine translation tools, and 2) determining which other forms of language technology should be developed alongside efforts to improve machine translation in order to mitigate harms rendered to vulnerable language communities. Both of these goals require treating low-resourced machine translation as a language-specific, rather than language-agnostic, task.
2022.acl-srw.5
haroutunian-2022-ethical
+ 10.18653/v1/2022.acl-srw.5
Integrating Question Rewrites in Conversational Question Answering: A Reinforcement Learning Approach
@@ -10065,6 +10770,7 @@ in the Case of Unambiguous Gender
CoQA
QReCC
QuAC
+ 10.18653/v1/2022.acl-srw.6
What Do You Mean by Relation Extraction? A Survey on Datasets and Study on Scientific Relation Classification
@@ -10079,6 +10785,7 @@ in the Case of Unambiguous Gender
DocRED
FewRel
FewRel 2.0
+ 10.18653/v1/2022.acl-srw.7
Logical Inference for Counting on Semi-structured Tables
@@ -10090,6 +10797,7 @@ in the Case of Unambiguous Gender
kurosawa-yanaka-2022-logical
ynklab/sst_count
InfoTabS
+ 10.18653/v1/2022.acl-srw.8
GNNer: Reducing Overlapping in Span-based NER Using Graph Neural Networks
@@ -10104,6 +10812,7 @@ in the Case of Unambiguous Gender
urchade/gnner
CoNLL-2003
SciERC
+ 10.18653/v1/2022.acl-srw.9
Compositional Semantics and Inference System for Temporal Order based on Japanese CCG
@@ -10114,6 +10823,7 @@ in the Case of Unambiguous Gender
2022.acl-srw.10
sugimoto-yanaka-2022-compositional
ynklab/ccgtemp
+ 10.18653/v1/2022.acl-srw.10
Combine to Describe: Evaluating Compositional Generalization in Image Captioning
@@ -10125,6 +10835,7 @@ in the Case of Unambiguous Gender
2022.acl-srw.11
pantazopoulos-etal-2022-combine
COCO
+ 10.18653/v1/2022.acl-srw.11
Towards Unification of Discourse Annotation Frameworks
@@ -10133,6 +10844,7 @@ in the Case of Unambiguous Gender
Discourse information is difficult to represent and annotate. Among the major frameworks for annotating discourse information, RST, PDTB and SDRT are widely discussed and used, each having its own theoretical foundation and focus. Corpora annotated under different frameworks vary considerably. To make better use of the existing discourse corpora and achieve the possible synergy of different frameworks, it is worthwhile to investigate the systematic relations between different frameworks and devise methods of unifying the frameworks. Although the issue of framework unification has been a topic of discussion for a long time, there is currently no comprehensive approach which considers unifying both discourse structure and discourse relations and evaluates the unified framework intrinsically and extrinsically. We plan to use automatic means for the unification task and evaluate the result with structural complexity and downstream tasks. We will also explore the application of the unified framework in multi-task learning and graphical models.
2022.acl-srw.12
fu-2022-towards
+ 10.18653/v1/2022.acl-srw.12
AMR Alignment for Morphologically-rich and Pro-drop Languages
@@ -10142,6 +10854,7 @@ in the Case of Unambiguous Gender
Alignment between concepts in an abstract meaning representation (AMR) graph and the words within a sentence is one of the important stages of AMR parsing. Although there exist high performing AMR aligners for English, unfortunately, these are not well suited for many languages where many concepts appear from morpho-semantic elements.For the first time in the literature, this paper presents an AMR aligner tailored for morphologically-rich and pro-drop languages by experimenting on the Turkish language being a prominent example of this language group.Our aligner focuses on the meaning considering the rich Turkish morphology and aligns AMR concepts that emerge from morphemes using a tree traversal approach without additional resources or rules. We evaluate our aligner over a manually annotated gold data set in terms of precision, recall and F1 score. Our aligner outperforms the Turkish adaptations of the previously proposed aligners for English and Portuguese by an F1 score of 0.87 and provides a relative error reduction of up to 76%.
2022.acl-srw.13
oral-eryigit-2022-amr
+ 10.18653/v1/2022.acl-srw.13
Sketching a Linguistically-Driven Reasoning Dialog Model for Social Talk
@@ -10150,6 +10863,7 @@ in the Case of Unambiguous Gender
The capability of holding social talk (or casual conversation) and making sense of conversational content requires context-sensitive natural language understanding and reasoning, which cannot be handled efficiently by the current popular open-domain dialog systems and chatbots. Heavily relying on corpus-based machine learning techniques to encode and decode context-sensitive meanings, these systems focus on fitting a particular training dataset, but not tracking what is actually happening in a conversation, and therefore easily derail in a new context. This work sketches out a more linguistically-informed architecture to handle social talk in English, in which corpus-based methods form the backbone of the relatively context-insensitive components (e.g. part-of-speech tagging, approximation of lexical meaning and constituent chunking), while symbolic modeling is used for reasoning out the context-sensitive components, which do not have any consistent mapping to linguistic forms. All components are fitted into a Bayesian game-theoretic model to address the interactive and rational aspects of conversation.
2022.acl-srw.14
luu-2022-sketching
+ 10.18653/v1/2022.acl-srw.14
Scoping natural language processing in Indonesian and Malay for education applications
@@ -10161,6 +10875,7 @@ in the Case of Unambiguous Gender
2022.acl-srw.15
maxwelll-smith-etal-2022-scoping
IndoNLU Benchmark
+ 10.18653/v1/2022.acl-srw.15
English-Malay Cross-Lingual Embedding Alignment using Bilingual Lexicon Augmentation
@@ -10170,6 +10885,7 @@ in the Case of Unambiguous Gender
As high-quality Malay language resources are still a scarcity, cross lingual word embeddings make it possible for richer English resources to be leveraged for downstream Malay text classification tasks. This paper focuses on creating an English-Malay cross-lingual word embeddings using embedding alignment by exploiting existing language resources. We augmented the training bilingual lexicons using machine translation with the goal to improve the alignment precision of our cross-lingual word embeddings. We investigated the quality of the current state-of-the-art English-Malay bilingual lexicon and worked on improving its quality using Google Translate. We also examined the effect of Malay word coverage on the quality of cross-lingual word embeddings. Experimental results with a precision up till 28.17% show that the alignment precision of the cross-lingual word embeddings would inevitably degrade after 1-NN but a better seed lexicon and cleaner nearest neighbours can reduce the number of word pairs required to achieve satisfactory performance. As the English and Malay monolingual embeddings are pre-trained on informal language corpora, our proposed English-Malay embeddings alignment approach is also able to map non-standard Malay translations in the English nearest neighbours.
2022.acl-srw.16
lim-liew-2022-english
+ 10.18653/v1/2022.acl-srw.16
Towards Detecting Political Bias in Hindi News Articles
@@ -10181,6 +10897,7 @@ in the Case of Unambiguous Gender
Political propaganda in recent times has been amplified by media news portals through biased reporting, creating untruthful narratives on serious issues causing misinformed public opinions with interests of siding and helping a particular political party. This issue proposes a challenging NLP task of detecting political bias in news articles.We propose a transformer-based transfer learning method to fine-tune the pre-trained network on our data for this bias detection. As the required dataset for this particular task was not available, we created our dataset comprising 1388 Hindi news articles and their headlines from various Hindi news media outlets. We marked them on whether they are biased towards, against, or neutral to BJP, a political party, and the current ruling party at the centre in India.
2022.acl-srw.17
agrawal-etal-2022-towards
+ 10.18653/v1/2022.acl-srw.17
Restricted or Not: A General Training Framework for Neural Machine Translation
@@ -10193,6 +10910,7 @@ in the Case of Unambiguous Gender
2022.acl-srw.18
li-etal-2022-restricted
ASPEC
+ 10.18653/v1/2022.acl-srw.18
What do Models Learn From Training on More Than Text? Measuring Visual Commonsense Knowledge
@@ -10203,6 +10921,7 @@ in the Case of Unambiguous Gender
2022.acl-srw.19
hagstrom-johansson-2022-models
lovhag/measure-visual-commonsense-knowledge
+ 10.18653/v1/2022.acl-srw.19
TeluguNER: Leveraging Multi-Domain Named Entity Recognition with Deep Transformers
@@ -10215,6 +10934,7 @@ in the Case of Unambiguous Gender
2022.acl-srw.20
duggenpudi-etal-2022-teluguner
WikiAnn
+ 10.18653/v1/2022.acl-srw.20
Using Neural Machine Translation Methods for Sign Language Translation
@@ -10226,6 +10946,7 @@ in the Case of Unambiguous Gender
2022.acl-srw.21
angelova-etal-2022-using
PHOENIX14T
+ 10.18653/v1/2022.acl-srw.21
Flexible Visual Grounding
@@ -10242,6 +10963,7 @@ in the Case of Unambiguous Gender
RefCOCO
Visual Genome
Visual7W
+ 10.18653/v1/2022.acl-srw.22
A large-scale computational study of content preservation measures for text style transfer and paraphrase generation
@@ -10254,6 +10976,7 @@ in the Case of Unambiguous Gender
2022.acl-srw.23
babakov-etal-2022-large
skoltech-nlp/mutual_implication_score
+ 10.18653/v1/2022.acl-srw.23
Explicit Object Relation Alignment for Vision and Language Navigation
@@ -10264,6 +10987,7 @@ in the Case of Unambiguous Gender
2022.acl-srw.24
zhang-kordjamshidi-2022-explicit
hlr/object-grounding-for-vln
+ 10.18653/v1/2022.acl-srw.24
Mining Logical Event Schemas From Pre-Trained Language Models
@@ -10274,6 +10998,7 @@ in the Case of Unambiguous Gender
2022.acl-srw.25
lawley-schubert-2022-mining
FrameNet
+ 10.18653/v1/2022.acl-srw.25
Exploring Cross-lingual Text Detoxification with Large Multilingual Language Models.
@@ -10284,6 +11009,7 @@ in the Case of Unambiguous Gender
Detoxification is a task of generating text in polite style while preserving meaning and fluency of the original toxic text. Existing detoxification methods are monolingual i.e. designed to work in one exact language. This work investigates multilingual and cross-lingual detoxification and the behavior of large multilingual models in this setting. Unlike previous works we aim to make large language models able to perform detoxification without direct fine-tuning in a given language. Experiments show that multilingual models are capable of performing multilingual style transfer. However, tested state-of-the-art models are not able to perform cross-lingual detoxification and direct fine-tuning on exact language is currently inevitable and motivating the need of further research in this direction.
2022.acl-srw.26
moskovskiy-etal-2022-exploring
+ 10.18653/v1/2022.acl-srw.26
MEKER: Memory Efficient Knowledge Embedding Representation for Link Prediction and Question Answering
@@ -10298,6 +11024,7 @@ in the Case of Unambiguous Gender
chekalina-etal-2022-meker
FB15k-237
SimpleQuestions
+ 10.18653/v1/2022.acl-srw.27
Discourse on ASR Measurement: Introducing the ARPOCA Assessment Tool
@@ -10307,6 +11034,7 @@ in the Case of Unambiguous Gender
Automatic speech recognition (ASR) has evolved from a pipeline architecture with pronunciation dictionaries, phonetic features and language models to the end-to-end systems performing a direct translation from a raw waveform into a word sequence. With the increase in accuracy and the availability of pre-trained models, the ASR systems are now omnipresent in our daily applications. On the other hand, the models’ interpretability and their computational cost have become more challenging, particularly when dealing with less-common languages or identifying regional variations of speakers. This research proposal will follow a four-stage process: 1) Proving an overview of acoustic features and feature extraction algorithms; 2) Exploring current ASR models, tools, and performance assessment techniques; 3) Aligning features with interpretable phonetic transcripts; and 4) Designing a prototype ARPOCA to increase awareness of regional language variation and improve models feedback by developing a semi-automatic acoustic features extraction using PRAAT in conjunction with phonetic transcription.
2022.acl-srw.28
merz-scrivner-2022-discourse
+ 10.18653/v1/2022.acl-srw.28
Pretrained Knowledge Base Embeddings for improved Sentential Relation Extraction
@@ -10319,6 +11047,7 @@ in the Case of Unambiguous Gender
2022.acl-srw.29
papaluca-etal-2022-pretrained
brunoliegibastonliegi/pretrained-kb-embeddings-for-re
+ 10.18653/v1/2022.acl-srw.29
Improving Cross-domain, Cross-lingual and Multi-modal Deception Detection
@@ -10329,6 +11058,7 @@ in the Case of Unambiguous Gender
2022.acl-srw.30
panda-levitan-2022-improving
LIAR
+ 10.18653/v1/2022.acl-srw.30
Automatic Generation of Distractors for Fill-in-the-Blank Exercises with Round-Trip Neural Machine Translation
@@ -10340,6 +11070,7 @@ in the Case of Unambiguous Gender
In a fill-in-the-blank exercise, a student is presented with a carrier sentence with one word hidden, and a multiple-choice list that includes the correct answer and several inappropriate options, called distractors. We propose to automatically generate distractors using round-trip neural machine translation: the carrier sentence is translated from English into another (pivot) language and back, and distractors are produced by aligning the original sentence and its round-trip translation. We show that using hundreds of translations for a given sentence allows us to generate a rich set of challenging distractors. Further, using multiple pivot languages produces a diverse set of candidates. The distractors are evaluated against a real corpus of cloze exercises and checked manually for validity. We demonstrate that the proposed method significantly outperforms two strong baselines.
2022.acl-srw.31
panda-etal-2022-automatic
+ 10.18653/v1/2022.acl-srw.31
On the Locality of Attention in Direct Speech Translation
@@ -10351,6 +11082,7 @@ in the Case of Unambiguous Gender
Transformers have achieved state-of-the-art results across multiple NLP tasks. However, the self-attention mechanism complexity scales quadratically with the sequence length, creating an obstacle for tasks involving long sequences, like in the speech domain. In this paper, we discuss the usefulness of self-attention for Direct Speech Translation. First, we analyze the layer-wise token contributions in the self-attention of the encoder, unveiling local diagonal patterns. To prove that some attention weights are avoidable, we propose to substitute the standard self-attention with a local efficient one, setting the amount of context used based on the results of the analysis. With this approach, our model matches the baseline performance, and improves the efficiency by skipping the computation of those weights that standard attention discards.
2022.acl-srw.32
alastruey-etal-2022-locality
+ 10.18653/v1/2022.acl-srw.32
Extraction of Diagnostic Reasoning Relations for Clinical Knowledge Graphs
@@ -10359,6 +11091,7 @@ in the Case of Unambiguous Gender
Clinical knowledge graphs lack meaningful diagnostic relations (e.g. comorbidities, sign/symptoms), limiting their ability to represent real-world diagnostic processes. Previous methods in biomedical relation extraction have focused on concept relations, such as gene-disease and disease-drug, and largely ignored clinical processes. In this thesis, we leverage a clinical reasoning ontology and propose methods to extract such relations from a physician-facing point-of-care reference wiki and consumer health resource texts. Given the lack of data labeled with diagnostic relations, we also propose new methods of evaluating the correctness of extracted triples in the zero-shot setting. We describe a process for the intrinsic evaluation of new facts by triple confidence filtering and clinician manual review, as well extrinsic evaluation in the form of a differential diagnosis prediction task.
2022.acl-srw.33
socrates-2022-extraction
+ 10.18653/v1/2022.acl-srw.33
Scene-Text Aware Image and Text Retrieval with Dual-Encoder
@@ -10372,6 +11105,7 @@ in the Case of Unambiguous Gender
2022.acl-srw.34
miyawaki-etal-2022-scene
TextCaps
+ 10.18653/v1/2022.acl-srw.34
Towards Fine-grained Classification of Climate Change related Social Media Text
@@ -10382,6 +11116,7 @@ in the Case of Unambiguous Gender
With climate change becoming a cause of concern worldwide, it becomes essential to gauge people’s reactions. This can help educate and spread awareness about it and help leaders improve decision-making. This work explores the fine-grained classification and Stance detection of climate change-related social media text. Firstly, we create two datasets, ClimateStance and ClimateEng, consisting of 3777 tweets each, posted during the 2019 United Nations Framework Convention on Climate Change and comprehensively outline the dataset collection, annotation methodology, and dataset composition. Secondly, we propose the task of Climate Change stance detection based on our proposed ClimateStance dataset. Thirdly, we propose a fine-grained classification based on the ClimateEng dataset, classifying social media text into five categories: Disaster, Ocean/Water, Agriculture/Forestry, Politics, and General. We benchmark both the datasets for climate change stance detection and fine-grained classification using state-of-the-art methods in text classification. We also create a Reddit-based dataset for both the tasks, ClimateReddit, consisting of 6262 pseudo-labeled comments along with 329 manually annotated comments for the label. We then perform semi-supervised experiments for both the tasks and benchmark their results using the best-performing model for the supervised experiments. Lastly, we provide insights into the ClimateStance and ClimateReddit using part-of-speech tagging and named-entity recognition.
2022.acl-srw.35
vaid-etal-2022-towards
+ 10.18653/v1/2022.acl-srw.35
Deep Neural Representations for Multiword Expressions Detection
@@ -10391,6 +11126,7 @@ in the Case of Unambiguous Gender
Effective methods for multiword expressions detection are important for many technologies related to Natural Language Processing. Most contemporary methods are based on the sequence labeling scheme applied to an annotated corpus, while traditional methods use statistical measures. In our approach, we want to integrate the concepts of those two approaches. We present a novel weakly supervised multiword expressions extraction method which focuses on their behaviour in various contexts. Our method uses a lexicon of English multiword lexical units acquired from The Oxford Dictionary of English as a reference knowledge base and leverages neural language modelling with deep learning architectures. In our approach, we do not need a corpus annotated specifically for the task. The only required components are: a lexicon of multiword units, a large corpus, and a general contextual embeddings model. We propose a method for building a silver dataset by spotting multiword expression occurrences and acquiring statistical collocations as negative samples. Sample representation has been inspired by representations used in Natural Language Inference and relation recognition. Very good results (F1=0.8) were obtained with CNN network applied to individual occurrences followed by weighted voting used to combine results from the whole corpus.The proposed method can be quite easily applied to other languages.
2022.acl-srw.36
kanclerz-piasecki-2022-deep
+ 10.18653/v1/2022.acl-srw.36
A Checkpoint on Multilingual Misogyny Identification
@@ -10400,6 +11136,7 @@ in the Case of Unambiguous Gender
We address the problem of identifying misogyny in tweets in mono and multilingual settings in three languages: English, Italian, and Spanish. We explore model variations considering single and multiple languages both in the pre-training of the transformer and in the training of the downstream taskto explore the feasibility of detecting misogyny through a transfer learning approach across multiple languages. That is, we train monolingual transformers with monolingual data, and multilingual transformers with both monolingual and multilingual data.Our models reach state-of-the-art performance on all three languages. The single-language BERT models perform the best, closely followed by different configurations of multilingual BERT models. The performance drops in zero-shot classification across languages. Our error analysis shows that multilingual and monolingual models tend to make the same mistakes.
2022.acl-srw.37
muti-barron-cedeno-2022-checkpoint
+ 10.18653/v1/2022.acl-srw.37
Using dependency parsing for few-shot learning in distributional semantics
@@ -10409,6 +11146,7 @@ in the Case of Unambiguous Gender
In this work, we explore the novel idea of employing dependency parsing information in the context of few-shot learning, the task of learning the meaning of a rare word based on a limited amount of context sentences. Firstly, we use dependency-based word embedding models as background spaces for few-shot learning. Secondly, we introduce two few-shot learning methods which enhance the additive baseline model by using dependencies.
2022.acl-srw.38
preda-emerson-2022-using
+ 10.18653/v1/2022.acl-srw.38
A Dataset and BERT-based Models for Targeted Sentiment Analysis on Turkish Texts
@@ -10418,6 +11156,7 @@ in the Case of Unambiguous Gender
Targeted Sentiment Analysis aims to extract sentiment towards a particular target from a given text. It is a field that is attracting attention due to the increasing accessibility of the Internet, which leads people to generate an enormous amount of data. Sentiment analysis, which in general requires annotated data for training, is a well-researched area for widely studied languages such as English. For low-resource languages such as Turkish, there is a lack of such annotated data. We present an annotated Turkish dataset suitable for targeted sentiment analysis. We also propose BERT-based models with different architectures to accomplish the task of targeted sentiment analysis. The results demonstrate that the proposed models outperform the traditional sentiment analysis models for the targeted sentiment analysis task.
2022.acl-srw.39
mutlu-ozgur-2022-dataset
+ 10.18653/v1/2022.acl-srw.39
@@ -10449,6 +11188,7 @@ in the Case of Unambiguous Gender
2022.acl-demo.1
lin-etal-2022-dotat
fxlp/marktool
+ 10.18653/v1/2022.acl-demo.1
UKP-SQUARE: An Online Platform for Question Answering Research
@@ -10475,6 +11215,7 @@ in the Case of Unambiguous Gender
MS MARCO
Natural Questions
SQuAD
+ 10.18653/v1/2022.acl-demo.2
ViLMedic: a framework for research at the intersection of vision and language in medical AI
@@ -10494,6 +11235,7 @@ in the Case of Unambiguous Gender
jbdel/vilmedic
PadChest
Visual Question Answering
+ 10.18653/v1/2022.acl-demo.3
TextPruner: A Model Pruning Toolkit for Pre-Trained Language Models
@@ -10504,6 +11246,7 @@ in the Case of Unambiguous Gender
Pre-trained language models have been prevailed in natural language processing and become the backbones of many NLP tasks, but the demands for computational resources have limited their applications. In this paper, we introduce TextPruner, an open-source model pruning toolkit designed for pre-trained language models, targeting fast and easy model compression. TextPruner offers structured post-training pruning methods, including vocabulary pruning and transformer pruning, and can be applied to various models and tasks. We also propose a self-supervised pruning method that can be applied without the labeled data. Our experiments with several NLP tasks demonstrate the ability of TextPruner to reduce the model size without re-training the model.
2022.acl-demo.4
yang-etal-2022-textpruner
+ 10.18653/v1/2022.acl-demo.4
AnnIE: An Annotation Platform for Constructing Complete Open Information Extraction Benchmark
@@ -10519,6 +11262,7 @@ in the Case of Unambiguous Gender
2022.acl-demo.5
friedrich-etal-2022-annie
nfriedri/annie-annotation-platform
+ 10.18653/v1/2022.acl-demo.5
AdapterHub Playground: Simple and Flexible Few-Shot Learning with Adapters
@@ -10539,6 +11283,7 @@ in the Case of Unambiguous Gender
IMDb Movie Reviews
MRPC
SST
+ 10.18653/v1/2022.acl-demo.6
QiuNiu: A Chinese Lyrics Generation System with Passage-Level Input
@@ -10550,6 +11295,7 @@ in the Case of Unambiguous Gender
Lyrics generation has been a very popular application of natural language generation. Previous works mainly focused on generating lyrics based on a couple of attributes or keywords, rendering very limited control over the content of the lyrics. In this paper, we demonstrate the QiuNiu, a Chinese lyrics generation system which is conditioned on passage-level text rather than a few attributes or keywords. By using the passage-level text as input, the content of generated lyrics is expected to reflect the nuances of users’ needs. The QiuNiu system supports various forms of passage-level input, such as short stories, essays, poetry. The training of it is conducted under the framework of unsupervised machine translation, due to the lack of aligned passage-level text-to-lyrics corpus. We initialize the parameters of QiuNiu with a custom pretrained Chinese GPT-2 model and adopt a two-step process to finetune the model for better alignment between passage-level text and lyrics. Additionally, a postprocess module is used to filter and rerank the generated lyrics to select the ones of highest quality. The demo video of the system is available at https://youtu.be/OCQNzahqWgM.
2022.acl-demo.7
zhang-etal-2022-qiuniu
+ 10.18653/v1/2022.acl-demo.7
Automatic Gloss Dictionary for Sign Language Learners
@@ -10563,6 +11309,7 @@ in the Case of Unambiguous Gender
2022.acl-demo.8
xu-etal-2022-automatic
WLASL
+ 10.18653/v1/2022.acl-demo.8
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
@@ -10599,6 +11346,7 @@ in the Case of Unambiguous Gender
bach-etal-2022-promptsource
bigscience-workshop/promptsource
SNLI
+ 10.18653/v1/2022.acl-demo.9
OpenPrompt: An Open-source Framework for Prompt-learning
@@ -10615,6 +11363,7 @@ in the Case of Unambiguous Gender
ding-etal-2022-openprompt
thunlp/OpenPrompt
GLUE
+ 10.18653/v1/2022.acl-demo.10
Guided K-best Selection for Semantic Parsing Annotation
@@ -10632,6 +11381,7 @@ in the Case of Unambiguous Gender
Collecting data for conversational semantic parsing is a time-consuming and demanding process. In this paper we consider, given an incomplete dataset with only a small amount of data, how to build an AI-powered human-in-the-loop process to enable efficient data collection. A guided K-best selection process is proposed, which (i) generates a set of possible valid candidates; (ii) allows users to quickly traverse the set and filter incorrect parses; and (iii) asks users to select the correct parse, with minimal modification when necessary. We investigate how to best support users in efficiently traversing the candidate set and locating the correct parse, in terms of speed and accuracy. In our user study, consisting of five annotators labeling 300 instances each, we find that combining keyword searching, where keywords can be used to query relevant candidates, and keyword suggestion, where representative keywords are automatically generated, enables fast and accurate annotation.
2022.acl-demo.11
belyy-etal-2022-guided
+ 10.18653/v1/2022.acl-demo.11
Hard and Soft Evaluation of NLP models with BOOtSTrap SAmpling - BooStSa
@@ -10643,6 +11393,7 @@ in the Case of Unambiguous Gender
Natural Language Processing (NLP) ‘s applied nature makes it necessary to select the most effective and robust models. Producing slightly higher performance is insufficient; we want to know whether this advantage will carry over to other data sets. Bootstrapped significance tests can indicate that ability.So while necessary, computing the significance of models’ performance differences has many levels of complexity. It can be tedious, especially when the experimental design has many conditions to compare and several runs of experiments.We present BooStSa, a tool that makes it easy to compute significance levels with the BOOtSTrap SAmpling procedure to evaluate models that predict not only standard hard labels but soft-labels (i.e., probability distributions over different classes) as well.
2022.acl-demo.12
fornaciari-etal-2022-hard
+ 10.18653/v1/2022.acl-demo.12
COVID-19 Claim Radar: A Structured Claim Extraction and Tracking System
@@ -10659,6 +11410,7 @@ in the Case of Unambiguous Gender
2022.acl-demo.13
li-etal-2022-covid
uiucnlp/covid-claim-radar
+ 10.18653/v1/2022.acl-demo.13
TS-ANNO: An Annotation Tool to Build, Annotate and Evaluate Text Simplification Corpora
@@ -10670,6 +11422,7 @@ in the Case of Unambiguous Gender
stodden-kallmeyer-2022-ts
ASSET
ASSET Corpus
+ 10.18653/v1/2022.acl-demo.14
Language Diversity: Visible to Humans, Exploitable by Machines
@@ -10683,6 +11436,7 @@ in the Case of Unambiguous Gender
The Universal Knowledge Core (UKC) is a large multilingual lexical database with a focus on language diversity and covering over two thousand languages. The aim of the database, as well as its tools and data catalogue, is to make the abstract notion of linguistic diversity visually understandable for humans and formally exploitable by machines. The UKC website lets users explore millions of individual words and their meanings, but also phenomena of cross-lingual convergence and divergence, such as shared interlingual meanings, lexicon similarities, cognate clusters, or lexical gaps. The UKC LiveLanguage Catalogue, in turn, provides access to the underlying lexical data in a computer-processable form, ready to be reused in cross-lingual applications.
2022.acl-demo.15
bella-etal-2022-language
+ 10.18653/v1/2022.acl-demo.15
CogKGE: A Knowledge Graph Embedding Toolkit and Benchmark for Representing Multi-source and Heterogeneous Knowledge
@@ -10703,6 +11457,7 @@ in the Case of Unambiguous Gender
jinzhuoran/cogkge
ConceptNet
FrameNet
+ 10.18653/v1/2022.acl-demo.16
Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks
@@ -10724,6 +11479,7 @@ in the Case of Unambiguous Gender
ANLI
AdversarialQA
GLUE
+ 10.18653/v1/2022.acl-demo.17
DataLab: A Platform for Data Analysis and Intervention
@@ -10741,6 +11497,7 @@ in the Case of Unambiguous Gender
xiao-etal-2022-datalab
BeerAdvocate
SNLI
+ 10.18653/v1/2022.acl-demo.18
Cue-bot: A Conversational Agent for Assistive Technology
@@ -10755,6 +11512,7 @@ in the Case of Unambiguous Gender
Intelligent conversational assistants have become an integral part of our lives for performing simple tasks. However, such agents, for example, Google bots, Alexa and others are yet to have any social impact on minority population, for example, for people with neurological disorders and people with speech, language and social communication disorders, sometimes with locked-in states where speaking or typing is a challenge. Language model technologies can be very powerful tools in enabling these users to carry out daily communication and social interactions. In this work, we present a system that users with varied levels of disabilties can use to interact with the world, supported by eye-tracking, mouse controls and an intelligent agent Cue-bot, that can represent the user in a conversation. The agent provides relevant controllable ‘cues’ to generate desirable responses quickly for an ongoing dialog context. In the context of usage of such systems for people with degenerative disorders, we present automatic and human evaluation of our cue/keyword predictor and the controllable dialog system and show that our models perform significantly better than models without control and can also reduce user effort (fewer keystrokes) and speed up communication (typing time) significantly.
2022.acl-demo.19
h-kumar-etal-2022-cue
+ 10.18653/v1/2022.acl-demo.19
M-SENA: An Integrated Platform for Multimodal Sentiment Analysis
@@ -10772,6 +11530,7 @@ in the Case of Unambiguous Gender
CH-SIMS
CMU-MOSEI
Multimodal Opinionlevel Sentiment Intensity
+ 10.18653/v1/2022.acl-demo.20
HOSMEL: A Hot-Swappable Modularized Entity Linking Toolkit for Chinese
@@ -10788,6 +11547,7 @@ in the Case of Unambiguous Gender
zhang-li-etal-2022-hosmel
thudm/hosmel
CLUE
+ 10.18653/v1/2022.acl-demo.21
BMInf: An Efficient Toolkit for Big Model Inference and Tuning
@@ -10805,6 +11565,7 @@ in the Case of Unambiguous Gender
2022.acl-demo.22
han-etal-2022-bminf
openbmb/bminf
+ 10.18653/v1/2022.acl-demo.22
MMEKG: Multi-modal Event Knowledge Graph towards Universal Representation across Modalities
@@ -10824,6 +11585,7 @@ in the Case of Unambiguous Gender
2022.acl-demo.23
ma-etal-2022-mmekg
FrameNet
+ 10.18653/v1/2022.acl-demo.23
SocioFillmore: A Tool for Discovering Perspectives
@@ -10836,6 +11598,7 @@ in the Case of Unambiguous Gender
SOCIOFILLMORE is a multilingual tool which helps to bring to the fore the focus or the perspective that a text expresses in depicting an event. Our tool, whose rationale we also support through a large collection of human judgements, is theoretically grounded on frame semantics and cognitive linguistics, and implemented using the LOME frame semantic parser. We describe SOCIOFILLMORE’s development and functionalities, show how non-NLP researchers can easily interact with the tool, and present some example case studies which are already incorporated in the system, together with the kind of analysis that can be visualised.
2022.acl-demo.24
minnema-etal-2022-sociofillmore
+ 10.18653/v1/2022.acl-demo.24
TimeLMs: Diachronic Language Models from Twitter
@@ -10850,6 +11613,7 @@ in the Case of Unambiguous Gender
loureiro-etal-2022-timelms
cardiffnlp/timelms
TweetEval
+ 10.18653/v1/2022.acl-demo.25
Adaptor: Objective-Centric Adaptation Framework for Language Models
@@ -10862,6 +11626,7 @@ in the Case of Unambiguous Gender
2022.acl-demo.26
stefanik-etal-2022-adaptor
gaussalgo/adaptor
+ 10.18653/v1/2022.acl-demo.26
QuickGraph: A Rapid Annotation Tool for Knowledge Graph Extraction from Technical Text
@@ -10873,6 +11638,7 @@ in the Case of Unambiguous Gender
2022.acl-demo.27
bikaun-etal-2022-quickgraph
nlp-tlp/quickgraph
+ 10.18653/v1/2022.acl-demo.27
@@ -10905,6 +11671,7 @@ in the Case of Unambiguous Gender
2022.acl-tutorials.1
church-etal-2022-gentle
GLUE
+ 10.18653/v1/2022.acl-tutorials.1
Towards Reproducible Machine Learning Research in Natural Language Processing
@@ -10920,6 +11687,7 @@ in the Case of Unambiguous Gender
While recent progress in the field of ML has been significant, the reproducibility of these cutting-edge results is often lacking, with many submissions lacking the necessary information in order to ensure subsequent reproducibility. Despite proposals such as the Reproducibility Checklist and reproducibility criteria at several major conferences, the reflex for carrying out research with reproducibility in mind is lacking in the broader ML community. We propose this tutorial as a gentle introduction to ensuring reproducible research in ML, with a specific emphasis on computational linguistics and NLP. We also provide a framework for using reproducibility as a teaching tool in university-level computer science programs.
2022.acl-tutorials.2
lucic-etal-2022-towards
+ 10.18653/v1/2022.acl-tutorials.2
Knowledge-Augmented Methods for Natural Language Processing
@@ -10936,6 +11704,7 @@ in the Case of Unambiguous Gender
CommonGen
CommonsenseQA
ConceptNet
+ 10.18653/v1/2022.acl-tutorials.3
Non-Autoregressive Sequence Generation
@@ -10945,6 +11714,7 @@ in the Case of Unambiguous Gender
Non-autoregressive sequence generation (NAR) attempts to generate the entire or partial output sequences in parallel to speed up the generation process and avoid potential issues (e.g., label bias, exposure bias) in autoregressive generation. While it has received much research attention and has been applied in many sequence generation tasks in natural language and speech, naive NAR models still face many challenges to close the performance gap between state-of-the-art autoregressive models because of a lack of modeling power. In this tutorial, we will provide a thorough introduction and review of non-autoregressive sequence generation, in four sections: 1) Background, which covers the motivation of NAR generation, the problem definition, the evaluation protocol, and the comparison with standard autoregressive generation approaches. 2) Method, which includes different aspects: model architecture, objective function, training data, learning paradigm, and additional inference tricks. 3) Application, which covers different tasks in text and speech generation, and some advanced topics in applications. 4) Conclusion, in which we describe several research challenges and discuss the potential future research directions. We hope this tutorial can serve both academic researchers and industry practitioners working on non-autoregressive sequence generation.
2022.acl-tutorials.4
gu-tan-2022-non
+ 10.18653/v1/2022.acl-tutorials.4
Learning with Limited Text Data
@@ -10955,6 +11725,7 @@ in the Case of Unambiguous Gender
Natural Language Processing (NLP) has achieved great progress in the past decade on the basis of neural models, which often make use of large amounts of labeled data to achieve state-of-the-art performance. The dependence on labeled data prevents NLP models from being applied to low-resource settings and languages because of the time, money, and expertise that is often required to label massive amounts of textual data. Consequently, the ability to learn with limited labeled data is crucial for deploying neural systems to real-world NLP applications. Recently, numerous approaches have been explored to alleviate the need for labeled data in NLP such as data augmentation and semi-supervised learning. This tutorial aims to provide a systematic and up-to-date overview of these methods in order to help researchers and practitioners understand the landscape of approaches and the challenges associated with learning from limited labeled data, an emerging topic in the computational linguistics community. We will consider applications to a wide variety of NLP tasks (including text classification, generation, and structured prediction) and will highlight current challenges and future directions.
2022.acl-tutorials.5
yang-etal-2022-learning
+ 10.18653/v1/2022.acl-tutorials.5
Zero- and Few-Shot NLP with Pretrained Language Models
@@ -10967,6 +11738,7 @@ in the Case of Unambiguous Gender
The ability to efficiently learn from little-to-no data is critical to applying NLP to tasks where data collection is costly or otherwise difficult. This is a challenging setting both academically and practically—particularly because training neutral models typically require large amount of labeled data. More recently, advances in pretraining on unlabelled data have brought up the potential of better zero-shot or few-shot learning (Devlin et al., 2019; Brown et al., 2020). In particular, over the past year, a great deal of research has been conducted to better learn from limited data using large-scale language models. In this tutorial, we aim at bringing interested NLP researchers up to speed about the recent and ongoing techniques for zero- and few-shot learning with pretrained language models. Additionally, our goal is to reveal new research opportunities to the audience, which will hopefully bring us closer to address existing challenges in this domain.
2022.acl-tutorials.6
beltagy-etal-2022-zero
+ 10.18653/v1/2022.acl-tutorials.6
Vision-Language Pretraining: Current Trends and the Future
@@ -10978,6 +11750,7 @@ in the Case of Unambiguous Gender
2022.acl-tutorials.7
agrawal-etal-2022-vision
Visual Question Answering
+ 10.18653/v1/2022.acl-tutorials.7
Natural Language Processing for Multilingual Task-Oriented Dialogue
@@ -10990,6 +11763,7 @@ in the Case of Unambiguous Gender
Recent advances in deep learning have also enabled fast progress in the research of task-oriented dialogue (ToD) systems. However, the majority of ToD systems are developed for English and merely a handful of other widely spoken languages, e.g., Chinese and German. This hugely limits the global reach and, consequently, transformative socioeconomic potential of such systems. In this tutorial, we will thus discuss and demonstrate the importance of (building) multilingual ToD systems, and then provide a systematic overview of current research gaps, challenges and initiatives related to multilingual ToD systems, with a particular focus on their connections to current research and challenges in multilingual and low-resource NLP. The tutorial will aim to provide answers or shed new light to the following questions: a) Why are multilingual dialogue systems so hard to build: what makes multilinguality for dialogue more challenging than for other NLP applications and tasks? b) What are the best existing methods and datasets for multilingual and cross-lingual (task-oriented) dialog systems? How are (multilingual) ToD systems usually evaluated? c) What are the promising future directions for multilingual ToD research: where can one draw inspiration from related NLP areas and tasks?
2022.acl-tutorials.8
razumovskaia-etal-2022-natural
+ 10.18653/v1/2022.acl-tutorials.8
diff --git a/data/xml/2022.bigscience.xml b/data/xml/2022.bigscience.xml
index 4b10218a6d..ae20e9fc45 100644
--- a/data/xml/2022.bigscience.xml
+++ b/data/xml/2022.bigscience.xml
@@ -33,6 +33,7 @@
jin-etal-2022-lifelong
S2ORC
SciERC
+ 10.18653/v1/2022.bigscience-1.1
Using ASR-Generated Text for Spoken Language Modeling
@@ -47,6 +48,7 @@
This papers aims at improving spoken language modeling (LM) using very large amount of automatically transcribed speech. We leverage the INA (French National Audiovisual Institute) collection and obtain 19GB of text after applying ASR on 350,000 hours of diverse TV shows. From this, spoken language models are trained either by fine-tuning an existing LM (FlauBERT) or through training a LM from scratch.The new models (FlauBERT-Oral) will be shared with the community and are evaluated not only in terms of word prediction accuracy but also for two downstream tasks : classification of TV shows and syntactic parsing of speech. Experimental results show that FlauBERT-Oral is better than its initial FlauBERT version demonstrating that, despite its inherent noisy nature, ASR-Generated text can be useful to improve spoken language modeling.
2022.bigscience-1.2
herve-etal-2022-using
+ 10.18653/v1/2022.bigscience-1.2
You reap what you sow: On the Challenges of Bias Evaluation Under Multilingual Settings
@@ -71,6 +73,7 @@
2022.bigscience-1.3
talat-etal-2022-reap
CrowS-Pairs
+ 10.18653/v1/2022.bigscience-1.3
Diverse Lottery Tickets Boost Ensemble from a Single Pretrained Model
@@ -87,6 +90,7 @@
SQuAD
SST
SuperGLUE
+ 10.18653/v1/2022.bigscience-1.4
UNIREX: A Unified Learning Framework for Language Model Rationale Extraction
@@ -107,6 +111,7 @@
MultiRC
SST
e-SNLI
+ 10.18653/v1/2022.bigscience-1.5
Pipelines for Social Bias Testing of Large Language Models
@@ -117,6 +122,7 @@
The maturity level of language models is now at a stage in which many companies rely on them to solve various tasks. However, while research has shown how biased and harmful these models are, systematic ways of integrating social bias tests into development pipelines are still lacking. This short paper suggests how to use these verification techniques in development pipelines. We take inspiration from software testing and suggest addressing social bias evaluation as software testing. We hope to open a discussion on the best methodologies to handle social bias testing in language models.
2022.bigscience-1.6
nozza-etal-2022-pipelines
+ 10.18653/v1/2022.bigscience-1.6
Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0
@@ -131,6 +137,7 @@
In this work, we explore whether the recently demonstrated zero-shot abilities of the T0 model extend to Named Entity Recognition for out-of-distribution languages and time periods. Using a historical newspaper corpus in 3 languages as test-bed, we use prompts to extract possible named entities. Our results show that a naive approach for prompt-based zero-shot multilingual Named Entity Recognition is error-prone, but highlights the potential of such an approach for historical languages lacking labeled datasets. Moreover, we also find that T0-like models can be probed to predict the publication date and language of a document, which could be very relevant for the study of historical texts.
2022.bigscience-1.7
de-toni-etal-2022-entities
+ 10.18653/v1/2022.bigscience-1.7
A Holistic Assessment of the Carbon Footprint of Noor, a Very Large Arabic Language Model
@@ -144,6 +151,7 @@
2022.bigscience-1.8
lakim-etal-2022-holistic
CCNet
+ 10.18653/v1/2022.bigscience-1.8
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
@@ -179,6 +187,7 @@
PROST
SuperGLUE
The Pile
+ 10.18653/v1/2022.bigscience-1.9
Dataset Debt in Biomedical Language Modeling
@@ -200,6 +209,7 @@
fries-etal-2022-dataset
BLUE
BLURB
+ 10.18653/v1/2022.bigscience-1.10
Emergent Structures and Training Dynamics in Large Language Models
@@ -214,6 +224,7 @@
Large language models have achieved success on a number of downstream tasks, particularly in a few and zero-shot manner. As a consequence, researchers have been investigating both the kind of information these networks learn and how such information can be encoded in the parameters of the model. We survey the literature on changes in the network during training, drawing from work outside of NLP when necessary, and on learned representations of linguistic features in large language models. We note in particular the lack of sufficient research on the emergence of functional units, subsections of the network where related functions are grouped or organised, within large language models and motivate future work that grounds the study of language models in an analysis of their changing internal structure during training time.
2022.bigscience-1.11
teehan-etal-2022-emergent
+ 10.18653/v1/2022.bigscience-1.11
Foundation Models of Scientific Knowledge for Chemistry: Opportunities, Challenges and Lessons Learned
@@ -245,6 +256,7 @@
WSC
WebText
WiC
+ 10.18653/v1/2022.bigscience-1.12
diff --git a/data/xml/2022.bionlp.xml b/data/xml/2022.bionlp.xml
index dd540d0613..482668f3aa 100644
--- a/data/xml/2022.bionlp.xml
+++ b/data/xml/2022.bionlp.xml
@@ -27,6 +27,7 @@
The healthcare domain suffers from the spread of poor quality articles on the Internet. While manual efforts exist to check the quality of online healthcare articles, they are not sufficient to assess all those in circulation. Such quality assessment can be automated as a text classification task, however, explanations for the labels are necessary for the users to trust the model predictions. While current explainable systems tackle explanation generation as summarization, we propose a new approach based on question answering (QA) that allows us to generate explanations for multiple criteria using a single model. We show that this QA-based approach is competitive with the current state-of-the-art, and complements summarization-based models for explainable quality assessment. We also introduce a human evaluation protocol more appropriate than automatic metrics for the evaluation of explanation generation models.
2022.bionlp-1.1
boissonnet-etal-2022-explainable
+ 10.18653/v1/2022.bionlp-1.1
A sequence-to-sequence approach for document-level relation extraction
@@ -41,6 +42,7 @@
BC5CDR
CDR
DocRED
+ 10.18653/v1/2022.bionlp-1.2
Position-based Prompting for Health Outcome Generation
@@ -52,6 +54,7 @@
Probing factual knowledge in Pre-trained Language Models (PLMs) using prompts has indirectly implied that language models (LMs) can be treated as knowledge bases. To this end, this phenomenon has been effective, especially when these LMs are fine-tuned towards not just data, but also to the style or linguistic pattern of the prompts themselves. We observe that satisfying a particular linguistic pattern in prompts is an unsustainable, time-consuming constraint in the probing task, especially because they are often manually designed and the range of possible prompt template patterns can vary depending on the prompting task. To alleviate this constraint, we propose using a position-attention mechanism to capture positional information of each word in a prompt relative to the mask to be filled, hence avoiding the need to re-construct prompts when the prompts’ linguistic pattern changes. Using our approach, we demonstrate the ability of eliciting answers (in a case study on health outcome generation) to not only common prompt templates like Cloze and Prefix but also rare ones too, such as Postfix and Mixed patterns whose masks are respectively at the start and in multiple random places of the prompt. More so, using various biomedical PLMs, our approach consistently outperforms a baseline in which the default PLMs representation is used to predict masked tokens.
2022.bionlp-1.3
abaho-etal-2022-position
+ 10.18653/v1/2022.bionlp-1.3
How You Say It Matters: Measuring the Impact of Verbal Disfluency Tags on Automated Dementia Detection
@@ -63,6 +66,7 @@
2022.bionlp-1.4
farzana-etal-2022-say
ashwindeshpande96/measuring_the_impact_of_verbal_disfluency_tags_on_automated_dementia_detection
+ 10.18653/v1/2022.bionlp-1.4
Zero-Shot Aspect-Based Scientific Document Summarization using Self-Supervised Pre-training
@@ -76,6 +80,7 @@
soleimani-etal-2022-zero
asoleimanib/zeroshotaspectbased
FacetSum
+ 10.18653/v1/2022.bionlp-1.5
Data Augmentation for Biomedical Factoid Question Answering
@@ -90,6 +95,7 @@
BIOMRC
BioASQ
SQuAD
+ 10.18653/v1/2022.bionlp-1.6
Slot Filling for Biomedical Information Extraction
@@ -104,6 +110,7 @@
ypapanik/biomedical-slot-filling
KILT
Natural Questions
+ 10.18653/v1/2022.bionlp-1.7
Automatic Biomedical Term Clustering by Learning Fine-grained Term Representations
@@ -116,6 +123,7 @@
zeng-etal-2022-automatic
GanjinZero/CODER
BC5CDR
+ 10.18653/v1/2022.bionlp-1.8
BioBART: Pretraining and Evaluation of A Biomedical Generative Language Model
@@ -138,6 +146,7 @@
MeQSum
MedMentions
Semantic Scholar
+ 10.18653/v1/2022.bionlp-1.9
Incorporating Medical Knowledge to Transformer-based Language Models for Medical Dialogue Generation
@@ -150,6 +159,7 @@
Medical dialogue systems have the potential to assist doctors in expanding access to medical care, improving the quality of patient experiences, and lowering medical expenses. The computational methods are still in their early stages and are not ready for widespread application despite their great potential. Existing transformer-based language models have shown promising results but lack domain-specific knowledge. However, to diagnose like doctors, an automatic medical diagnosis necessitates more stringent requirements for the rationality of the dialogue in the context of relevant knowledge. In this study, we propose a new method that addresses the challenges of medical dialogue generation by incorporating medical knowledge into transformer-based language models. We present a method that leverages an external medical knowledge graph and injects triples as domain knowledge into the utterances. Automatic and human evaluation on a publicly available dataset demonstrates that incorporating medical knowledge outperforms several state-of-the-art baseline methods.
2022.bionlp-1.10
naseem-etal-2022-incorporating
+ 10.18653/v1/2022.bionlp-1.10
Memory-aligned Knowledge Graph for Clinically Accurate Radiology Image Report Generation
@@ -158,6 +168,7 @@
Automatic generating the clinically accurate radiology report from X-ray images is important but challenging. The identification of multi-grained abnormal regions in image and corresponding abnormalities is difficult for data-driven neural models. In this work, we introduce a Memory-aligned Knowledge Graph (MaKG) of clinical abnormalities to better learn the visual patterns of abnormalities and their relationships by integrating it into a deep model architecture for the report generation. We carry out extensive experiments and show that the proposed MaKG deep model can improve the clinical accuracy of the generated reports.
2022.bionlp-1.11
yan-2022-memory
+ 10.18653/v1/2022.bionlp-1.11
Simple Semantic-based Data Augmentation for Named Entity Recognition in Biomedical Texts
@@ -167,6 +178,7 @@
Data augmentation is important in addressing data sparsity and low resources in NLP. Unlike data augmentation for other tasks such as sentence-level and sentence-pair ones, data augmentation for named entity recognition (NER) requires preserving the semantic of entities. To that end, in this paper we propose a simple semantic-based data augmentation method for biomedical NER. Our method leverages semantic information from pre-trained language models for both entity-level and sentence-level. Experimental results on two datasets: i2b2-2010 (English) and VietBioNER (Vietnamese) showed that the proposed method could improve NER performance.
2022.bionlp-1.12
phan-nguyen-2022-simple
+ 10.18653/v1/2022.bionlp-1.12
Auxiliary Learning for Named Entity Recognition with Multiple Auxiliary Biomedical Training Data
@@ -181,6 +193,7 @@
2022.bionlp-1.13
watanabe-etal-2022-auxiliary
NCBI Disease
+ 10.18653/v1/2022.bionlp-1.13
SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study
@@ -196,6 +209,7 @@
2022.bionlp-1.14
cahyawijaya-etal-2022-snp2vec
hltchkust/snp2vec
+ 10.18653/v1/2022.bionlp-1.14
Biomedical NER using Novel Schema and Distant Supervision
@@ -207,6 +221,7 @@
Biomedical Named Entity Recognition (BMNER) is one of the most important tasks in the field of biomedical text mining. Most work so far on this task has not focused on identification of discontinuous and overlapping entities, even though they are present in significant fractions in real-life biomedical datasets. In this paper, we introduce a novel annotation schema to capture complex entities, and explore the effects of distant supervision on our deep-learning sequence labelling model. For BMNER task, our annotation schema outperforms other BIO-based annotation schemes on the same model. We also achieve higher F1-scores than state-of-the-art models on multiple corpora without fine-tuning embeddings, highlighting the efficacy of neural feature extraction using our model.
2022.bionlp-1.15
khandelwal-etal-2022-biomedical
+ 10.18653/v1/2022.bionlp-1.15
Improving Supervised Drug-Protein Relation Extraction with Distantly Supervised Models
@@ -217,6 +232,7 @@
This paper proposes novel drug-protein relation extraction models that indirectly utilize distant supervision data. Concretely, instead of adding distant supervision data to the manually annotated training data, our models incorporate distantly supervised models that are relation extraction models trained with distant supervision data. Distantly supervised learning has been proposed to generate a large amount of pseudo-training data at low cost. However, there is still a problem of low prediction performance due to the inclusion of mislabeled data. Therefore, several methods have been proposed to suppress the effects of noisy cases by utilizing some manually annotated training data. However, their performance is lower than that of supervised learning on manually annotated data because mislabeled data that cannot be fully suppressed becomes noise when training the model. To overcome this issue, our methods indirectly utilize distant supervision data with manually annotated training data. The experimental results on the DrugProt corpus in the BioCreative VII Track 1 showed that our proposed model can consistently improve the supervised models in different settings.
2022.bionlp-1.16
iinuma-etal-2022-improving
+ 10.18653/v1/2022.bionlp-1.16
Named Entity Recognition for Cancer Immunology Research Using Distant Supervision
@@ -227,6 +243,7 @@
Cancer immunology research involves several important cell and protein factors. Extracting the information of such cells and proteins and the interactions between them from text are crucial in text mining for cancer immunology research. However, there are few available datasets for these entities, and the amount of annotated documents is not sufficient compared with other major named entity types. In this work, we introduce our automatically annotated dataset of key named entities, i.e., T-cells, cytokines, and transcription factors, which engages the recent cancer immunotherapy. The entities are annotated based on the UniProtKB knowledge base using dictionary matching. We build a neural named entity recognition (NER) model to be trained on this dataset and evaluate it on a manually-annotated data. Experimental results show that we can achieve a promising NER performance even though our data is automatically annotated. Our dataset also enhances the NER performance when combined with existing data, especially gaining improvement in yet investigated named entities such as cytokines and transcription factors.
2022.bionlp-1.17
trieu-etal-2022-named
+ 10.18653/v1/2022.bionlp-1.17
Intra-Template Entity Compatibility based Slot-Filling for Clinical Trial Information Extraction
@@ -236,6 +253,7 @@
We present a deep learning based information extraction system that can extract the design and results of a published abstract describing a Randomized Controlled Trial (RCT). In contrast to other approaches, our system does not regard the PICO elements as flat objects or labels but as structured objects. We thus model the task as the one of filling a set of templates and slots; our two-step approach recognizes relevant slot candidates as a first step and assigns them to a corresponding template as second step, relying on a learned pairwise scoring function that models the compatibility of the different slot values. We evaluate the approach on a dataset of 211 manually annotated abstracts for type 2 Diabetes and Glaucoma, showing the positive impact of modelling intra-template entity compatibility. As main benefit, our approach yields a structured object for every RCT abstract that supports the aggregation and summarization of clinical trial results across published studies and can facilitate the task of creating a systematic review or meta-analysis.
2022.bionlp-1.18
witte-cimiano-2022-intra
+ 10.18653/v1/2022.bionlp-1.18
Pretrained Biomedical Language Models for Clinical NLP in Spanish
@@ -253,6 +271,7 @@
2022.bionlp-1.19
carrino-etal-2022-pretrained
PlanTL-GOB-ES/lm-biomedical-clinical-es
+ 10.18653/v1/2022.bionlp-1.19
Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts
@@ -268,6 +287,7 @@
amin-etal-2022-shot
suamin/t2ner
CoNLL 2002
+ 10.18653/v1/2022.bionlp-1.20
VPAI_Lab at MedVidQA 2022: A Two-Stage Cross-modal Fusion Method for Medical Instructional Video Classification
@@ -283,6 +303,7 @@
lireanstar/medvidcl
Kinetics
MedVidQA
+ 10.18653/v1/2022.bionlp-1.21
GenCompareSum: a hybrid unsupervised summarization method using salience
@@ -298,6 +319,7 @@
Pubmed
S2ORC
arXiv
+ 10.18653/v1/2022.bionlp-1.22
BioCite: A Deep Learning-based Citation Linkage Framework for Biomedical Research Articles
@@ -307,6 +329,7 @@
Research papers reflect scientific advances. Citations are widely used in research publications to support the new findings and show their benefits, while also regulating the information flow to make the contents clearer for the audience. A citation in a research article refers to the information’s source, but not the specific text span from that source article. In biomedical research articles, this task is challenging as the same chemical or biological component can be represented in multiple ways in different papers from various domains. This paper suggests a mechanism for linking citing sentences in a publication with cited sentences in referenced sources. The framework presented here pairs the citing sentence with all of the sentences in the reference text, and then tries to retrieve the semantically equivalent pairs. These semantically related sentences from the reference paper are chosen as the cited statements. This effort involves designing a citation linkage framework utilizing sequential and tree-structured siamese deep learning models. This paper also provides a method to create a synthetic corpus for such a task.
2022.bionlp-1.23
singha-roy-mercer-2022-biocite
+ 10.18653/v1/2022.bionlp-1.23
Low Resource Causal Event Detection from Biomedical Literature
@@ -318,6 +341,7 @@
Recognizing causal precedence relations among the chemical interactions in biomedical literature is crucial to understanding the underlying biological mechanisms. However, detecting such causal relation can be hard because: (1) many times, such causal relations among events are not explicitly expressed by certain phrases but implicitly implied by very diverse expressions in the text, and (2) annotating such causal relation detection datasets requires considerable expert knowledge and effort. In this paper, we propose a strategy to address both challenges by training neural models with in-domain pre-training and knowledge distillation. We show that, by using very limited amount of labeled data, and sufficient amount of unlabeled data, the neural models outperform previous baselines on the causal precedence detection task, and are ten times faster at inference compared to the BERT base model.
2022.bionlp-1.24
liang-etal-2022-low
+ 10.18653/v1/2022.bionlp-1.24
Overview of the MedVidQA 2022 Shared Task on Medical Video Question-Answering
@@ -329,6 +353,7 @@
gupta-demner-fushman-2022-overview
HowTo100M
MedVidQA
+ 10.18653/v1/2022.bionlp-1.25
Inter-annotator agreement is not the ceiling of machine learning performance: Evidence from a comprehensive set of simulations
@@ -339,6 +364,7 @@
It is commonly claimed that inter-annotator agreement (IAA) is the ceiling of machine learning (ML) performance, i.e., that the agreement between an ML system’s predictions and an annotator can not be higher than the agreement between two annotators. Although Boguslav & Cohen (2017) showed that this claim is falsified by many real-world ML systems, the claim has persisted. As a complement to this real-world evidence, we conducted a comprehensive set of simulations, and show that an ML model can beat IAA even if (and especially if) annotators are noisy and differ in their underlying classification functions, as long as the ML model is reasonably well-specified. Although the latter condition has long been elusive, leading ML models to underperform IAA, we anticipate that this condition will be increasingly met in the era of big data and deep learning. Our work has implications for (1) maximizing the value of machine learning, (2) adherence to ethical standards in computing, and (3) economical use of annotated resources, which is paramount in settings where annotation is especially expensive, like biomedical natural language processing.
2022.bionlp-1.26
richie-etal-2022-inter
+ 10.18653/v1/2022.bionlp-1.26
Conversational Bots for Psychotherapy: A Study of Generative Transformer Models Using Domain-specific Dialogues
@@ -356,6 +382,7 @@
2022.bionlp-1.27
das-etal-2022-conversational
WebText
+ 10.18653/v1/2022.bionlp-1.27
BEEDS: Large-Scale Biomedical Event Extraction using Distant Supervision and Question Answering
@@ -367,6 +394,7 @@
2022.bionlp-1.28
wang-etal-2022-beeds
wangxii/beeds
+ 10.18653/v1/2022.bionlp-1.28
Data Augmentation for Rare Symptoms in Vaccine Side-Effect Detection
@@ -376,6 +404,7 @@
We study the problem of entity detection and normalization applied to patient self-reports of symptoms that arise as side-effects of vaccines. Our application domain presents unique challenges that render traditional classification methods ineffective: the number of entity types is large; and many symptoms are rare, resulting in a long-tail distribution of training examples per entity type. We tackle these challenges with an autoregressive model that generates standardized names of symptoms. We introduce a data augmentation technique to increase the number of training examples for rare symptoms. Experiments on real-life patient vaccine symptom self-reports show that our approach outperforms strong baselines, and that additional examples improve performance on the long-tail entities.
2022.bionlp-1.29
kim-nakashole-2022-data
+ 10.18653/v1/2022.bionlp-1.29
Improving Romanian BioNER Using a Biologically Inspired System
@@ -385,6 +414,7 @@
Recognition of named entities present in text is an important step towards information extraction and natural language understanding. This work presents a named entity recognition system for the Romanian biomedical domain. The system makes use of a new and extended version of SiMoNERo corpus, that is open sourced. Also, the best system is available for direct usage in the RELATE platform.
2022.bionlp-1.30
mitrofan-pais-2022-improving
+ 10.18653/v1/2022.bionlp-1.30
BanglaBioMed: A Biomedical Named-Entity Annotated Corpus for Bangla (Bengali)
@@ -394,6 +424,7 @@
2022.bionlp-1.31
sazzed-2022-banglabiomed
CoWeSe
+ 10.18653/v1/2022.bionlp-1.31
ICDBigBird: A Contextual Embedding Model for ICD Code Classification
@@ -406,6 +437,7 @@
The International Classification of Diseases (ICD) system is the international standard for classifying diseases and procedures during a healthcare encounter and is widely used for healthcare reporting and management purposes. Assigning correct codes for clinical procedures is important for clinical, operational and financial decision-making in healthcare. Contextual word embedding models have achieved state-of-the-art results in multiple NLP tasks. However, these models have yet to achieve state-of-the-art results in the ICD classification task since one of their main disadvantages is that they can only process documents that contain a small number of tokens which is rarely the case with real patient notes. In this paper, we introduce ICDBigBird a BigBird-based model which can integrate a Graph Convolutional Network (GCN), that takes advantage of the relations between ICD codes in order to create ‘enriched’ representations of their embeddings, with a BigBird contextual model that can process larger documents. Our experiments on a real-world clinical dataset demonstrate the effectiveness of our BigBird-based model on the ICD classification task as it outperforms the previous state-of-the-art models.
2022.bionlp-1.32
michalopoulos-etal-2022-icdbigbird
+ 10.18653/v1/2022.bionlp-1.32
Doctor XAvIer: Explainable Diagnosis on Physician-Patient Dialogues and XAI Evaluation
@@ -416,6 +448,7 @@
2022.bionlp-1.33
ngai-rudzicz-2022-doctor
hillary-ngai/doctor_xavier
+ 10.18653/v1/2022.bionlp-1.33
DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resource Entity Extraction Using Clinical Trials Literature
@@ -425,6 +458,7 @@
PICO recognition is an information extraction task for identifying participant, intervention, comparator, and outcome information from clinical literature. Manually identifying PICO information is the most time-consuming step for conducting systematic reviews (SR), which is already labor-intensive. A lack of diversified and large, annotated corpora restricts innovation and adoption of automated PICO recognition systems. The largest-available PICO entity/span corpus is manually annotated which is too expensive for a majority of the scientific community. To break through the bottleneck, we propose DISTANT-CTO, a novel distantly supervised PICO entity extraction approach using the clinical trials literature, to generate a massive weakly-labeled dataset with more than a million ‘Intervention’ and ‘Comparator’ entity annotations. We train distant NER (named-entity recognition) models using this weakly-labeled dataset and demonstrate that it outperforms even the sophisticated models trained on the manually annotated dataset with a 2% F1 improvement over the Intervention entity of the PICO benchmark and more than 5% improvement when combined with the manually annotated dataset. We investigate the generalizability of our approach and gain an impressive F1 score on another domain-specific PICO benchmark. The approach is not only zero-cost but is also scalable for a constant stream of PICO entity annotations.
2022.bionlp-1.34
dhrangadhariya-muller-2022-distant
+ 10.18653/v1/2022.bionlp-1.34
EchoGen: Generating Conclusions from Echocardiogram Notes
@@ -440,6 +474,7 @@
2022.bionlp-1.35
tang-etal-2022-echogen
MIMIC-III
+ 10.18653/v1/2022.bionlp-1.35
Quantifying Clinical Outcome Measures in Patients with Epilepsy Using the Electronic Health Record
@@ -451,6 +486,7 @@
A wealth of important clinical information lies untouched in the Electronic Health Record, often in the form of unstructured textual documents. For patients with Epilepsy, such information includes outcome measures like Seizure Frequency and Dates of Last Seizure, key parameters that guide all therapy for these patients. Transformer models have been able to extract such outcome measures from unstructured clinical note text as sentences with human-like accuracy; however, these sentences are not yet usable in a quantitative analysis for large-scale studies. In this study, we developed a pipeline to quantify these outcome measures. We used text summarization models to convert unstructured sentences into specific formats, and then employed rules-based quantifiers to calculate seizure frequencies and dates of last seizure. We demonstrated that our pipeline of models does not excessively propagate errors and we analyzed its mistakes. We anticipate that our methods can be generalized outside of epilepsy to other disorders to drive large-scale clinical research.
2022.bionlp-1.36
xie-etal-2022-quantifying
+ 10.18653/v1/2022.bionlp-1.36
Comparing Encoder-Only and Encoder-Decoder Transformers for Relation Extraction from Biomedical Texts: An Empirical Study on Ten Benchmark Datasets
@@ -462,6 +498,7 @@
2022.bionlp-1.37
sarrouti-etal-2022-comparing
DDI
+ 10.18653/v1/2022.bionlp-1.37
Utility Preservation of Clinical Text After De-Identification
@@ -471,6 +508,7 @@
Electronic health records contain valuable information about symptoms, diagnosis, treatment and outcomes of the treatments of individual patients. However, the records may also contain information that can reveal the identity of the patients. Removing these identifiers - the Protected Health Information (PHI) - can protect the identity of the patient. Automatic de-identification is a process which employs machine learning techniques to detect and remove PHI. However, automatic techniques are imperfect in their precision and introduce noise into the data. This study examines the impact of this noise on the utility of Swedish de-identified clinical data by using human evaluators and by training and testing BERT models. Our results indicate that de-identification does not harm the utility for clinical NLP and that human evaluators are less sensitive to noise from de-identification than expected.
2022.bionlp-1.38
vakili-dalianis-2022-utility
+ 10.18653/v1/2022.bionlp-1.38
Horses to Zebras: Ontology-Guided Data Augmentation and Synthesis for ICD-9 Coding
@@ -483,6 +521,7 @@
2022.bionlp-1.39
falis-etal-2022-horses
MIMIC-III
+ 10.18653/v1/2022.bionlp-1.39
Towards Automatic Curation of Antibiotic Resistance Genes via Statement Extraction from Scientific Papers: A Benchmark Dataset and Models
@@ -495,6 +534,7 @@
2022.bionlp-1.40
chandak-etal-2022-towards
vt-nlp/sciarg
+ 10.18653/v1/2022.bionlp-1.40
Model Distillation for Faithful Explanations of Medical Code Predictions
@@ -505,6 +545,7 @@
Machine learning models that offer excellent predictive performance often lack the interpretability necessary to support integrated human machine decision-making. In clinical medicine and other high-risk settings, domain experts may be unwilling to trust model predictions without explanations. Work in explainable AI must balance competing objectives along two different axes: 1) Models should ideally be both accurate and simple. 2) Explanations must balance faithfulness to the model’s decision-making with their plausibility to a domain expert. We propose to use knowledge distillation, or training a student model that mimics the behavior of a trained teacher model, as a technique to generate faithful and plausible explanations. We evaluate our approach on the task of assigning ICD codes to clinical notes to demonstrate that the student model is faithful to the teacher model’s behavior and produces quality natural language explanations.
2022.bionlp-1.41
wood-doughty-etal-2022-model
+ 10.18653/v1/2022.bionlp-1.41
Towards Generalizable Methods for Automating Risk Score Calculation
@@ -521,6 +562,7 @@
liang-etal-2022-towards
MIMIC-III
emrQA
+ 10.18653/v1/2022.bionlp-1.42
DoSSIER at MedVidQA 2022: Text-based Approaches to Medical Video Answer Localization Problem
@@ -534,6 +576,7 @@
2022.bionlp-1.43
kusa-etal-2022-dossier
MedVidQA
+ 10.18653/v1/2022.bionlp-1.43
diff --git a/data/xml/2022.cmcl.xml b/data/xml/2022.cmcl.xml
index ae2be7a3ad..b5d65c0ebc 100644
--- a/data/xml/2022.cmcl.xml
+++ b/data/xml/2022.cmcl.xml
@@ -31,6 +31,7 @@
DannyMerkx/speech2image
COCO
ImageNet
+ 10.18653/v1/2022.cmcl-1.1
A Neural Model for Compositional Word Embeddings and Sentence Processing
@@ -40,6 +41,7 @@
We propose a new neural model for word embeddings, which uses Unitary Matrices as the primary device for encoding lexical information. It uses simple matrix multiplication to derive matrices for large units, yielding a sentence processing model that is strictly compositional, does not lose information over time steps, and is transparent, in the sense that word embeddings can be analysed regardless of context. This model does not employ activation functions, and so the network is fully accessible to analysis by the methods of linear algebra at each point in its operation on an input sequence. We test it in two NLP agreement tasks and obtain rule like perfect accuracy, with greater stability than current state-of-the-art systems. Our proposed model goes some way towards offering a class of computationally powerful deep learning systems that can be fully understood and compared to human cognitive processes for natural language learning and representation.
2022.cmcl-1.2
lappin-bernardy-2022-neural
+ 10.18653/v1/2022.cmcl-1.2
Visually Grounded Interpretation of Noun-Noun Compounds in English
@@ -52,6 +54,7 @@
2022.cmcl-1.3
lang-etal-2022-visually
ImageNet
+ 10.18653/v1/2022.cmcl-1.3
Less Descriptive yet Discriminative: Quantifying the Properties of Multimodal Referring Utterances via CLIP
@@ -63,6 +66,7 @@
2022.cmcl-1.4
takmaz-etal-2022-less
ecekt/clip-desc-disc
+ 10.18653/v1/2022.cmcl-1.4
Codenames as a Game of Co-occurrence Counting
@@ -75,6 +79,7 @@
2022.cmcl-1.5
cserhati-etal-2022-codenames
xerevity/codenamesagent
+ 10.18653/v1/2022.cmcl-1.5
Estimating word co-occurrence probabilities from pretrained static embeddings using a log-bilinear model
@@ -83,6 +88,7 @@
We investigate how to use pretrained static word embeddings to deliver improved estimates of bilexical co-occurrence probabilities: conditional probabilities of one word given a single other word in a specific relationship. Such probabilities play important roles in psycholinguistics, corpus linguistics, and usage-based cognitive modeling of language more generally. We propose a log-bilinear model taking pretrained vector representations of the two words as input, enabling generalization based on the distributional information contained in both vectors. We show that this model outperforms baselines in estimating probabilities of adjectives given nouns that they attributively modify, and probabilities of nominal direct objects given their head verbs, given limited training data in Arabic, English, Korean, and Spanish.
2022.cmcl-1.6
futrell-2022-estimating
+ 10.18653/v1/2022.cmcl-1.6
Modeling the Relationship between Input Distributions and Learning Trajectories with the Tolerance Principle
@@ -91,6 +97,7 @@
Child language learners develop with remarkable uniformity, both in their learning trajectories and ultimate outcomes, despite major differences in their learning environments. In this paper, we explore the role that the frequencies and distributions of irregular lexical items in the input plays in driving learning trajectories. We conclude that while the Tolerance Principle, a type-based model of productivity learning, accounts for inter-learner uniformity, it also interacts with input distributions to drive cross-linguistic variation in learning trajectories.
2022.cmcl-1.7
kodner-2022-modeling
+ 10.18653/v1/2022.cmcl-1.7
Predicting scalar diversity with context-driven uncertainty over alternatives
@@ -101,6 +108,7 @@
Scalar implicature (SI) arises when a speaker uses an expression (e.g., “some”) that is semantically compatible with a logically stronger alternative on the same scale (e.g., “all”), leading the listener to infer that they did not intend to convey the stronger meaning. Prior work has demonstrated that SI rates are highly variable across scales, raising the question of what factors determine the SI strength for a particular scale. Here, we test the hypothesis that SI rates depend on the listener’s confidence in the underlying scale, which we operationalize as uncertainty over the distribution of possible alternatives conditioned on the context. We use a T5 model fine-tuned on a text infilling task to estimate this distribution. We find that scale uncertainty predicts human SI rates, measured as entropy over the sampled alternatives and over latent classes among alternatives in sentence embedding space. Furthermore, we do not find a significant effect of the surprisal of the strong scalemate. Our results suggest that pragmatic inferences depend on listeners’ context-driven uncertainty over alternatives.
2022.cmcl-1.8
hu-etal-2022-predicting
+ 10.18653/v1/2022.cmcl-1.8
Eye Gaze and Self-attention: How Humans and Transformers Attend Words in Sentences
@@ -119,6 +127,7 @@
GLUE
MovieQA
SuperGLUE
+ 10.18653/v1/2022.cmcl-1.9
About Time: Do Transformers Learn Temporal Verbal Aspect?
@@ -130,6 +139,7 @@
2022.cmcl-1.10
metheniti-etal-2022-time
lenakmeth/telicity_classification
+ 10.18653/v1/2022.cmcl-1.10
Poirot at CMCL 2022 Shared Task: Zero Shot Crosslingual Eye-Tracking Data Prediction using Multilingual Transformer Models
@@ -138,6 +148,7 @@
Eye tracking data during reading is a useful source of information to understand the cognitive processes that take place during language comprehension processes. Different languages account for different cognitive triggers, however there seems to be some uniform indicatorsacross languages. In this paper, we describe our submission to the CMCL 2022 shared task on predicting human reading patterns for multi-lingual dataset. Our model uses text representations from transformers and some hand engineered features with a regression layer on top to predict statistical measures of mean and standard deviation for 2 main eye-tracking features. We train an end-to-end model to extract meaningful information from different languages and test our model on two separate datasets. We compare different transformer models andshow ablation studies affecting model performance. Our final submission ranked 4th place for SubTask-1 and 1st place for SubTask-2 forthe shared task.
2022.cmcl-1.11
srivastava-2022-poirot
+ 10.18653/v1/2022.cmcl-1.11
NU HLT at CMCL 2022 Shared Task: Multilingual and Crosslingual Prediction of Human Reading Behavior in Universal Language Space
@@ -147,6 +158,7 @@
2022.cmcl-1.12
imperial-2022-nu
imperialite/cmcl2022-unified-eye-tracking-ipa
+ 10.18653/v1/2022.cmcl-1.12
HkAmsters at CMCL 2022 Shared Task: Predicting Eye-Tracking Data from a Gradient Boosting Framework with Linguistic Features
@@ -157,6 +169,7 @@
Eye movement data are used in psycholinguistic studies to infer information regarding cognitive processes during reading. In this paper, we describe our proposed method for the Shared Task of Cognitive Modeling and Computational Linguistics (CMCL) 2022 - Subtask 1, which involves data from multiple datasets on 6 languages. We compared different regression models using features of the target word and its previous word, and target word surprisal as regression features. Our final system, using a gradient boosting regressor, achieved the lowest mean absolute error (MAE), resulting in the best system of the competition.
2022.cmcl-1.13
salicchi-etal-2022-hkamsters
+ 10.18653/v1/2022.cmcl-1.13
CMCL 2022 Shared Task on Multilingual and Crosslingual Prediction of Human Reading Behavior
@@ -170,6 +183,7 @@
We present the second shared task on eye-tracking data prediction of the Cognitive Modeling and Computational Linguistics Workshop (CMCL). Differently from the previous edition, participating teams are asked to predict eye-tracking features from multiple languages, including a surprise language for which there were no available training data. Moreover, the task also included the prediction of standard deviations of feature values in order to account for individual differences between readers.A total of six teams registered to the task. For the first subtask on multilingual prediction, the winning team proposed a regression model based on lexical features, while for the second subtask on cross-lingual prediction, the winning team used a hybrid model based on a multilingual transformer embeddings as well as statistical features.
2022.cmcl-1.14
hollenstein-etal-2022-cmcl
+ 10.18653/v1/2022.cmcl-1.14
Team ÚFAL at CMCL 2022 Shared Task: Figuring out the correct recipe for predicting Eye-Tracking features using Pretrained Language Models
@@ -180,6 +194,7 @@
Eye-Tracking data is a very useful source of information to study cognition and especially language comprehension in humans. In this paper, we describe our systems for the CMCL 2022 shared task on predicting eye-tracking information. We describe our experiments withpretrained models like BERT and XLM and the different ways in which we used those representations to predict four eye-tracking features. Along with analysing the effect of using two different kinds of pretrained multilingual language models and different ways of pooling the token-level representations, we also explore how contextual information affects the performance of the systems. Finally, we also explore if factors like augmenting linguistic information affect the predictions. Our submissions achieved an average MAE of 5.72 and ranked 5th in the shared task. The average MAE showed further reduction to 5.25 in post task evaluation.
2022.cmcl-1.15
bhattacharya-etal-2022-team
+ 10.18653/v1/2022.cmcl-1.15
Team DMG at CMCL 2022 Shared Task: Transformer Adapters for the Multi- and Cross-Lingual Prediction of Human Reading Behavior
@@ -188,6 +203,7 @@
In this paper, we present the details of our approaches that attained the second place in the shared task of the ACL 2022 Cognitive Modeling and Computational Linguistics Workshop. The shared task is focused on multi- and cross-lingual prediction of eye movement features in human reading behavior, which could provide valuable information regarding language processing. To this end, we train ‘adapters’ inserted into the layers of frozen transformer-based pretrained language models. We find that multilingual models equipped with adapters perform well in predicting eye-tracking features. Our results suggest that utilizing language- and task-specific adapters is beneficial and translating test sets into similar languages that exist in the training set could help with zero-shot transferability in the prediction of human reading behavior.
2022.cmcl-1.16
takmaz-2022-team
+ 10.18653/v1/2022.cmcl-1.16
diff --git a/data/xml/2022.computel.xml b/data/xml/2022.computel.xml
index e198a7aca7..6b262d154e 100644
--- a/data/xml/2022.computel.xml
+++ b/data/xml/2022.computel.xml
@@ -31,6 +31,7 @@
In this paper we present the speech corpus for the Siberian Ingrian Finnish language. The speech corpus includes audio data, annotations, software tools for data-processing, two databases and a web application. We have published part of the audio data and annotations. The software tool for parsing annotation files and feeding a relational database is developed and published under a free license. A web application is developed and available. At this moment, about 300 words and 200 phrases can be displayed using this web application.
2022.computel-1.1
ubaleht-raudalainen-2022-development
+ 10.18653/v1/2022.computel-1.1
New syntactic insights for automated Wolof Universal Dependency parsing
@@ -39,6 +40,7 @@
Focus on language-specific properties with insights from formal minimalist syntax can improve universal dependency (UD) parsing. Such improvements are especially sensitive for low-resource African languages, like Wolof, which have fewer UD treebanks in number and amount of annotations, and fewer contributing annotators. For two different UD parser pipelines, one parser model was trained on the original Wolof treebank, and one was trained on an edited treebank. For each parser pipeline, the accuracy of the edited treebank was higher than the original for both the dependency relations and dependency labels. Accuracy for universal dependency relations improved as much as 2.90%, while accuracy for universal dependency labels increased as much as 3.38%. An annotation scheme that better fits a language’s distinct syntax results in better parsing accuracy.
2022.computel-1.2
dyer-2022-new
+ 10.18653/v1/2022.computel-1.2
Corpus Development of Kiswahili Speech Recognition Test and Evaluation sets, Preemptively Mitigating Demographic Bias Through Collaboration with Linguists
@@ -53,6 +55,7 @@
Language technologies, particularly speech technologies, are becoming more pervasive for access to digital platforms and resources. This brings to the forefront concerns of their inclusivity, first in terms of language diversity. Additionally, research shows speech recognition to be more accurate for men than for women and more accurate for individuals younger than 30 years of age than those older. In the Global South where languages are low resource, these same issues should be taken into consideration in data collection efforts to not replicate these mistakes. It is also important to note that in varying contexts within the Global South, this work presents additional nuance and potential for bias based on accents, related dialects and variants of a language. This paper documents i) the designing and execution of a Linguists Engagement for purposes of building an inclusive Kiswahili Speech Recognition dataset, representative of the diversity among speakers of the language ii) the unexpected yet key learning in terms of socio-linguistcs which demonstrate the importance of multi-disciplinarity in teams developing datasets and NLP technologies iii) the creation of a test dataset intended to be used for evaluating the performance of Speech Recognition models on demographic groups that are likely to be underrepresented.
2022.computel-1.3
siminyu-etal-2022-corpus
+ 10.18653/v1/2022.computel-1.3
CLD² Language Documentation Meets Natural Language Processing for Revitalising Endangered Languages
@@ -63,6 +66,7 @@
Language revitalisation should not be understood as a direct outcome of language documentation, which is mainly focused on the creation of language repositories. Natural language processing (NLP) offers the potential to complement and exploit these repositories through the development of language technologies that may contribute to improving the vitality status of endangered languages. In this paper, we discuss the current state of the interaction between language documentation and computational linguistics, present a diagnosis of how the outputs of recent documentation projects for endangered languages are underutilised for the NLP community, and discuss how the situation could change from both the documentary linguistics and NLP perspectives. All this is introduced as a bridging paradigm dubbed as Computational Language Documentation and Development (CLD²). CLD² calls for (1) the inclusion of NLP-friendly annotated data as a deliverable of future language documentation projects; and (2) the exploitation of language documentation databases by the NLP community to promote the computerization of endangered languages, as one way to contribute to their revitalization.
2022.computel-1.4
zariquiey-etal-2022-cld2
+ 10.18653/v1/2022.computel-1.4
One Wug, Two Wug+s Transformer Inflection Models Hallucinate Affixes
@@ -72,6 +76,7 @@
Data augmentation strategies are increasingly important in NLP pipelines for low-resourced and endangered languages, and in neural morphological inflection, augmentation by so called data hallucination is a popular technique. This paper presents a detailed analysis of inflection models trained with and without data hallucination for the low-resourced Canadian Indigenous language Gitksan. Our analysis reveals evidence for a concatenative inductive bias in augmented models—in contrast to models trained without hallucination, they strongly prefer affixing inflection patterns over suppletive ones. We find that preference for affixation in general improves inflection performance in “wug test” like settings, where the model is asked to inflect lexemes missing from the training set. However, data hallucination dramatically reduces prediction accuracy for reduplicative forms due to a misanalysis of reduplication as affixation. While the overall impact of data hallucination for unseen lexemes remains positive, our findings call for greater qualitative analysis and more varied evaluation conditions in testing automatic inflection systems. Our results indicate that further innovations in data augmentation for computational morphology are desirable.
2022.computel-1.5
samir-silfverberg-2022-one
+ 10.18653/v1/2022.computel-1.5
Automated speech tools for helping communities process restricted-access corpora for language revival efforts
@@ -88,6 +93,7 @@
Many archival recordings of speech from endangered languages remain unannotated and inaccessible to community members and language learning programs. One bottleneck is the time-intensive nature of annotation. An even narrower bottleneck occurs for recordings with access constraints, such as language that must be vetted or filtered by authorised community members before annotation can begin. We propose a privacy-preserving workflow to widen both bottlenecks for recordings where speech in the endangered language is intermixed with a more widely-used language such as English for meta-linguistic commentary and questions (e.g.What is the word for ‘tree’?). We integrate voice activity detection (VAD), spoken language identification (SLI), and automatic speech recognition (ASR) to transcribe the metalinguistic content, which an authorised person can quickly scan to triage recordings that can be annotated by people with lower levels of access. We report work-in-progress processing 136 hours archival audio containing a mix of English and Muruwari. Our collaborative work with the Muruwari custodian of the archival materials show that this workflow reduces metalanguage transcription time by 20% even given only minimal amounts of annotated training data, 10 utterances per language for SLI and for ASR at most 39 minutes, and possibly as little as 39 seconds.
2022.computel-1.6
san-etal-2022-automated
+ 10.18653/v1/2022.computel-1.6
G_i2P_i Rule-based, index-preserving grapheme-to-phoneme transformations
@@ -105,6 +111,7 @@
This paper describes the motivation and implementation details for a rule-based, index-preserving grapheme-to-phoneme engine ‘G_i2P_i' implemented in pure Python and released under the open source MIT license. The engine and interface have been designed to prioritize the developer experience of potential contributors without requiring a high level of programming knowledge. ‘G_i2P_i' already provides mappings for 30 (mostly Indigenous) languages, and the package is accompanied by a web-based interactive development environment, a RESTful API, and extensive documentation to encourage the addition of more mappings in the future. We also present three downstream applications of ‘G_i2P_i' and show results of a preliminary evaluation.
2022.computel-1.7
pine-etal-2022-gi22pi
+ 10.18653/v1/2022.computel-1.7
Shallow Parsing for Nepal Bhasa Complement Clauses
@@ -115,6 +122,7 @@
Accelerating the process of data collection, annotation, and analysis is an urgent need for linguistic fieldwork and documentation of endangered languages (Bird, 2009). Our experiments describe how we maximize the quality for the Nepal Bhasa syntactic complement structure chunking model. Native speaker language consultants were trained to annotate a minimally selected raw data set (Suárez et al.,2019). The embedded clauses, matrix verbs, and embedded verbs are annotated. We apply both statistical training algorithms and transfer learning in our training, including Naive Bayes, MaxEnt, and fine-tuning the pre-trained mBERT model (Devlin et al., 2018). We show that with limited annotated data, the model is already sufficient for the task. The modeling resources we used are largely available for many other endangered languages. The practice is easy to duplicate for training a shallow parser for other endangered languages in general.
2022.computel-1.8
zhang-etal-2022-shallow
+ 10.18653/v1/2022.computel-1.8
Using LARA to create image-based and phonetically annotated multimodal texts for endangered languages
@@ -131,6 +139,7 @@
We describe recent extensions to the open source Learning And Reading Assistant (LARA) supporting image-based and phonetically annotated texts. We motivate the utility of these extensions both in general and specifically in relation to endangered and archaic languages, and illustrate with examples from the revived Australian language Barngarla, Icelandic Sign Language, Irish Gaelic, Old Norse manuscripts and Egyptian hieroglyphics.
2022.computel-1.9
bedi-etal-2022-using
+ 10.18653/v1/2022.computel-1.9
Recovering Text from Endangered Languages Corrupted PDF documents
@@ -139,6 +148,7 @@
In this paper we present an approach to efficiently recover texts from corrupted documents of endangered languages. Textual resources for such languages are scarce, and sometimes the few available resources are corrupted PDF documents. Endangered languages are not supported by standard tools and present even the additional difficulties of not possessing any corpus over which to train language models to assist with the recovery. The approach presented is able to fully recover born digital PDF documents with minimal effort, thereby helping the preservation effort of endangered languages, by extending the range of documents usable for corpus building.
2022.computel-1.10
stefanovitch-2022-recovering
+ 10.18653/v1/2022.computel-1.10
Learning Through Transcription
@@ -148,6 +158,7 @@
Transcribing speech for primarily oral, local languages is often a joint effort involving speakers and outsiders. It is commonly motivated by externally-defined scientific goals, alongside local motivations such as language acquisition and access to heritage materials. We explore the task of ‘learning through transcription’ through the design of a system for collaborative speech annotation. We have developed a prototype to support local and remote learner-speaker interactions in remote Aboriginal communities in northern Australia. We show that situated systems design for inclusive non-expert practice is a promising new direction for working with speakers of local languages.
2022.computel-1.11
bettinson-bird-2022-learning
+ 10.18653/v1/2022.computel-1.11
Developing a Part-Of-Speech tagger for te reo Māori
@@ -160,6 +171,7 @@
This paper discusses the development of a Part-of-Speech tagger for te reo Māori which is the Indigenous language of Aotearoa, also known as New Zealand, see Morrison. Henceforth, Part-of-Speech will be referred to as POS throughout this paper and te reo Māori will be referred to as Māori, while Universal Dependencies will be referred to as UD. Prior to the development of this tagger, there was no POS tagger for Māori from Aotearoa. POS taggers tag words according to their syntactic or grammatical category. However, many traditional syntactic categories, and by consequence POS labels, do not “work for” Māori. By this we mean that, for some of the traditional categories, The definition of, or guidelines for, an existing category is not suitable for Māori. They do not have an existing category for certain word classes of Māori. They do not reflect a Māori worldview of the Māori language. We wanted a tagset that is usable with industry-wide tools, but we also needed a tagset that would meet the needs of Māori. Therefore, we based our tagset and guidelines on the UD tagset and tagging conventions, however the categorization of words has been significantly altered to be appropriate for Māori. This is because at the time of development of our POS tagger, the UD conventions had still not been used to tag a Polyneisan language such as Māori, nor did it provide any guidelines about how to tag them. To that end, we worked with highly-proficient, specially-selected Māori speakers and linguists who are specialists in Māori. This has ensured that our POS labels and guidelines conventions faithfully reflect a Māori speaker’s conceptualization of their language.
2022.computel-1.12
finn-etal-2022-developing
+ 10.18653/v1/2022.computel-1.12
Challenges and Perspectives for Innu-Aimun within Indigenous Language Technologies
@@ -171,6 +183,7 @@
Innu-Aimun is an Algonquian language spoken in Eastern Canada. It is the language of the Innu, an Indigenous people that now lives for the most part in a dozen communities across Quebec and Labrador. Although it is alive, Innu-Aimun sees important preservation and revitalization challenges and issues. The state of its technology is still nascent, with very few existing applications. This paper proposes a first survey of the available linguistic resources and existing technology for Innu-Aimun. Considering the existing linguistic and textual resources, we argue that developing language technology is feasible and propose first steps towards NLP applications like machine translation. The goal of developing such technologies is first and foremost to help efforts in improving language transmission and cultural safety and preservation for Innu-Aimun speakers, as those are considered urgent and vital issues. Finally, we discuss the importance of close collaboration and consultation with the Innu community in order to ensure that language technologies are developed respectfully and in accordance with that goal.
2022.computel-1.13
cadotte-etal-2022-challenges
+ 10.18653/v1/2022.computel-1.13
Using Speech and NLP Resources to build an iCALL platform for a minority language, the story of An Scéalaí, the Irish experience to date
@@ -184,6 +197,7 @@
This paper describes how emerging linguistic resources and technologies can be used to build a language learning platform for Irish, an endangered language. This platform, An Scéalaí, harvests learner corpora - a vital resource both to study the stages of learners’ language acquisition and to guide future platform development. A technical description of the platform is provided, including details of how different speech technologies and linguistic resources are fused to provide a holistic learner experience. The active continuous participation of the community, and platform evaluations by learners and teachers, are discussed.
2022.computel-1.14
ni-chiarain-etal-2022-using
+ 10.18653/v1/2022.computel-1.14
Closing the NLP Gap: Documentary Linguistics and NLP Need a Shared Software Infrastructure
@@ -192,6 +206,7 @@
For decades, researchers in natural language processing and computational linguistics have been developing models and algorithms that aim to serve the needs of language documentation projects. However, these models have seen little use in language documentation despite their great potential for making documentary linguistic artefacts better and easier to produce. In this work, we argue that a major reason for this NLP gap is the lack of a strong foundation of application software which can on the one hand serve the complex needs of language documentation and on the other hand provide effortless integration with NLP models. We further present and describe a work-in-progress system we have developed to serve this need, Glam.
2022.computel-1.15
gessler-2022-closing
+ 10.18653/v1/2022.computel-1.15
Can We Use Word Embeddings for Enhancing Guarani-Spanish Machine Translation?
@@ -203,6 +218,7 @@
2022.computel-1.16
gongora-etal-2022-use
sgongora27/Guarani-embeddings-for-MT
+ 10.18653/v1/2022.computel-1.16
Faoi Gheasa an adaptive game for Irish language learning
@@ -213,6 +229,7 @@
In this paper, we present a game with a purpose (GWAP) (Von Ahn 2006). The aim of the game is to promote language learning and ‘noticing’ (Skehan, 2013). The game has been designed for Irish, but the framework could be used for other languages. Irish is a minority language which means that L2 learners have limited opportunities for exposure to the language, and additionally, there are also limited (digital) learning resources available. This research incorporates game development, language pedagogy and ICALL language materials development. This paper will focus on the language materials development as this is a bottleneck in the teaching and learning of minority and endangered languages.
2022.computel-1.17
xu-etal-2022-faoi
+ 10.18653/v1/2022.computel-1.17
Using Graph-Based Methods to Augment Online Dictionaries of Endangered Languages
@@ -224,6 +241,7 @@
Many endangered Uralic languages have multilingual machine readable dictionaries saved in an XML format. However, the dictionaries cover translations very inconsistently between language pairs, for instance, the Livonian dictionary has some translations to Finnish, Latvian and Estonian, and the Komi-Zyrian dictionary has some translations to Finnish, English and Russian. We utilize graph-based approaches to augment such dictionaries by predicting new translations to existing and new languages based on different dictionaries for endangered languages and Wiktionaries. Our study focuses on the lexical resources for Komi-Zyrian (kpv), Erzya (myv) and Livonian (liv). We evaluate our approach by human judges fluent in the three endangered languages in question. Based on the evaluation, the method predicted good or acceptable translations 77% of the time. Furthermore, we train a neural prediction model to predict the quality of the automatically predicted translations with an 81% accuracy. The resulting extensions to the dictionaries are made available on the online dictionary platform used by the speakers of these languages.
2022.computel-1.18
alnajjar-etal-2022-using
+ 10.18653/v1/2022.computel-1.18
Reusing a Multi-lingual Setup to Bootstrap a Grammar Checker for a Very Low Resource Language without Data
@@ -234,6 +252,7 @@
Grammar checkers (GEC) are needed for digital language survival. Very low resource languages like Lule Sámi with less than 3,000 speakers need to hurry to build these tools, but do not have the big corpus data that are required for the construction of machine learning tools. We present a rule-based tool and a workflow where the work done for a related language can speed up the process. We use an existing grammar to infer rules for the new language, and we do not need a large gold corpus of annotated grammar errors, but a smaller corpus of regression tests is built while developing the tool. We present a test case for Lule Sámi reusing resources from North Sámi, show how we achieve a categorisation of the most frequent errors, and present a preliminary evaluation of the system. We hope this serves as an inspiration for small languages that need advanced tools in a limited amount of time, but do not have big data.
2022.computel-1.19
lill-sigga-mikkelsen-etal-2022-reusing
+ 10.18653/v1/2022.computel-1.19
A Word-and-Paradigm Workflow for Fieldwork Annotation
@@ -246,6 +265,7 @@
There are many challenges in morphological fieldwork annotation, it heavily relies on segmentation and feature labeling (which have both practical and theoretical drawbacks), it’s time-intensive, and the annotator needs to be linguistically trained and may still annotate things inconsistently. We propose a workflow that relies on unsupervised and active learning grounded in Word-and-Paradigm morphology (WP). Machine learning has the potential to greatly accelerate the annotation process and allow a human annotator to focus on problematic cases, while the WP approach makes for an annotation system that is word-based and relational, removing the need to make decisions about feature labeling and segmentation early in the process and allowing speakers of the language of interest to participate more actively, since linguistic training is not necessary. We present a proof-of-concept for the first step of the workflow, in a realistic fieldwork setting, annotators can process hundreds of forms per hour.
2022.computel-1.20
copot-etal-2022-word
+ 10.18653/v1/2022.computel-1.20
Fine-tuning pre-trained models for Automatic Speech Recognition, experiments on a fieldwork corpus of Japhug (Trans-Himalayan family)
@@ -263,6 +283,7 @@
This is a report on results obtained in the development of speech recognition tools intended to support linguistic documentation efforts. The test case is an extensive fieldwork corpus of Japhug, an endangered language of the Trans-Himalayan (Sino-Tibetan) family. The goal is to reduce the transcription workload of field linguists. The method used is a deep learning approach based on the language-specific tuning of a generic pre-trained representation model, XLS-R, using a Transformer architecture. We note difficulties in implementation, in terms of learning stability. But this approach brings significant improvements nonetheless. The quality of phonemic transcription is improved over earlier experiments; and most significantly, the new approach allows for reaching the stage of automatic word recognition. Subjective evaluation of the tool by the author of the training data confirms the usefulness of this approach.
2022.computel-1.21
guillaume-etal-2022-fine
+ 10.18653/v1/2022.computel-1.21
Morphologically annotated corpora of Pomak
@@ -280,6 +301,7 @@
The project XXXX is developing a platform to enable researchers of living languages to easily create and make available state-of-the-art spoken and textual annotated resources. As a case study we use Greek and Pomak, the latter being an endangered oral Slavic language of the Balkans (including Thrace/Greece). The linguistic documentation of Pomak is an ongoing work by an interdisciplinary team in close cooperation with the Pomak community of Greece. We describe our experience in the development of a Latin-based orthography and morphologically annotated text corpora of Pomak with state-of-the-art NLP technology. These resources will be made openly available on the XXXX site and the gold annotated corpora of Pomak will be made available on the Universal Dependencies treebank repository.
2022.computel-1.22
jusuf-karahoga-etal-2022-morphologically
+ 10.18653/v1/2022.computel-1.22
Enhancing Documentation of Hupa with Automatic Speech Recognition
@@ -290,6 +312,7 @@
This study investigates applications of automatic speech recognition (ASR) techniques to Hupa, a critically endangered Native American language from the Dene (Athabaskan) language family. Using around 9h12m of spoken data produced by one elder who is a first-language Hupa speaker, we experimented with different evaluation schemes and training settings. On average a fully connected deep neural network reached a word error rate of 35.26%. Our overall results illustrate the utility of ASR for making Hupa language documentation more accessible and usable. In addition, we found that when training acoustic models, using recordings with transcripts that were not carefully verified did not necessarily have a negative effect on model performance. This shows promise for speech corpora of indigenous languages that commonly include transcriptions produced by second-language speakers or linguists who have advanced knowledge in the language of interest.
2022.computel-1.23
liu-etal-2022-enhancing
+ 10.18653/v1/2022.computel-1.23
diff --git a/data/xml/2022.constraint.xml b/data/xml/2022.constraint.xml
index 0cbdbf1a75..ca2e33c2ed 100644
--- a/data/xml/2022.constraint.xml
+++ b/data/xml/2022.constraint.xml
@@ -33,6 +33,7 @@
We present the findings of the shared task at the CONSTRAINT 2022 Workshop: Hero, Villain, and Victim: Dissecting harmful memes for Semantic role labeling of entities. The task aims to delve deeper into the domain of meme comprehension by deciphering the connotations behind the entities present in a meme. In more nuanced terms, the shared task focuses on determining the victimizing, glorifying, and vilifying intentions embedded in meme entities to explicate their connotations. To this end, we curate HVVMemes, a novel meme dataset of about 7000 memes spanning the domains of COVID-19 and US Politics, each containing entities and their associated roles: hero, villain, victim, or none. The shared task attracted 105 participants, but eventually only 6 submissions were made. Most of the successful submissions relied on fine-tuning pre-trained language and multimodal models along with ensembles. The best submission achieved an F1-score of 58.67.
2022.constraint-1.1
sharma-etal-2022-findings
+ 10.18653/v1/2022.constraint-1.1
DD-TIG at Constraint@ACL2022: Multimodal Understanding and Reasoning for Role Labeling of Entities in Hateful Memes
@@ -47,6 +48,7 @@
zhou-etal-2022-dd
Hateful Memes
VCR
+ 10.18653/v1/2022.constraint-1.2
Are you a hero or a villain? A semantic role labelling approach for detecting harmful memes.
@@ -60,6 +62,7 @@
Identifying good and evil through representations of victimhood, heroism, and villainy (i.e., role labeling of entities) has recently caught the research community’s interest. Because of the growing popularity of memes, the amount of offensive information published on the internet is expanding at an alarming rate. It generated a larger need to address this issue and analyze the memes for content moderation. Framing is used to show the entities engaged as heroes, villains, victims, or others so that readers may better anticipate and understand their attitudes and behaviors as characters. Positive phrases are used to characterize heroes, whereas negative terms depict victims and villains, and terms that tend to be neutral are mapped to others. In this paper, we propose two approaches to role label the entities of the meme as hero, villain, victim, or other through Named-Entity Recognition(NER), Sentiment Analysis, etc. With an F1-score of 23.855, our team secured eighth position in the Shared Task @ Constraint 2022.
2022.constraint-1.3
fharook-etal-2022-hero
+ 10.18653/v1/2022.constraint-1.3
Logically at the Constraint 2022: Multimodal role labelling
@@ -70,6 +73,7 @@
This paper describes our system for the Constraint 2022 challenge at ACL 2022, whose goal is to detect which entities are glorified, vilified or victimised, within a meme . The task should be done considering the perspective of the meme’s author. In our work, the challenge is treated as a multi-class classification task. For a given pair of a meme and an entity, we need to classify whether the entity is being referenced as Hero, a Villain, a Victim or Other. Our solution combines (ensembling) different models based on Unimodal (Text only) model and Multimodal model (Text + Images). We conduct several experiments and benchmarks different competitive pre-trained transformers and vision models in this work. Our solution, based on an ensembling method, is ranked first on the leaderboard and obtains a macro F1-score of 0.58 on test set. The code for the experiments and results are available at https://bitbucket.org/logicallydevs/constraint_2022/src/master/
2022.constraint-1.4
kun-etal-2022-logically
+ 10.18653/v1/2022.constraint-1.4
Combining Language Models and Linguistic Information to Label Entities in Memes
@@ -80,6 +84,7 @@
This paper describes the system we developed for the shared task ‘Hero, Villain and Victim: Dissecting harmful memes for Semantic role labelling of entities’ organised in the framework of the Second Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situation (Constraint 2022). We present an ensemble approach combining transformer-based models and linguistic information, such as the presence of irony and implicit sentiment associated to the target named entities. The ensemble system obtains promising classification scores, resulting in a third place finish in the competition.
2022.constraint-1.5
singh-etal-2022-combining
+ 10.18653/v1/2022.constraint-1.5
Detecting the Role of an Entity in Harmful Memes: Techniques and their Limitations
@@ -93,6 +98,7 @@
robi56/harmful_memes_block_fusion
Hateful Memes
Hateful Memes Challenge
+ 10.18653/v1/2022.constraint-1.6
Fine-tuning and Sampling Strategies for Multimodal Role Labeling of Entities under Class Imbalance
@@ -104,6 +110,7 @@
We propose our solution to the multimodal semantic role labeling task from the CONSTRAINT’22 workshop. The task aims at classifying entities in memes into classes such as “hero” and “villain”. We use several pre-trained multi-modal models to jointly encode the text and image of the memes, and implement three systems to classify the role of the entities. We propose dynamic sampling strategies to tackle the issue of class imbalance. Finally, we perform qualitative analysis on the representations of the entities.
2022.constraint-1.7
montariol-etal-2022-fine
+ 10.18653/v1/2022.constraint-1.7
Document Retrieval and Claim Verification to Mitigate COVID-19 Misinformation
@@ -119,6 +126,7 @@
sundriyal-etal-2022-document
CORD-19
FEVER
+ 10.18653/v1/2022.constraint-1.8
M-BAD: A Multilabel Dataset for Detecting Aggressive Texts and Their Targets
@@ -129,6 +137,7 @@
Recently, detection and categorization of undesired (e. g., aggressive, abusive, offensive, hate) content from online platforms has grabbed the attention of researchers because of its detrimental impact on society. Several attempts have been made to mitigate the usage and propagation of such content. However, most past studies were conducted primarily for English, where low-resource languages like Bengali remained out of the focus. Therefore, to facilitate research in this arena, this paper introduces a novel multilabel Bengali dataset (named M-BAD) containing 15650 texts to detect aggressive texts and their targets. Each text of M-BAD went through rigorous two-level annotations. At the primary level, each text is labelled as either aggressive or non-aggressive. In the secondary level, the aggressive texts have been further annotated into five fine-grained target classes: religion, politics, verbal, gender and race. Baseline experiments are carried out with different machine learning (ML), deep learning (DL) and transformer models, where Bangla-BERT acquired the highest weighted f_1-score in both detection (0.92) and target identification (0.83) tasks. Error analysis of the models exhibits the difficulty to identify context-dependent aggression, and this work argues that further research is required to address these issues.
2022.constraint-1.9
sharif-etal-2022-bad
+ 10.18653/v1/2022.constraint-1.9
How does fake news use a thumbnail? CLIP-based Multimodal Detection on the Unrepresentative News Image
@@ -141,6 +150,7 @@
2022.constraint-1.10
choi-etal-2022-fake
ssu-humane/fake-news-thumbnail
+ 10.18653/v1/2022.constraint-1.10
Detecting False Claims in Low-Resource Regions: A Case Study of Caribbean Islands
@@ -153,6 +163,7 @@
2022.constraint-1.11
lucas-etal-2022-detecting
CoAID
+ 10.18653/v1/2022.constraint-1.11
diff --git a/data/xml/2022.csrr.xml b/data/xml/2022.csrr.xml
index fe8dac8214..e0b4e93554 100644
--- a/data/xml/2022.csrr.xml
+++ b/data/xml/2022.csrr.xml
@@ -34,6 +34,7 @@
CommonsenseQA
ConceptNet
OpenBookQA
+ 10.18653/v1/2022.csrr-1.1
Cloze Evaluation for Deeper Understanding of Commonsense Stories in Indonesian
@@ -45,6 +46,7 @@
2022.csrr-1.2
koto-etal-2022-cloze
ROCStories
+ 10.18653/v1/2022.csrr-1.2
Psycholinguistic Diagnosis of Language Models’ Commonsense Reasoning
@@ -55,6 +57,7 @@
cong-2022-psycholinguistic
yancong222/pragamtics-commonsense-lms
SuperGLUE
+ 10.18653/v1/2022.csrr-1.3
Bridging the Gap between Recognition-level Pre-training and Commonsensical Vision-language Tasks
@@ -71,6 +74,7 @@
Conceptual Captions
VCR
Visual Question Answering
+ 10.18653/v1/2022.csrr-1.4
Materialized Knowledge Bases from Commonsense Transformers
@@ -82,6 +86,7 @@
nguyen-razniewski-2022-materialized
ConceptNet
WebText
+ 10.18653/v1/2022.csrr-1.5
Knowledge-Augmented Language Models for Cause-Effect Relation Classification
@@ -96,6 +101,7 @@
BCOPA-CE
COPA
TCR
+ 10.18653/v1/2022.csrr-1.6
CURIE: An Iterative Querying Approach for Reasoning About Situations
@@ -114,6 +120,7 @@
QuaRTz
QuaRel
WIQA
+ 10.18653/v1/2022.csrr-1.7
diff --git a/data/xml/2022.deelio.xml b/data/xml/2022.deelio.xml
index 029bb68a5e..516c647df5 100644
--- a/data/xml/2022.deelio.xml
+++ b/data/xml/2022.deelio.xml
@@ -24,6 +24,7 @@
Cross-lingual Transfer Learning typically involves training a model on a high-resource sourcelanguage and applying it to a low-resource tar-get language. In this work we introduce a lexi-cal database calledValency Patterns Leipzig(ValPal)which provides the argument patterninformation about various verb-forms in mul-tiple languages including low-resource langua-ges. We also provide a framework to integratethe ValPal database knowledge into the state-of-the-art LSTM based model for cross-lingualsemantic role labelling. Experimental resultsshow that integrating such knowledge resultedin am improvement in performance of the mo-del on all the target languages on which it isevaluated.
2022.deelio-1.1
choudhary-oriordan-2022-cross
+ 10.18653/v1/2022.deelio-1.1
How Do Transformer-Architecture Models Address Polysemy of Korean Adverbial Postpositions?
@@ -34,6 +35,7 @@
2022.deelio-1.2
2022.deelio-1.2.software.zip
mun-desagulier-2022-transformer
+ 10.18653/v1/2022.deelio-1.2
Query Generation with External Knowledge for Dense Retrieval
@@ -52,6 +54,7 @@
SciDocs
SciFact
SimpleQuestions
+ 10.18653/v1/2022.deelio-1.3
Uncovering Values: Detecting Latent Moral Content from Natural Language with Explainable and Non-Trained Methods
@@ -67,6 +70,7 @@
asprino-etal-2022-uncovering
stendoipanni/moraldilemmas
DBpedia
+ 10.18653/v1/2022.deelio-1.4
Jointly Identifying and Fixing Inconsistent Readings from Information Extraction Systems
@@ -79,6 +83,7 @@
padia-etal-2022-jointly
FEVER
TACRED
+ 10.18653/v1/2022.deelio-1.5
KIQA: Knowledge-Infused Question Answering Model for Financial Table-Text Data
@@ -89,6 +94,7 @@
While entity retrieval models continue to advance their capabilities, our understanding of their wide-ranging applications is limited, especially in domain-specific settings. We highlighted this issue by using recent general-domain entity-linking models, LUKE and GENRE, to inject external knowledge into a question-answering (QA) model for a financial QA task with a hybrid tabular-textual dataset. We found that both models improved the baseline model by 1.57% overall and 8.86% on textual data. Nonetheless, the challenge remains as they still struggle to handle tabular inputs. We subsequently conducted a comprehensive attention-weight analysis, revealing how LUKE utilizes external knowledge supplied by GENRE. The analysis also elaborates how the injection of symbolic knowledge can be helpful and what needs further improvement, paving the way for future research on this challenging QA task and advancing our understanding of how a language model incorporates external knowledge.
2022.deelio-1.6
nararatwong-etal-2022-kiqa
+ 10.18653/v1/2022.deelio-1.6
Trans-KBLSTM: An External Knowledge Enhanced Transformer BiLSTM Model for Tabular Reasoning
@@ -101,6 +107,7 @@
varun-etal-2022-trans
ConceptNet
GLUE
+ 10.18653/v1/2022.deelio-1.7
Fast Few-shot Debugging for NLU Test Suites
@@ -113,6 +120,7 @@
malon-etal-2022-fast
necla-ml/debug-test-suites
SST
+ 10.18653/v1/2022.deelio-1.8
On Masked Language Models for Contextual Link Prediction
@@ -123,6 +131,7 @@
In the real world, many relational facts require context; for instance, a politician holds a given elected position only for a particular timespan. This context (the timespan) is typically ignored in knowledge graph link prediction tasks, or is leveraged by models designed specifically to make use of it (i.e. n-ary link prediction models). Here, we show that the task of n-ary link prediction is easily performed using language models, applied with a basic method for constructing cloze-style query sentences. We introduce a pre-training methodology based around an auxiliary entity-linked corpus that outperforms other popular pre-trained models like BERT, even with a smaller model. This methodology also enables n-ary link prediction without access to any n-ary training set, which can be invaluable in circumstances where expensive and time-consuming curation of n-ary knowledge graphs is not feasible. We achieve state-of-the-art performance on the primary n-ary link prediction dataset WD50K and on WikiPeople facts that include literals - typically ignored by knowledge graph embedding methods.
2022.deelio-1.9
brayne-etal-2022-masked
+ 10.18653/v1/2022.deelio-1.9
What Makes Good In-Context Examples for GPT-3?
@@ -144,6 +153,7 @@
SNLI
SST
TriviaQA
+ 10.18653/v1/2022.deelio-1.10
diff --git a/data/xml/2022.dialdoc.xml b/data/xml/2022.dialdoc.xml
index 24ca4287e8..8bf38b5cde 100644
--- a/data/xml/2022.dialdoc.xml
+++ b/data/xml/2022.dialdoc.xml
@@ -27,6 +27,7 @@
2022.dialdoc-1.1
feng-etal-2022-msamsum
xcfcode/msamsum
+ 10.18653/v1/2022.dialdoc-1.1
UniDS: A Unified Dialogue System for Chit-Chat and Task-oriented Dialogues
@@ -43,6 +44,7 @@
With the advances in deep learning, tremendous progress has been made with chit-chat dialogue systems and task-oriented dialogue systems. However, these two systems are often tackled separately in current methods. To achieve more natural interaction with humans, dialogue systems need to be capable of both chatting and accomplishing tasks. To this end, we propose a unified dialogue system (UniDS) with the two aforementioned skills. In particular, we design a unified dialogue data schema, compatible for both chit-chat and task-oriented dialogues. Besides, we propose a two-stage training method to train UniDS based on the unified dialogue data schema. UniDS does not need to adding extra parameters to existing chit-chat dialogue systems. Experimental results demonstrate that the proposed UniDS works comparably well as the state-of-the-art chit-chat dialogue systems and task-oriented dialogue systems. More importantly, UniDS achieves better robustness than pure dialogue systems and satisfactory switch ability between two types of dialogues.
2022.dialdoc-1.2
zhao-etal-2022-unids
+ 10.18653/v1/2022.dialdoc-1.2
Low-Resource Adaptation of Open-Domain Generative Chatbots
@@ -57,6 +59,7 @@
Blended Skill Talk
ConvAI2
QReCC
+ 10.18653/v1/2022.dialdoc-1.3
Pseudo Ambiguous and Clarifying Questions Based on Sentence Structures Toward Clarifying Question Answering System
@@ -70,6 +73,7 @@
2022.dialdoc-1.4
nakano-etal-2022-pseudo
HotpotQA
+ 10.18653/v1/2022.dialdoc-1.4
Parameter-Efficient Abstractive Question Answering over Tables or Text
@@ -82,6 +86,7 @@
pal-etal-2022-parameter
kolk/pea-qa
NarrativeQA
+ 10.18653/v1/2022.dialdoc-1.5
Conversation- and Tree-Structure Losses for Dialogue Disentanglement
@@ -93,6 +98,7 @@
When multiple conversations occur simultaneously, a listener must decide which conversation each utterance is part of in order to interpret and respond to it appropriately. This task is referred as dialogue disentanglement. A significant drawback of previous studies on disentanglement lies in that they only focus on pair-wise relationships between utterances while neglecting the conversation structure which is important for conversation structure modeling. In this paper, we propose a hierarchical model, named Dialogue BERT (DIALBERT), which integrates the local and global semantics in the context range by using BERT to encode each message-pair and using BiLSTM to aggregate the chronological context information into the output of BERT. In order to integrate the conversation structure information into the model, two types of loss of conversation-structure loss and tree-structure loss are designed. In this way, our model can implicitly learn and leverage the conversation structures without being restricted to the lack of explicit access to such structures during the inference stage. Experimental results on two large datasets show that our method outperforms previous methods by substantial margins, achieving great performance on dialogue disentanglement.
2022.dialdoc-1.6
li-etal-2022-conversation
+ 10.18653/v1/2022.dialdoc-1.6
Conversational Search with Mixed-Initiative - Asking Good Clarification Questions backed-up by Passage Retrieval
@@ -104,6 +110,7 @@
We deal with the scenario of conversational search, where user queries are under-specified or ambiguous. This calls for a mixed-initiative setup. User-asks (queries) and system-answers, as well as system-asks (clarification questions) and user response, in order to clarify her information needs. We focus on the task of selecting the next clarification question, given conversation context. Our method leverages passage retrieval from background content to fine-tune two deep-learning models for ranking candidate clarification questions. We evaluated our method on two different use-cases. The first is an open domain conversational search in a large web collection. The second is a task-oriented customer-support setup.We show that our method performs well on both use-cases.
2022.dialdoc-1.7
mass-etal-2022-conversational
+ 10.18653/v1/2022.dialdoc-1.7
Graph-combined Coreference Resolution Methods on Conversational Machine Reading Comprehension with Pre-trained Language Model
@@ -115,6 +122,7 @@
wang-komatani-2022-graph
CANARD
CoQA
+ 10.18653/v1/2022.dialdoc-1.8
Construction of Hierarchical Structured Knowledge-based Recommendation Dialogue Dataset and Dialogue System
@@ -127,6 +135,7 @@
kodama-etal-2022-construction
KdConv
Wizard of Wikipedia
+ 10.18653/v1/2022.dialdoc-1.9
Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters
@@ -144,6 +153,7 @@
xu-etal-2022-retrieval
hltchkust/knowexpert
Wizard of Wikipedia
+ 10.18653/v1/2022.dialdoc-1.10
G4: Grounding-guided Goal-oriented Dialogues Generation with Multiple Documents
@@ -157,6 +167,7 @@
2022.dialdoc-1.11
zhang-etal-2022-g4
MultiDoc2Dial
+ 10.18653/v1/2022.dialdoc-1.11
UGent-T2K at the 2nd DialDoc Shared Task: A Retrieval-Focused Dialog System Grounded in Multiple Documents
@@ -172,6 +183,7 @@
Doc2Dial
MultiDoc2Dial
doc2dial
+ 10.18653/v1/2022.dialdoc-1.12
Grounded Dialogue Generation with Cross-encoding Re-ranker, Grounding Span Prediction, and Passage Dropout
@@ -186,6 +198,7 @@
MultiDoc2Dial presents an important challenge on modeling dialogues grounded with multiple documents. This paper proposes a pipeline system of “retrieve, re-rank, and generate”, where each component is individually optimized. This enables the passage re-ranker and response generator to fully exploit training with ground-truth data. Furthermore, we use a deep cross-encoder trained with localized hard negative passages from the retriever. For the response generator, we use grounding span prediction as an auxiliary task to be jointly trained with the main task of response generation. We also adopt a passage dropout and regularization technique to improve response generation performance. Experimental results indicate that the system clearly surpasses the competitive baseline and our team CPII-NLP ranked 1st among the public submissions on ALL four leaderboards based on the sum of F1, SacreBLEU, METEOR and RougeL scores.
2022.dialdoc-1.13
li-etal-2022-grounded
+ 10.18653/v1/2022.dialdoc-1.13
A Knowledge storage and semantic space alignment Method for Multi-documents dialogue generation
@@ -200,6 +213,7 @@
CoQA
MultiDoc2Dial
QuAC
+ 10.18653/v1/2022.dialdoc-1.14
Improving Multiple Documents Grounded Goal-Oriented Dialog Systems via Diverse Knowledge Enhanced Pretrained Language Model
@@ -216,6 +230,7 @@
jang-etal-2022-improving
CoQA
MultiDoc2Dial
+ 10.18653/v1/2022.dialdoc-1.15
Docalog: Multi-document Dialogue System using Transformer-based Span Retrieval
@@ -234,6 +249,7 @@
MultiDoc2Dial
QuAC
doc2dial
+ 10.18653/v1/2022.dialdoc-1.16
R3 : Refined Retriever-Reader pipeline for Multidoc2dial
@@ -256,6 +272,7 @@
Natural Questions
QuAC
doc2dial
+ 10.18653/v1/2022.dialdoc-1.17
DialDoc 2022 Shared Task: Open-Book Document-grounded Dialogue Modeling
@@ -269,6 +286,7 @@
Doc2Dial
MultiDoc2Dial
doc2dial
+ 10.18653/v1/2022.dialdoc-1.18
TRUE: Re-evaluating Factual Consistency Evaluation
@@ -292,6 +310,7 @@
GLUE
PAWS
VitaminC
+ 10.18653/v1/2022.dialdoc-1.19
Handling Comments in Documents through Interactions
@@ -301,6 +320,7 @@
Comments are widely used by users in collaborative documents every day. The documents’ comments enable collaborative editing and review dynamics, transforming each document into a context-sensitive communication channel. Understanding the role of comments in communication dynamics within documents is the first step towards automating their management. In this paper we propose the first ever taxonomy for different types of in-document comments based on analysis of a large scale dataset of public documents from the web. We envision that the next generation of intelligent collaborative document experiences allow interactive creation and consumption of content, there We also introduce the components necessary for developing novel tools that automate the handling of comments through natural language interaction with the documents. We identify the commands that users would use to respond to various types of comments. We train machine learning algorithms to recognize the different types of comments and assess their feasibility. We conclude by discussing some of the implications for the design of automatic document management tools.
2022.dialdoc-1.20
nouri-toxtli-2022-handling
+ 10.18653/v1/2022.dialdoc-1.20
Task2Dial: A Novel Task and Dataset for Commonsense-enhanced Task-based Dialogue Grounded in Documents
@@ -313,6 +333,7 @@
CoQA
Doc2Dial
doc2dial
+ 10.18653/v1/2022.dialdoc-1.21
diff --git a/data/xml/2022.dravidianlangtech.xml b/data/xml/2022.dravidianlangtech.xml
index 4e385bc9a2..3956b1d9f4 100644
--- a/data/xml/2022.dravidianlangtech.xml
+++ b/data/xml/2022.dravidianlangtech.xml
@@ -31,6 +31,7 @@
2022.dravidianlangtech-1.1
kumar-etal-2022-bert
Universal Dependencies
+ 10.18653/v1/2022.dravidianlangtech-1.1
A Dataset for Detecting Humor in Telugu Social Media Text
@@ -42,6 +43,7 @@
2022.dravidianlangtech-1.2
bellamkonda-etal-2022-dataset
shaswa123/telugu_humour_dataset
+ 10.18653/v1/2022.dravidianlangtech-1.2
MuCoT: Multilingual Contrastive Training for Question-Answering in Low-resource Languages
@@ -56,6 +58,7 @@
gokulkarthik/mucot
ChAII - Hindi and Tamil Question Answering
SQuAD
+ 10.18653/v1/2022.dravidianlangtech-1.3
TamilATIS: Dataset for Task-Oriented Dialog in Tamil
@@ -67,6 +70,7 @@
2022.dravidianlangtech-1.4
s-etal-2022-tamilatis
ATIS
+ 10.18653/v1/2022.dravidianlangtech-1.4
DE-ABUSE@TamilNLP-ACL 2022: Transliteration as Data Augmentation for Abuse Detection in Tamil
@@ -78,6 +82,7 @@
With the rise of social media and internet, thereis a necessity to provide an inclusive space andprevent the abusive topics against any gender,race or community. This paper describes thesystem submitted to the ACL-2022 shared taskon fine-grained abuse detection in Tamil. In ourapproach we transliterated code-mixed datasetas an augmentation technique to increase thesize of the data. Using this method we wereable to rank 3rd on the task with a 0.290 macroaverage F1 score and a 0.590 weighted F1score
2022.dravidianlangtech-1.5
palanikumar-etal-2022-de
+ 10.18653/v1/2022.dravidianlangtech-1.5
UMUTeam@TamilNLP-ACL2022: Emotional Analysis in Tamil
@@ -88,6 +93,7 @@
This working notes summarises the participation of the UMUTeam on the TamilNLP (ACL 2022) shared task concerning emotion analysis in Tamil. We participated in the two multi-classification challenges proposed with a neural network that combines linguistic features with different feature sets based on contextual and non-contextual sentence embeddings. Our proposal achieved the 1st result for the second subtask, with an f1-score of 15.1% discerning among 30 different emotions. However, our results for the first subtask were not recorded in the official leader board. Accordingly, we report our results for this subtask with the validation split, reaching a macro f1-score of 32.360%.
2022.dravidianlangtech-1.6
garcia-diaz-etal-2022-umuteam
+ 10.18653/v1/2022.dravidianlangtech-1.6
UMUTeam@TamilNLP-ACL2022: Abusive Detection in Tamil using Linguistic Features and Transformers
@@ -98,6 +104,7 @@
Social media has become a dangerous place as bullies take advantage of the anonymity the Internet provides to target and intimidate vulnerable individuals and groups. In the past few years, the research community has focused on developing automatic classification tools for detecting hate-speech, its variants, and other types of abusive behaviour. However, these methods are still at an early stage in low-resource languages. With the aim of reducing this barrier, the TamilNLP shared task has proposed a multi-classification challenge for Tamil written in Tamil script and code-mixed to detect abusive comments and hope-speech. Our participation consists of a knowledge integration strategy that combines sentence embeddings from BERT, RoBERTa, FastText and a subset of language-independent linguistic features. We achieved our best result in code-mixed, reaching 3rd position with a macro-average f1-score of 35%.
2022.dravidianlangtech-1.7
garcia-diaz-etal-2022-umuteam-tamilnlp
+ 10.18653/v1/2022.dravidianlangtech-1.7
hate-alert@DravidianLangTech-ACL2022: Ensembling Multi-Modalities for Tamil TrollMeme Classification
@@ -108,6 +115,7 @@
Social media platforms often act as breeding grounds for various forms of trolling or malicious content targeting users or communities. One way of trolling users is by creating memes, which in most cases unites an image with a short piece of text embedded on top of it. The situation is more complex for multilingual(e.g., Tamil) memes due to the lack of benchmark datasets and models. We explore several models to detect Troll memes in Tamil based on the shared task, “Troll Meme Classification in DravidianLangTech2022” at ACL-2022. We observe while the text-based model MURIL performs better for Non-troll meme classification, the image-based model VGG16 performs better for Troll-meme classification. Further fusing these two modalities help us achieve stable outcomes in both classes. Our fusion model achieved a 0.561 weighted average F1 score and ranked second in this task.
2022.dravidianlangtech-1.8
das-etal-2022-hate
+ 10.18653/v1/2022.dravidianlangtech-1.8
JudithJeyafreedaAndrew@TamilNLP-ACL2022:CNN for Emotion Analysis in Tamil
@@ -116,6 +124,7 @@
Using technology for analysis of human emotion is a relatively nascent research area. There are several types of data where emotion recognition can be employed, such as - text, images, audio and video. In this paper, the focus is on emotion recognition in text data. Emotion recognition in text can be performed from both written comments and from conversations. In this paper, the dataset used for emotion recognition is a list of comments. While extensive research is being performed in this area, the language of the text plays a very important role. In this work, the focus is on the Dravidian language of Tamil. The language and its script demands an extensive pre-processing. The paper contributes to this by adapting various pre-processing methods to the Dravidian Language of Tamil. A CNN method has been adopted for the task at hand. The proposed method has achieved a comparable result.
2022.dravidianlangtech-1.9
andrew-2022-judithjeyafreedaandrew
+ 10.18653/v1/2022.dravidianlangtech-1.9
MUCIC@TamilNLP-ACL2022: Abusive Comment Detection in Tamil Language using 1D Conv-LSTM
@@ -128,6 +137,7 @@
2022.dravidianlangtech-1.10
balouchzahi-etal-2022-mucic
anushamdgowda/abusive-detection
+ 10.18653/v1/2022.dravidianlangtech-1.10
CEN-Tamil@DravidianLangTech-ACL2022: Abusive Comment detection in Tamil using TF-IDF and Random Kitchen Sink Algorithm
@@ -140,6 +150,7 @@
This paper describes the approach of team CEN-Tamil used for abusive comment detection in Tamil. This task aims to identify whether a given comment contains abusive comments. We used TF-IDF with char-wb analyzers with Random Kitchen Sink (RKS) algorithm to create feature vectors and the Support Vector Machine (SVM) classifier with polynomial kernel for classification. We used this method for both Tamil and Tamil-English datasets and secured first place with an f1-score of 0.32 and seventh place with an f1-score of 0.25, respectively. The code for our approach is shared in the GitHub repository.
2022.dravidianlangtech-1.11
s-n-etal-2022-cen
+ 10.18653/v1/2022.dravidianlangtech-1.11
NITK-IT_NLP@TamilNLP-ACL2022: Transformer based model for Toxic Span Identification in Tamil
@@ -150,6 +161,7 @@
Toxic span identification in Tamil is a shared task that focuses on identifying harmful content, contributing to offensiveness. In this work, we have built a model that can efficiently identify the span of text contributing to offensive content. We have used various transformer-based models to develop the system, out of which the fine-tuned MuRIL model was able to achieve the best overall character F1-score of 0.4489.
2022.dravidianlangtech-1.12
lekshmiammal-etal-2022-nitk
+ 10.18653/v1/2022.dravidianlangtech-1.12
TeamX@DravidianLangTech-ACL2022: A Comparative Analysis for Troll-Based Meme Classification
@@ -162,6 +174,7 @@
nandi-etal-2022-teamx
Hateful Memes
Hateful Memes Challenge
+ 10.18653/v1/2022.dravidianlangtech-1.13
GJG@TamilNLP-ACL2022: Emotion Analysis and Classification in Tamil using Transformers
@@ -172,6 +185,7 @@
This paper describes the systems built by our team for the “Emotion Analysis in Tamil” shared task at the Second Workshop on Speech and Language Technologies for Dravidian Languages at ACL 2022. There were two multi-class classification sub-tasks as a part of this shared task. The dataset for sub-task A contained 11 types of emotions while sub-task B was more fine-grained with 31 emotions. We fine-tuned an XLM-RoBERTa and DeBERTA base model for each sub-task. For sub-task A, the XLM-RoBERTa model achieved an accuracy of 0.46 and the DeBERTa model achieved an accuracy of 0.45. We had the best classification performance out of 11 teams for sub-task A. For sub-task B, the XLM-RoBERTa model’s accuracy was 0.33 and the DeBERTa model had an accuracy of 0.26. We ranked 2nd out of 7 teams for sub-task B.
2022.dravidianlangtech-1.14
prasad-etal-2022-gjg
+ 10.18653/v1/2022.dravidianlangtech-1.14
GJG@TamilNLP-ACL2022: Using Transformers for Abusive Comment Classification in Tamil
@@ -182,6 +196,7 @@
This paper presents transformer-based models for the “Abusive Comment Detection” shared task at the Second Workshop on Speech and Language Technologies for Dravidian Languages at ACL 2022. Our team participated in both the multi-class classification sub-tasks as a part of this shared task. The dataset for sub-task A was in Tamil text; while B was code-mixed Tamil-English text. Both the datasets contained 8 classes of abusive comments. We trained an XLM-RoBERTa and DeBERTA base model on the training splits for each sub-task. For sub-task A, the XLM-RoBERTa model achieved an accuracy of 0.66 and the DeBERTa model achieved an accuracy of 0.62. For sub-task B, both the models achieved a classification accuracy of 0.72; however, the DeBERTa model performed better in other classification metrics. Our team ranked 2nd in the code-mixed classification sub-task and 8th in Tamil-text sub-task.
2022.dravidianlangtech-1.15
prasad-etal-2022-gjg-tamilnlp
+ 10.18653/v1/2022.dravidianlangtech-1.15
IIITDWD@TamilNLP-ACL2022: Transformer-based approach to classify abusive content in Dravidian Code-mixed text
@@ -191,6 +206,7 @@
Identifying abusive content or hate speech in social media text has raised the research community’s interest in recent times. The major driving force behind this is the widespread use of social media websites. Further, it also leads to identifying abusive content in low-resource regional languages, which is an important research problem in computational linguistics. As part of ACL-2022, organizers of DravidianLangTech@ACL 2022 have released a shared task on abusive category identification in Tamil and Tamil-English code-mixed text to encourage further research on offensive content identification in low-resource Indic languages. This paper presents the working notes for the model submitted by IIITDWD at DravidianLangTech@ACL 2022. Our team competed in Sub-Task B and finished in 9th place among the participating teams. In our proposed approach, we used a pre-trained transformer model such as Indic-bert for feature extraction, and on top of that, SVM classifier is used for stance detection. Further, our model achieved 62 % accuracy on code-mixed Tamil-English text.
2022.dravidianlangtech-1.16
biradar-saumya-2022-iiitdwd
+ 10.18653/v1/2022.dravidianlangtech-1.16
PANDAS@TamilNLP-ACL2022: Emotion Analysis in Tamil Text using Language Agnostic Embeddings
@@ -204,6 +220,7 @@
As the world around us continues to become increasingly digital, it has been acknowledged that there is a growing need for emotion analysis of social media content. The task of identifying the emotion in a given text has many practical applications ranging from screening public health to business and management. In this paper, we propose a language-agnostic model that focuses on emotion analysis in Tamil text. Our experiments yielded an F1-score of 0.010.
2022.dravidianlangtech-1.17
k-etal-2022-pandas
+ 10.18653/v1/2022.dravidianlangtech-1.17
PANDAS@Abusive Comment Detection in Tamil Code-Mixed Data Using Custom Embeddings with LaBSE
@@ -216,6 +233,7 @@
Abusive language has lately been prevalent in comments on various social media platforms. The increasing hostility observed on the internet calls for the creation of a system that can identify and flag such acerbic content, to prevent conflict and mental distress. This task becomes more challenging when low-resource languages like Tamil, as well as the often-observed Tamil-English code-mixed text, are involved. The approach used in this paper for the classification model includes different methods of feature extraction and the use of traditional classifiers. We propose a novel method of combining language-agnostic sentence embeddings with the TF-IDF vector representation that uses a curated corpus of words as vocabulary, to create a custom embedding, which is then passed to an SVM classifier. Our experimentation yielded an accuracy of 52% and an F1-score of 0.54.
2022.dravidianlangtech-1.18
swaminathan-etal-2022-pandas
+ 10.18653/v1/2022.dravidianlangtech-1.18
Translation Techies @DravidianLangTech-ACL2022-Machine Translation in Dravidian Languages
@@ -227,6 +245,7 @@
This paper discusses the details of submission made by team Translation Techies to the Shared Task on Machine Translation in Dravidian languages- ACL 2022. In connection to the task, five language pairs were provided to test the accuracy of submitted model. A baseline transformer model with Neural Machine Translation(NMT) technique is used which has been taken directly from the OpenNMT framework. On this baseline model, tokenization is applied using the IndicNLP library. Finally, the evaluation is performed using the BLEU scoring mechanism.
2022.dravidianlangtech-1.19
goyal-etal-2022-translation
+ 10.18653/v1/2022.dravidianlangtech-1.19
SSNCSE_NLP@TamilNLP-ACL2022: Transformer based approach for Emotion analysis in Tamil language
@@ -236,6 +255,7 @@
Emotion analysis is the process of identifying and analyzing the underlying emotions expressed in textual data. Identifying emotions from a textual conversation is a challenging task due to the absence of gestures, vocal intonation, and facial expressions. Once the chatbots and messengers detect and report the emotions of the user, a comfortable conversation can be carried out with no misunderstandings. Our task is to categorize text into a predefined notion of emotion. In this thesis, it is required to classify text into several emotional labels depending on the task. We have adopted the transformer model approach to identify the emotions present in the text sequence. Our task is to identify whether a given comment contains emotion, and the emotion it stands for. The datasets were provided to us by the LT-EDI organizers (CITATION) for two tasks, in the Tamil language. We have evaluated the datasets using the pretrained transformer models and we have obtained the micro-averaged F1 scores as 0.19 and 0.12 for Task1 and Task 2 respectively.
2022.dravidianlangtech-1.20
b-varsha-2022-ssncse
+ 10.18653/v1/2022.dravidianlangtech-1.20
SSN_MLRG1@DravidianLangTech-ACL2022: Troll Meme Classification in Tamil using Transformer Models
@@ -248,6 +268,7 @@
The ACL shared task of DravidianLangTech-2022 for Troll Meme classification is a binary classification task that involves identifying Tamil memes as troll or not-troll. Classification of memes is a challenging task since memes express humour and sarcasm in an implicit way. Team SSN_MLRG1 tested and compared results obtained by using three models namely BERT, ALBERT and XLNET. The XLNet model outperformed the other two models in terms of various performance metrics. The proposed XLNet model obtained the 3rd rank in the shared task with a weighted F1-score of 0.558.
2022.dravidianlangtech-1.21
hariprasad-etal-2022-ssn
+ 10.18653/v1/2022.dravidianlangtech-1.21
BpHigh@TamilNLP-ACL2022: Effects of Data Augmentation on Indic-Transformer based classifier for Abusive Comments Detection in Tamil
@@ -256,6 +277,7 @@
Social Media platforms have grown their reach worldwide. As an effect of this growth, many vernacular social media platforms have also emerged, focusing more on the diverse languages in the specific regions. Tamil has also emerged as a popular language for use on social media platforms due to the increasing penetration of vernacular media like Sharechat and Moj, which focus more on local Indian languages than English and encourage their users to converse in Indic languages. Abusive language remains a significant challenge in the social media framework and more so when we consider languages like Tamil, which are low-resource languages and have poor performance on multilingual models and lack language-specific models. Based on this shared task, “Abusive Comment detection in Tamil@DravidianLangTech-ACL 2022”, we present an exploration of different techniques used to tackle and increase the accuracy of our models using data augmentation in NLP. We also show the results of these techniques.
2022.dravidianlangtech-1.22
pahwa-2022-bphigh
+ 10.18653/v1/2022.dravidianlangtech-1.22
MUCS@DravidianLangTech@ACL2022: Ensemble of Logistic Regression Penalties to Identify Emotions in Tamil Text
@@ -266,6 +288,7 @@
Emotion Analysis (EA) is the process of automatically analyzing and categorizing the input text into one of the predefined sets of emotions. In recent years, people have turned to social media to express their emotions, opinions or feelings about news, movies, products, services, and so on. These users’ emotions may help the public, governments, business organizations, film producers, and others in devising strategies, making decisions, and so on. The increasing number of social media users and the increasing amount of user generated text containing emotions on social media demands automated tools for the analysis of such data as handling this data manually is labor intensive and error prone. Further, the characteristics of social media data makes the EA challenging. Most of the EA research works have focused on English language leaving several Indian languages including Tamil unexplored for this task. To address the challenges of EA in Tamil texts, in this paper, we - team MUCS, describe the model submitted to the shared task on Emotion Analysis in Tamil at DravidianLangTech@ACL 2022. Out of the two subtasks in this shared task, our team submitted the model only for Task a. The proposed model comprises of an Ensemble of Logistic Regression (LR) classifiers with three penalties, namely: L1, L2, and Elasticnet. This Ensemble model trained with Term Frequency - Inverse Document Frequency (TF-IDF) of character bigrams and trigrams secured 4th rank in Task a with a macro averaged F1-score of 0.04. The code to reproduce the proposed models is available in github1.
2022.dravidianlangtech-1.23
hegde-etal-2022-mucs
+ 10.18653/v1/2022.dravidianlangtech-1.23
BPHC@DravidianLangTech-ACL2022-A comparative analysis of classical and pre-trained models for troll meme classification in Tamil
@@ -277,6 +300,7 @@
Trolling refers to any user behaviour on the internet to intentionally provoke or instigate conflict predominantly in social media. This paper aims to classify troll meme captions in Tamil-English code-mixed form. Embeddings are obtained for raw code-mixed text and the translated and transliterated version of the text and their relative performances are compared. Furthermore, this paper compares the performances of 11 different classification algorithms using Accuracy and F1- Score. We conclude that we were able to achieve a weighted F1 score of 0.74 through MuRIL pretrained model.
2022.dravidianlangtech-1.24
v-etal-2022-bphc
+ 10.18653/v1/2022.dravidianlangtech-1.24
SSNCSE NLP@TamilNLP-ACL2022: Transformer based approach for detection of abusive comment for Tamil language
@@ -286,6 +310,7 @@
Social media platforms along with many other public forums on the Internet have shown a significant rise in the cases of abusive behavior such as Misogynism, Misandry, Homophobia, and Cyberbullying. To tackle these concerns, technologies are being developed and applied, as it is a tedious and time-consuming task to identify, report and block these offenders. Our task was to automate the process of identifying abusive comments and classify them into appropriate categories. The datasets provided by the DravidianLangTech@ACL2022 organizers were a code-mixed form of Tamil text. We trained the datasets using pre-trained transformer models such as BERT,m-BERT, and XLNET and achieved a weighted average of F1 scores of 0.96 for Tamil-English code mixed text and 0.59 for Tamil text.
2022.dravidianlangtech-1.25
b-varsha-2022-ssncse-nlp
+ 10.18653/v1/2022.dravidianlangtech-1.25
Varsini_and_Kirthanna@DravidianLangTech-ACL2022-Emotional Analysis in Tamil
@@ -299,6 +324,7 @@
In this paper, we present our system for the task of Emotion analysis in Tamil. Over 3.96 million people use these platforms to send messages formed using texts, images, videos, audio or combinations of these to express their thoughts and feelings. Text communication on social media platforms is quite overwhelming due to its enormous quantity and simplicity. The data must be processed to understand the general feeling felt by the author. We present a lexicon-based approach for the extraction emotion in Tamil texts. We use dictionaries of words labelled with their respective emotions. The process of assigning an emotional label to each text, and then capture the main emotion expressed in it. Finally, the F1-score in the official test set is 0.0300 and our method ranks 5th.
2022.dravidianlangtech-1.26
s-etal-2022-varsini
+ 10.18653/v1/2022.dravidianlangtech-1.26
CUET-NLP@DravidianLangTech-ACL2022: Investigating Deep Learning Techniques to Detect Multimodal Troll Memes
@@ -311,6 +337,7 @@
With the substantial rise of internet usage, social media has become a powerful communication medium to convey information, opinions, and feelings on various issues. Recently, memes have become a popular way of sharing information on social media. Usually, memes are visuals with text incorporated into them and quickly disseminate hatred and offensive content. Detecting or classifying memes is challenging due to their region-specific interpretation and multimodal nature. This work presents a meme classification technique in Tamil developed by the CUET NLP team under the shared task (DravidianLangTech-ACL2022). Several computational models have been investigated to perform the classification task. This work also explored visual and textual features using VGG16, ResNet50, VGG19, CNN and CNN+LSTM models. Multimodal features are extracted by combining image (VGG16) and text (CNN, LSTM+CNN) characteristics. Results demonstrate that the textual strategy with CNN+LSTM achieved the highest weighted f_1-score (0.52) and recall (0.57). Moreover, the CNN-Text+VGG16 outperformed the other models concerning the multimodal memes detection by achieving the highest f_1-score of 0.49, but the LSTM+CNN model allowed the team to achieve 4^{th} place in the shared task.
2022.dravidianlangtech-1.27
hasan-etal-2022-cuet
+ 10.18653/v1/2022.dravidianlangtech-1.27
PICT@DravidianLangTech-ACL2022: Neural Machine Translation On Dravidian Languages
@@ -325,6 +352,7 @@
vyawahare-etal-2022-pict
IndicCorp
Samanantar
+ 10.18653/v1/2022.dravidianlangtech-1.28
Sentiment Analysis on Code-Switched Dravidian Languages with Kernel Based Extreme Learning Machines
@@ -335,6 +363,7 @@
Code-switching refers to the textual or spoken data containing multiple languages. Application of natural language processing (NLP) tasks like sentiment analysis is a harder problem on code-switched languages due to the irregularities in the sentence structuring and ordering. This paper shows the experiment results of building a Kernel based Extreme Learning Machines(ELM) for sentiment analysis for code-switched Dravidian languages with English. Our results show that ELM performs better than traditional machine learning classifiers on various metrics as well as trains faster than deep learning models. We also show that Polynomial kernels perform better than others in the ELM architecture. We were able to achieve a median AUC of 0.79 with a polynomial kernel.
2022.dravidianlangtech-1.29
s-r-etal-2022-sentiment
+ 10.18653/v1/2022.dravidianlangtech-1.29
CUET-NLP@DravidianLangTech-ACL2022: Exploiting Textual Features to Classify Sentiment of Multimodal Movie Reviews
@@ -348,6 +377,7 @@
With the proliferation of internet usage, a massive growth of consumer-generated content on social media has been witnessed in recent years that provide people’s opinions on diverse issues. Through social media, users can convey their emotions and thoughts in distinctive forms such as text, image, audio, video, and emoji, which leads to the advancement of the multimodality of the content users on social networking sites. This paper presents a technique for classifying multimodal sentiment using the text modality into five categories: highly positive, positive, neutral, negative, and highly negative categories. A shared task was organized to develop models that can identify the sentiments expressed by the videos of movie reviewers in both Malayalam and Tamil languages. This work applied several machine learning techniques (LR, DT, MNB, SVM) and deep learning (BiLSTM, CNN+BiLSTM) to accomplish the task. Results demonstrate that the proposed model with the decision tree (DT) outperformed the other methods and won the competition by acquiring the highest macro f_1-score of 0.24.
2022.dravidianlangtech-1.30
mustakim-etal-2022-cuet
+ 10.18653/v1/2022.dravidianlangtech-1.30
CUET-NLP@TamilNLP-ACL2022: Multi-Class Textual Emotion Detection from Social Media using Transformer
@@ -361,6 +391,7 @@
Recently, emotion analysis has gained increased attention by NLP researchers due to its various applications in opinion mining, e-commerce, comprehensive search, healthcare, personalized recommendations and online education. Developing an intelligent emotion analysis model is challenging in resource-constrained languages like Tamil. Therefore a shared task is organized to identify the underlying emotion of a given comment expressed in the Tamil language. The paper presents our approach to classifying the textual emotion in Tamil into 11 classes: ambiguous, anger, anticipation, disgust, fear, joy, love, neutral, sadness, surprise and trust. We investigated various machine learning (LR, DT, MNB, SVM), deep learning (CNN, LSTM, BiLSTM) and transformer-based models (Multilingual-BERT, XLM-R). Results reveal that the XLM-R model outdoes all other models by acquiring the highest macro f_1-score (0.33).
2022.dravidianlangtech-1.31
mustakim-etal-2022-cuet-nlp
+ 10.18653/v1/2022.dravidianlangtech-1.31
DLRG@DravidianLangTech-ACL2022: Abusive Comment Detection in Tamil using Multilingual Transformer Models
@@ -371,6 +402,7 @@
Online Social Network has let people to connect and interact with each other. It does, however, also provide a platform for online abusers to propagate abusive content. The vast majority of abusive remarks are written in a multilingual style, which allows them to easily slip past internet inspection. This paper presents a system developed for the Shared Task on Abusive Comment Detection (Misogyny, Misandry, Homophobia, Transphobic, Xenophobia, CounterSpeech, Hope Speech) in Tamil DravidianLangTech@ACL 2022 to detect the abusive category of each comment. We approach the task with three methodologies - Machine Learning, Deep Learning and Transformer-based modeling, for two sets of data - Tamil and Tamil+English language dataset. The dataset used in our system can be accessed from the competition on CodaLab. For Machine Learning, eight algorithms were implemented, among which Random Forest gave the best result with Tamil+English dataset, with a weighted average F1-score of 0.78. For Deep Learning, Bi-Directional LSTM gave best result with pre-trained word embeddings. In Transformer-based modeling, we used IndicBERT and mBERT with fine-tuning, among which mBERT gave the best result for Tamil dataset with a weighted average F1-score of 0.7.
2022.dravidianlangtech-1.32
rajalakshmi-etal-2022-dlrg
+ 10.18653/v1/2022.dravidianlangtech-1.32
Aanisha@TamilNLP-ACL2022:Abusive Detection in Tamil
@@ -379,6 +411,7 @@
In social media, there are instances where people present their opinions in strong language, resorting to abusive/toxic comments.There are instances of communal hatred, hate-speech, toxicity and bullying. And, in this age of social media, it’s very important to find means to keep check on these toxic comments, as to preserve the mental peace of people in social media.While there are tools, models to detect andpotentially filter these kind of content, developing these kinds of models for the low resource language space is an issue of research.In this paper, the task of abusive comment identification in Tamil language, is seen upon as a multi-class classification problem.There are different pre-processing as well as modelling approaches discussed in this paper.The different approaches are compared on the basis of weighted average accuracy.
2022.dravidianlangtech-1.33
bhattacharyya-2022-aanisha
+ 10.18653/v1/2022.dravidianlangtech-1.33
COMBATANT@TamilNLP-ACL2022: Fine-grained Categorization of Abusive Comments using Logistic Regression
@@ -391,6 +424,7 @@
With the widespread usage of social media and effortless internet access, millions of posts and comments are generated every minute. Unfortunately, with this substantial rise, the usage of abusive language has increased significantly in these mediums. This proliferation leads to many hazards such as cyber-bullying, vulgarity, online harassment and abuse. Therefore, it becomes a crucial issue to detect and mitigate the usage of abusive language. This work presents our system developed as part of the shared task to detect the abusive language in Tamil. We employed three machine learning (LR, DT, SVM), two deep learning (CNN+BiLSTM, CNN+BiLSTM with FastText) and a transformer-based model (Indic-BERT). The experimental results show that Logistic regression (LR) and CNN+BiLSTM models outperformed the others. Both Logistic Regression (LR) and CNN+BiLSTM with FastText achieved the weighted F_1-score of 0.39. However, LR obtained a higher recall value (0.44) than CNN+BiLSTM (0.36). This leads us to stand the 2^{nd} rank in the shared task competition.
2022.dravidianlangtech-1.34
hossain-etal-2022-combatant
+ 10.18653/v1/2022.dravidianlangtech-1.34
Optimize_Prime@DravidianLangTech-ACL2022: Emotion Analysis in Tamil
@@ -403,6 +437,7 @@
This paper aims to perform an emotion analysis of social media comments in Tamil. Emotion analysis is the process of identifying the emotional context of the text. In this paper, we present the findings obtained by Team Optimize_Prime in the ACL 2022 shared task “Emotion Analysis in Tamil.” The task aimed to classify social media comments into categories of emotion like Joy, Anger, Trust, Disgust, etc. The task was further divided into two subtasks, one with 11 broad categories of emotions and the other with 31 specific categories of emotion. We implemented three different approaches to tackle this problem: transformer-based models, Recurrent Neural Networks (RNNs), and Ensemble models. XLM-RoBERTa performed the best on the first task with a macro-averaged f1 score of 0.27, while MuRIL provided the best results on the second task with a macro-averaged f1 score of 0.13.
2022.dravidianlangtech-1.35
gokhale-etal-2022-optimize
+ 10.18653/v1/2022.dravidianlangtech-1.35
Optimize_Prime@DravidianLangTech-ACL2022: Abusive Comment Detection in Tamil
@@ -415,6 +450,7 @@
This paper tries to address the problem of abusive comment detection in low-resource indic languages. Abusive comments are statements that are offensive to a person or a group of people. These comments are targeted toward individuals belonging to specific ethnicities, genders, caste, race, sexuality, etc. Abusive Comment Detection is a significant problem, especially with the recent rise in social media users. This paper presents the approach used by our team — Optimize_Prime, in the ACL 2022 shared task “Abusive Comment Detection in Tamil.” This task detects and classifies YouTube comments in Tamil and Tamil-English Codemixed format into multiple categories. We have used three methods to optimize our results: Ensemble models, Recurrent Neural Networks, and Transformers. In the Tamil data, MuRIL and XLM-RoBERTA were our best performing models with a macro-averaged f1 score of 0.43. Furthermore, for the Code-mixed data, MuRIL and M-BERT provided sublime results, with a macro-averaged f1 score of 0.45.
2022.dravidianlangtech-1.36
patankar-etal-2022-optimize
+ 10.18653/v1/2022.dravidianlangtech-1.36
Zero-shot Code-Mixed Offensive Span Identification through Rationale Extraction
@@ -426,6 +462,7 @@
2022.dravidianlangtech-1.37
ravikiran-chakravarthi-2022-zero
manikandan-ravikiran/zero-shot-offensive-span
+ 10.18653/v1/2022.dravidianlangtech-1.37
DLRG@TamilNLP-ACL2022: Offensive Span Identification in Tamil usingBiLSTM-CRF approach
@@ -439,6 +476,7 @@
Identifying offensive speech is an exciting andessential area of research, with ample tractionin recent times. This paper presents our sys-tem submission to the subtask 1, focusing onusing supervised approaches for extracting Of-fensive spans from code-mixed Tamil-Englishcomments. To identify offensive spans, wedeveloped the Bidirectional Long Short-TermMemory (BiLSTM) model with Glove Em-bedding. To this end, the developed systemachieved an overall F1 of 0.1728. Addition-ally, for comments with less than 30 characters,the developed system shows an F1 of 0.3890,competitive with other submissions.
2022.dravidianlangtech-1.38
rajalakshmi-etal-2022-dlrg-tamilnlp
+ 10.18653/v1/2022.dravidianlangtech-1.38
Findings of the Shared Task on Multimodal Sentiment Analysis and Troll Meme Classification in Dravidian Languages
@@ -455,6 +493,7 @@
This paper presents the findings of the shared task on Multimodal Sentiment Analysis and Troll meme classification in Dravidian languages held at ACL 2022. Multimodal sentiment analysis deals with the identification of sentiment from video. In addition to video data, the task requires the analysis of corresponding text and audio features for the classification of movie reviews into five classes. We created a dataset for this task in Malayalam and Tamil. The Troll meme classification task aims to classify multimodal Troll memes into two categories. This task assumes the analysis of both text and image features for making better predictions. The performance of the participating teams was analysed using the F1-score. Only one team submitted their results in the Multimodal Sentiment Analysis task, whereas we received six submissions in the Troll meme classification task. The only team that participated in the Multimodal Sentiment Analysis shared task obtained an F1-score of 0.24. In the Troll meme classification task, the winning team achieved an F1-score of 0.596.
2022.dravidianlangtech-1.39
b-etal-2022-findings
+ 10.18653/v1/2022.dravidianlangtech-1.39
Findings of the Shared Task on Offensive Span Identification fromCode-Mixed Tamil-English Comments
@@ -470,6 +509,7 @@
Offensive content moderation is vital in social media platforms to support healthy online discussions. However, their prevalence in code-mixed Dravidian languages is limited to classifying whole comments without identifying part of it contributing to offensiveness. Such limitation is primarily due to the lack of annotated data for offensive spans. Accordingly, in this shared task, we provide Tamil-English code-mixed social comments with offensive spans. This paper outlines the dataset so released, methods, and results of the submitted systems.
2022.dravidianlangtech-1.40
ravikiran-etal-2022-findings
+ 10.18653/v1/2022.dravidianlangtech-1.40
Overview of the Shared Task on Machine Translation in Dravidian Languages
@@ -485,6 +525,7 @@
2022.dravidianlangtech-1.41
madasamy-etal-2022-overview
Samanantar
+ 10.18653/v1/2022.dravidianlangtech-1.41
Findings of the Shared Task on Emotion Analysis in Tamil
@@ -505,6 +546,7 @@
This paper presents the overview of the shared task on emotional analysis in Tamil. The result of the shared task is presented at the workshop. This paper presents the dataset used in the shared task, task description, and the methodology used by the participants and the evaluation results of the submission. This task is organized as two Tasks. Task A is carried with 11 emotions annotated data for social media comments in Tamil and Task B is organized with 31 fine-grained emotion annotated data for social media comments in Tamil. For conducting experiments, training and development datasets were provided to the participants and results are evaluated for the unseen data. Totally we have received around 24 submissions from 13 teams. For evaluating the models, Precision, Recall, micro average metrics are used.
2022.dravidianlangtech-1.42
sampath-etal-2022-findings
+ 10.18653/v1/2022.dravidianlangtech-1.42
Findings of the Shared Task on Multi-task Learning in Dravidian Languages
@@ -523,6 +565,7 @@
We present our findings from the first shared task on Multi-task Learning in Dravidian Languages at the second Workshop on Speech and Language Technologies for Dravidian Languages. In this task, a sentence in any of three Dravidian Languages is required to be classified into two closely related tasks namely Sentiment Analyis (SA) and Offensive Language Identification (OLI). The task spans over three Dravidian Languages, namely, Kannada, Malayalam, and Tamil. It is one of the first shared tasks that focuses on Multi-task Learning for closely related tasks, especially for a very low-resourced language family such as the Dravidian language family. In total, 55 people signed up to participate in the task, and due to the intricate nature of the task, especially in its first iteration, 3 submissions have been received.
2022.dravidianlangtech-1.43
chakravarthi-etal-2022-findings
+ 10.18653/v1/2022.dravidianlangtech-1.43
Overview of Abusive Comment Detection in Tamil-ACL 2022
@@ -538,6 +581,7 @@
The social media is one of the significantdigital platforms that create a huge im-pact in peoples of all levels. The commentsposted on social media is powerful enoughto even change the political and businessscenarios in very few hours. They alsotend to attack a particular individual ora group of individuals. This shared taskaims at detecting the abusive comments in-volving, Homophobia, Misandry, Counter-speech, Misogyny, Xenophobia, Transpho-bic. The hope speech is also identified. Adataset collected from social media taggedwith the above said categories in Tamiland Tamil-English code-mixed languagesare given to the participants. The par-ticipants used different machine learningand deep learning algorithms. This paperpresents the overview of this task compris-ing the dataset details and results of theparticipants.
2022.dravidianlangtech-1.44
priyadharshini-etal-2022-overview
+ 10.18653/v1/2022.dravidianlangtech-1.44
diff --git a/data/xml/2022.ecnlp.xml b/data/xml/2022.ecnlp.xml
index c6eaea6589..4e4ec7c166 100644
--- a/data/xml/2022.ecnlp.xml
+++ b/data/xml/2022.ecnlp.xml
@@ -26,6 +26,7 @@
Defect Triage is a time-sensitive and critical process in a large-scale agile software development lifecycle for e-commerce. Inefficiencies arising from human and process dependencies in this domain have motivated research in automated approaches using machine learning to accurately assign defects to qualified teams. This work proposes a novel framework for automated defect triage (DEFTri) using fine-tuned state-of-the-art pre-trained BERT on labels fused text embeddings to improve contextual representations from human-generated product defects. For our multi-label text classification defect triage task, we also introduce a Walmart proprietary dataset of product defects using weak supervision and adversarial learning, in a few-shot setting.
2022.ecnlp-1.1
mohanty-2022-deftri
+ 10.18653/v1/2022.ecnlp-1.1
Interactive Latent Knowledge Selection for E-Commerce Product Copywriting Generation
@@ -40,6 +41,7 @@
As the multi-modal e-commerce is thriving, high-quality advertising product copywriting has gain more attentions, which plays a crucial role in the e-commerce recommender, advertising and even search platforms.The advertising product copywriting is able to enhance the user experience by highlighting the product’s characteristics with textual descriptions and thus to improve the likelihood of user click and purchase. Automatically generating product copywriting has attracted noticeable interests from both academic and industrial communities, where existing solutions merely make use of a product’s title and attribute information to generate its corresponding description.However, in addition to the product title and attributes, we observe that there are various auxiliary descriptions created by the shoppers or marketers in the e-commerce platforms (namely human knowledge), which contains valuable information for product copywriting generation, yet always accompanying lots of noises.In this work, we propose a novel solution to automatically generating product copywriting that involves all the title, attributes and denoised auxiliary knowledge.To be specific, we design an end-to-end generation framework equipped with two variational autoencoders that works interactively to select informative human knowledge and generate diverse copywriting.
2022.ecnlp-1.2
wang-etal-2022-interactive
+ 10.18653/v1/2022.ecnlp-1.2
Leveraging Seq2seq Language Generation for Multi-level Product Issue Identification
@@ -54,6 +56,7 @@
In a leading e-commerce business, we receive hundreds of millions of customer feedback from different text communication channels such as product reviews. The feedback can contain rich information regarding customers’ dissatisfaction in the quality of goods and services. To harness such information to better serve customers, in this paper, we created a machine learning approach to automatically identify product issues and uncover root causes from the customer feedback text. We identify issues at two levels: coarse grained (L-Coarse) and fine grained (L-Granular). We formulate this multi-level product issue identification problem as a seq2seq language generation problem. Specifically, we utilize transformer-based seq2seq models due to their versatility and strong transfer-learning capability. We demonstrate that our approach is label efficient and outperforms the traditional approach such as multi-class multi-label classification formulation. Based on human evaluation, our fine-tuned model achieves 82.1% and 95.4% human-level performance for L-Coarse and L-Granular issue identification, respectively. Furthermore, our experiments illustrate that the model can generalize to identify unseen L-Granular issues.
2022.ecnlp-1.3
liu-etal-2022-leveraging
+ 10.18653/v1/2022.ecnlp-1.3
Data Quality Estimation Framework for Faster Tax Code Classification
@@ -64,6 +67,7 @@
This paper describes a novel framework to estimate the data quality of a collection of product descriptions to identify required relevant information for accurate product listing classification for tax-code assignment. Our Data Quality Estimation (DQE) framework consists of a Question Answering (QA) based attribute value extraction model to identify missing attributes and a classification model to identify bad quality records. We show that our framework can accurately predict the quality of product descriptions. In addition to identifying low-quality product listings, our framework can also generate a detailed report at a category level showing missing product information resulting in a better customer experience.
2022.ecnlp-1.4
kondadadi-etal-2022-data
+ 10.18653/v1/2022.ecnlp-1.4
CML: A Contrastive Meta Learning Method to Estimate Human Label Confidence Scores and Reduce Data Collection Cost
@@ -77,6 +81,7 @@
Deep neural network models are especially susceptible to noise in annotated labels. In the real world, annotated data typically contains noise caused by a variety of factors such as task difficulty, annotator experience, and annotator bias. Label quality is critical for label validation tasks; however, correcting for noise by collecting more data is often costly. In this paper, we propose a contrastive meta-learning framework (CML) to address the challenges introduced by noisy annotated data, specifically in the context of natural language processing. CML combines contrastive and meta learning to improve the quality of text feature representations. Meta-learning is also used to generate confidence scores to assess label quality. We demonstrate that a model built on CML-filtered data outperforms a model built on clean data. Furthermore, we perform experiments on deidentified commercial voice assistant datasets and demonstrate that our model outperforms several SOTA approaches.
2022.ecnlp-1.5
dong-etal-2022-cml
+ 10.18653/v1/2022.ecnlp-1.5
Improving Relevance Quality in Product Search using High-Precision Query-Product Semantic Similarity
@@ -92,6 +97,7 @@
Ensuring relevance quality in product search is a critical task as it impacts the customer’s ability to find intended products in the short-term as well as the general perception and trust of the e-commerce system in the long term. In this work we leverage a high-precision cross-encoder BERT model for semantic similarity between customer query and products and survey its effectiveness for three ranking applications where offline-generated scores could be used: (1) as an offline metric for estimating relevance quality impact, (2) as a re-ranking feature covering head/torso queries, and (3) as a training objective for optimization. We present results on effectiveness of this strategy for the large e-commerce setting, which has general applicability for choice of other high-precision models and tasks in ranking.
2022.ecnlp-1.6
bagheri-garakani-etal-2022-improving
+ 10.18653/v1/2022.ecnlp-1.6
Comparative Snippet Generation
@@ -103,6 +109,7 @@
2022.ecnlp-1.7
jain-etal-2022-comparative
wing-nus/comparative-snippet-generation-dataset
+ 10.18653/v1/2022.ecnlp-1.7
Textual Content Moderation in C2C Marketplace
@@ -113,6 +120,7 @@
Automatic monitoring systems for inappropriate user-generated messages have been found to be effective in reducing human operation costs in Consumer to Consumer (C2C) marketplace services, in which customers send messages directly to other customers.We propose a lightweight neural network that takes a conversation as input, which we deployed to a production service.Our results show that the system reduced the human operation costs to less than one-sixth compared to the conventional rule-based monitoring at Mercari.
2022.ecnlp-1.8
shido-etal-2022-textual
+ 10.18653/v1/2022.ecnlp-1.8
Spelling Correction using Phonetics in E-commerce Search
@@ -127,6 +135,7 @@
In E-commerce search, spelling correction plays an important role to find desired products for customers in processing user-typed search queries. However, resolving phonetic errors is a critical but much overlooked area. The query with phonetic spelling errors tends to appear correct based on pronunciation but is nonetheless inaccurate in spelling (e.g., “bluetooth sound system” vs. “blutut sant sistam”) with numerous noisy forms and sparse occurrences. In this work, we propose a generalized spelling correction system integrating phonetics to address phonetic errors in E-commerce search without additional latency cost. Using India (IN) E-commerce market for illustration, the experiment shows that our proposed phonetic solution significantly improves the F1 score by 9%+ and recall of phonetic errors by 8%+. This phonetic spelling correction system has been deployed to production, currently serving hundreds of millions of customers.
2022.ecnlp-1.9
yang-etal-2022-spelling
+ 10.18653/v1/2022.ecnlp-1.9
Logical Reasoning for Task Oriented Dialogue Systems
@@ -139,6 +148,7 @@
In recent years, large pretrained models have been used in dialogue systems to improve successful task completion rates. However, lack of reasoning capabilities of dialogue platforms make it difficult to provide relevant and fluent responses, unless the designers of a conversational experience spend a considerable amount of time implementing these capabilities in external rule based modules. In this work, we propose a novel method to fine-tune pretrained transformer models such as Roberta and T5, to reason over a set of facts in a given dialogue context.Our method includes a synthetic data generation mechanism which helps the model learn logical relations, such as comparison between list of numerical values, inverse relations (and negation), inclusion and exclusion for categorical attributes, and application of a combination of attributes over both numerical and categorical values, and spoken form for numerical values, without need for additional training data. We show that the transformer based model can perform logical reasoning to answer questions when the dialogue context contains all the required information, otherwise it is able to extract appropriate constraints to pass to downstream components (e.g. a knowledge base) when partial information is available. We observe that transformer based models such as UnifiedQA-T5 can be fine-tuned to perform logical reasoning (such as numerical and categorical attributes’ comparison) over attributes seen at training time (e.g., accuracy of 90%+ for comparison of smaller than kmax=5 values over heldout test dataset).
2022.ecnlp-1.10
beygi-etal-2022-logical
+ 10.18653/v1/2022.ecnlp-1.10
CoVA: Context-aware Visual Attention for Webpage Information Extraction
@@ -153,6 +163,7 @@
kumar-etal-2022-cova
kevalmorabia97/cova-web-object-detection
CoVA
+ 10.18653/v1/2022.ecnlp-1.11
Product Titles-to-Attributes As a Text-to-Text Task
@@ -162,6 +173,7 @@
Online marketplaces use attribute-value pairs, such as brand, size, size type, color, etc. to help define important and relevant facts about a listing. These help buyers to curate their search results using attribute filtering and overall create a richer experience. Although their critical importance for listings’ discoverability, getting sellers to input tens of different attribute-value pairs per listing is costly and often results in missing information. This can later translate to the unnecessary removal of relevant listings from the search results when buyers are filtering by attribute values. In this paper we demonstrate using a Text-to-Text hierarchical multi-label ranking model framework to predict the most relevant attributes per listing, along with their expected values, using historic user behavioral data. This solution helps sellers by allowing them to focus on verifying information on attributes that are likely to be used by buyers, and thus, increase the expected recall for their listings. Specifically for eBay’s case we show that using this model can improve the relevancy of the attribute extraction process by 33.2% compared to the current highly-optimized production system. Apart from the empirical contribution, the highly generalized nature of the framework presented in this paper makes it relevant for many high-volume search-driven websites.
2022.ecnlp-1.12
fuchs-acriche-2022-product
+ 10.18653/v1/2022.ecnlp-1.12
Product Answer Generation from Heterogeneous Sources: A New Benchmark and Best Practices
@@ -176,6 +188,7 @@
2022.ecnlp-1.13
shen-etal-2022-product
AmazonQA
+ 10.18653/v1/2022.ecnlp-1.13
semiPQA: A Study on Product Question Answering over Semi-structured Data
@@ -192,6 +205,7 @@
Natural Questions
NewsQA
SQuAD
+ 10.18653/v1/2022.ecnlp-1.14
Improving Specificity in Review Response Generation with Data-Driven Data Filtering
@@ -201,6 +215,7 @@
Responding to online customer reviews has become an essential part of successfully managing and growing a business both in e-commerce and the hospitality and tourism sectors. Recently, neural text generation methods intended to assist authors in composing responses have been shown to deliver highly fluent and natural looking texts. However, they also tend to learn a strong, undesirable bias towards generating overly generic, one-size-fits-all outputs to a wide range of inputs. While this often results in ‘safe’, high-probability responses, there are many practical settings in which greater specificity is preferable. In this work we examine the task of generating more specific responses for online reviews in the hospitality domain by identifying generic responses in the training data, filtering them and fine-tuning the generation model. We experiment with a range of data-driven filtering methods and show through automatic and human evaluation that, despite a 60% reduction in the amount of training data, filtering helps to derive models that are capable of generating more specific, useful responses.
2022.ecnlp-1.15
kew-volk-2022-improving
+ 10.18653/v1/2022.ecnlp-1.15
Extreme Multi-Label Classification with Label Masking for Product Attribute Value Extraction
@@ -211,6 +226,7 @@
Although most studies have treated attribute value extraction (AVE) as named entity recognition, these approaches are not practical in real-world e-commerce platforms because they perform poorly, and require canonicalization of extracted values. Furthermore, since values needed for actual services is static in many attributes, extraction of new values is not always necessary. Given the above, we formalize AVE as extreme multi-label classification (XMC). A major problem in solving AVE as XMC is that the distribution between positive and negative labels for products is heavily imbalanced. To mitigate the negative impact derived from such biased distribution, we propose label masking, a simple and effective method to reduce the number of negative labels in training. We exploit attribute taxonomy designed for e-commerce platforms to determine which labels are negative for products. Experimental results using a dataset collected from a Japanese e-commerce platform demonstrate that the label masking improves micro and macro F_1 scores by 3.38 and 23.20 points, respectively.
2022.ecnlp-1.16
chen-etal-2022-extreme
+ 10.18653/v1/2022.ecnlp-1.16
Enhanced Representation with Contrastive Loss for Long-Tail Query Classification in e-commerce
@@ -222,6 +238,7 @@
Query classification is a fundamental task in an e-commerce search engine, which assigns one or multiple predefined product categories in response to each search query. Taking click-through logs as training data in deep learning methods is a common and effective approach for query classification. However, the frequency distribution of queries typically has long-tail property, which means that there are few logs for most of the queries. The lack of reliable user feedback information results in worse performance of long-tail queries compared with frequent queries. To solve the above problem, we propose a novel method that leverages an auxiliary module to enhance the representations of long-tail queries by taking advantage of reliable supervised information of variant frequent queries. The long-tail queries are guided by the contrastive loss to obtain category-aligned representations in the auxiliary module, where the variant frequent queries serve as anchors in the representation space. We train our model with real-world click data from AliExpress and conduct evaluation on both offline labeled data and online AB test. The results and further analysis demonstrate the effectiveness of our proposed method.
2022.ecnlp-1.17
zhu-etal-2022-enhanced
+ 10.18653/v1/2022.ecnlp-1.17
Domain-specific knowledge distillation yields smaller and better models for conversational commerce
@@ -239,6 +256,7 @@
We demonstrate that knowledge distillation can be used not only to reduce model size, but to simultaneously adapt a contextual language model to a specific domain. We use Multilingual BERT (mBERT; Devlin et al., 2019) as a starting point and follow the knowledge distillation approach of (Sahn et al., 2019) to train a smaller multilingual BERT model that is adapted to the domain at hand. We show that for in-domain tasks, the domain-specific model shows on average 2.3% improvement in F1 score, relative to a model distilled on domain-general data. Whereas much previous work with BERT has fine-tuned the encoder weights during task training, we show that the model improvements from distillation on in-domain data persist even when the encoder weights are frozen during task training, allowing a single encoder to support classifiers for multiple tasks and languages.
2022.ecnlp-1.18
howell-etal-2022-domain
+ 10.18653/v1/2022.ecnlp-1.18
OpenBrand: Open Brand Value Extraction from Product Descriptions
@@ -250,6 +268,7 @@
2022.ecnlp-1.19
sabeh-etal-2022-openbrand
kassemsabeh/open-brand
+ 10.18653/v1/2022.ecnlp-1.19
Robust Product Classification with Instance-Dependent Noise
@@ -259,6 +278,7 @@
Noisy labels in large E-commerce product data (i.e., product items are placed into incorrect categories) is a critical issue for product categorization task because they are unavoidable, non-trivial to remove and degrade prediction performance significantly. Training a product title classification model which is robust to noisy labels in the data is very important to make product classification applications more practical. In this paper, we study the impact of instance-dependent noise to performance of product title classification by comparing our data denoising algorithm and different noise-resistance training algorithms which were designed to prevent a classifier model from over-fitting to noise. We develop a simple yet effective Deep Neural Network for product title classification to use as a base classifier. Along with recent methods of stimulating instance-dependent noise, we propose a novel noise stimulation algorithm based on product title similarity. Our experiments cover multiple datasets, various noise methods and different training solutions. Results uncover the limit of classification task when noise rate is not negligible and data distribution is highly skewed.
2022.ecnlp-1.20
nguyen-khatwani-2022-robust
+ 10.18653/v1/2022.ecnlp-1.20
Structured Extraction of Terms and Conditions from German and English Online Shops
@@ -270,6 +290,7 @@
2022.ecnlp-1.21
schamel-etal-2022-structured
sebischair/lowestcommonancestorextractor
+ 10.18653/v1/2022.ecnlp-1.21
“Does it come in black?” CLIP-like models are zero-shot recommenders
@@ -282,6 +303,7 @@
Product discovery is a crucial component for online shopping. However, item-to-item recommendations today do not allow users to explore changes along selected dimensions: given a query item, can a model suggest something similar but in a different color? We consider item recommendations of the comparative nature (e.g. “something darker”) and show how CLIP-based models can support this use case in a zero-shot manner. Leveraging a large model built for fashion, we introduce GradREC and its industry potential, and offer a first rounded assessment of its strength and weaknesses.
2022.ecnlp-1.22
chia-etal-2022-come
+ 10.18653/v1/2022.ecnlp-1.22
Clause Topic Classification in German and English Standard Form Contracts
@@ -291,6 +313,7 @@
So-called standard form contracts, i.e. contracts that are drafted unilaterally by one party, like terms and conditions of online shops or terms of services of social networks, are cornerstones of our modern economy. Their processing is, therefore, of significant practical value. Often, the sheer size of these contracts allows the drafting party to hide unfavourable terms from the other party. In this paper, we compare different approaches for automatically classifying the topics of clauses in standard form contracts, based on a data-set of more than 6,000 clauses from more than 170 contracts, which we collected from German and English online shops and annotated based on a taxonomy of clause topics, that we developed together with legal experts. We will show that, in our comparison of seven approaches, from simple keyword matching to transformer language models, BERT performed best with an F1-score of up to 0.91, however much simpler and computationally cheaper models like logistic regression also achieved similarly good results of up to 0.87.
2022.ecnlp-1.23
braun-matthes-2022-clause
+ 10.18653/v1/2022.ecnlp-1.23
Investigating the Generative Approach for Question Answering in E-Commerce
@@ -302,6 +325,7 @@
Many e-commerce websites provide Product-related Question Answering (PQA) platform where potential customers can ask questions related to a product, and other consumers can post an answer to that question based on their experience. Recently, there has been a growing interest in providing automated responses to product questions. In this paper, we investigate the suitability of the generative approach for PQA. We use state-of-the-art generative models proposed by Deng et al.(2020) and Lu et al.(2020) for this purpose. On closer examination, we find several drawbacks in this approach: (1) input reviews are not always utilized significantly for answer generation, (2) the performance of the models is abysmal while answering the numerical questions, (3) many of the generated answers contain phrases like “I do not know” which are taken from the reference answer in training data, and these answers do not convey any information to the customer. Although these approaches achieve a high ROUGE score, it does not reflect upon these shortcomings of the generated answers. We hope that our analysis will lead to more rigorous PQA approaches, and future research will focus on addressing these shortcomings in PQA.
2022.ecnlp-1.24
roy-etal-2022-investigating
+ 10.18653/v1/2022.ecnlp-1.24
Utilizing Cross-Modal Contrastive Learning to Improve Item Categorization BERT Model
@@ -311,6 +335,7 @@
Item categorization (IC) is a core natural language processing (NLP) task in e-commerce. As a special text classification task, fine-tuning pre-trained models, e.g., BERT, has become a mainstream solution. To improve IC performance further, other product metadata, e.g., product images, have been used. Although multimodal IC (MIC) systems show higher performance, expanding from processing text to more resource-demanding images brings large engineering impacts and hinders the deployment of such dual-input MIC systems. In this paper, we proposed a new way of using product images to improve text-only IC model: leveraging cross-modal signals between products’ titles and associated images to adapt BERT models in a self-supervised learning (SSL) way. Our experiments on the three genres in the public Amazon product dataset show that the proposed method generates improved prediction accuracy and macro-F1 values than simply using the original BERT. Moreover, the proposed method is able to keep using existing text-only IC inference implementation and shows a resource advantage than the deployment of a dual-input MIC system.
2022.ecnlp-1.25
chen-chou-2022-utilizing
+ 10.18653/v1/2022.ecnlp-1.25
Towards Generalizeable Semantic Product Search by Text Similarity Pre-training on Search Click Logs
@@ -324,6 +349,7 @@
Recently, semantic search has been successfully applied to E-commerce product search and the learned semantic space for query and product encoding are expected to generalize well to unseen queries or products. Yet, whether generalization can conveniently emerge has not been thoroughly studied in the domain thus far. In this paper, we examine several general-domain and domain-specific pre-trained Roberta variants and discover that general-domain fine-tuning does not really help generalization which aligns with the discovery of prior art, yet proper domain-specific fine-tuning with clickstream data can lead to better model generalization, based on a bucketed analysis of a manually annotated query-product relevance data.
2022.ecnlp-1.26
liu-etal-2022-towards
+ 10.18653/v1/2022.ecnlp-1.26
Can Pretrained Language Models Generate Persuasive, Faithful, and Informative Ad Text for Product Descriptions?
@@ -334,6 +360,7 @@
For any e-commerce service, persuasive, faithful, and informative product descriptions can attract shoppers and improve sales. While not all sellers are capable of providing such interesting descriptions, a language generation system can be a source of such descriptions at scale, and potentially assist sellers to improve their product descriptions. Most previous work has addressed this task based on statistical approaches (Wang et al., 2017), limited attributes such as titles (Chen et al., 2019; Chan et al., 2020), and focused on only one product type (Wang et al., 2017; Munigala et al., 2018; Hong et al., 2021). In this paper, we jointly train image features and 10 text attributes across 23 diverse product types, with two different target text types with different writing styles: bullet points and paragraph descriptions. Our findings suggest that multimodal training with modern pretrained language models can generate fluent and persuasive advertisements, but are less faithful and informative, especially out of domain.
2022.ecnlp-1.27
koto-etal-2022-pretrained
+ 10.18653/v1/2022.ecnlp-1.27
A Simple Baseline for Domain Adaptation in End to End ASR Systems Using Synthetic Data
@@ -343,6 +370,7 @@
Automatic Speech Recognition(ASR) has been dominated by deep learning-based end-to-end speech recognition models. These approaches require large amounts of labeled data in the form of audio-text pairs. Moreover, these models are more susceptible to domain shift as compared to traditional models. It is common practice to train generic ASR models and then adapt them to target domains using comparatively smaller data sets. We consider a more extreme case of domain adaptation where text-only corpus is available. In this work, we propose a simple baseline technique for domain adaptation in end-to-end speech recognition models. We convert the text-only corpus to audio data using single speaker Text to Speech (TTS) engine. The parallel data in the target domain is then used to fine-tune the final dense layer of generic ASR models. We show that single speaker synthetic TTS data coupled with final dense layer only fine-tuning provides reasonable improvements in word error rates. We use text data from address and e-commerce search domains to show the effectiveness of our low-cost baseline approach on CTC and attention-based models.
2022.ecnlp-1.28
joshi-singh-2022-simple
+ 10.18653/v1/2022.ecnlp-1.28
Lot or Not: Identifying Multi-Quantity Offerings in E-Commerce
@@ -352,6 +380,7 @@
The term lot in is defined to mean an offering that contains a collection of multiple identical items for sale. In a large online marketplace, lot offerings play an important role, allowing buyers and sellers to set price levels to optimally balance supply and demand needs. In spite of their central role, platforms often struggle to identify lot offerings, since explicit lot status identification is frequently not provided by sellers. The ability to identify lot offerings plays a key role in many fundamental tasks, from matching offerings to catalog products, through ranking search results, to providing effective pricing guidance. In this work, we seek to determine the lot status (and lot size) of each offering, in order to facilitate an improved buyer experience, while reducing the friction for sellers posting new offerings. We demonstrate experimentally the ability to accurately classify offerings as lots and predict their lot size using only the offer title, by adapting state-of-the-art natural language techniques to the lot identification problem.
2022.ecnlp-1.29
lavee-guy-2022-lot
+ 10.18653/v1/2022.ecnlp-1.29
diff --git a/data/xml/2022.fever.xml b/data/xml/2022.fever.xml
index 43438c9852..be78c35d52 100644
--- a/data/xml/2022.fever.xml
+++ b/data/xml/2022.fever.xml
@@ -34,6 +34,7 @@
IIRC
QASC
eQASC
+ 10.18653/v1/2022.fever-1.1
Heterogeneous-Graph Reasoning and Fine-Grained Aggregation for Fact Checking
@@ -44,6 +45,7 @@
2022.fever-1.2
lin-fu-2022-heterogeneous
FEVER
+ 10.18653/v1/2022.fever-1.2
Distilling Salient Reviews with Zero Labels
@@ -57,6 +59,7 @@
Many people read online reviews to learn about real-world entities of their interest. However, majority of reviews only describes general experiences and opinions of the customers, and may not reveal facts that are specific to the entity being reviewed. In this work, we focus on a novel task of mining from a review corpus sentences that are unique for each entity. We refer to this task as Salient Fact Extraction. Salient facts are extremely scarce due to their very nature. Consequently, collecting labeled examples for training supervised models is tedious and cost-prohibitive. To alleviate this scarcity problem, we develop an unsupervised method, ZL-Distiller, which leverages contextual language representations of the reviews and their distributional patterns to identify salient sentences about entities. Our experiments on multiple domains (hotels, products, and restaurants) show that ZL-Distiller achieves state-of-the-art performance and further boosts the performance of other supervised/unsupervised algorithms for the task. Furthermore, we show that salient sentences mined by ZL-Distiller provide unique and detailed information about entities, which benefit downstream NLP applications including question answering and summarization.
2022.fever-1.3
huang-etal-2022-distilling
+ 10.18653/v1/2022.fever-1.3
Automatic Fake News Detection: Are current models “fact-checking” or“gut-checking”?
@@ -71,6 +74,7 @@
kelk-etal-2022-automatic
PolitiFact
Snopes
+ 10.18653/v1/2022.fever-1.4
A Semantics-Aware Approach to Automated Claim Verification
@@ -82,6 +86,7 @@
2022.fever-1.5
calvo-figueras-etal-2022-semantics
FEVER
+ 10.18653/v1/2022.fever-1.5
PHEMEPlus: Enriching Social Media Rumour Verification with External Evidence
@@ -95,6 +100,7 @@
2022.fever-1.6
dougrez-lewis-etal-2022-phemeplus
FEVER
+ 10.18653/v1/2022.fever-1.6
XInfoTabS: Evaluating Multilingual Tabular Natural Language Inference
@@ -108,6 +114,7 @@
2022.fever-1.7
minhas-etal-2022-xinfotabs
TabFact
+ 10.18653/v1/2022.fever-1.7
Neural Machine Translation for Fact-checking Temporal Claims
@@ -119,6 +126,7 @@
Computational fact-checking aims at supporting the verification process of textual claims by exploiting trustworthy sources. However, there are large classes of complex claims that cannot be automatically verified, for instance those related to temporal reasoning. To this aim, in this work, we focus on the verification of economic claims against time series sources.Starting from given textual claims in natural language, we propose a neural machine translation approach to produce respective queries expressed in a recently proposed temporal fragment of the Datalog language. The adopted deep neural approach shows promising preliminary results for the translation of 10 categories of claims extracted from real use cases.
2022.fever-1.8
mori-etal-2022-neural
+ 10.18653/v1/2022.fever-1.8
diff --git a/data/xml/2022.findings.xml b/data/xml/2022.findings.xml
index 11f3729ec8..d88f428e60 100644
--- a/data/xml/2022.findings.xml
+++ b/data/xml/2022.findings.xml
@@ -30,6 +30,7 @@
Whole word masking (WWM), which masks all subwords corresponding to a word at once, makes a better English BERT model. For the Chinese language, however, there is no subword because each token is an atomic character. The meaning of a word in Chinese is different in that a word is a compositional unit consisting of multiple characters. Such difference motivates us to investigate whether WWM leads to better context understanding ability for Chinese BERT. To achieve this, we introduce two probing tasks related to grammatical error correction and ask pretrained models to revise or insert tokens in a masked language modeling manner. We construct a dataset including labels for 19,075 tokens in 10,448 sentences. We train three Chinese BERT models with standard character-level masking (CLM), WWM, and a combination of CLM and WWM, respectively. Our major findings are as follows: First, when one character needs to be inserted or replaced, the model trained with CLM performs the best. Second, when more than one character needs to be handled, WWM is the key to better performance. Finally, when being fine-tuned on sentence-level downstream tasks, models trained with different masking strategies perform comparably.
2022.findings-acl.1
dai-etal-2022-whole
+ 10.18653/v1/2022.findings-acl.1
Compilable Neural Code Generation with Compiler Feedback
@@ -48,6 +49,7 @@
2022.findings-acl.2
wang-etal-2022-compilable
CodeSearchNet
+ 10.18653/v1/2022.findings-acl.2
Towards Unifying the Label Space for Aspect- and Sentence-based Sentiment Analysis
@@ -59,6 +61,7 @@
The aspect-based sentiment analysis (ABSA) is a fine-grained task that aims to determine the sentiment polarity towards targeted aspect terms occurring in the sentence. The development of the ABSA task is very much hindered by the lack of annotated data. To tackle this, the prior works have studied the possibility of utilizing the sentiment analysis (SA) datasets to assist in training the ABSA model, primarily via pretraining or multi-task learning. In this article, we follow this line, and for the first time, we manage to apply the Pseudo-Label (PL) method to merge the two homogeneous tasks. While it seems straightforward to use generated pseudo labels to handle this case of label granularity unification for two highly related tasks, we identify its major challenge in this paper and propose a novel framework, dubbed as Dual-granularity Pseudo Labeling (DPL). Further, similar to PL, we regard the DPL as a general framework capable of combining other prior methods in the literature. Through extensive experiments, DPL has achieved state-of-the-art performance on standard benchmarks surpassing the prior work significantly.
2022.findings-acl.3
zhang-etal-2022-towards
+ 10.18653/v1/2022.findings-acl.3
Input-specific Attention Subnetworks for Adversarial Detection
@@ -78,6 +81,7 @@
QNLI
SNLI
SST
+ 10.18653/v1/2022.findings-acl.4
RelationPrompt: Leveraging Prompts to Generate Synthetic Data for Zero-Shot Relation Triplet Extraction
@@ -92,6 +96,7 @@
declare-lab/relationprompt
FewRel
Wiki-ZSL
+ 10.18653/v1/2022.findings-acl.5
Pre-Trained Multilingual Sequence-to-Sequence Models: A Hope for Low-Resource Language Translation?
@@ -108,6 +113,7 @@
2022.findings-acl.6.software.zip
lee-etal-2022-pre
PMIndia
+ 10.18653/v1/2022.findings-acl.6
Multi-Scale Distribution Deep Variational Autoencoder for Explanation Generation
@@ -120,6 +126,7 @@
Generating explanations for recommender systems is essential for improving their transparency, as users often wish to understand the reason for receiving a specified recommendation. Previous methods mainly focus on improving the generation quality, but often produce generic explanations that fail to incorporate user and item specific details. To resolve this problem, we present Multi-Scale Distribution Deep Variational Autoencoders (MVAE).These are deep hierarchical VAEs with a prior network that eliminates noise while retaining meaningful signals in the input, coupled with a recognition network serving as the source of information to guide the learning of the prior network. Further, the Multi-scale distribution Learning Framework (MLF) along with a Target Tracking Kullback-Leibler divergence (TKL) mechanism are proposed to employ multi KL divergences at different scales for more effective learning. Extensive empirical experiments demonstrate that our methods can generate explanations with concrete input-specific contents.
2022.findings-acl.7
cai-etal-2022-multi
+ 10.18653/v1/2022.findings-acl.7
Dual Context-Guided Continuous Prompt Tuning for Few-Shot Learning
@@ -133,6 +140,7 @@
Prompt-based paradigm has shown its competitive performance in many NLP tasks. However, its success heavily depends on prompt design, and the effectiveness varies upon the model and training data. In this paper, we propose a novel dual context-guided continuous prompt (DCCP) tuning method. To explore the rich contextual information in language structure and close the gap between discrete prompt tuning and continuous prompt tuning, DCCP introduces two auxiliary training objectives and constructs input in a pair-wise fashion.Experimental results demonstrate that our method is applicable to many NLP tasks, and can often outperform existing prompt tuning methods by a large margin in the few-shot setting.
2022.findings-acl.8
zhou-etal-2022-dual
+ 10.18653/v1/2022.findings-acl.8
Extract-Select: A Span Selection Framework for Nested Named Entity Recognition with Generative Adversarial Training
@@ -146,6 +154,7 @@
Nested named entity recognition (NER) is a task in which named entities may overlap with each other. Span-based approaches regard nested NER as a two-stage span enumeration and classification task, thus having the innate ability to handle this task. However, they face the problems of error propagation, ignorance of span boundary, difficulty in long entity recognition and requirement on large-scale annotated data. In this paper, we propose Extract-Select, a span selection framework for nested NER, to tackle these problems. Firstly, we introduce a span selection framework in which nested entities with different input categories would be separately extracted by the extractor, thus naturally avoiding error propagation in two-stage span-based approaches. In the inference phase, the trained extractor selects final results specific to the given entity category. Secondly, we propose a hybrid selection strategy in the extractor, which not only makes full use of span boundary but also improves the ability of long entity recognition. Thirdly, we design a discriminator to evaluate the extraction result, and train both extractor and discriminator with generative adversarial training (GAT). The use of GAT greatly alleviates the stress on the dataset size. Experimental results on four benchmark datasets demonstrate that Extract-Select outperforms competitive nested NER models, obtaining state-of-the-art results. The proposed model also performs well when less labeled data are given, proving the effectiveness of GAT.
2022.findings-acl.9
huang-etal-2022-extract
+ 10.18653/v1/2022.findings-acl.9
Controlled Text Generation Using Dictionary Prior in Variational Autoencoders
@@ -161,6 +170,7 @@
fang-etal-2022-controlled
Penn Treebank
SNLI
+ 10.18653/v1/2022.findings-acl.10
Challenges to Open-Domain Constituency Parsing
@@ -175,6 +185,7 @@
yang-etal-2022-challenges
ringos/multi-domain-parsing-analysis
Penn Treebank
+ 10.18653/v1/2022.findings-acl.11
Going “Deeper”: Structured Sememe Prediction via Transformer with Tree Attention
@@ -188,6 +199,7 @@
2022.findings-acl.12.software.zip
ye-etal-2022-going
thunlp/stg
+ 10.18653/v1/2022.findings-acl.12
Table-based Fact Verification with Self-adaptive Mixture of Experts
@@ -201,6 +213,7 @@
zhou-etal-2022-table
thumlp/samoe
TabFact
+ 10.18653/v1/2022.findings-acl.13
Investigating Data Variance in Evaluations of Automatic Machine Translation Metrics
@@ -215,6 +228,7 @@
Current practices in metric evaluation focus on one single dataset, e.g., Newstest dataset in each year’s WMT Metrics Shared Task. However, in this paper, we qualitatively and quantitatively show that the performances of metrics are sensitive to data. The ranking of metrics varies when the evaluation is conducted on different datasets. Then this paper further investigates two potential hypotheses, i.e., insignificant data points and the deviation of i.i.d assumption, which may take responsibility for the issue of data variance. In conclusion, our findings suggest that when evaluating automatic translation metrics, researchers should take data variance into account and be cautious to report the results on unreliable datasets, because it may leads to inconsistent results with most of the other datasets.
2022.findings-acl.14
xiang-etal-2022-investigating
+ 10.18653/v1/2022.findings-acl.14
Sememe Prediction for BabelNet Synsets using Multilingual and Multimodal Information
@@ -231,6 +245,7 @@
qi-etal-2022-sememe
thunlp/msgi
ImageNet
+ 10.18653/v1/2022.findings-acl.15
Query and Extract: Refining Event Extraction as Type-oriented Binary Decoding
@@ -244,6 +259,7 @@
2022.findings-acl.16
wang-etal-2022-query
MAVEN
+ 10.18653/v1/2022.findings-acl.16
LEVEN: A Large-Scale Chinese Legal Event Detection Dataset
@@ -264,6 +280,7 @@
yao-etal-2022-leven
thunlp/leven
MAVEN
+ 10.18653/v1/2022.findings-acl.17
Analyzing Dynamic Adversarial Training Data in the Limit
@@ -278,6 +295,7 @@
facebookresearch/dadc-limit
FEVER
SNLI
+ 10.18653/v1/2022.findings-acl.18
AbductionRules: Training Transformers to Explain Unexpected Inputs
@@ -291,6 +309,7 @@
young-etal-2022-abductionrules
strong-ai-lab/abductionrules
ProofWriter
+ 10.18653/v1/2022.findings-acl.19
On the Importance of Data Size in Probing Fine-tuned Models
@@ -306,6 +325,7 @@
GLUE
MRPC
SST
+ 10.18653/v1/2022.findings-acl.20
RuCCoN: Clinical Concept Normalization in Russian
@@ -324,6 +344,7 @@
2022.findings-acl.21
nesterov-etal-2022-ruccon
XL-BEL
+ 10.18653/v1/2022.findings-acl.21
A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive Learning Framework for Sentence Embeddings
@@ -337,6 +358,7 @@
2022.findings-acl.22
tan-etal-2022-sentence
namco0816/pt-bert
+ 10.18653/v1/2022.findings-acl.22
Eider: Empowering Document-level Relation Extraction with Efficient Evidence Extraction and Inference-stage Fusion
@@ -351,6 +373,7 @@
xie-etal-2022-eider
veronicium/eider
DocRED
+ 10.18653/v1/2022.findings-acl.23
Meta-X_{NLG}: A Meta-Learning Approach Based on Language Clustering for Zero-Shot Cross-Lingual Transfer and Generation
@@ -366,6 +389,7 @@
TyDi QA
WikiLingua
XQuAD
+ 10.18653/v1/2022.findings-acl.24
MR-P: A Parallel Decoding Algorithm for Iterative Refinement Non-Autoregressive Translation
@@ -375,6 +399,7 @@
Non-autoregressive translation (NAT) predicts all the target tokens in parallel and significantly speeds up the inference process. The Conditional Masked Language Model (CMLM) is a strong baseline of NAT. It decodes with the Mask-Predict algorithm which iteratively refines the output. Most works about CMLM focus on the model structure and the training objective. However, the decoding algorithm is equally important. We propose a simple, effective, and easy-to-implement decoding algorithm that we call MaskRepeat-Predict (MR-P). The MR-P algorithm gives higher priority to consecutive repeated tokens when selecting tokens to mask for the next iteration and stops the iteration after target tokens converge. We conduct extensive experiments on six translation directions with varying data sizes. The results show that MR-P significantly improves the performance with the same model parameters. Specifically, we achieve a BLEU increase of 1.39 points in the WMT’14 En-De translation task.
2022.findings-acl.25
cheng-zhang-2022-mr
+ 10.18653/v1/2022.findings-acl.25
Open Relation Modeling: Learning to Define Relations between Entities
@@ -388,6 +413,7 @@
2022.findings-acl.26.software.zip
huang-etal-2022-open
jeffhj/open-relation-modeling
+ 10.18653/v1/2022.findings-acl.26
A Slot Is Not Built in One Utterance: Spoken Language Dialogs with Sub-Slots
@@ -409,6 +435,7 @@
SSD_NAME
SSD_PHONE
SSD_PLATE
+ 10.18653/v1/2022.findings-acl.27
Towards Transparent Interactive Semantic Parsing via Step-by-Step Correction
@@ -424,6 +451,7 @@
BREAK
GEM
SPLASH
+ 10.18653/v1/2022.findings-acl.28
MINER: Multi-Interest Matching Network for News Recommendation
@@ -440,6 +468,7 @@
2022.findings-acl.29
li-etal-2022-miner
MIND
+ 10.18653/v1/2022.findings-acl.29
KSAM: Infusing Multi-Source Knowledge into Dialogue Generation via Knowledge Source Aware Multi-Head Decoding
@@ -451,6 +480,7 @@
Knowledge-enhanced methods have bridged the gap between human beings and machines in generating dialogue responses. However, most previous works solely seek knowledge from a single source, and thus they often fail to obtain available knowledge because of the insufficient coverage of a single knowledge source. To this end, infusing knowledge from multiple sources becomes a trend. This paper proposes a novel approach Knowledge Source Aware Multi-Head Decoding, KSAM, to infuse multi-source knowledge into dialogue generation more efficiently. Rather than following the traditional single decoder paradigm, KSAM uses multiple independent source-aware decoder heads to alleviate three challenging problems in infusing multi-source knowledge, namely, the diversity among different knowledge sources, the indefinite knowledge alignment issue, and the insufficient flexibility/scalability in knowledge usage. Experiments on a Chinese multi-source knowledge-aligned dataset demonstrate the superior performance of KSAM against various competitive approaches.
2022.findings-acl.30
wu-etal-2022-ksam
+ 10.18653/v1/2022.findings-acl.30
Towards Responsible Natural Language Annotation for the Varieties of Arabic
@@ -460,6 +490,7 @@
When building NLP models, there is a tendency to aim for broader coverage, often overlooking cultural and (socio)linguistic nuance. In this position paper, we make the case for care and attention to such nuances, particularly in dataset annotation, as well as the inclusion of cultural and linguistic expertise in the process. We present a playbook for responsible dataset creation for polyglossic, multidialectal languages. This work is informed by a study on Arabic annotation of social media content.
2022.findings-acl.31
bergman-diab-2022-towards
+ 10.18653/v1/2022.findings-acl.31
Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection
@@ -472,6 +503,7 @@
2022.findings-acl.32
bose-etal-2022-dynamically
tbose20/d-ref
+ 10.18653/v1/2022.findings-acl.32
Towards Large-Scale Interpretable Knowledge Graph Reasoning for Dialogue Systems
@@ -486,6 +518,7 @@
2022.findings-acl.33
tuan-etal-2022-towards
OpenDialKG
+ 10.18653/v1/2022.findings-acl.33
MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction
@@ -502,6 +535,7 @@
2022.findings-acl.34
zhang-etal-2022-mderank
linhanz/mderank
+ 10.18653/v1/2022.findings-acl.34
Visualizing the Relationship Between Encoded Linguistic Information and Task Performance
@@ -515,6 +549,7 @@
Probing is popular to analyze whether linguistic information can be captured by a well-trained deep neural model, but it is hard to answer how the change of the encoded linguistic information will affect task performance. To this end, we study the dynamic relationship between the encoded linguistic information and task performance from the viewpoint of Pareto Optimality. Its key idea is to obtain a set of models which are Pareto-optimal in terms of both objectives. From this viewpoint, we propose a method to optimize the Pareto-optimal models by formalizing it as a multi-objective optimization problem. We conduct experiments on two popular NLP tasks, i.e., machine translation and language modeling, and investigate the relationship between several kinds of linguistic information and task performances. Experimental results demonstrate that the proposed method is better than a baseline method. Our empirical findings suggest that some syntactic information is helpful for NLP tasks whereas encoding more syntactic information does not necessarily lead to better performance, because the model architecture is also an important factor.
2022.findings-acl.35
xiang-etal-2022-visualizing
+ 10.18653/v1/2022.findings-acl.35
Efficient Argument Structure Extraction with Transfer Learning and Active Learning
@@ -525,6 +560,7 @@
2022.findings-acl.36
hua-wang-2022-efficient
CDCP
+ 10.18653/v1/2022.findings-acl.36
Plug-and-Play Adaptation for Continuously-updated QA
@@ -541,6 +577,7 @@
lee-etal-2022-plug
Natural Questions
SituatedQA
+ 10.18653/v1/2022.findings-acl.37
Reinforced Cross-modal Alignment for Radiology Report Generation
@@ -552,6 +589,7 @@
2022.findings-acl.38.software.zip
qin-song-2022-reinforced
CheXpert
+ 10.18653/v1/2022.findings-acl.38
What Works and Doesn’t Work, A Deep Decoder for Neural Machine Translation
@@ -565,6 +603,7 @@
Deep learning has demonstrated performance advantages in a wide range of natural language processing tasks, including neural machine translation (NMT). Transformer NMT models are typically strengthened by deeper encoder layers, but deepening their decoder layers usually results in failure. In this paper, we first identify the cause of the failure of the deep decoder in the Transformer model. Inspired by this discovery, we then propose approaches to improving it, with respect to model structure and model training, to make the deep decoder practical in NMT. Specifically, with respect to model structure, we propose a cross-attention drop mechanism to allow the decoder layers to perform their own different roles, to reduce the difficulty of deep-decoder learning. For model training, we propose a collapse reducing training approach to improve the stability and effectiveness of deep-decoder training. We experimentally evaluated our proposed Transformer NMT model structure modification and novel training methods on several popular machine translation benchmarks. The results showed that deepening the NMT model by increasing the number of decoder layers successfully prevented the deepened decoder from degrading to an unconditional language model. In contrast to prior work on deepening an NMT model on the encoder, our method can deepen the model on both the encoder and decoder at the same time, resulting in a deeper model and improved performance.
2022.findings-acl.39
li-etal-2022-works
+ 10.18653/v1/2022.findings-acl.39
SyMCoM - Syntactic Measure of Code Mixing A Study Of English-Hindi Code-Mixing
@@ -578,6 +617,7 @@
2022.findings-acl.40
kodali-etal-2022-symcom
LinCE
+ 10.18653/v1/2022.findings-acl.40
HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data
@@ -598,6 +638,7 @@
RecipeQA
SQA
ShARC
+ 10.18653/v1/2022.findings-acl.41
NEWTS: A Corpus for News Topic-Focused Summarization
@@ -608,6 +649,7 @@
Text summarization models are approaching human levels of fidelity. Existing benchmarking corpora provide concordant pairs of full and abridged versions of Web, news or professional content. To date, all summarization datasets operate under a one-size-fits-all paradigm that may not reflect the full range of organic summarization needs. Several recently proposed models (e.g., plug and play language models) have the capacity to condition the generated summaries on a desired range of themes. These capacities remain largely unused and unevaluated as there is no dedicated dataset that would support the task of topic-focused summarization.This paper introduces the first topical summarization corpus NEWTS, based on the well-known CNN/Dailymail dataset, and annotated via online crowd-sourcing. Each source article is paired with two reference summaries, each focusing on a different theme of the source document. We evaluate a representative range of existing techniques and analyze the effectiveness of different prompting methods.
2022.findings-acl.42
bahrainian-etal-2022-newts
+ 10.18653/v1/2022.findings-acl.42
Classification without (Proper) Representation: Political Heterogeneity in Social Media and Its Implications for Classification and Behavioral Analysis
@@ -618,6 +660,7 @@
Reddit is home to a broad spectrum of political activity, and users signal their political affiliations in multiple ways—from self-declarations to community participation. Frequently, computational studies have treated political users as a single bloc, both in developing models to infer political leaning and in studying political behavior. Here, we test this assumption of political users and show that commonly-used political-inference models do not generalize, indicating heterogeneous types of political users. The models remain imprecise at best for most users, regardless of which sources of data or methods are used. Across a 14-year longitudinal analysis, we demonstrate that the choice in definition of a political user has significant implications for behavioral analysis. Controlling for multiple factors, political users are more toxic on the platform and inter-party interactions are even more toxic—but not all political users behave this way. Last, we identify a subset of political users who repeatedly flip affiliations, showing that these users are the most controversial of all, acting as provocateurs by more frequently bringing up politics, and are more likely to be banned, suspended, or deleted.
2022.findings-acl.43
alkiek-etal-2022-classification
+ 10.18653/v1/2022.findings-acl.43
Toward More Meaningful Resources for Lower-resourced Languages
@@ -631,6 +674,7 @@
lignos-etal-2022-toward
MasakhaNER
WikiAnn
+ 10.18653/v1/2022.findings-acl.44
Better Quality Estimation for Low Resource Corpus Mining
@@ -643,6 +687,7 @@
2022.findings-acl.45.software.zip
kocyigit-etal-2022-better
MLQE
+ 10.18653/v1/2022.findings-acl.45
End-to-End Segmentation-based News Summarization
@@ -654,6 +699,7 @@
2022.findings-acl.46
liu-etal-2022-end
CNN/Daily Mail
+ 10.18653/v1/2022.findings-acl.46
Fast Nearest Neighbor Machine Translation
@@ -669,6 +715,7 @@
2022.findings-acl.47
meng-etal-2022-fast
ShannonAI/fast-knn-nmt
+ 10.18653/v1/2022.findings-acl.47
Extracting Latent Steering Vectors from Pretrained Language Models
@@ -681,6 +728,7 @@
subramani-etal-2022-extracting
nishantsubramani/steering_vectors
StylePTB
+ 10.18653/v1/2022.findings-acl.48
Domain Generalisation of NMT: Fusing Adapters with Leave-One-Domain-Out Training
@@ -692,6 +740,7 @@
Generalising to unseen domains is under-explored and remains a challenge in neural machine translation. Inspired by recent research in parameter-efficient transfer learning from pretrained models, this paper proposes a fusion-based generalisation method that learns to combine domain-specific parameters. We propose a leave-one-domain-out training strategy to avoid information leaking to address the challenge of not knowing the test domain during training time. Empirical results on three language pairs show that our proposed fusion method outperforms other baselines up to +0.8 BLEU score on average.
2022.findings-acl.49
vu-etal-2022-domain
+ 10.18653/v1/2022.findings-acl.49
Reframing Instructional Prompts to GPTk’s Language
@@ -709,6 +758,7 @@
MC-TACO
QASC
WinoGrande
+ 10.18653/v1/2022.findings-acl.50
Read Top News First: A Document Reordering Approach for Multi-Document News Summarization
@@ -725,6 +775,7 @@
zhaochaocs/mds-dr
CNN/Daily Mail
Multi-News
+ 10.18653/v1/2022.findings-acl.51
Human Language Modeling
@@ -737,6 +788,7 @@
2022.findings-acl.52
soni-etal-2022-human
humanlab/hart
+ 10.18653/v1/2022.findings-acl.52
Inverse is Better! Fast and Accurate Prompt for Few-shot Slot Tagging
@@ -750,6 +802,7 @@
2022.findings-acl.53
hou-etal-2022-inverse
atmahou/promptslottagging
+ 10.18653/v1/2022.findings-acl.53
Cross-Modal Cloze Task: A New Task to Brain-to-Word Decoding
@@ -762,6 +815,7 @@
2022.findings-acl.54
zou-etal-2022-cross
littletreezou/cross-modal-cloze-task
+ 10.18653/v1/2022.findings-acl.54
Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal
@@ -780,6 +834,7 @@
2022.findings-acl.55
gupta-etal-2022-mitigating
WebText
+ 10.18653/v1/2022.findings-acl.55
Domain Representative Keywords Selection: A Probabilistic Approach
@@ -795,6 +850,7 @@
akash-etal-2022-domain
pritomsaha/keyword-selection
AMiner
+ 10.18653/v1/2022.findings-acl.56
Hierarchical Inductive Transfer for Continual Dialogue Learning
@@ -806,6 +862,7 @@
Pre-trained models have achieved excellent performance on the dialogue task. However, for the continual increase of online chit-chat scenarios, directly fine-tuning these models for each of the new tasks not only explodes the capacity of the dialogue system on the embedded devices but also causes knowledge forgetting on pre-trained models and knowledge interference among diverse dialogue tasks. In this work, we propose a hierarchical inductive transfer framework to learn and deploy the dialogue skills continually and efficiently. First, we introduce the adapter module into pre-trained models for learning new dialogue tasks. As the only trainable module, it is beneficial for the dialogue system on the embedded devices to acquire new dialogue skills with negligible additional parameters. Then, for alleviating knowledge interference between tasks yet benefiting the regularization between them, we further design hierarchical inductive transfer that enables new tasks to use general knowledge in the base adapter without being misled by diverse knowledge in task-specific adapters. Empirical evaluation and analysis indicate that our framework obtains comparable performance under deployment-friendly model capacity.
2022.findings-acl.57
feng-etal-2022-hierarchical
+ 10.18653/v1/2022.findings-acl.57
Why Exposure Bias Matters: An Imitation Learning Perspective of Error Accumulation in Language Generation
@@ -820,6 +877,7 @@
kushalarora/quantifying_exposure_bias
WikiText-103
WikiText-2
+ 10.18653/v1/2022.findings-acl.58
Question Answering Infused Pre-training of General-Purpose Contextualized Representations
@@ -840,6 +898,7 @@
SQuAD
SST
SearchQA
+ 10.18653/v1/2022.findings-acl.59
Automatic Song Translation for Tonal Languages
@@ -854,6 +913,7 @@
This paper develops automatic song translation (AST) for tonal languages and addresses the unique challenge of aligning words’ tones with melody of a song in addition to conveying the original meaning. We propose three criteria for effective AST—preserving meaning, singability and intelligibility—and design metrics for these criteria. We develop a new benchmark for English–Mandarin song translation and develop an unsupervised AST system, Guided AliGnment for Automatic Song Translation (GagaST), which combines pre-training with three decoding constraints. Both automatic and human evaluations show GagaST successfully balances semantics and singability.
2022.findings-acl.60
guo-etal-2022-automatic
+ 10.18653/v1/2022.findings-acl.60
Read before Generate! Faithful Long Form Question Answering with Machine Reading
@@ -873,6 +933,7 @@
KILT
MS MARCO
Natural Questions
+ 10.18653/v1/2022.findings-acl.61
A Simple yet Effective Relation Information Guided Approach for Few-Shot Relation Extraction
@@ -887,6 +948,7 @@
liu-etal-2022-simple
lylylylylyly/simplefsre
FewRel
+ 10.18653/v1/2022.findings-acl.62
MIMICause: Representation and automatic extraction of causal relation types from clinical notes
@@ -902,6 +964,7 @@
khetan-etal-2022-mimicause
MIMIC-III
ROCStories
+ 10.18653/v1/2022.findings-acl.63
Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective Distillation
@@ -915,6 +978,7 @@
zhao-etal-2022-compressing
xuandongzhao/hpd
SNLI
+ 10.18653/v1/2022.findings-acl.64
Debiasing Event Understanding for Visual Commonsense Tasks
@@ -929,6 +993,7 @@
2022.findings-acl.65.software.zip
seo-etal-2022-debiasing
VCR
+ 10.18653/v1/2022.findings-acl.65
Fact-Tree Reasoning for N-ary Question Answering over Knowledge Graphs
@@ -941,6 +1006,7 @@
Current Question Answering over Knowledge Graphs (KGQA) task mainly focuses on performing answer reasoning upon KGs with binary facts. However, it neglects the n-ary facts, which contain more than two entities. In this work, we highlight a more challenging but under-explored task: n-ary KGQA, i.e., answering n-ary facts questions upon n-ary KGs. Nevertheless, the multi-hop reasoning framework popular in binary KGQA task is not directly applicable on n-ary KGQA. We propose two feasible improvements: 1) upgrade the basic reasoning unit from entity or relation to fact, and 2) upgrade the reasoning structure from chain to tree. Therefore, we propose a novel fact-tree reasoning framework, FacTree, which integrates the above two upgrades. FacTree transforms the question into a fact tree and performs iterative fact reasoning on the fact tree to infer the correct answer. Experimental results on the n-ary KGQA dataset we constructed and two binary KGQA benchmarks demonstrate the effectiveness of FacTree compared with state-of-the-art methods.
2022.findings-acl.66
zhang-etal-2022-fact
+ 10.18653/v1/2022.findings-acl.66
DeepStruct: Pretraining of Language Models for Structure Prediction
@@ -965,6 +1031,7 @@
OPIEC
T-REx
TekGen
+ 10.18653/v1/2022.findings-acl.67
The Change that Matters in Discourse Parsing: Estimating the Impact of Domain Shift on Parser Error
@@ -977,6 +1044,7 @@
2022.findings-acl.68
atwell-etal-2022-change
anthonysicilia/change-that-matters-acl2022
+ 10.18653/v1/2022.findings-acl.68
Mukayese: Turkish NLP Strikes Back
@@ -990,6 +1058,7 @@
safaya-etal-2022-mukayese
alisafaya/mukayese
GLUE
+ 10.18653/v1/2022.findings-acl.69
Virtual Augmentation Supported Contrastive Learning of Sentence Representations
@@ -1003,6 +1072,7 @@
2022.findings-acl.70
zhang-etal-2022-virtual
amazon-research/sentence-representations
+ 10.18653/v1/2022.findings-acl.70
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
@@ -1020,6 +1090,7 @@
GLUE
RACE
SST
+ 10.18653/v1/2022.findings-acl.71
DS-TOD: Efficient Domain Specialization for Task-Oriented Dialog
@@ -1034,6 +1105,7 @@
hung-etal-2022-ds
umanlp/ds-tod
CCNet
+ 10.18653/v1/2022.findings-acl.72
Distinguishing Non-natural from Natural Adversarial Samples for More Robust Pre-trained Language Model
@@ -1048,6 +1120,7 @@
lilynlp/distinguishing-non-natural
IMDb Movie Reviews
SST
+ 10.18653/v1/2022.findings-acl.73
Learning Adaptive Axis Attentions in Fine-tuning: Beyond Fixed Sparse Attention Patterns
@@ -1067,6 +1140,7 @@
GLUE
LRA
QNLI
+ 10.18653/v1/2022.findings-acl.74
Using Interactive Feedback to Improve the Accuracy and Explainability of Question Answering Systems Post-Deployment
@@ -1079,6 +1153,7 @@
Most research on question answering focuses on the pre-deployment stage; i.e., building an accurate model for deployment.In this paper, we ask the question: Can we improve QA systems further post-deployment based on user interactions? We focus on two kinds of improvements: 1) improving the QA system’s performance itself, and 2) providing the model with the ability to explain the correctness or incorrectness of an answer.We collect a retrieval-based QA dataset, FeedbackQA, which contains interactive feedback from users. We collect this dataset by deploying a base QA system to crowdworkers who then engage with the system and provide feedback on the quality of its answers.The feedback contains both structured ratings and unstructured natural language explanations.We train a neural model with this feedback data that can generate explanations and re-score answer candidates. We show that feedback data not only improves the accuracy of the deployed QA system but also other stronger non-deployed systems. The generated explanations also help users make informed decisions about the correctness of answers.
2022.findings-acl.75
li-etal-2022-using
+ 10.18653/v1/2022.findings-acl.75
To be or not to be an Integer? Encoding Variables for Mathematical Text
@@ -1092,6 +1167,7 @@
2022.findings-acl.76
2022.findings-acl.76.software.zip
ferreira-etal-2022-integer
+ 10.18653/v1/2022.findings-acl.76
GRS: Combining Generation and Revision in Unsupervised Sentence Simplification
@@ -1106,6 +1182,7 @@
ASSET
CoLA
Newsela
+ 10.18653/v1/2022.findings-acl.77
BPE vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic Languages
@@ -1118,6 +1195,7 @@
Morphologically-rich polysynthetic languages present a challenge for NLP systems due to data sparsity, and a common strategy to handle this issue is to apply subword segmentation. We investigate a wide variety of supervised and unsupervised morphological segmentation methods for four polysynthetic languages: Nahuatl, Raramuri, Shipibo-Konibo, and Wixarika. Then, we compare the morphologically inspired segmentation methods against Byte-Pair Encodings (BPEs) as inputs for machine translation (MT) when translating to and from Spanish. We show that for all language pairs except for Nahuatl, an unsupervised morphological segmentation algorithm outperforms BPEs consistently and that, although supervised methods achieve better segmentation scores, they under-perform in MT challenges. Finally, we contribute two new morphological segmentation datasets for Raramuri and Shipibo-Konibo, and a parallel corpus for Raramuri–Spanish.
2022.findings-acl.78
mager-etal-2022-bpe
+ 10.18653/v1/2022.findings-acl.78
Distributed NLI: Learning to Predict Human Opinion Distributions for Language Reasoning
@@ -1132,6 +1210,7 @@
easonnie/ChaosNLI
ChaosNLI
SNLI
+ 10.18653/v1/2022.findings-acl.79
Morphological Processing of Low-Resource Languages: Where We Are and What’s Next
@@ -1146,6 +1225,7 @@
Automatic morphological processing can aid downstream natural language processing applications, especially for low-resource languages, and assist language documentation efforts for endangered languages. Having long been multilingual, the field of computational morphology is increasingly moving towards approaches suitable for languages with minimal or no annotated resources. First, we survey recent developments in computational morphology with a focus on low-resource languages. Second, we argue that the field is ready to tackle the logical next challenge: understanding a language’s morphology from raw text alone. We perform an empirical study on a truly unsupervised version of the paradigm completion task and show that, while existing state-of-the-art models bridged by two newly proposed models we devise perform reasonably, there is still much room for improvement. The stakes are high: solving this task will increase the language coverage of morphological resources by a number of magnitudes.
2022.findings-acl.80
wiemerslage-etal-2022-morphological
+ 10.18653/v1/2022.findings-acl.80
Learning and Evaluating Character Representations in Novels
@@ -1158,6 +1238,7 @@
2022.findings-acl.81
inoue-etal-2022-learning
naoya-i/charembench
+ 10.18653/v1/2022.findings-acl.81
Answer Uncertainty and Unanswerability in Multiple-Choice Machine Reading Comprehension
@@ -1169,6 +1250,7 @@
raina-gales-2022-answer
RACE
ReClor
+ 10.18653/v1/2022.findings-acl.82
Measuring the Language of Self-Disclosure across Corpora
@@ -1181,6 +1263,7 @@
Being able to reliably estimate self-disclosure – a key component of friendship and intimacy – from language is important for many psychology studies. We build single-task models on five self-disclosure corpora, but find that these models generalize poorly; the within-domain accuracy of predicted message-level self-disclosure of the best-performing model (mean Pearson’s r=0.69) is much higher than the respective across data set accuracy (mean Pearson’s r=0.32), due to both variations in the corpora (e.g., medical vs. general topics) and labeling instructions (target variables: self-disclosure, emotional disclosure, intimacy). However, some lexical features, such as expression of negative emotions and use of first person personal pronouns such as ‘I’ reliably predict self-disclosure across corpora. We develop a multi-task model that yields better results, with an average Pearson’s r of 0.37 for out-of-corpora prediction.
2022.findings-acl.83
reuel-etal-2022-measuring
+ 10.18653/v1/2022.findings-acl.83
When Chosen Wisely, More Data Is What You Need: A Universal Sample-Efficient Strategy For Data Augmentation
@@ -1199,6 +1282,7 @@
QNLI
SICK
SQuAD
+ 10.18653/v1/2022.findings-acl.84
Explaining Classes through Stable Word Attributions
@@ -1213,6 +1297,7 @@
2022.findings-acl.85.software.tgz
ronnqvist-etal-2022-explaining
turkunlp/class-explainer
+ 10.18653/v1/2022.findings-acl.85
What to Learn, and How: Toward Effective Learning from Rationales
@@ -1227,6 +1312,7 @@
FEVER
MultiRC
e-SNLI
+ 10.18653/v1/2022.findings-acl.86
Listening to Affected Communities to Define Extreme Speech: Dataset and Experiments
@@ -1241,6 +1327,7 @@
2022.findings-acl.87
maronikolakis-etal-2022-listening
antmarakis/xtremespeech
+ 10.18653/v1/2022.findings-acl.87
Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists
@@ -1255,6 +1342,7 @@
attanasio-etal-2022-entropy
g8a9/ear
MLMA Hate Speech
+ 10.18653/v1/2022.findings-acl.88
From BERT‘s Point of View: Revealing the Prevailing Contextual Differences
@@ -1265,6 +1353,7 @@
2022.findings-acl.89
2022.findings-acl.89.software.zip
schuster-hegelich-2022-berts
+ 10.18653/v1/2022.findings-acl.89
Learning Bias-reduced Word Embeddings Using Dictionary Definitions
@@ -1276,6 +1365,7 @@
2022.findings-acl.90
an-etal-2022-learning
haozhe-an/dd-glove
+ 10.18653/v1/2022.findings-acl.90
Knowledge Graph Embedding by Adaptive Limit Scoring Loss Using Dynamic Weighting Strategy
@@ -1291,6 +1381,7 @@
2022.findings-acl.91
yang-etal-2022-knowledge
FB15k-237
+ 10.18653/v1/2022.findings-acl.91
OCR Improves Machine Translation for Low-Resource Languages
@@ -1302,6 +1393,7 @@
We aim to investigate the performance of current OCR systems on low resource languages and low resource scripts.We introduce and make publicly available a novel benchmark, OCR4MT, consisting of real and synthetic data, enriched with noise, for 60 low-resource languages in low resource scripts. We evaluate state-of-the-art OCR systems on our benchmark and analyse most common errors. We show that OCR monolingual data is a valuable resource that can increase performance of Machine Translation models, when used in backtranslation. We then perform an ablation study to investigate how OCR errors impact Machine Translation performance and determine what is the minimum level of OCR quality needed for the monolingual data to be useful for Machine Translation.
2022.findings-acl.92
ignat-etal-2022-ocr
+ 10.18653/v1/2022.findings-acl.92
CoCoLM: Complex Commonsense Enhanced Language Model with Discourse Relations
@@ -1319,6 +1411,7 @@
LAMA
ROCStories
SuperGLUE
+ 10.18653/v1/2022.findings-acl.93
Learning to Robustly Aggregate Labeling Functions for Semi-supervised Data Programming
@@ -1334,6 +1427,7 @@
2022.findings-acl.94.software.zip
maheshwari-etal-2022-learning
SST
+ 10.18653/v1/2022.findings-acl.94
Multi-Granularity Semantic Aware Graph Model for Reducing Position Bias in Emotion Cause Pair Extraction
@@ -1346,6 +1440,7 @@
The emotion cause pair extraction (ECPE) task aims to extract emotions and causes as pairs from documents. We observe that the relative distance distribution of emotions and causes is extremely imbalanced in the typical ECPE dataset. Existing methods have set a fixed size window to capture relations between neighboring clauses. However, they neglect the effective semantic connections between distant clauses, leading to poor generalization ability towards position-insensitive data. To alleviate the problem, we propose a novel \textbf{M}ulti-\textbf{G}ranularity \textbf{S}emantic \textbf{A}ware \textbf{G}raph model (MGSAG) to incorporate fine-grained and coarse-grained semantic features jointly, without regard to distance limitation. In particular, we first explore semantic dependencies between clauses and keywords extracted from the document that convey fine-grained semantic features, obtaining keywords enhanced clause representations. Besides, a clause graph is also established to model coarse-grained semantic relations between clauses. Experimental results indicate that MGSAG surpasses the existing state-of-the-art ECPE models. Especially, MGSAG outperforms other models significantly in the condition of position-insensitive data.
2022.findings-acl.95
bao-etal-2022-multi
+ 10.18653/v1/2022.findings-acl.95
Cross-lingual Inference with A Chinese Entailment Graph
@@ -1362,6 +1457,7 @@
teddy-li/chineseentgraph
CLUE
FIGER
+ 10.18653/v1/2022.findings-acl.96
Multi-task Learning for Paraphrase Generation With Keyword and Part-of-Speech Reconstruction
@@ -1374,6 +1470,7 @@
2022.findings-acl.97.software.zip
xie-etal-2022-multi
COCO
+ 10.18653/v1/2022.findings-acl.97
MDCSpell: A Multi-task Detector-Corrector Framework for Chinese Spelling Correction
@@ -1385,6 +1482,7 @@
Chinese Spelling Correction (CSC) is a task to detect and correct misspelled characters in Chinese texts. CSC is challenging since many Chinese characters are visually or phonologically similar but with quite different semantic meanings. Many recent works use BERT-based language models to directly correct each character of the input sentence. However, these methods can be sub-optimal since they correct every character of the sentence only by the context which is easily negatively affected by the misspelled characters. Some other works propose to use an error detector to guide the correction by masking the detected errors. Nevertheless, these methods dampen the visual or phonological features from the misspelled characters which could be critical for correction. In this work, we propose a novel general detector-corrector multi-task framework where the corrector uses BERT to capture the visual and phonological features from each character in the raw sentence and uses a late fusion strategy to fuse the hidden states of the corrector with that of the detector to minimize the negative impact from the misspelled characters. Comprehensive experiments on benchmarks demonstrate that our proposed method can significantly outperform the state-of-the-art methods in the CSC task.
2022.findings-acl.98
zhu-etal-2022-mdcspell
+ 10.18653/v1/2022.findings-acl.98
S^2SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers
@@ -1402,6 +1500,7 @@
2022.findings-acl.99.software.zip
hui-etal-2022-s2sql
SPIDER
+ 10.18653/v1/2022.findings-acl.99
Constructing Open Cloze Tests Using Generation and Discrimination Capabilities of Transformers
@@ -1412,6 +1511,7 @@
This paper presents the first multi-objective transformer model for generating open cloze tests that exploits generation and discrimination capabilities to improve performance. Our model is further enhanced by tweaking its loss function and applying a post-processing re-ranking algorithm that improves overall test structure. Experiments using automatic and human evaluation show that our approach can achieve up to 82% accuracy according to experts, outperforming previous work and baselines. We also release a collection of high-quality open cloze tests along with sample system output and human annotations that can serve as a future benchmark.
2022.findings-acl.100
felice-etal-2022-constructing
+ 10.18653/v1/2022.findings-acl.100
Co-training an Unsupervised Constituency Parser with Weak Supervision
@@ -1424,6 +1524,7 @@
Nickil21/weakly-supervised-parsing
Chinese Treebank
Penn Treebank
+ 10.18653/v1/2022.findings-acl.101
HiStruct+: Improving Extractive Text Summarization with Hierarchical Structure Information
@@ -1438,6 +1539,7 @@
QianRuan/histruct
Pubmed
arXiv
+ 10.18653/v1/2022.findings-acl.102
An Isotropy Analysis in the Multilingual BERT Embedding Space
@@ -1448,6 +1550,7 @@
2022.findings-acl.103
rajaee-pilehvar-2022-isotropy
sara-rajaee/multilingual-isotropy
+ 10.18653/v1/2022.findings-acl.103
Multi-Stage Prompting for Knowledgeable Dialogue Generation
@@ -1464,6 +1567,7 @@
liu-etal-2022-multi
NVIDIA/Megatron-LM
Wizard of Wikipedia
+ 10.18653/v1/2022.findings-acl.104
\textrm{DuReader}_{\textrm{vis}}: A Chinese Dataset for Open-domain Document Visual Question Answering
@@ -1486,6 +1590,7 @@
InfographicVQA
Natural Questions
VisualMRC
+ 10.18653/v1/2022.findings-acl.105
Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive Bias to Sequence-to-sequence Models
@@ -1500,6 +1605,7 @@
mueller-etal-2022-coloring
sebschu/multilingual-transformations
mC4
+ 10.18653/v1/2022.findings-acl.106
C^3KG: A Chinese Commonsense Conversation Knowledge Graph
@@ -1517,6 +1623,7 @@
ATOMIC
ConceptNet
MOD
+ 10.18653/v1/2022.findings-acl.107
Graph Neural Networks for Multiparallel Word Alignment
@@ -1529,6 +1636,7 @@
After a period of decrease, interest in word alignments is increasing again for their usefulness in domains such as typological research, cross-lingual annotation projection and machine translation. Generally, alignment algorithms only use bitext and do not make use of the fact that many parallel corpora are multiparallel. Here, we compute high-quality word alignments between multiple language pairs by considering all language pairs together. First, we create a multiparallel word alignment graph, joining all bilingual word alignment pairs in one graph. Next, we use graph neural networks (GNNs) to exploit the graph structure. Our GNN approach (i) utilizes information about the meaning, position and language of the input words, (ii) incorporates information from multiple parallel sentences, (iii) adds and removes edges from the initial alignments, and (iv) yields a prediction model that can generalize beyond the training sentences. We show that community detection algorithms can provide valuable information for multiparallel word alignment. Our method outperforms previous work on three word alignment datasets and on a downstream task.
2022.findings-acl.108
imani-etal-2022-graph
+ 10.18653/v1/2022.findings-acl.108
Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors
@@ -1545,6 +1653,7 @@
wu-etal-2022-sentiment
albertwy/SWRM
Multimodal Opinionlevel Sentiment Intensity
+ 10.18653/v1/2022.findings-acl.109
A Novel Framework Based on Medical Concept Driven Attention for Explainable Medical Code Prediction via External Knowledge
@@ -1557,6 +1666,7 @@
Medical code prediction from clinical notes aims at automatically associating medical codes with the clinical notes. Rare code problem, the medical codes with low occurrences, is prominent in medical code prediction. Recent studies employ deep neural networks and the external knowledge to tackle it. However, such approaches lack interpretability which is a vital issue in medical application. Moreover, due to the lengthy and noisy clinical notes, such approaches fail to achieve satisfactory results. Therefore, in this paper, we propose a novel framework based on medical concept driven attention to incorporate external knowledge for explainable medical code prediction. In specific, both the clinical notes and Wikipedia documents are aligned into topic space to extract medical concepts using topic modeling. Then, the medical concept-driven attention mechanism is applied to uncover the medical code related concepts which provide explanations for medical code prediction. Experimental results on the benchmark dataset show the superiority of the proposed framework over several state-of-the-art baselines.
2022.findings-acl.110
wang-etal-2022-novel
+ 10.18653/v1/2022.findings-acl.110
Effective Unsupervised Constrained Text Generation based on Perturbed Masking
@@ -1568,6 +1678,7 @@
Unsupervised constrained text generation aims to generate text under a given set of constraints without any supervised data. Current state-of-the-art methods stochastically sample edit positions and actions, which may cause unnecessary search steps. In this paper, we propose PMCTG to improve effectiveness by searching for the best edit position and action in each step. Specifically, PMCTG extends perturbed masking technique to effectively search for the most incongruent token to edit. Then it introduces four multi-aspect scoring functions to select edit action to further reduce search difficulty. Since PMCTG does not require supervised data, it could be applied to different generation tasks. We show that under the unsupervised setting, PMCTG achieves new state-of-the-art results in two representative tasks, namely keywords- to-sentence generation and paraphrasing.
2022.findings-acl.111
fu-etal-2022-effective
+ 10.18653/v1/2022.findings-acl.111
Combining (Second-Order) Graph-Based and Headed-Span-Based Projective Dependency Parsing
@@ -1579,6 +1690,7 @@
yang-tu-2022-combining
sustcsonglin/span-based-dependency-parsing
Penn Treebank
+ 10.18653/v1/2022.findings-acl.112
End-to-End Speech Translation for Code Switched Speech
@@ -1595,6 +1707,7 @@
weller-etal-2022-end
apple/ml-code-switched-speech-translation
CoVoST
+ 10.18653/v1/2022.findings-acl.113
A Transformational Biencoder with In-Domain Negative Sampling for Zero-Shot Entity Linking
@@ -1609,6 +1722,7 @@
2022.findings-acl.114.software.zip
sun-etal-2022-transformational
ZESHEL
+ 10.18653/v1/2022.findings-acl.114
Finding the Dominant Winning Ticket in Pre-Trained Language Models
@@ -1626,6 +1740,7 @@
gong-etal-2022-finding
GLUE
QNLI
+ 10.18653/v1/2022.findings-acl.115
Thai Nested Named Entity Recognition Corpus
@@ -1642,6 +1757,7 @@
CoNLL-2003
DaN+
NNE
+ 10.18653/v1/2022.findings-acl.116
Two-Step Question Retrieval for Open-Domain QA
@@ -1660,6 +1776,7 @@
Natural Questions
PAQ
TriviaQA
+ 10.18653/v1/2022.findings-acl.117
Semantically Distributed Robust Optimization for Vision-and-Language Inference
@@ -1674,6 +1791,7 @@
gokhale-etal-2022-semantically
asu-apg/vli_sdro
Violin
+ 10.18653/v1/2022.findings-acl.118
Learning from Missing Relations: Contrastive Learning with Commonsense Knowledge Graphs for Commonsense Inference
@@ -1691,6 +1809,7 @@
yongho94/solar-framework_commonsense-inference
ConceptNet
Event2Mind
+ 10.18653/v1/2022.findings-acl.119
Capture Human Disagreement Distributions by Calibrated Networks for Natural Language Inference
@@ -1708,6 +1827,7 @@
wang-etal-2022-capture
ChaosNLI
MultiNLI
+ 10.18653/v1/2022.findings-acl.120
Efficient, Uncertainty-based Moderation of Neural Networks Text Classifiers
@@ -1720,6 +1840,7 @@
andersen-maalej-2022-efficient
jsandersen/cmt
IMDb Movie Reviews
+ 10.18653/v1/2022.findings-acl.121
Revisiting Automatic Evaluation of Extractive Summarization Task: Can We Do Better than ROUGE?
@@ -1730,6 +1851,7 @@
It has been the norm for a long time to evaluate automated summarization tasks using the popular ROUGE metric. Although several studies in the past have highlighted the limitations of ROUGE, researchers have struggled to reach a consensus on a better alternative until today. One major limitation of the traditional ROUGE metric is the lack of semantic understanding (relies on direct overlap of n-grams). In this paper, we exclusively focus on the extractive summarization task and propose a semantic-aware nCG (normalized cumulative gain)-based evaluation metric (called Sem-nCG) for evaluating this task. One fundamental contribution of the paper is that it demonstrates how we can generate more reliable semantic-aware ground truths for evaluating extractive summarization tasks without any additional human intervention. To the best of our knowledge, this work is the first of its kind. We have conducted extensive experiments with this new metric using the widely used CNN/DailyMail dataset. Experimental results show that the new Sem-nCG metric is indeed semantic-aware, shows higher correlation with human judgement (more reliable) and yields a large number of disagreements with the original ROUGE metric (suggesting that ROUGE often leads to inaccurate conclusions also verified by humans).
2022.findings-acl.122
akter-etal-2022-revisiting
+ 10.18653/v1/2022.findings-acl.122
Open Vocabulary Extreme Classification Using Generative Models
@@ -1744,6 +1866,7 @@
The extreme multi-label classification (XMC) task aims at tagging content with a subset of labels from an extremely large label set. The label vocabulary is typically defined in advance by domain experts and assumed to capture all necessary tags. However in real world scenarios this label set, although large, is often incomplete and experts frequently need to refine it. To develop systems that simplify this process, we introduce the task of open vocabulary XMC (OXMC): given a piece of content, predict a set of labels, some of which may be outside of the known tag set. Hence, in addition to not having training data for some labels–as is the case in zero-shot classification–models need to invent some labels on-thefly. We propose GROOV, a fine-tuned seq2seq model for OXMC that generates the set of labels as a flat sequence and is trained using a novel loss independent of predicted label order. We show the efficacy of the approach, experimenting with popular XMC datasets for which GROOV is able to predict meaningful labels outside the given vocabulary while performing on par with state-of-the-art solutions for known labels.
2022.findings-acl.123
simig-etal-2022-open
+ 10.18653/v1/2022.findings-acl.123
Decomposed Meta-Learning for Few-Shot Named Entity Recognition
@@ -1761,6 +1884,7 @@
CoNLL 2002
Few-NERD
WNUT 2017
+ 10.18653/v1/2022.findings-acl.124
TegTok: Augmenting Text Generation via Task-specific and Open-world Knowledge
@@ -1777,6 +1901,7 @@
2022.findings-acl.125
tan-etal-2022-tegtok
lxchtan/tegtok
+ 10.18653/v1/2022.findings-acl.125
EmoCaps: Emotion Capsule based Model for Conversational Emotion Recognition
@@ -1791,6 +1916,7 @@
li-etal-2022-emocaps
IEMOCAP
MELD
+ 10.18653/v1/2022.findings-acl.126
Logic-Driven Context Extension and Data Augmentation for Logical Reasoning of Text
@@ -1809,6 +1935,7 @@
wang-etal-2022-logic
WangsyGit/LReasoner
ReClor
+ 10.18653/v1/2022.findings-acl.127
Transfer Learning and Prediction Consistency for Detecting Offensive Spans of Text
@@ -1823,6 +1950,7 @@
2022.findings-acl.128
2022.findings-acl.128.software.zip
pouran-ben-veyseh-etal-2022-transfer
+ 10.18653/v1/2022.findings-acl.128
Learning Reasoning Patterns for Relational Triple Extraction with Mutual Generation of Text and Graph
@@ -1833,6 +1961,7 @@
Relational triple extraction is a critical task for constructing knowledge graphs. Existing methods focused on learning text patterns from explicit relational mentions. However, they usually suffered from ignoring relational reasoning patterns, thus failed to extract the implicitly implied triples. Fortunately, the graph structure of a sentence’s relational triples can help find multi-hop reasoning paths. Moreover, the type inference logic through the paths can be captured with the sentence’s supplementary relational expressions that represent the real-world conceptual meanings of the paths’ composite relations. In this paper, we propose a unified framework to learn the relational reasoning patterns for this task. To identify multi-hop reasoning paths, we construct a relational graph from the sentence (text-to-graph generation) and apply multi-layer graph convolutions to it. To capture the relation type inference logic of the paths, we propose to understand the unlabeled conceptual expressions by reconstructing the sentence from the relational graph (graph-to-text generation) in a self-supervised manner. Experimental results on several benchmark datasets demonstrate the effectiveness of our method.
2022.findings-acl.129
chen-etal-2022-learning
+ 10.18653/v1/2022.findings-acl.129
Document-Level Event Argument Extraction via Optimal Transport
@@ -1846,6 +1975,7 @@
2022.findings-acl.130
2022.findings-acl.130.software.zip
pouran-ben-veyseh-etal-2022-document
+ 10.18653/v1/2022.findings-acl.130
N-Shot Learning for Augmenting Task-Oriented Dialogue State Tracking
@@ -1858,6 +1988,7 @@
2022.findings-acl.131
aksu-etal-2022-n
MultiWOZ
+ 10.18653/v1/2022.findings-acl.131
Document-Level Relation Extraction with Adaptive Focal Loss and Knowledge Distillation
@@ -1872,6 +2003,7 @@
tan-etal-2022-document
tonytan48/kd-docre
DocRED
+ 10.18653/v1/2022.findings-acl.132
Calibration of Machine Reading Systems at Scale
@@ -1884,6 +2016,7 @@
2022.findings-acl.133
dhuliawala-etal-2022-calibration
Natural Questions
+ 10.18653/v1/2022.findings-acl.133
Towards Adversarially Robust Text Classifiers by Learning to Reweight Clean Examples
@@ -1901,6 +2034,7 @@
xu-etal-2022-towards
AG News
SST
+ 10.18653/v1/2022.findings-acl.134
Morphosyntactic Tagging with Pre-trained Language Models for Arabic and its Dialects
@@ -1912,6 +2046,7 @@
2022.findings-acl.135
inoue-etal-2022-morphosyntactic
camel-lab/camelbert_morphosyntactic_tagger
+ 10.18653/v1/2022.findings-acl.135
How Pre-trained Language Models Capture Factual Knowledge? A Causal-Inspired Analysis
@@ -1929,6 +2064,7 @@
2022.findings-acl.136
li-etal-2022-pre
LAMA
+ 10.18653/v1/2022.findings-acl.136
Metadata Shaping: A Simple Approach for Knowledge-Enhanced Language Models
@@ -1944,6 +2080,7 @@
FewRel
Open Entity
TACRED
+ 10.18653/v1/2022.findings-acl.137
Enhancing Natural Language Representation with Large-Scale Out-of-Domain Commonsense
@@ -1959,6 +2096,7 @@
QNLI
WSC
WinoGrande
+ 10.18653/v1/2022.findings-acl.138
Weighted self Distillation for Chinese word segmentation
@@ -1972,6 +2110,7 @@
2022.findings-acl.139.software.zip
he-etal-2022-weighted
anzi20/weidc
+ 10.18653/v1/2022.findings-acl.139
Sibylvariant Transformations for Robust Text Classification
@@ -1987,6 +2126,7 @@
AG News
IMDb Movie Reviews
SST
+ 10.18653/v1/2022.findings-acl.140
DaLC: Domain Adaptation Learning Curve Prediction for Neural Machine Translation
@@ -1999,6 +2139,7 @@
Domain Adaptation (DA) of Neural Machine Translation (NMT) model often relies on a pre-trained general NMT model which is adapted to the new domain on a sample of in-domain parallel data. Without parallel data, there is no way to estimate the potential benefit of DA, nor the amount of parallel samples it would require. It is however a desirable functionality that could help MT practitioners to make an informed decision before investing resources in dataset creation. We propose a Domain adaptation Learning Curve prediction (DaLC) model that predicts prospective DA performance based on in-domain monolingual samples in the source language. Our model relies on the NMT encoder representations combined with various instance and corpus-level features. We demonstrate that instance-level is better able to distinguish between different domains compared to corpus-level frameworks proposed in previous studies Finally, we perform in-depth analyses of the results highlighting the limitations of our approach, and provide directions for future research.
2022.findings-acl.141
park-etal-2022-dalc
+ 10.18653/v1/2022.findings-acl.141
Hey AI, Can You Solve Complex Tasks by Talking to Agents?
@@ -2013,6 +2154,7 @@
allenai/commaqa
DROP
MathQA
+ 10.18653/v1/2022.findings-acl.142
Modality-specific Learning Rates for Effective Multimodal Additive Late-fusion
@@ -2023,6 +2165,7 @@
2022.findings-acl.143
yao-mihalcea-2022-modality
MELD
+ 10.18653/v1/2022.findings-acl.143
BiSyn-GAT+: Bi-Syntax Aware Graph Attention Network for Aspect-based Sentiment Analysis
@@ -2037,6 +2180,7 @@
liang-etal-2022-bisyn
CCIIPLab/BiSyn_GAT_plus
MAMS
+ 10.18653/v1/2022.findings-acl.144
IndicBART: A Pre-trained Model for Indic Natural Language Generation
@@ -2054,6 +2198,7 @@
FLoRes
IndicCorp
Samanantar
+ 10.18653/v1/2022.findings-acl.145
Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models
@@ -2073,6 +2218,7 @@
ReQA
SentEval
SuperGLUE
+ 10.18653/v1/2022.findings-acl.146
Improving Relation Extraction through Syntax-induced Pre-training with Dependency Masking
@@ -2088,6 +2234,7 @@
Updated code link in footnote.
Penn Treebank
SemEval-2010 Task 8
+ 10.18653/v1/2022.findings-acl.147
Striking a Balance: Alleviating Inconsistency in Pre-trained Models for Symmetric Classification Tasks
@@ -2103,6 +2250,7 @@
PAWS
QNLI
SST
+ 10.18653/v1/2022.findings-acl.148
Diversifying Content Generation for Commonsense Reasoning with Mixture of Knowledge Graph Experts
@@ -2117,6 +2265,7 @@
2022.findings-acl.149
yu-etal-2022-diversifying
DM2-ND/MoKGE
+ 10.18653/v1/2022.findings-acl.149
Dict-BERT: Enhancing Language Model Pre-training with Dictionary
@@ -2135,6 +2284,7 @@
GLUE
QNLI
WNLaMPro
+ 10.18653/v1/2022.findings-acl.150
A Feasibility Study of Answer-Unaware Question Generation for Education
@@ -2151,6 +2301,7 @@
2022.findings-acl.151
dugan-etal-2022-feasibility
liamdugan/summary-qg
+ 10.18653/v1/2022.findings-acl.151
Relevant CommonSense Subgraphs for “What if...” Procedural Reasoning
@@ -2162,6 +2313,7 @@
zheng-kordjamshidi-2022-relevant
ConceptNet
WIQA
+ 10.18653/v1/2022.findings-acl.152
Combining Feature and Instance Attribution to Detect Artifacts
@@ -2176,6 +2328,7 @@
BoolQ
IMDb Movie Reviews
SuperGLUE
+ 10.18653/v1/2022.findings-acl.153
Leveraging Expert Guided Adversarial Augmentation For Improving Generalization in Named Entity Recognition
@@ -2191,6 +2344,7 @@
reich-etal-2022-leveraging
gt-salt/guided-adversarial-augmentation
CoNLL-2003
+ 10.18653/v1/2022.findings-acl.154
Label Semantics for Few Shot Named Entity Recognition
@@ -2208,6 +2362,7 @@
CoNLL-2003
NCBI Disease
WNUT 2017
+ 10.18653/v1/2022.findings-acl.155
Detection, Disambiguation, Re-ranking: Autoregressive Entity Linking as a Multi-Task Problem
@@ -2223,6 +2378,7 @@
mrini-etal-2022-detection
AIDA CoNLL-YAGO
COMETA
+ 10.18653/v1/2022.findings-acl.156
VISITRON: Visual Semantics-Aligned Interactively Trained Object-Navigator
@@ -2240,6 +2396,7 @@
alexa/visitron
Matterport3D
RxR
+ 10.18653/v1/2022.findings-acl.157
Investigating Selective Prediction Approaches Across Several Tasks in IID, OOD, and Adversarial Settings
@@ -2252,6 +2409,7 @@
2022.findings-acl.158.software.zip
varshney-etal-2022-investigating
SNLI
+ 10.18653/v1/2022.findings-acl.158
Unsupervised Natural Language Inference Using PHL Triplet Generation
@@ -2269,6 +2427,7 @@
ConceptNet
MultiNLI
SNLI
+ 10.18653/v1/2022.findings-acl.159
Data Augmentation and Learned Layer Aggregation for Improved Multilingual Language Understanding in Dialogue
@@ -2281,6 +2440,7 @@
razumovskaia-etal-2022-data
CC100
xSID
+ 10.18653/v1/2022.findings-acl.160
Ranking-Constrained Learning with Rationales for Text Classification
@@ -2291,6 +2451,7 @@
We propose a novel approach that jointly utilizes the labels and elicited rationales for text classification to speed up the training of deep learning models with limited training data. We define and optimize a ranking-constrained loss function that combines cross-entropy loss with ranking losses as rationale constraints. We evaluate our proposed rationale-augmented learning approach on three human-annotated datasets, and show that our approach provides significant improvements over classification approaches that do not utilize rationales as well as other state-of-the-art rationale-augmented baselines.
2022.findings-acl.161
wang-etal-2022-ranking
+ 10.18653/v1/2022.findings-acl.161
CaM-Gen: Causally Aware Metric-Guided Text Generation
@@ -2304,6 +2465,7 @@
Content is created for a well-defined purpose, often described by a metric or signal represented in the form of structured information. The relationship between the goal (metrics) of target content and the content itself is non-trivial. While large-scale language models show promising text generation capabilities, guiding the generated text with external metrics is challenging.These metrics and content tend to have inherent relationships and not all of them may be of consequence. We introduce CaM-Gen: Causally aware Generative Networks guided by user-defined target metrics incorporating the causal relationships between the metric and content features. We leverage causal inference techniques to identify causally significant aspects of a text that lead to the target metric and then explicitly guide generative models towards these by a feedback mechanism. We propose this mechanism for variational autoencoder and Transformer-based generative models. The proposed models beat baselines in terms of the target metric control while maintaining fluency and language quality of the generated text. To the best of our knowledge, this is one of the early attempts at controlled generation incorporating a metric guide using causal inference.
2022.findings-acl.162
goyal-etal-2022-cam
+ 10.18653/v1/2022.findings-acl.162
Training Dynamics for Text Summarization Models
@@ -2315,6 +2477,7 @@
Pre-trained language models (e.g. BART) have shown impressive results when fine-tuned on large summarization datasets. However, little is understood about this fine-tuning process, including what knowledge is retained from pre-training time or how content selection and generation strategies are learnt across iterations. In this work, we analyze the training dynamics for generation models, focusing on summarization. Across different datasets (CNN/DM, XSum, MediaSum) and summary properties, such as abstractiveness and hallucination, we study what the model learns at different stages of its fine-tuning process. We find that a propensity to copy the input is learned early in the training process consistently across all datasets studied. On the other hand, factual errors, such as hallucination of unsupported facts, are learnt in the later stages, though this behavior is more varied across domains. Based on these observations, we explore complementary approaches for modifying training: first, disregarding high-loss tokens that are challenging to learn and second, disregarding low-loss tokens that are learnt very quickly in the latter stages of the training process. We show that these simple training modifications allow us to configure our model to achieve different goals, such as improving factuality or improving abstractiveness.
2022.findings-acl.163
goyal-etal-2022-training
+ 10.18653/v1/2022.findings-acl.163
Richer Countries and Richer Representations
@@ -2326,6 +2489,7 @@
2022.findings-acl.164
zhou-etal-2022-richer
katezhou/country_distortions
+ 10.18653/v1/2022.findings-acl.164
BBQ: A hand-built bias benchmark for question answering
@@ -2344,6 +2508,7 @@
nyu-mll/bbq
BBQ
RACE
+ 10.18653/v1/2022.findings-acl.165
Zero-shot Learning for Grapheme to Phoneme Conversion with Language Ensemble
@@ -2357,6 +2522,7 @@
2022.findings-acl.166
li-etal-2022-zero
xinjli/transphone
+ 10.18653/v1/2022.findings-acl.166
Dim Wihl Gat Tun: The Case for Linguistic Expertise in NLP for Under-Documented Languages
@@ -2371,6 +2537,7 @@
Recent progress in NLP is driven by pretrained models leveraging massive datasets and has predominantly benefited the world’s political and economic superpowers. Technologically underserved languages are left behind because they lack such resources. Hundreds of underserved languages, nevertheless, have available data sources in the form of interlinear glossed text (IGT) from language documentation efforts. IGT remains underutilized in NLP work, perhaps because its annotations are only semi-structured and often language-specific. With this paper, we make the case that IGT data can be leveraged successfully provided that target language expertise is available. We specifically advocate for collaboration with documentary linguists. Our paper provides a roadmap for successful projects utilizing IGT data: (1) It is essential to define which NLP tasks can be accomplished with the given IGT data and how these will benefit the speech community. (2) Great care and target language expertise is required when converting the data into structured formats commonly employed in NLP. (3) Task-specific and user-specific evaluation can help to ascertain that the tools which are created benefit the target language speech community. We illustrate each step through a case study on developing a morphological reinflection system for the Tsimchianic language Gitksan.
2022.findings-acl.167
forbes-etal-2022-dim
+ 10.18653/v1/2022.findings-acl.167
Question Generation for Reading Comprehension Assessment by Modeling How and What to Ask
@@ -2385,6 +2552,7 @@
ghanem-etal-2022-question
CosmosQA
SQuAD
+ 10.18653/v1/2022.findings-acl.168
TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval
@@ -2400,6 +2568,7 @@
FIGER
KILT
Natural Questions
+ 10.18653/v1/2022.findings-acl.169
Hierarchical Recurrent Aggregative Generation for Few-Shot NLG
@@ -2411,6 +2580,7 @@
2022.findings-acl.170
zhou-etal-2022-hierarchical
SGD
+ 10.18653/v1/2022.findings-acl.170
Training Text-to-Text Transformers with Privacy Guarantees
@@ -2424,6 +2594,7 @@
C4
GLUE
QNLI
+ 10.18653/v1/2022.findings-acl.171
Revisiting Uncertainty-based Query Strategies for Active Learning with Transformers
@@ -2439,6 +2610,7 @@
MR
SUBJ
TREC-10
+ 10.18653/v1/2022.findings-acl.172
The impact of lexical and grammatical processing on generating code from natural language
@@ -2451,6 +2623,7 @@
codegenfact/BertranX
CoNaLa
Django
+ 10.18653/v1/2022.findings-acl.173
Seq2Path: Generating Sentiment Tuples as Paths of a Tree
@@ -2464,6 +2637,7 @@
2022.findings-acl.174
2022.findings-acl.174.software.zip
mao-etal-2022-seq2path
+ 10.18653/v1/2022.findings-acl.174
Mitigating the Inconsistency Between Word Saliency and Model Confidence with Pathological Contrastive Training
@@ -2479,6 +2653,7 @@
zhan-etal-2022-mitigating
AG News
IMDb Movie Reviews
+ 10.18653/v1/2022.findings-acl.175
Your fairness may vary: Pretrained language model fairness in toxic text classification
@@ -2492,6 +2667,7 @@
2022.findings-acl.176
baldini-etal-2022-fairness
HateXplain
+ 10.18653/v1/2022.findings-acl.176
ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning
@@ -2509,6 +2685,7 @@
FigureQA
LEAF-QA
PlotQA
+ 10.18653/v1/2022.findings-acl.177
A Novel Perspective to Look At Attention: Bi-level Attention-based Explainable Topic Modeling for News Classification
@@ -2520,6 +2697,7 @@
2022.findings-acl.178
liu-etal-2022-novel
MIND
+ 10.18653/v1/2022.findings-acl.178
Learn and Review: Enhancing Continual Named Entity Recognition via Reviewing Synthetic Samples
@@ -2535,6 +2713,7 @@
2022.findings-acl.179
xia-etal-2022-learn
CoNLL-2003
+ 10.18653/v1/2022.findings-acl.179
Phoneme transcription of endangered languages: an evaluation of recent ASR architectures in the single speaker scenario
@@ -2543,6 +2722,7 @@
Transcription is often reported as the bottleneck in endangered language documentation, requiring large efforts from scarce speakers and transcribers. In general, automatic speech recognition (ASR) can be accurate enough to accelerate transcription only if trained on large amounts of transcribed data. However, when a single speaker is involved, several studies have reported encouraging results for phonetic transcription even with small amounts of training. Here we expand this body of work on speaker-dependent transcription by comparing four ASR approaches, notably recent transformer and pretrained multilingual models, on a common dataset of 11 languages. To automate data preparation, training and evaluation steps, we also developed a phoneme recognition setup which handles morphologically complex languages and writing systems for which no pronunciation dictionary exists.We find that fine-tuning a multilingual pretrained model yields an average phoneme error rate (PER) of 15% for 6 languages with 99 minutes or less of transcribed data for training. For the 5 languages with between 100 and 192 minutes of training, we achieved a PER of 8.4% or less. These results on a number of varied languages suggest that ASR can now significantly reduce transcription efforts in the speaker-dependent situation common in endangered language work.
2022.findings-acl.180
boulianne-2022-phoneme
+ 10.18653/v1/2022.findings-acl.180
Does BERT really agree ? Fine-grained Analysis of Lexical Dependence on a Syntactic Task
@@ -2553,6 +2733,7 @@
Although transformer-based Neural Language Models demonstrate impressive performance on a variety of tasks, their generalization abilities are not well understood. They have been shown to perform strongly on subject-verb number agreement in a wide array of settings, suggesting that they learned to track syntactic dependencies during their training even without explicit supervision. In this paper, we examine the extent to which BERT is able to perform lexically-independent subject-verb number agreement (NA) on targeted syntactic templates. To do so, we disrupt the lexical patterns found in naturally occurring stimuli for each targeted structure in a novel fine-grained analysis of BERT’s behavior. Our results on nonce sentences suggest that the model generalizes well for simple templates, but fails to perform lexically-independent syntactic generalization when as little as one attractor is present.
2022.findings-acl.181
lasri-etal-2022-bert
+ 10.18653/v1/2022.findings-acl.181
Combining Static and Contextualised Multilingual Embeddings
@@ -2567,6 +2748,7 @@
kathyhaem/combining-static-contextual
TyDi QA
XQuAD
+ 10.18653/v1/2022.findings-acl.182
An Accurate Unsupervised Method for Joint Entity Alignment and Dangling Entity Detection
@@ -2578,6 +2760,7 @@
2022.findings-acl.183.software.zip
luo-yu-2022-accurate
luosx18/ued
+ 10.18653/v1/2022.findings-acl.183
Square One Bias in NLP: Towards a Multi-Dimensional Exploration of the Research Manifold
@@ -2588,6 +2771,7 @@
The prototypical NLP experiment trains a standard architecture on labeled English data and optimizes for accuracy, without accounting for other dimensions such as fairness, interpretability, or computational efficiency. We show through a manual classification of recent NLP research papers that this is indeed the case and refer to it as the square one experimental setup. We observe that NLP research often goes beyond the square one setup, e.g, focusing not only on accuracy, but also on fairness or interpretability, but typically only along a single dimension. Most work targeting multilinguality, for example, considers only accuracy; most work on fairness or interpretability considers only English; and so on. Such one-dimensionality of most research means we are only exploring a fraction of the NLP research search space. We provide historical and recent examples of how the square one bias has led researchers to draw false conclusions or make unwise choices, point to promising yet unexplored directions on the research manifold, and make practical recommendations to enable more multi-dimensional research. We open-source the results of our annotations to enable further analysis.
2022.findings-acl.184
ruder-etal-2022-square
+ 10.18653/v1/2022.findings-acl.184
Systematicity, Compositionality and Transitivity of Deep NLP Models: a Metamorphic Testing Perspective
@@ -2601,6 +2785,7 @@
2022.findings-acl.185
2022.findings-acl.185.software.zip
manino-etal-2022-systematicity
+ 10.18653/v1/2022.findings-acl.185
Improving Neural Political Statement Classification with Class Hierarchical Information
@@ -2616,6 +2801,7 @@
2022.findings-acl.186
2022.findings-acl.186.software.zip
dayanik-etal-2022-improving
+ 10.18653/v1/2022.findings-acl.186
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
@@ -2633,6 +2819,7 @@
GLUE
OK-VQA
nocaps
+ 10.18653/v1/2022.findings-acl.187
Co-VQA : Answering by Interactive Sub Question Sequence
@@ -2649,6 +2836,7 @@
Visual Genome
Visual Question Answering
Visual Question Answering v2.0
+ 10.18653/v1/2022.findings-acl.188
A Simple Hash-Based Early Exiting Approach For Language Understanding and Generation
@@ -2671,6 +2859,7 @@
MRPC
SNLI
SST
+ 10.18653/v1/2022.findings-acl.189
Auxiliary tasks to boost Biaffine Semantic Dependency Parsing
@@ -2681,6 +2870,7 @@
2022.findings-acl.190.software.tgz
candito-2022-auxiliary
mcandito/aux-tasks-biaffine-graph-parser-findingsacl22
+ 10.18653/v1/2022.findings-acl.190
Syntax-guided Contrastive Learning for Pre-trained Language Model
@@ -2699,6 +2889,7 @@
GLUE
Open Entity
QNLI
+ 10.18653/v1/2022.findings-acl.191
Improved Multi-label Classification under Temporal Concept Drift: Rethinking Group-Robust Algorithms in a Label-Wise Setting
@@ -2711,6 +2902,7 @@
chalkidis-sogaard-2022-improved
coastalcph/lw-robust
BioASQ
+ 10.18653/v1/2022.findings-acl.192
ASCM: An Answer Space Clustered Prompting Method without Answer Engineering
@@ -2726,6 +2918,7 @@
2022.findings-acl.193
wang-etal-2022-ascm
miaomiao1215/ascm
+ 10.18653/v1/2022.findings-acl.193
Why don’t people use character-level machine translation?
@@ -2737,6 +2930,7 @@
2022.findings-acl.194
2022.findings-acl.194.software.tgz
libovicky-etal-2022-dont
+ 10.18653/v1/2022.findings-acl.194
Seeking Patterns, Not just Memorizing Procedures: Contrastive Learning for Solving Math Word Problems
@@ -2754,6 +2948,7 @@
zwx980624/mwp-cl
Math23K
MathQA
+ 10.18653/v1/2022.findings-acl.195
xGQA: Cross-Lingual Visual Question Answering
@@ -2772,6 +2967,7 @@
GQA
IGLUE
MultiSubs
+ 10.18653/v1/2022.findings-acl.196
Automatic Speech Recognition and Query By Example for Creole Languages Documentation
@@ -2784,6 +2980,7 @@
2022.findings-acl.197
macaire-etal-2022-automatic
macairececile/asr-qbe-creole
+ 10.18653/v1/2022.findings-acl.197
MReD: A Meta-Review Dataset for Structure-Controllable Text Generation
@@ -2799,6 +2996,7 @@
shen-etal-2022-mred
shen-chenhui/mred
CNN/Daily Mail
+ 10.18653/v1/2022.findings-acl.198
Single Model Ensemble for Subword Regularized Models in Low-Resource Machine Translation
@@ -2809,6 +3007,7 @@
Subword regularizations use multiple subword segmentations during training to improve the robustness of neural machine translation models.In previous subword regularizations, we use multiple segmentations in the training process but use only one segmentation in the inference.In this study, we propose an inference strategy to address this discrepancy.The proposed strategy approximates the marginalized likelihood by using multiple segmentations including the most plausible segmentation and several sampled segmentations.Because the proposed strategy aggregates predictions from several segmentations, we can regard it as a single model ensemble that does not require any additional cost for training.Experimental results show that the proposed strategy improves the performance of models trained with subword regularization in low-resource machine translation tasks.
2022.findings-acl.199
takase-etal-2022-single
+ 10.18653/v1/2022.findings-acl.199
Detecting Various Types of Noise for Neural Machine Translation
@@ -2820,6 +3019,7 @@
The filtering and/or selection of training data is one of the core aspects to be considered when building a strong machine translation system.In their influential work, Khayrallah and Koehn (2018) investigated the impact of different types of noise on the performance of machine translation systems.In the same year the WMT introduced a shared task on parallel corpus filtering, which went on to be repeated in the following years, and resulted in many different filtering approaches being proposed.In this work we aim to combine the recent achievements in data filtering with the original analysis of Khayrallah and Koehn (2018) and investigate whether state-of-the-art filtering systems are capable of removing all the suggested noise types.We observe that most of these types of noise can be detected with an accuracy of over 90% by modern filtering systems when operating in a well studied high resource setting.However, we also find that when confronted with more refined noise categories or when working with a less common language pair, the performance of the filtering systems is far from optimal, showing that there is still room for improvement in this area of research.
2022.findings-acl.200
herold-etal-2022-detecting
+ 10.18653/v1/2022.findings-acl.200
DU-VLG: Unifying Vision-and-Language Generation via Dual Sequence-to-Sequence Pre-training
@@ -2833,6 +3033,7 @@
2022.findings-acl.201
huang-etal-2022-du
COCO
+ 10.18653/v1/2022.findings-acl.201
HiCLRE: A Hierarchical Contrastive Learning Framework for Distantly Supervised Relation Extraction
@@ -2846,6 +3047,7 @@
2022.findings-acl.202
li-etal-2022-hiclre
matnlp/hiclre
+ 10.18653/v1/2022.findings-acl.202
Prompt-Driven Neural Machine Translation
@@ -2858,6 +3060,7 @@
2022.findings-acl.203
li-etal-2022-prompt
yafuly/promptnmt
+ 10.18653/v1/2022.findings-acl.203
On Controlling Fallback Responses for Grounded Dialogue Generation
@@ -2870,6 +3073,7 @@
2022.findings-acl.204
2022.findings-acl.204.software.zip
lu-etal-2022-controlling
+ 10.18653/v1/2022.findings-acl.204
CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions
@@ -2891,6 +3095,7 @@
PHYRE
TVQA
TVQA+
+ 10.18653/v1/2022.findings-acl.205
A Graph Enhanced BERT Model for Event Prediction
@@ -2905,6 +3110,7 @@
2022.findings-acl.206.software.zip
du-etal-2022-graph
ROCStories
+ 10.18653/v1/2022.findings-acl.206
Long Time No See! Open-Domain Conversation with Long-Term Persona Memory
@@ -2921,6 +3127,7 @@
xu-etal-2022-long
PaddlePaddle/Research
DuLeMon
+ 10.18653/v1/2022.findings-acl.207
Lacking the Embedding of a Word? Look it up into a Traditional Dictionary
@@ -2935,6 +3142,7 @@
2022.findings-acl.208
2022.findings-acl.208.software.zip
ruzzetti-etal-2022-lacking
+ 10.18653/v1/2022.findings-acl.208
MTRec: Multi-Task Learning over BERT for News Recommendation
@@ -2949,6 +3157,7 @@
2022.findings-acl.209
bi-etal-2022-mtrec
MIND
+ 10.18653/v1/2022.findings-acl.209
Cross-domain Named Entity Recognition via Graph Matching
@@ -2961,6 +3170,7 @@
2022.findings-acl.210.software.zip
zheng-etal-2022-cross
CrossNER
+ 10.18653/v1/2022.findings-acl.210
Assessing Multilingual Fairness in Pre-trained Multimodal Representations
@@ -2973,6 +3183,7 @@
2022.findings-acl.211.software.tgz
wang-etal-2022-assessing
FairFace
+ 10.18653/v1/2022.findings-acl.211
More Than Words: Collocation Retokenization for Latent Dirichlet Allocation Models
@@ -2983,6 +3194,7 @@
Traditionally, Latent Dirichlet Allocation (LDA) ingests words in a collection of documents to discover their latent topics using word-document co-occurrences. Previous studies show that representing bigrams collocations in the input can improve topic coherence in English. However, it is unclear how to achieve the best results for languages without marked word boundaries such as Chinese and Thai. Here, we explore the use of retokenization based on chi-squared measures, t-statistics, and raw frequency to merge frequent token ngrams into collocations when preparing input to the LDA model. Based on the goodness of fit and the coherence metric, we show that topics trained with merged tokens result in topic keys that are clearer, more coherent, and more effective at distinguishing topics than those of unmerged models.
2022.findings-acl.212
cheevaprawatdomrong-etal-2022-words
+ 10.18653/v1/2022.findings-acl.212
Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness
@@ -3001,6 +3213,7 @@
Natural Questions
SNLI
SVHN
+ 10.18653/v1/2022.findings-acl.213
ASSIST: Towards Label Noise-Robust Dialogue State Tracking
@@ -3015,6 +3228,7 @@
smartyfh/dst-assist
MultiWOZ
SGD
+ 10.18653/v1/2022.findings-acl.214
Graph Refinement for Coreference Resolution
@@ -3024,6 +3238,7 @@
The state-of-the-art models for coreference resolution are based on independent mention pair-wise decisions. We propose a modelling approach that learns coreference at the document-level and takes global decisions. For this purpose, we model coreference links in a graph structure where the nodes are tokens in the text, and the edges represent the relationship between them. Our model predicts the graph in a non-autoregressive manner, then iteratively refines it based on previous predictions, allowing global dependencies between decisions. The experimental results show improvements over various baselines, reinforcing the hypothesis that document-level information improves conference resolution.
2022.findings-acl.215
miculicich-henderson-2022-graph
+ 10.18653/v1/2022.findings-acl.215
ECO v1: Towards Event-Centric Opinion Mining
@@ -3040,6 +3255,7 @@
2022.findings-acl.216
2022.findings-acl.216.software.zip
xu-etal-2022-eco
+ 10.18653/v1/2022.findings-acl.216
Deep Reinforcement Learning for Entity Alignment
@@ -3053,6 +3269,7 @@
2022.findings-acl.217.software.zip
guo-etal-2022-deep
guolingbing/rlea
+ 10.18653/v1/2022.findings-acl.217
Breaking Down Multilingual Machine Translation
@@ -3064,6 +3281,7 @@
While multilingual training is now an essential ingredient in machine translation (MT) systems, recent work has demonstrated that it has different effects in different multilingual settings, such as many-to-one, one-to-many, and many-to-many learning. These training settings expose the encoder and the decoder in a machine translation model with different data distributions. In this paper, we examine how different varieties of multilingual training contribute to learning these two components of the MT model. Specifically, we compare bilingual models with encoders and/or decoders initialized by multilingual training. We show that multilingual training is beneficial to encoders in general, while it only benefits decoders for low-resource languages (LRLs). We further find the important attention heads for each language pair and compare their correlations during inference. Our analysis sheds light on how multilingual translation models work and also enables us to propose methods to improve performance by training with highly related languages. Our many-to-one models for high-resource languages and one-to-many models for LRL outperform the best results reported by Aharoni et al. (2019).
2022.findings-acl.218
chiang-etal-2022-breaking
+ 10.18653/v1/2022.findings-acl.218
Mitigating Contradictions in Dialogue Based on Contrastive Learning
@@ -3076,6 +3294,7 @@
2022.findings-acl.219
2022.findings-acl.219.software.zip
li-etal-2022-mitigating
+ 10.18653/v1/2022.findings-acl.219
ELLE: Efficient Lifelong Pre-training for Emerging Data
@@ -3092,6 +3311,7 @@
2022.findings-acl.220.software.zip
qin-etal-2022-elle
thunlp/elle
+ 10.18653/v1/2022.findings-acl.220
EnCBP: A New Benchmark Dataset for Finer-Grained Cultural Background Prediction in English
@@ -3109,6 +3329,7 @@
GoEmotions
QNLI
SST
+ 10.18653/v1/2022.findings-acl.221
Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models
@@ -3126,6 +3347,7 @@
ucinlp/null-prompts
GLUE
QNLI
+ 10.18653/v1/2022.findings-acl.222
uFACT: Unfaithful Alien-Corpora Training for Semantically Consistent Data-to-Text Generation
@@ -3137,6 +3359,7 @@
2022.findings-acl.223
anders-etal-2022-ufact
ViGGO
+ 10.18653/v1/2022.findings-acl.223
Good Night at 4 pm?! Time Expressions in Different Cultures
@@ -3146,6 +3369,7 @@
2022.findings-acl.224
shwartz-2022-good
vered1986/time_expressions
+ 10.18653/v1/2022.findings-acl.224
Extracting Person Names from User Generated Text: Named-Entity Recognition for Combating Human Trafficking
@@ -3158,6 +3382,7 @@
2022.findings-acl.225
li-etal-2022-extracting
WNUT 2017
+ 10.18653/v1/2022.findings-acl.225
OneAligner: Zero-shot Cross-lingual Transfer with One Rich-Resource Language Pair for Low-Resource Sentence Retrieval
@@ -3171,6 +3396,7 @@
niu-etal-2022-onealigner
CC100
WikiMatrix
+ 10.18653/v1/2022.findings-acl.226
Suum Cuique: Studying Bias in Taboo Detection with a Community Perspective
@@ -3184,6 +3410,7 @@
khalid-etal-2022-suum
jonrusert/suumcuique
OLID
+ 10.18653/v1/2022.findings-acl.227
Modeling Intensification for Sign Language Generation: A Computational Approach
@@ -3199,6 +3426,7 @@
inan-etal-2022-modeling
merterm/modeling-intensification-for-slg
PHOENIX14T
+ 10.18653/v1/2022.findings-acl.228
Controllable Natural Language Generation with Contrastive Prefixes
@@ -3212,6 +3440,7 @@
2022.findings-acl.229
qian-etal-2022-controllable
AG News
+ 10.18653/v1/2022.findings-acl.229
Revisiting the Effects of Leakage on Dependency Parsing
@@ -3223,6 +3452,7 @@
2022.findings-acl.230
krasner-etal-2022-revisiting
miriamwanner/reu-nlp-project
+ 10.18653/v1/2022.findings-acl.230
Learning to Describe Solutions for Bug Reports Based on Developer Discussions
@@ -3235,6 +3465,7 @@
2022.findings-acl.231
panthaplackel-etal-2022-learning
panthap2/describing-bug-report-solutions
+ 10.18653/v1/2022.findings-acl.231
Perturbations in the Wild: Leveraging Human-Written Text Perturbations for Realistic Adversarial Attack and Defense
@@ -3248,6 +3479,7 @@
2022.findings-acl.232
le-etal-2022-perturbations
lethaiq/perturbations-in-the-wild
+ 10.18653/v1/2022.findings-acl.232
Improving Chinese Grammatical Error Detection via Data augmentation by Conditional Error Generation
@@ -3261,6 +3493,7 @@
Chinese Grammatical Error Detection(CGED) aims at detecting grammatical errors in Chinese texts. One of the main challenges for CGED is the lack of annotated data. To alleviate this problem, previous studies proposed various methods to automatically generate more training samples, which can be roughly categorized into rule-based methods and model-based methods. The rule-based methods construct erroneous sentences by directly introducing noises into original sentences. However, the introduced noises are usually context-independent, which are quite different from those made by humans. The model-based methods utilize generative models to imitate human errors. The generative model may bring too many changes to the original sentences and generate semantically ambiguous sentences, so it is difficult to detect grammatical errors in these generated sentences. In addition, generated sentences may be error-free and thus become noisy data. To handle these problems, we propose CNEG, a novel Conditional Non-Autoregressive Error Generation model for generating Chinese grammatical errors. Specifically, in order to generate a context-dependent error, we first mask a span in a correct text, then predict an erroneous span conditioned on both the masked text and the correct span. Furthermore, we filter out error-free spans by measuring their perplexities in the original sentences. Experimental results show that our proposed method achieves better performance than all compared data augmentation methods on the CGED-2018 and CGED-2020 benchmarks.
2022.findings-acl.233
yue-etal-2022-improving
+ 10.18653/v1/2022.findings-acl.233
Modular and Parameter-Efficient Multimodal Fusion with Prompting
@@ -3272,6 +3505,7 @@
2022.findings-acl.234
2022.findings-acl.234.software.zip
liang-etal-2022-modular
+ 10.18653/v1/2022.findings-acl.234
Synchronous Refinement for Neural Machine Translation
@@ -3284,6 +3518,7 @@
Machine translation typically adopts an encoder-to-decoder framework, in which the decoder generates the target sentence word-by-word in an auto-regressive manner. However, the auto-regressive decoder faces a deep-rooted one-pass issue whereby each generated word is considered as one element of the final output regardless of whether it is correct or not. These generated wrong words further constitute the target historical context to affect the generation of subsequent target words. This paper proposes a novel synchronous refinement method to revise potential errors in the generated words by considering part of the target future context. Particularly, the proposed approach allows the auto-regressive decoder to refine the previously generated target words and generate the next target word synchronously. The experimental results on three widely-used machine translation tasks demonstrated the effectiveness of the proposed approach.
2022.findings-acl.235
chen-etal-2022-synchronous
+ 10.18653/v1/2022.findings-acl.235
HIE-SQL: History Information Enhanced Network for Context-Dependent Text-to-SQL Semantic Parsing
@@ -3297,6 +3532,7 @@
2022.findings-acl.236
zheng-etal-2022-hie
CoSQL
+ 10.18653/v1/2022.findings-acl.236
CRASpell: A Contextual Typo Robust Approach to Improve Chinese Spelling Correction
@@ -3313,6 +3549,7 @@
2022.findings-acl.237.software.zip
liu-etal-2022-craspell
liushulinle/craspell
+ 10.18653/v1/2022.findings-acl.237
Gaussian Multi-head Attention for Simultaneous Machine Translation
@@ -3323,6 +3560,7 @@
2022.findings-acl.238
zhang-feng-2022-gaussian
ictnlp/gma
+ 10.18653/v1/2022.findings-acl.238
Composing Structure-Aware Batches for Pairwise Sentence Classification
@@ -3336,6 +3574,7 @@
ukplab/acl2022-structure-batches
GLUE
QNLI
+ 10.18653/v1/2022.findings-acl.239
Factual Consistency of Multilingual Pretrained Language Models
@@ -3348,6 +3587,7 @@
fierro-sogaard-2022-factual
coastalcph/mpararel
LAMA
+ 10.18653/v1/2022.findings-acl.240
Selecting Stickers in Open-Domain Dialogue through Multitask Learning
@@ -3362,6 +3602,7 @@
2022.findings-acl.241.software.zip
zhang-etal-2022-selecting
nonstopfor/sticker-selection
+ 10.18653/v1/2022.findings-acl.241
ZiNet: Linking Chinese Characters Spanning Three Thousand Years
@@ -3377,6 +3618,7 @@
2022.findings-acl.242.software.zip
chi-etal-2022-zinet
yangchijlu/ancientchinesecharsim
+ 10.18653/v1/2022.findings-acl.242
How Can Cross-lingual Knowledge Contribute Better to Fine-Grained Entity Typing?
@@ -3392,6 +3634,7 @@
2022.findings-acl.243
jin-etal-2022-cross
FIGER
+ 10.18653/v1/2022.findings-acl.243
AMR-DA: Data Augmentation by Abstract Meaning Representation
@@ -3403,6 +3646,7 @@
2022.findings-acl.244
shou-etal-2022-amr
zzshou/amr-data-augmentation
+ 10.18653/v1/2022.findings-acl.244
Using Pre-Trained Language Models for Producing Counter Narratives Against Hate Speech: a Comparative Study
@@ -3414,6 +3658,7 @@
In this work, we present an extensive study on the use of pre-trained language models for the task of automatic Counter Narrative (CN) generation to fight online hate speech in English. We first present a comparative study to determine whether there is a particular Language Model (or class of LMs) and a particular decoding mechanism that are the most appropriate to generate CNs. Findings show that autoregressive models combined with stochastic decodings are the most promising. We then investigate how an LM performs in generating a CN with regard to an unseen target of hate. We find out that a key element for successful ‘out of target’ experiments is not an overall similarity with the training data but the presence of a specific subset of training data, i. e. a target that shares some commonalities with the test target that can be defined a-priori. We finally introduce the idea of a pipeline based on the addition of an automatic post-editing step to refine generated CNs.
2022.findings-acl.245
tekiroglu-etal-2022-using
+ 10.18653/v1/2022.findings-acl.245
Improving Robustness of Language Models from a Geometry-aware Perspective
@@ -3429,6 +3674,7 @@
zhu-etal-2022-improving
IMDb Movie Reviews
SST
+ 10.18653/v1/2022.findings-acl.246
Task-guided Disentangled Tuning for Pretrained Language Models
@@ -3444,6 +3690,7 @@
lemon0830/tdt
CLUE
GLUE
+ 10.18653/v1/2022.findings-acl.247
Exploring the Impact of Negative Samples of Contrastive Learning: A Case Study of Sentence Embedding
@@ -3459,6 +3706,7 @@
2022.findings-acl.248
cao-etal-2022-exploring
xbdxwyh/mocose
+ 10.18653/v1/2022.findings-acl.248
The Inefficiency of Language Models in Scholarly Retrieval: An Experimental Walk-through
@@ -3469,6 +3717,7 @@
2022.findings-acl.249
singh-singh-2022-inefficiency
shruti-singh/scilm_exp
+ 10.18653/v1/2022.findings-acl.249
Fusing Heterogeneous Factors with Triaffine Mechanism for Nested Named Entity Recognition
@@ -3485,6 +3734,7 @@
ACE 2004
ACE 2005
GENIA
+ 10.18653/v1/2022.findings-acl.250
UNIMO-2: End-to-End Unified Vision-Language Grounded Learning
@@ -3505,6 +3755,7 @@
SNLI-VE
SST
Visual Genome
+ 10.18653/v1/2022.findings-acl.251
The Past Mistake is the Future Wisdom: Error-driven Contrastive Probability Optimization for Chinese Spell Checking
@@ -3523,6 +3774,7 @@
2022.findings-acl.252
2022.findings-acl.252.software.zip
li-etal-2022-past
+ 10.18653/v1/2022.findings-acl.252
XFUND: A Benchmark Dataset for Multilingual Visually Rich Form Understanding
@@ -3540,6 +3792,7 @@
2022.findings-acl.253.software.zip
xu-etal-2022-xfund
FUNSD
+ 10.18653/v1/2022.findings-acl.253
Type-Driven Multi-Turn Corrections for Grammatical Error Correction
@@ -3558,6 +3811,7 @@
deeplearnxmu/tmtc
FCE
WI-LOCNESS
+ 10.18653/v1/2022.findings-acl.254
Leveraging Knowledge in Multilingual Commonsense Reasoning
@@ -3577,6 +3831,7 @@
ConceptNet
X-CSQA
XCOPA
+ 10.18653/v1/2022.findings-acl.255
Encoding and Fusing Semantic Connection and Linguistic Evidence for Implicit Discourse Relation Recognition
@@ -3589,6 +3844,7 @@
2022.findings-acl.256
xiang-etal-2022-encoding
hustminslab/manf
+ 10.18653/v1/2022.findings-acl.256
One Agent To Rule Them All: Towards Multi-agent Conversational AI
@@ -3607,6 +3863,7 @@
clarke-etal-2022-one
ChrisIsKing/black-box-multi-agent-integation
BBAI Dataset
+ 10.18653/v1/2022.findings-acl.257
Word-level Perturbation Considering Word Length and Compositional Subwords
@@ -3620,6 +3877,7 @@
2022.findings-acl.258
hiraoka-etal-2022-word
tathi/cwr
+ 10.18653/v1/2022.findings-acl.258
Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging
@@ -3635,6 +3893,7 @@
Jacob-Zhou/FeatureCRFAE
Penn Treebank
Universal Dependencies
+ 10.18653/v1/2022.findings-acl.259
Controlling the Focus of Pretrained Language Generation Models
@@ -3649,6 +3908,7 @@
question406/learningtofocus
CNN/Daily Mail
PERSONA-CHAT
+ 10.18653/v1/2022.findings-acl.260
Comparative Opinion Summarization via Collaborative Decoding
@@ -3661,6 +3921,7 @@
2022.findings-acl.261
iso-etal-2022-comparative
megagonlabs/cocosum
+ 10.18653/v1/2022.findings-acl.261
IsoScore: Measuring the Uniformity of Embedding Space Utilization
@@ -3674,6 +3935,7 @@
rudman-etal-2022-isoscore
bcbi-edu/p_eickhoff_isoscore
WikiText-2
+ 10.18653/v1/2022.findings-acl.262
A Natural Diet: Towards Improving Naturalness of Machine Translation Output
@@ -3686,6 +3948,7 @@
Machine translation (MT) evaluation often focuses on accuracy and fluency, without paying much attention to translation style. This means that, even when considered accurate and fluent, MT output can still sound less natural than high quality human translations or text originally written in the target language. Machine translation output notably exhibits lower lexical diversity, and employs constructs that mirror those in the source sentence. In this work we propose a method for training MT systems to achieve a more natural style, i.e. mirroring the style of text originally written in the target language. Our method tags parallel training data according to the naturalness of the target side by contrasting language models trained on natural and translated data. Tagging data allows us to put greater emphasis on target sentences originally written in the target language. Automatic metrics show that the resulting models achieve lexical richness on par with human translations, mimicking a style much closer to sentences originally written in the target language. Furthermore, we find that their output is preferred by human experts when compared to the baseline translations.
2022.findings-acl.263
freitag-etal-2022-natural
+ 10.18653/v1/2022.findings-acl.263
From Stance to Concern: Adaptation of Propositional Analysis to New Tasks and Domains
@@ -3700,6 +3963,7 @@
2022.findings-acl.264
mather-etal-2022-stance
ihmc/findings-of-acl-2022-concern-detection
+ 10.18653/v1/2022.findings-acl.264
CUE Vectors: Modular Training of Language Models Conditioned on Diverse Contextual Signals
@@ -3711,6 +3975,7 @@
We propose a framework to modularize the training of neural language models that use diverse forms of context by eliminating the need to jointly train context and within-sentence encoders. Our approach, contextual universal embeddings (CUE), trains LMs on one type of contextual data and adapts to novel context types. The model consists of a pretrained neural sentence LM, a BERT-based contextual encoder, and a masked transfomer decoder that estimates LM probabilities using sentence-internal and contextual evidence.When contextually annotated data is unavailable, our model learns to combine contextual and sentence-internal information using noisy oracle unigram embeddings as a proxy. Real context data can be introduced later and used to adapt a small number of parameters that map contextual data into the decoder’s embedding space. We validate the CUE framework on a NYTimes text corpus with multiple metadata types, for which the LM perplexity can be lowered from 36.6 to 27.4 by conditioning on context. Bootstrapping a contextual LM with only a subset of the metadata during training retains 85% of the achievable gain. Training the model initially with proxy context retains 67% of the perplexity gain after adapting to real context. Furthermore, we can swap one type of pretrained sentence LM for another without retraining the context encoders, by only adapting the decoder model. Overall, we obtain a modular framework that allows incremental, scalable training of context-enhanced LMs.
2022.findings-acl.265
novotney-etal-2022-cue
+ 10.18653/v1/2022.findings-acl.265
Cross-Lingual UMLS Named Entity Linking using UMLS Dictionary Fine-Tuning
@@ -3725,6 +3990,7 @@
rinagalperin/biomedical_nel
BC5CDR
MedMentions
+ 10.18653/v1/2022.findings-acl.266
Aligned Weight Regularizers for Pruning Pretrained Neural Networks
@@ -3735,6 +4001,7 @@
Pruning aims to reduce the number of parameters while maintaining performance close to the original network. This work proposes a novel self-distillation based pruning strategy, whereby the representational similarity between the pruned and unpruned versions of the same network is maximized. Unlike previous approaches that treat distillation and pruning separately, we use distillation to inform the pruning criteria, without requiring a separate student network as in knowledge distillation. We show that the proposed cross-correlation objective for self-distilled pruning implicitly encourages sparse solutions, naturally complementing magnitude-based pruning criteria. Experiments on the GLUE and XGLUE benchmarks show that self-distilled pruning increases mono- and cross-lingual language model performance. Self-distilled pruned models also outperform smaller Transformers with an equal number of parameters and are competitive against (6 times) larger distilled networks. We also observe that self-distillation (1) maximizes class separability, (2) increases the signal-to-noise ratio, and (3) converges faster after pruning steps, providing further insights into why self-distilled pruning improves generalization.
2022.findings-acl.267
o-neill-etal-2022-aligned
+ 10.18653/v1/2022.findings-acl.267
Consistent Representation Learning for Continual Relation Extraction
@@ -3749,6 +4016,7 @@
thuiar/CRL
FewRel
TACRED
+ 10.18653/v1/2022.findings-acl.268
Event Transition Planning for Open-ended Text Generation
@@ -3763,6 +4031,7 @@
2022.findings-acl.269
li-etal-2022-event
ATOMIC
+ 10.18653/v1/2022.findings-acl.269
Comprehensive Multi-Modal Interactions for Referring Image Segmentation
@@ -3776,6 +4045,7 @@
COCO
Google Refexp
RefCOCO
+ 10.18653/v1/2022.findings-acl.270
MetaWeighting: Learning to Weight Tasks in Multi-Task Learning
@@ -3789,6 +4059,7 @@
2022.findings-acl.271
2022.findings-acl.271.software.zip
mao-etal-2022-metaweighting
+ 10.18653/v1/2022.findings-acl.271
Improving Controllable Text Generation with Position-Aware Weighted Decoding
@@ -3804,6 +4075,7 @@
gu-etal-2022-improving
IMDb Movie Reviews
SST
+ 10.18653/v1/2022.findings-acl.272
Prompt Tuning for Discriminative Pre-trained Language Models
@@ -3825,6 +4097,7 @@
AG News
Quoref
SST
+ 10.18653/v1/2022.findings-acl.273
Two Birds with One Stone: Unified Model Learning for Both Recall and Ranking in News Recommendation
@@ -3837,6 +4110,7 @@
2022.findings-acl.274
wu-etal-2022-two
MIND
+ 10.18653/v1/2022.findings-acl.274
What does it take to bake a cake? The RecipeRef corpus and anaphora resolution in procedural text
@@ -3848,6 +4122,7 @@
2022.findings-acl.275
fang-etal-2022-take
biaoyanf/reciperef
+ 10.18653/v1/2022.findings-acl.275
MERIt: Meta-Path Guided Contrastive Learning for Logical Reasoning
@@ -3862,6 +4137,7 @@
sparkjiao/merit
LogiQA
ReClor
+ 10.18653/v1/2022.findings-acl.276
THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption
@@ -3884,6 +4160,7 @@
MRPC
QNLI
SST
+ 10.18653/v1/2022.findings-acl.277
HLDC: Hindi Legal Documents Corpus
@@ -3903,6 +4180,7 @@
2022.findings-acl.278.software.zip
kapoor-etal-2022-hldc
exploration-lab/hldc
+ 10.18653/v1/2022.findings-acl.278
Rethinking Document-level Neural Machine Translation
@@ -3918,6 +4196,7 @@
2022.findings-acl.279
sun-etal-2022-rethinking
sunzewei2715/Doc2Doc_NMT
+ 10.18653/v1/2022.findings-acl.279
Incremental Intent Detection for Medical Domain with Contrast Replay Networks
@@ -3930,6 +4209,7 @@
2022.findings-acl.280
bai-etal-2022-incremental
KUAKE-QIC
+ 10.18653/v1/2022.findings-acl.280
LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval
@@ -3950,6 +4230,7 @@
MS MARCO
Natural Questions
SciFact
+ 10.18653/v1/2022.findings-acl.281
Do Pre-trained Models Benefit Knowledge Graph Completion? A Reliable Evaluation and a Reasonable Approach
@@ -3967,6 +4248,7 @@
lv-etal-2022-pre
InferWiki
LAMA
+ 10.18653/v1/2022.findings-acl.282
EICO: Improving Few-Shot Text Classification via Explicit and Implicit Consistency Regularization
@@ -3978,6 +4260,7 @@
zhao-yao-2022-eico
MPQA Opinion Corpus
SST
+ 10.18653/v1/2022.findings-acl.283
Improving the Adversarial Robustness of NLP Models by Information Bottleneck
@@ -3994,6 +4277,7 @@
zhang-etal-2022-improving
IMDb Movie Reviews
SST
+ 10.18653/v1/2022.findings-acl.284
Incorporating Dynamic Semantics into Pre-Trained Language Model for Aspect-based Sentiment Analysis
@@ -4008,6 +4292,7 @@
Aspect-based sentiment analysis (ABSA) predicts sentiment polarity towards a specific aspect in the given sentence. While pre-trained language models such as BERT have achieved great success, incorporating dynamic semantic changes into ABSA remains challenging. To this end, in this paper, we propose to address this problem by Dynamic Re-weighting BERT (DR-BERT), a novel method designed to learn dynamic aspect-oriented semantics for ABSA. Specifically, we first take the Stack-BERT layers as a primary encoder to grasp the overall semantic of the sentence and then fine-tune it by incorporating a lightweight Dynamic Re-weighting Adapter (DRA). Note that the DRA can pay close attention to a small region of the sentences at each step and re-weigh the vitally important words for better aspect-aware sentiment understanding. Finally, experimental results on three benchmark datasets demonstrate the effectiveness and the rationality of our proposed model and provide good interpretable insights for future semantic modeling.
2022.findings-acl.285
zhang-etal-2022-incorporating
+ 10.18653/v1/2022.findings-acl.285
DARER: Dual-task Temporal Relational Recurrent Reasoning Network for Joint Dialog Sentiment Classification and Act Recognition
@@ -4019,6 +4304,7 @@
xing-tsang-2022-darer
xingbowen714/darer
DailyDialog
+ 10.18653/v1/2022.findings-acl.286
Divide and Conquer: Text Semantic Matching with Disentangled Keywords and Intents
@@ -4037,6 +4323,7 @@
rowitzou/dc-match
GLUE
MRPC
+ 10.18653/v1/2022.findings-acl.287
Modular Domain Adaptation
@@ -4050,6 +4337,7 @@
jkvc/modular-domain-adaptation
IMDb Movie Reviews
SST
+ 10.18653/v1/2022.findings-acl.288
Detection of Adversarial Examples in Text Classification: Benchmark and Baseline via Robust Density Estimation
@@ -4066,6 +4354,7 @@
AG News
IMDb Movie Reviews
SST
+ 10.18653/v1/2022.findings-acl.289
Platt-Bin: Efficient Posterior Calibrated Training for NLP Classifiers
@@ -4076,6 +4365,7 @@
2022.findings-acl.290
2022.findings-acl.290.software.zip
singh-goshtasbpour-2022-platt
+ 10.18653/v1/2022.findings-acl.290
Addressing Resource and Privacy Constraints in Semantic Parsing Through Data Augmentation
@@ -4091,6 +4381,7 @@
yang-etal-2022-addressing
ATIS
BREAK
+ 10.18653/v1/2022.findings-acl.291
Improving Candidate Retrieval with Entity Profile Generation for Wikidata Entity Linking
@@ -4102,6 +4393,7 @@
2022.findings-acl.292
lai-etal-2022-improving
laituan245/el-dockers
+ 10.18653/v1/2022.findings-acl.292
Local Structure Matters Most: Perturbation Study in NLU
@@ -4115,6 +4407,7 @@
2022.findings-acl.293.software.zip
clouatre-etal-2022-local
GLUE
+ 10.18653/v1/2022.findings-acl.293
Probing Factually Grounded Content Transfer with Factual Ablation
@@ -4126,6 +4419,7 @@
Despite recent success, large neural models often generate factually incorrect text. Compounding this is the lack of a standard automatic evaluation for factuality–it cannot be meaningfully improved if it cannot be measured. Grounded generation promises a path to solving both of these problems: models draw on a reliable external document (grounding) for factual information, simplifying the challenge of factuality. Measuring factuality is also simplified–to factual consistency, testing whether the generation agrees with the grounding, rather than all facts. Yet, without a standard automatic metric for factual consistency, factually grounded generation remains an open problem. We study this problem for content transfer, in which generations extend a prompt, using information from factual grounding. Particularly, this domain allows us to introduce the notion of factual ablation for automatically measuring factual consistency: this captures the intuition that the model should be less likely to produce an output given a less relevant grounding document. In practice, we measure this by presenting a model with two grounding documents, and the model should prefer to use the more factually relevant one. We contribute two evaluation sets to measure this. Applying our new evaluation, we propose multiple novel methods improving over strong baselines.
2022.findings-acl.294
west-etal-2022-probing
+ 10.18653/v1/2022.findings-acl.294
ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference
@@ -4146,6 +4440,7 @@
hui-etal-2022-ed2lm
MS MARCO
Natural Questions
+ 10.18653/v1/2022.findings-acl.295
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics
@@ -4155,6 +4450,7 @@
Question answering-based summarization evaluation metrics must automatically determine whether the QA model’s prediction is correct or not, a task known as answer verification. In this work, we benchmark the lexical answer verification methods which have been used by current QA-based metrics as well as two more sophisticated text comparison methods, BERTScore and LERC. We find that LERC out-performs the other methods in some settings while remaining statistically indistinguishable from lexical overlap in others. However, our experiments reveal that improved verification performance does not necessarily translate to overall QA-based metric quality: In some scenarios, using a worse verification method — or using none at all — has comparable performance to using the best verification method, a result that we attribute to properties of the datasets.
2022.findings-acl.296
deutsch-roth-2022-benchmarking
+ 10.18653/v1/2022.findings-acl.296
Prior Knowledge and Memory Enriched Transformer for Sign Language Translation
@@ -4167,6 +4463,7 @@
2022.findings-acl.297
jin-etal-2022-prior
PHOENIX14T
+ 10.18653/v1/2022.findings-acl.297
Discontinuous Constituency and BERT: A Case Study of Dutch
@@ -4177,6 +4474,7 @@
2022.findings-acl.298
2022.findings-acl.298.software.zip
kogkalidis-wijnholds-2022-discontinuous
+ 10.18653/v1/2022.findings-acl.298
Probing Multilingual Cognate Prediction Models
@@ -4186,6 +4484,7 @@
Character-based neural machine translation models have become the reference models for cognate prediction, a historical linguistics task. So far, all linguistic interpretations about latent information captured by such models have been based on external analysis (accuracy, raw results, errors). In this paper, we investigate what probing can tell us about both models and previous interpretations, and learn that though our models store linguistic and diachronic information, they do not achieve it in previously assumed ways.
2022.findings-acl.299
fourrier-sagot-2022-probing
+ 10.18653/v1/2022.findings-acl.299
A Neural Pairwise Ranking Model for Readability Assessment
@@ -4198,6 +4497,7 @@
lee-vajjala-2022-neural
jlee118/nprm
Newsela
+ 10.18653/v1/2022.findings-acl.300
First the Worst: Finding Better Gender Translations During Beam Search
@@ -4210,6 +4510,7 @@
2022.findings-acl.301.software.zip
saunders-etal-2022-first
dcsaunders/nmt-gender-rerank
+ 10.18653/v1/2022.findings-acl.301
Dialogue Summaries as Dialogue States (DS2), Template-Guided Summarization for Few-shot Dialogue State Tracking
@@ -4226,6 +4527,7 @@
jshin49/ds2
MultiWOZ
SAMSum Corpus
+ 10.18653/v1/2022.findings-acl.302
Unsupervised Preference-Aware Language Identification
@@ -4242,6 +4544,7 @@
2022.findings-acl.303.software.zip
ren-etal-2022-unsupervised
xzhren/preferenceawarelid
+ 10.18653/v1/2022.findings-acl.303
Using NLP to quantify the environmental cost and diversity benefits of in-person NLP conferences
@@ -4252,6 +4555,7 @@
2022.findings-acl.304
przybyla-shardlow-2022-using
piotrmp/nlp_geography
+ 10.18653/v1/2022.findings-acl.304
Interpretable Research Replication Prediction via Variational Contextual Consistency Sentence Masking
@@ -4265,6 +4569,7 @@
2022.findings-acl.305.software.zip
luo-etal-2022-interpretable
ECHR
+ 10.18653/v1/2022.findings-acl.305
Chinese Synesthesia Detection: New Dataset and Models
@@ -4276,6 +4581,7 @@
In this paper, we introduce a new task called synesthesia detection, which aims to extract the sensory word of a sentence, and to predict the original and synesthetic sensory modalities of the corresponding sensory word. Synesthesia refers to the description of perceptions in one sensory modality through concepts from other modalities. It involves not only a linguistic phenomenon, but also a cognitive phenomenon structuring human thought and action, which makes it become a bridge between figurative linguistic phenomenon and abstract cognition, and thus be helpful to understand the deep semantics. To address this, we construct a large-scale human-annotated Chinese synesthesia dataset, which contains 7,217 annotated sentences accompanied by 187 sensory words. Based on this dataset, we propose a family of strong and representative baseline models. Upon these baselines, we further propose a radical-based neural network model to identify the boundary of the sensory word, and to jointly detect the original and synesthetic sensory modalities for the word. Through extensive experiments, we observe that the importance of the proposed task and dataset can be verified by the statistics and progressive performances. In addition, our proposed model achieves state-of-the-art results on the synesthesia dataset.
2022.findings-acl.306
jiang-etal-2022-chinese
+ 10.18653/v1/2022.findings-acl.306
Rethinking Offensive Text Detection as a Multi-Hop Reasoning Problem
@@ -4288,6 +4594,7 @@
zhang-etal-2022-rethinking
qzx7/slight
OLID
+ 10.18653/v1/2022.findings-acl.307
On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark
@@ -4306,6 +4613,7 @@
2022.findings-acl.308.software.zip
sun-etal-2022-safety
thu-coai/diasafety
+ 10.18653/v1/2022.findings-acl.308
Word Segmentation by Separation Inference for East Asian Languages
@@ -4319,6 +4627,7 @@
2022.findings-acl.309
tong-etal-2022-word
um-nlper/spin-ws
+ 10.18653/v1/2022.findings-acl.309
Unsupervised Chinese Word Segmentation with BERT Oriented Probing and Transformation
@@ -4332,6 +4641,7 @@
2022.findings-acl.310.software.zip
li-etal-2022-unsupervised
liweitj47/bert_unsupervised_word_segmentation
+ 10.18653/v1/2022.findings-acl.310
E-KAR: A Benchmark for Rationalizing Natural Language Analogical Reasoning
@@ -4350,6 +4660,7 @@
2022.findings-acl.311
chen-etal-2022-e
E-KAR
+ 10.18653/v1/2022.findings-acl.311
Implicit Relation Linking for Question Answering over Knowledge Graph
@@ -4367,6 +4678,7 @@
zhao-etal-2022-implicit
DBpedia
SimpleQuestions
+ 10.18653/v1/2022.findings-acl.312
Attention Mechanism with Energy-Friendly Operations
@@ -4383,6 +4695,7 @@
2022.findings-acl.313
wan-etal-2022-attention
nlp2ct/e-att
+ 10.18653/v1/2022.findings-acl.313
Probing BERT’s priors with serial reproduction chains
@@ -4393,6 +4706,7 @@
Sampling is a promising bottom-up method for exposing what generative models have learned about language, but it remains unclear how to generate representative samples from popular masked language models (MLMs) like BERT. The MLM objective yields a dependency network with no guarantee of consistent conditional distributions, posing a problem for naive approaches. Drawing from theories of iterated learning in cognitive science, we explore the use of serial reproduction chains to sample from BERT’s priors. In particular, we observe that a unique and consistent estimator of the ground-truth joint distribution is given by a Generative Stochastic Network (GSN) sampler, which randomly selects which token to mask and reconstruct on each step. We show that the lexical and syntactic statistics of sentences from GSN chains closely match the ground-truth corpus distribution and perform better than other methods in a large corpus of naturalness judgments. Our findings establish a firmer theoretical foundation for bottom-up probing and highlight richer deviations from human priors.
2022.findings-acl.314
yamakoshi-etal-2022-probing
+ 10.18653/v1/2022.findings-acl.314
Interpreting the Robustness of Neural NLP Models to Textual Perturbations
@@ -4404,6 +4718,7 @@
Modern Natural Language Processing (NLP) models are known to be sensitive to input perturbations and their performance can decrease when applied to real-world, noisy data. However, it is still unclear why models are less robust to some perturbations than others. In this work, we test the hypothesis that the extent to which a model is affected by an unseen textual perturbation (robustness) can be explained by the learnability of the perturbation (defined as how well the model learns to identify the perturbation with a small amount of evidence). We further give a causal justification for the learnability metric. We conduct extensive experiments with four prominent NLP models — TextRNN, BERT, RoBERTa and XLNet — over eight types of textual perturbations on three datasets. We show that a model which is better at identifying a perturbation (higher learnability) becomes worse at ignoring such a perturbation at test time (lower robustness), providing empirical support for our hypothesis.
2022.findings-acl.315
zhang-etal-2022-interpreting
+ 10.18653/v1/2022.findings-acl.315
Zero-Shot Dense Retrieval with Momentum Adversarial Domain Invariant Representations
@@ -4419,6 +4734,7 @@
xin-etal-2022-zero
BEIR
Natural Questions
+ 10.18653/v1/2022.findings-acl.316
A Few-Shot Semantic Parser for Wizard-of-Oz Dialogues with the Precise ThingTalk Representation
@@ -4432,6 +4748,7 @@
Previous attempts to build effective semantic parsers for Wizard-of-Oz (WOZ) conversations suffer from the difficulty in acquiring a high-quality, manually annotated training set. Approaches based only on dialogue synthesis are insufficient, as dialogues generated from state-machine based models are poor approximations of real-life conversations. Furthermore, previously proposed dialogue state representations are ambiguous and lack the precision necessary for building an effective agent.This paper proposes a new dialogue representation and a sample-efficient methodology that can predict precise dialogue states in WOZ conversations. We extended the ThingTalk representation to capture all information an agent needs to respond properly. Our training strategy is sample-efficient: we combine (1) few-shot data sparsely sampling the full dialogue space and (2) synthesized data covering a subset space of dialogues generated by a succinct state-based dialogue model. The completeness of the extended ThingTalk language is demonstrated with a fully operational agent, which is also used in training data synthesis. We demonstrate the effectiveness of our methodology on MultiWOZ 3.0, a reannotation of the MultiWOZ 2.1 dataset in ThingTalk. ThingTalk can represent 98% of the test turns, while the simulator can emulate 85% of the validation set. We train a contextual semantic parser using our strategy, and obtain 79% turn-by-turn exact match accuracy on the reannotated test set.
2022.findings-acl.317
campagna-etal-2022-shot
+ 10.18653/v1/2022.findings-acl.317
GCPG: A General Framework for Controllable Paraphrase Generation
@@ -4448,6 +4765,7 @@
2022.findings-acl.318
2022.findings-acl.318.software.zip
yang-etal-2022-gcpg
+ 10.18653/v1/2022.findings-acl.318
CrossAligner & Co: Zero-Shot Transfer Methods for Task-Oriented Cross-lingual Natural Language Understanding
@@ -4460,6 +4778,7 @@
gritta-etal-2022-crossaligner
huawei-noah/noah-research
MTOP
+ 10.18653/v1/2022.findings-acl.319
Attention as Grounding: Exploring Textual and Cross-Modal Attention on Entities and Relations in Language-and-Vision Transformer
@@ -4471,6 +4790,7 @@
ilinykh-dobnik-2022-attention
gu-clasp/attention-as-grounding
Image Description Sequences
+ 10.18653/v1/2022.findings-acl.320
Improving Zero-Shot Cross-lingual Transfer Between Closely Related Languages by Injecting Character-Level Noise
@@ -4481,6 +4801,7 @@
2022.findings-acl.321
aepli-sennrich-2022-improving
Universal Dependencies
+ 10.18653/v1/2022.findings-acl.321
Structural Supervision for Word Alignment and Machine Translation
@@ -4492,6 +4813,7 @@
Syntactic structure has long been argued to be potentially useful for enforcing accurate word alignment and improving generalization performance of machine translation. Unfortunately, existing wisdom demonstrates its significance by considering only the syntactic structure of source tokens, neglecting the rich structural information from target tokens and the structural similarity between the source and target sentences. In this work, we propose to incorporate the syntactic structure of both source and target tokens into the encoder-decoder framework, tightly correlating the internal logic of word alignment and machine translation for multi-task learning. Particularly, we won’t leverage any annotated syntactic graph of the target side during training, so we introduce Dynamic Graph Convolution Networks (DGCN) on observed target tokens to sequentially and simultaneously generate the target tokens and the corresponding syntactic graphs, and further guide the word alignment. On this basis, Hierarchical Graph Random Walks (HGRW) are performed on the syntactic graphs of both source and target sides, for incorporating structured constraints on machine translation outputs. Experiments on four publicly available language pairs verify that our method is highly effective in capturing syntactic structure in different languages, consistently outperforming baselines in alignment accuracy and demonstrating promising results in translation quality.
2022.findings-acl.322
li-etal-2022-structural
+ 10.18653/v1/2022.findings-acl.322
Focus on the Action: Learning to Highlight and Summarize Jointly for Email To-Do Items Summarization
@@ -4502,6 +4824,7 @@
Automatic email to-do item generation is the task of generating to-do items from a given email to help people overview emails and schedule daily work. Different from prior research on email summarization, to-do item generation focuses on generating action mentions to provide more structured summaries of email text.Prior work either requires large amount of annotation for key sentences with potential actions or fails to pay attention to nuanced actions from these unstructured emails, and thus often lead to unfaithful summaries. To fill these gaps, we propose a simple and effective learning to highlight and summarize framework (LHS) to learn to identify the most salient text and actions, and incorporate these structured representations to generate more faithful to-do items. Experiments show that our LHS model outperforms the baselines and achieves the state-of-the-art performance in terms of both quantitative evaluation and human judgement. We also discussed specific challenges that current models faced with email to-do summarization.
2022.findings-acl.323
zhang-etal-2022-focus
+ 10.18653/v1/2022.findings-acl.323
Exploring the Capacity of a Large-scale Masked Language Model to Recognize Grammatical Errors
@@ -4512,6 +4835,7 @@
In this paper, we explore the capacity of a language model-based method for grammatical error detection in detail. We first show that 5 to 10% of training data are enough for a BERT-based error detection method to achieve performance equivalent to what a non-language model-based method can achieve with the full training data; recall improves much faster with respect to training data size in the BERT-based method than in the non-language model method. This suggests that (i) the BERT-based method should have a good knowledge of the grammar required to recognize certain types of error and that (ii) it can transform the knowledge into error detection rules by fine-tuning with few training samples, which explains its high generalization ability in grammatical error detection. We further show with pseudo error data that it actually exhibits such nice properties in learning rules for recognizing various types of error. Finally, based on these findings, we discuss a cost-effective method for detecting grammatical errors with feedback comments explaining relevant grammatical rules to learners.
2022.findings-acl.324
nagata-etal-2022-exploring
+ 10.18653/v1/2022.findings-acl.324
Should We Trust This Summary? Bayesian Abstractive Summarization to The Rescue
@@ -4522,6 +4846,7 @@
2022.findings-acl.325
gidiotis-tsoumakas-2022-trust
AESLC
+ 10.18653/v1/2022.findings-acl.325
On the data requirements of probing
@@ -4536,6 +4861,7 @@
zhu-etal-2022-data
spoclab-ca/probing_dataset
SentEval
+ 10.18653/v1/2022.findings-acl.326
Translation Error Detection as Rationale Extraction
@@ -4547,6 +4873,7 @@
2022.findings-acl.327
fomicheva-etal-2022-translation
MLQE-PE
+ 10.18653/v1/2022.findings-acl.327
Towards Collaborative Neural-Symbolic Graph Semantic Parsing via Uncertainty
@@ -4558,6 +4885,7 @@
2022.findings-acl.328
lin-etal-2022-towards
SCAN
+ 10.18653/v1/2022.findings-acl.328
Towards Few-shot Entity Recognition in Document Images: A Label-aware Sequence-to-Sequence Framework
@@ -4568,6 +4896,7 @@
2022.findings-acl.329
wang-shang-2022-towards
FUNSD
+ 10.18653/v1/2022.findings-acl.329
On Length Divergence Bias in Textual Matching Models
@@ -4582,6 +4911,7 @@
2022.findings-acl.330
jiang-etal-2022-length
TrecQA
+ 10.18653/v1/2022.findings-acl.330
What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation
@@ -4596,6 +4926,7 @@
ghazarian-etal-2022-wrong
alexa/conture
FED
+ 10.18653/v1/2022.findings-acl.331
diff --git a/data/xml/2022.fl4nlp.xml b/data/xml/2022.fl4nlp.xml
index e3ab30f053..f672f77425 100644
--- a/data/xml/2022.fl4nlp.xml
+++ b/data/xml/2022.fl4nlp.xml
@@ -34,6 +34,7 @@
In the context of personalized federated learning (FL), the critical challenge is to balance local model improvement and global model tuning when the personal and global objectives may not be exactly aligned. Inspired by Bayesian hierarchical models, we develop ActPerFL, a self-aware personalized FL method where each client can automatically balance the training of its local personal model and the global model that implicitly contributes to other clients’ training. Such a balance is derived from the inter-client and intra-client uncertainty quantification. Consequently, ActPerFL can adapt to the underlying clients’ heterogeneity with uncertainty-driven local training and model aggregation. With experimental studies on Sent140 and Amazon Alexa audio data, we show that ActPerFL can achieve superior personalization performance compared with the existing counterparts.
2022.fl4nlp-1.1
chen-etal-2022-actperfl
+ 10.18653/v1/2022.fl4nlp-1.1
Scaling Language Model Size in Cross-Device Federated Learning
@@ -49,6 +50,7 @@
2022.fl4nlp-1.2
ro-etal-2022-scaling
Billion Word Benchmark
+ 10.18653/v1/2022.fl4nlp-1.2
Adaptive Differential Privacy for Language Model Training
@@ -61,6 +63,7 @@
wu-etal-2022-adaptive
WikiText-103
WikiText-2
+ 10.18653/v1/2022.fl4nlp-1.3
Intrinsic Gradient Compression for Scalable and Efficient Federated Learning
@@ -72,6 +75,7 @@
melas-kyriazi-wang-2022-intrinsic
PERSONA-CHAT
SST
+ 10.18653/v1/2022.fl4nlp-1.4
diff --git a/data/xml/2022.humeval.xml b/data/xml/2022.humeval.xml
index 2e42a427fd..55a10d5b0c 100644
--- a/data/xml/2022.humeval.xml
+++ b/data/xml/2022.humeval.xml
@@ -25,6 +25,7 @@
SacreBLEU, by incorporating a text normalizing step in the pipeline, has become a rising automatic evaluation metric in recent MT studies. With agglutinative languages such as Korean, however, the lexical-level metric cannot provide a conceivable result without a customized pre-tokenization. This paper endeavors to ex- amine the influence of diversified tokenization schemes –word, morpheme, subword, character, and consonants & vowels (CV)– on the metric after its protective layer is peeled off.By performing meta-evaluation with manually- constructed into-Korean resources, our empirical study demonstrates that the human correlation of the surface-based metric and other homogeneous ones (as an extension) vacillates greatly by the token type. Moreover, the human correlation of the metric often deteriorates due to some tokenization, with CV one of its culprits. Guiding through the proper usage of tokenizers for the given metric, we discover i) the feasibility of the character tokens and ii) the deficit of CV in the Korean MT evaluation.
2022.humeval-1.1
kim-kim-2022-vacillating
+ 10.18653/v1/2022.humeval-1.1
A Methodology for the Comparison of Human Judgments With Metrics for Coreference Resolution
@@ -37,6 +38,7 @@
2022.humeval-1.2
borovikova-etal-2022-methodology
CoNLL-2012
+ 10.18653/v1/2022.humeval-1.2
Perceptual Quality Dimensions of Machine-Generated Text with a Focus on Machine Translation
@@ -49,6 +51,7 @@
2022.humeval-1.3
macketanz-etal-2022-perceptual
dfki-nlp/textq
+ 10.18653/v1/2022.humeval-1.3
Human evaluation of web-crawled parallel corpora for machine translation
@@ -61,6 +64,7 @@
2022.humeval-1.4
ramirez-sanchez-etal-2022-human
ParaCrawl
+ 10.18653/v1/2022.humeval-1.4
Beyond calories: evaluating how tailored communication reduces emotional load in diet-coaching
@@ -70,6 +74,7 @@
Dieting is a behaviour change task that is difficult for many people to conduct successfully. This is due to many factors, including stress and cost. Mobile applications offer an alternative to traditional coaching. However, previous work on apps evaluation only focused on dietary outcomes, ignoring users’ emotional state despite its influence on eating habits. In this work, we introduce a novel evaluation of the effects that tailored communication can have on the emotional load of dieting. We implement this by augmenting a traditional diet-app with affective NLG, text-tailoring and persuasive communication techniques. We then run a short 2-weeks experiment and check dietary outcomes, user feedback of produced text and, most importantly, its impact on emotional state, through PANAS questionnaire. Results show that tailored communication significantly improved users’ emotional state, compared to an app-only control group.
2022.humeval-1.5
balloccu-reiter-2022-beyond
+ 10.18653/v1/2022.humeval-1.5
The Human Evaluation Datasheet: A Template for Recording Details of Human Evaluation Experiments in NLP
@@ -80,6 +85,7 @@
2022.humeval-1.6
shimorina-belz-2022-human
Shimorina/human-evaluation-datasheet
+ 10.18653/v1/2022.humeval-1.6
Toward More Effective Human Evaluation for Machine Translation
@@ -92,6 +98,7 @@
2022.humeval-1.7
saldias-fuentes-etal-2022-toward
WMT 2020
+ 10.18653/v1/2022.humeval-1.7
A Study on Manual and Automatic Evaluation for Text Style Transfer: The Case of Detoxification
@@ -107,6 +114,7 @@
2022.humeval-1.8
logacheva-etal-2022-study
CoLA
+ 10.18653/v1/2022.humeval-1.8
Human Judgement as a Compass to Navigate Automatic Metrics for Formality Transfer
@@ -120,6 +128,7 @@
lai-etal-2022-human
laihuiyuan/eval-formality-transfer
GYAFC
+ 10.18653/v1/2022.humeval-1.9
Towards Human Evaluation of Mutual Understanding in Human-Computer Spontaneous Conversation: An Empirical Study of Word Sense Disambiguation for Naturalistic Social Dialogs in American English
@@ -128,6 +137,7 @@
Current evaluation practices for social dialog systems, dedicated to human-computer spontaneous conversation, exclusively focus on the quality of system-generated surface text, but not human-verifiable aspects of mutual understanding between the systems and their interlocutors. This work proposes Word Sense Disambiguation (WSD) as an essential component of a valid and reliable human evaluation framework, whose long-term goal is to radically improve the usability of dialog systems in real-life human-computer collaboration. The practicality of this proposal is proved via experimentally investigating (1) the WordNet 3.0 sense inventory coverage of lexical meanings in spontaneous conversation between humans in American English, assumed as an upper bound of lexical diversity of human-computer communication, and (2) the effectiveness of state-of-the-art WSD models and pretrained transformer-based contextual embeddings on this type of data.
2022.humeval-1.10
luu-2022-towards
+ 10.18653/v1/2022.humeval-1.10
diff --git a/data/xml/2022.in2writing.xml b/data/xml/2022.in2writing.xml
index de00494aeb..58c50f12dd 100644
--- a/data/xml/2022.in2writing.xml
+++ b/data/xml/2022.in2writing.xml
@@ -31,6 +31,7 @@
Today, data-to-text systems are used as commercial solutions for automated text productionof large quantities of text. Therefore, they already represent a new technology of writing.This new technology requires the author, asan act of writing, both to configure a systemthat then takes over the transformation into areal text, but also to maintain strategies of traditional writing. What should an environmentlook like, where a human guides a machineto write texts? Based on a comparison of theNLG pipeline architecture with the results ofthe research on the human writing process, thispaper attempts to take an overview of whichtasks need to be solved and which strategiesare necessary to produce good texts in this environment. From this synopsis, principles for thedesign of data-to-text systems as a functioningwriting environment are then derived.
2022.in2writing-1.1
schneider-etal-2022-data
+ 10.18653/v1/2022.in2writing-1.1
A Design Space for Writing Support Tools Using a Cognitive Process Model of Writing
@@ -42,6 +43,7 @@
Improvements in language technology have led to an increasing interest in writing support tools. In this paper we propose a design space for such tools based on a cognitive process model of writing. We conduct a systematic review of recent computer science papers that present and/or study such tools, analyzing 30 papers from the last five years using the design space. Tools are plotted according to three distinct cognitive processes–planning, translating, and reviewing–and the level of constraint each process entails. Analyzing recent work with the design space shows that highly constrained planning and reviewing are under-studied areas that recent technology improvements may now be able to serve. Finally, we propose shared evaluation methodologies and tasks that may help the field mature.
2022.in2writing-1.2
gero-etal-2022-design
+ 10.18653/v1/2022.in2writing-1.2
A Selective Summary of Where to Hide a Stolen Elephant: Leaps in Creative Writing with Multimodal Machine Intelligence
@@ -53,6 +55,7 @@
While developing a story, novices and published writers alike have had to look outside themselves for inspiration. Language models have recently been able to generate text fluently, producing new stochastic narratives upon request. However, effectively integrating such capabilities with human cognitive faculties and creative processes remains challenging. We propose to investigate this integration with a multimodal writing support interface that offers writing suggestions textually, visually, and aurally. We conduct an extensive study that combines elicitation of prior expectations before writing, observation and semi-structured interviews during writing, and outcome evaluations after writing. Our results illustrate individual and situational variation in machine-in-the-loop writing approaches, suggestion acceptance, and ways the system is helpful. Centrally, we report how participants perform integrative leaps, by which they do cognitive work to integrate suggestions of varying semantic relevance into their developing stories. We interpret these findings, offering modeling and design recommendations for future creative writing support technologies.
2022.in2writing-1.3
singh-etal-2022-selective
+ 10.18653/v1/2022.in2writing-1.3
A text-writing system for Easy-to-Read German evaluated with low-literate users with cognitive impairment
@@ -63,6 +66,7 @@
2022.in2writing-1.4
steinmetz-harbusch-2022-text
CELEX
+ 10.18653/v1/2022.in2writing-1.4
Language Models as Context-sensitive Word Search Engines
@@ -78,6 +82,7 @@
CLOTH
WikiText-103
WikiText-2
+ 10.18653/v1/2022.in2writing-1.5
Plug-and-Play Controller for Story Completion: A Pilot Study toward Emotion-aware Story Writing Assistance
@@ -90,6 +95,7 @@
2022.in2writing-1.6
mori-etal-2022-plug
ROCStories
+ 10.18653/v1/2022.in2writing-1.6
Text Revision by On-the-Fly Representation Optimization
@@ -105,6 +111,7 @@
jingjingli01/oreo
GYAFC
Newsela
+ 10.18653/v1/2022.in2writing-1.7
The Pure Poet: How Good is the Subjective Credibility and Stylistic Quality of Literary Short Texts Written with an Artificial Intelligence Tool as Compared to Texts Written by Human Authors?
@@ -118,6 +125,7 @@
The application of artificial intelligence (AI) for text generation in creative domains raises questions regarding the credibility of AI-generated content. In two studies, we explored if readers can differentiate between AI-based and human-written texts (generated based on the first line of texts and poems of classic authors) and how the stylistic qualities of these texts are rated. Participants read 9 AI-based continuations and either 9 human-written continuations (Study 1, N=120) or 9 original continuations (Study 2, N=302). Participants’ task was to decide whether a continuation was written with an AI-tool or not, to indicate their confidence in each decision, and to assess the stylistic text quality. Results showed that participants generally had low accuracy for differentiating between text types but were overconfident in their decisions. Regarding the assessment of stylistic quality, AI-continuations were perceived as less well-written, inspiring, fascinating, interesting, and aesthetic than both human-written and original continuations.
2022.in2writing-1.8
gunser-etal-2022-pure
+ 10.18653/v1/2022.in2writing-1.8
Interactive Children’s Story Rewriting Through Parent-Children Interaction
@@ -129,6 +137,7 @@
Storytelling in early childhood provides significant benefits in language and literacy development, relationship building, and entertainment. To maximize these benefits, it is important to empower children with more agency. Interactive story rewriting through parent-children interaction can boost children’s agency and help build the relationship between parent and child as they collaboratively create changes to an original story. However, for children with limited proficiency in reading and writing, parents must carry out multiple tasks to guide the rewriting process, which can incur a high cognitive load. In this work, we introduce an interface design that aims to support children and parents to rewrite stories together with the help of AI techniques. We describe three design goals determined by a review of prior literature in interactive storytelling and existing educational activities. We also propose a preliminary prompt-based pipeline that uses GPT-3 to realize the design goals and enable the interface.
2022.in2writing-1.9
lee-etal-2022-interactive
+ 10.18653/v1/2022.in2writing-1.9
News Article Retrieval in Context for Event-centric Narrative Creation
@@ -141,6 +150,7 @@
2022.in2writing-1.10
voskarides-etal-2022-news
nickvosk/ictir2021-news-retrieval-in-context
+ 10.18653/v1/2022.in2writing-1.10
Unmet Creativity Support Needs in Computationally Supported Creative Writing
@@ -150,6 +160,7 @@
Large language models (LLMs) enabled by the datasets and computing power of the last decade have recently gained popularity for their capacity to generate plausible natural language text from human-provided prompts. This ability makes them appealing to fiction writers as prospective co-creative agents, addressing the common challenge of writer’s block, or getting unstuck. However, creative writers face additional challenges, including maintaining narrative consistency, developing plot structure, architecting reader experience, and refining their expressive intent, which are not well-addressed by current LLM-backed tools. In this paper, we define these needs by grounding them in cognitive and theoretical literature, then survey previous computational narrative research that holds promise for supporting each of them in a co-creative setting.
2022.in2writing-1.11
kreminski-martens-2022-unmet
+ 10.18653/v1/2022.in2writing-1.11
Sparks: Inspiration for Science Writing using Language Models
@@ -160,6 +171,7 @@
Large-scale language models are rapidly improving, performing well on a variety of tasks with little to no customization. In this work we investigate how language models can support science writing, a challenging writing task that is both open-ended and highly constrained. We present a system for generating “sparks”, sentences related to a scientific concept intended to inspire writers. We run a user study with 13 STEM graduate students and find three main use cases of sparks—inspiration, translation, and perspective—each of which correlates with a unique interaction pattern. We also find that while participants were more likely to select higher quality sparks, the overall quality of sparks seen by a given participant did not correlate with their satisfaction with the tool.
2022.in2writing-1.12
gero-etal-2022-sparks
+ 10.18653/v1/2022.in2writing-1.12
ChipSong: A Controllable Lyric Generation System for Chinese Popular Song
@@ -175,6 +187,7 @@
2022.in2writing-1.13
liu-etal-2022-chipsong
korokes/chipsong
+ 10.18653/v1/2022.in2writing-1.13
Read, Revise, Repeat: A System Demonstration for Human-in-the-loop Iterative Text Revision
@@ -188,6 +201,7 @@
2022.in2writing-1.14
du-etal-2022-read
vipulraheja/iterater
+ 10.18653/v1/2022.in2writing-1.14
diff --git a/data/xml/2022.insights.xml b/data/xml/2022.insights.xml
index 1ace04d59f..cc861d753d 100644
--- a/data/xml/2022.insights.xml
+++ b/data/xml/2022.insights.xml
@@ -31,6 +31,7 @@
2022.insights-1.1
ding-etal-2022-isotropy
GLUE
+ 10.18653/v1/2022.insights-1.1
Do Dependency Relations Help in the Task of Stance Detection?
@@ -41,6 +42,7 @@
In this paper we present a set of multilingual experiments tackling the task of Stance Detection in five different languages: English, Spanish, Catalan, French and Italian. Furthermore, we study the phenomenon of stance with respect to six different targets – one per language, and two different for Italian – employing a variety of machine learning algorithms that primarily exploit morphological and syntactic knowledge as features, represented throughout the format of Universal Dependencies. Results seem to suggest that the methodology employed is not beneficial per se, but might be useful to exploit the same features with a different methodology.
2022.insights-1.2
cignarella-etal-2022-dependency
+ 10.18653/v1/2022.insights-1.2
Evaluating the Practical Utility of Confidence-score based Techniques for Unsupervised Open-world Classification
@@ -50,6 +52,7 @@
Open-world classification in dialog systems require models to detect open intents, while ensuring the quality of in-domain (ID) intent classification. In this work, we revisit methods that leverage distance-based statistics for unsupervised out-of-domain (OOD) detection. We show that despite their superior performance on threshold-independent metrics like AUROC on test-set, threshold values chosen based on the performance on a validation-set do not generalize well to the test-set, thus resulting in substantially lower performance on ID or OOD detection accuracy and F1-scores. Our analysis shows that this lack of generalizability can be successfully mitigated by setting aside a hold-out set from validation data for threshold selection (sometimes achieving relative gains as high as 100%). Extensive experiments on seven benchmark datasets show that this fix puts the performance of these methods at par with, or sometimes even better than, the current state-of-the-art OOD detection techniques.
2022.insights-1.3
khosla-gangadharaiah-2022-evaluating
+ 10.18653/v1/2022.insights-1.3
Extending the Scope of Out-of-Domain: Examining QA models in multiple subdomains
@@ -63,6 +66,7 @@
lyuchenyang/analysing-question-answering-data
NewsQA
SQuAD
+ 10.18653/v1/2022.insights-1.4
What Do You Get When You Cross Beam Search with Nucleus Sampling?
@@ -72,6 +76,7 @@
We combine beam search with the probabilistic pruning technique of nucleus sampling to create two deterministic nucleus search algorithms for natural language generation. The first algorithm, p-exact search, locally prunes the next-token distribution and performs an exact search over the remaining space. The second algorithm, dynamic beam search, shrinks and expands the beam size according to the entropy of the candidate’s probability distribution. Despite the probabilistic intuition behind nucleus search, experiments on machine translation and summarization benchmarks show that both algorithms reach the same performance levels as standard beam search.
2022.insights-1.5
shaham-levy-2022-get
+ 10.18653/v1/2022.insights-1.5
How Much Do Modifications to Transformer Language Models Affect Their Ability to Learn Linguistic Knowledge?
@@ -83,6 +88,7 @@
2022.insights-1.6
sun-etal-2022-much
BLiMP
+ 10.18653/v1/2022.insights-1.6
Cross-lingual Inflection as a Data Augmentation Method for Parsing
@@ -93,6 +99,7 @@
We propose a morphology-based method for low-resource (LR) dependency parsing. We train a morphological inflector for target LR languages, and apply it to related rich-resource (RR) treebanks to create cross-lingual (x-inflected) treebanks that resemble the target LR language. We use such inflected treebanks to train parsers in zero- (training on x-inflected treebanks) and few-shot (training on x-inflected and target language treebanks) setups. The results show that the method sometimes improves the baselines, but not consistently.
2022.insights-1.7
munoz-ortiz-etal-2022-cross
+ 10.18653/v1/2022.insights-1.7
Is BERT Robust to Label Noise? A Study on Learning with Noisy Labels in Text Classification
@@ -108,6 +115,7 @@
uds-lsv/bert-lnl
AG News
IMDb Movie Reviews
+ 10.18653/v1/2022.insights-1.8
Ancestor-to-Creole Transfer is Not a Walk in the Park
@@ -118,6 +126,7 @@
We aim to learn language models for Creole languages for which large volumes of data are not readily available, and therefore explore the potential transfer from ancestor languages (the ‘Ancestry Transfer Hypothesis’). We find that standard transfer methods do not facilitate ancestry transfer. Surprisingly, different from other non-Creole languages, a very distinct two-phase pattern emerges for Creoles: As our training losses plateau, and language models begin to overfit on their source languages, perplexity on the Creoles drop. We explore if this compression phase can lead to practically useful language models (the ‘Ancestry Bottleneck Hypothesis’), but also falsify this. Moreover, we show that Creoles even exhibit this two-phase pattern even when training on random, unrelated languages. Thus Creoles seem to be typological outliers and we speculate whether there is a link between the two observations.
2022.insights-1.9
lent-etal-2022-ancestor
+ 10.18653/v1/2022.insights-1.9
What GPT Knows About Who is Who
@@ -133,6 +142,7 @@
yang-etal-2022-gpt
awesomecoref/prompt-coref
WSC
+ 10.18653/v1/2022.insights-1.10
Evaluating Biomedical Word Embeddings for Vocabulary Alignment at Scale in the UMLS Metathesaurus Using Siamese Networks
@@ -148,6 +158,7 @@
Recent work uses a Siamese Network, initialized with BioWordVec embeddings (distributed word embeddings), for predicting synonymy among biomedical terms to automate a part of the UMLS (Unified Medical Language System) Metathesaurus construction process. We evaluate the use of contextualized word embeddings extracted from nine different biomedical BERT-based models for synonym prediction in the UMLS by replacing BioWordVec embeddings with embeddings extracted from each biomedical BERT model using different feature extraction methods. Finally, we conduct a thorough grid search, which prior work lacks, to find the best set of hyperparameters. Surprisingly, we find that Siamese Networks initialized with BioWordVec embeddings still out perform the Siamese Networks initialized with embedding extracted from biomedical BERT model.
2022.insights-1.11
bajaj-etal-2022-evaluating
+ 10.18653/v1/2022.insights-1.11
On the Impact of Data Augmentation on Downstream Performance in Natural Language Processing
@@ -160,6 +171,7 @@
2022.insights-1.12
okimura-etal-2022-impact
SST
+ 10.18653/v1/2022.insights-1.12
Can Question Rewriting Help Conversational Question Answering?
@@ -176,6 +188,7 @@
CoQA
QReCC
QuAC
+ 10.18653/v1/2022.insights-1.13
Clustering Examples in Multi-Dataset Benchmarks with Item Response Theory
@@ -191,6 +204,7 @@
MRQA
SST
SuperGLUE
+ 10.18653/v1/2022.insights-1.14
On the Limits of Evaluating Embodied Agent Model Generalization Using Validation Sets
@@ -205,6 +219,7 @@
kim-etal-2022-limits
AI2-THOR
ALFRED
+ 10.18653/v1/2022.insights-1.15
Do Data-based Curricula Work?
@@ -215,6 +230,7 @@
Current state-of-the-art NLP systems use large neural networks that require extensive computational resources for training. Inspired by human knowledge acquisition, researchers have proposed curriculum learning - sequencing tasks (task-based curricula) or ordering and sampling the datasets (data-based curricula) that facilitate training. This work investigates the benefits of data-based curriculum learning for large language models such as BERT and T5. We experiment with various curricula based on complexity measures and different sampling strategies. Extensive experiments on several NLP tasks show that curricula based on various complexity measures rarely have any benefits, while random sampling performs either as well or better than curricula.
2022.insights-1.16
surkov-etal-2022-data
+ 10.18653/v1/2022.insights-1.16
The Document Vectors Using Cosine Similarity Revisited
@@ -226,6 +242,7 @@
bingyu-arefyev-2022-document
bgzh/dv_cosine_revisited
IMDb Movie Reviews
+ 10.18653/v1/2022.insights-1.17
Challenges in including extra-linguistic context in pre-trained language models
@@ -236,6 +253,7 @@
To successfully account for language, computational models need to take into account both the linguistic context (the content of the utterances) and the extra-linguistic context (for instance, the participants in a dialogue). We focus on a referential task that asks models to link entity mentions in a TV show to the corresponding characters, and design an architecture that attempts to account for both kinds of context. In particular, our architecture combines a previously proposed specialized module (an “entity library”) for character representation with transfer learning from a pre-trained language model. We find that, although the model does improve linguistic contextualization, it fails to successfully integrate extra-linguistic information about the participants in the dialogue. Our work shows that it is very challenging to incorporate extra-linguistic information into pre-trained language models.
2022.insights-1.18
sorodoc-etal-2022-challenges
+ 10.18653/v1/2022.insights-1.18
Label Errors in BANKING77
@@ -245,6 +263,7 @@
We investigate potential label errors present in the popular BANKING77 dataset and the associated negative impacts on intent classification methods. Motivated by our own negative results when constructing an intent classifier, we applied two automated approaches to identify potential label errors in the dataset. We found that over 1,400 (14%) of the 10,003 training utterances may have been incorrectly labelled. In a simple experiment, we found that by removing the utterances with potential errors, our intent classifier saw an increase of 4.5% and 8% for the F1-Score and Adjusted Rand Index, respectively, in supervised and unsupervised classification. This paper serves as a warning of the potential of noisy labels in popular NLP datasets. Further study is needed to fully identify the breadth and depth of label errors in BANKING77 and other datasets.
2022.insights-1.19
ying-thomas-2022-label
+ 10.18653/v1/2022.insights-1.19
Pathologies of Pre-trained Language Models in Few-shot Fine-tuning
@@ -259,6 +278,7 @@
chen-etal-2022-pathologies
IMDb Movie Reviews
SNLI
+ 10.18653/v1/2022.insights-1.20
An Empirical study to understand the Compositional Prowess of Neural Dialog Models
@@ -273,6 +293,7 @@
vinayshekharcmu/ComposionalityOfDialogModels
DailyDialog
MutualFriends
+ 10.18653/v1/2022.insights-1.21
Combining Extraction and Generation for Constructing Belief-Consequence Causal Links
@@ -283,6 +304,7 @@
In this paper, we introduce and justify a new task—causal link extraction based on beliefs—and do a qualitative analysis of the ability of a large language model—InstructGPT-3—to generate implicit consequences of beliefs. With the language model-generated consequences being promising, but not consistent, we propose directions of future work, including data collection, explicit consequence extraction using rule-based and language modeling-based approaches, and using explicitly stated consequences of beliefs to fine-tune or prompt the language model to produce outputs suitable for the task.
2022.insights-1.22
alexeeva-etal-2022-combining
+ 10.18653/v1/2022.insights-1.22
Replicability under Near-Perfect Conditions – A Case-Study from Automatic Summarization
@@ -291,6 +313,7 @@
Replication of research results has become more and more important in Natural Language Processing. Nevertheless, we still rely on results reported in the literature for comparison. Additionally, elements of an experimental setup are not always completely reported. This includes, but is not limited to reporting specific parameters used or omitting an implementational detail. In our experiment based on two frequently used data sets from the domain of automatic summarization and the seemingly full disclosure of research artefacts, we examine how well results reported are replicable and what elements influence the success or failure of replication. Our results indicate that publishing research artifacts is far from sufficient, that that publishing all relevant parameters in all possible detail is cruicial.
2022.insights-1.23
mieskes-2022-replicability
+ 10.18653/v1/2022.insights-1.23
BPE beyond Word Boundary: How NOT to use Multi Word Expressions in Neural Machine Translation
@@ -303,6 +326,7 @@
2022.insights-1.24.OptionalSupplementaryData.zip
kumar-thawani-2022-bpe
pegasus-lynx/mwe-bpe
+ 10.18653/v1/2022.insights-1.24
Pre-trained language models evaluating themselves - A comparative study
@@ -314,6 +338,7 @@
2022.insights-1.25
koch-etal-2022-pre
lazerlambda/metricscomparison
+ 10.18653/v1/2022.insights-1.25
diff --git a/data/xml/2022.iwslt.xml b/data/xml/2022.iwslt.xml
index 953f53be1a..ac23672461 100644
--- a/data/xml/2022.iwslt.xml
+++ b/data/xml/2022.iwslt.xml
@@ -25,6 +25,7 @@
This paper addresses the problem of evaluating the quality of automatically generated subtitles, which includes not only the quality of the machine-transcribed or translated speech, but also the quality of line segmentation and subtitle timing. We propose SubER - a single novel metric based on edit distance with shifts that takes all of these subtitle properties into account. We compare it to existing metrics for evaluating transcription, translation, and subtitle quality. A careful human evaluation in a post-editing scenario shows that the new metric has a high correlation with the post-editing effort and direct human assessment scores, outperforming baseline metrics considering only the subtitle text, such as WER and BLEU, and existing methods to integrate segmentation and timing features.
2022.iwslt-1.1
wilken-etal-2022-suber
+ 10.18653/v1/2022.iwslt-1.1
Improving Arabic Diacritization by Learning to Diacritize and Translate
@@ -35,6 +36,7 @@
2022.iwslt-1.2
thompson-alshehri-2022-improving
WikiMatrix
+ 10.18653/v1/2022.iwslt-1.2
Simultaneous Neural Machine Translation with Prefix Alignment
@@ -45,6 +47,7 @@
Simultaneous translation is a task that requires starting translation before the speaker has finished speaking, so we face a trade-off between latency and accuracy. In this work, we focus on prefix-to-prefix translation and propose a method to extract alignment between bilingual prefix pairs. We use the alignment to segment a streaming input and fine-tune a translation model. The proposed method demonstrated higher BLEU than those of baselines in low latency ranges in our experiments on the IWSLT simultaneous translation benchmark.
2022.iwslt-1.3
kano-etal-2022-simultaneous
+ 10.18653/v1/2022.iwslt-1.3
Locality-Sensitive Hashing for Long Context Neural Machine Translation
@@ -56,6 +59,7 @@
After its introduction the Transformer architecture quickly became the gold standard for the task of neural machine translation. A major advantage of the Transformer compared to previous architectures is the faster training speed achieved by complete parallelization across timesteps due to the use of attention over recurrent layers. However, this also leads to one of the biggest problems of the Transformer, namely the quadratic time and memory complexity with respect to the input length. In this work we adapt the locality-sensitive hashing approach of Kitaev et al. (2020) to self-attention in the Transformer, we extended it to cross-attention and apply this memory efficient framework to sentence- and document-level machine translation. Our experiments show that the LSH attention scheme for sentence-level comes at the cost of slightly reduced translation quality. For document-level NMT we are able to include much bigger context sizes than what is possible with the baseline Transformer. However, more context does neither improve translation quality nor improve scores on targeted test suites.
2022.iwslt-1.4
petrick-etal-2022-locality
+ 10.18653/v1/2022.iwslt-1.4
Anticipation-Free Training for Simultaneous Machine Translation
@@ -67,6 +71,7 @@
2022.iwslt-1.5
chang-etal-2022-anticipation
george0828zhang/sinkhorn-simultrans
+ 10.18653/v1/2022.iwslt-1.5
Who Are We Talking About? Handling Person Names in Speech Translation
@@ -79,6 +84,7 @@
gaido-etal-2022-talking
hlt-mt/fbk-fairseq
Europarl-ST
+ 10.18653/v1/2022.iwslt-1.6
Joint Generation of Captions and Subtitles with Dual Decoding
@@ -93,6 +99,7 @@
xu-etal-2022-joint
jitao-xu/dual-decoding
MuST-Cinema
+ 10.18653/v1/2022.iwslt-1.7
MirrorAlign: A Super Lightweight Unsupervised Word Alignment Model via Cross-Lingual Contrastive Learning
@@ -104,6 +111,7 @@
Word alignment is essential for the downstream cross-lingual language understanding and generation tasks. Recently, the performance of the neural word alignment models has exceeded that of statistical models. However, they heavily rely on sophisticated translation models. In this study, we propose a super lightweight unsupervised word alignment model named MirrorAlign, in which bidirectional symmetric attention trained with a contrastive learning objective is introduced, and an agreement loss is employed to bind the attention maps, such that the alignments follow mirror-like symmetry hypothesis. Experimental results on several public benchmarks demonstrate that our model achieves competitive, if not better, performance compared to the state of the art in word alignment while significantly reducing the training and decoding time on average. Further ablation analysis and case studies show the superiority of our proposed MirrorAlign. Notably, we recognize our model as a pioneer attempt to unify bilingual word embedding and word alignments. Encouragingly, our approach achieves 16.4X speedup against GIZA++, and 50X parameter compression compared with the Transformer-based alignment methods. We release our code to facilitate the community: https://github.com/moore3930/MirrorAlign.
2022.iwslt-1.8
wu-etal-2022-mirroralign
+ 10.18653/v1/2022.iwslt-1.8
On the Impact of Noises in Crowd-Sourced Data for Speech Translation
@@ -116,6 +124,7 @@
ouyang-etal-2022-impact
owaski/must-c-clean
MuST-C
+ 10.18653/v1/2022.iwslt-1.9
Findings of the IWSLT 2022 Evaluation Campaign
@@ -172,6 +181,7 @@
LibriSpeech
MuST-C
VoxPopuli
+ 10.18653/v1/2022.iwslt-1.10
The YiTrans Speech Translation System for IWSLT 2022 Offline Shared Task
@@ -185,6 +195,7 @@
MuST-C
OpenSubtitles
VoxPopuli
+ 10.18653/v1/2022.iwslt-1.11
Amazon Alexa AI’s System for IWSLT 2022 Offline Speech Translation Shared Task
@@ -199,6 +210,7 @@
Europarl-ST
LibriSpeech
MuST-C
+ 10.18653/v1/2022.iwslt-1.12
Efficient yet Competitive Speech Translation: FBK@IWSLT2022
@@ -213,6 +225,7 @@
2022.iwslt-1.13
gaido-etal-2022-efficient
hlt-mt/fbk-fairseq
+ 10.18653/v1/2022.iwslt-1.13
Effective combination of pretrained models - KIT@IWSLT2022
@@ -230,6 +243,7 @@
How2
LibriSpeech
MuST-C
+ 10.18653/v1/2022.iwslt-1.14
The USTC-NELSLIP Offline Speech Translation Systems for IWSLT 2022
@@ -250,6 +264,7 @@
This paper describes USTC-NELSLIP’s submissions to the IWSLT 2022 Offline Speech Translation task, including speech translation of talks from English to German, English to Chinese and English to Japanese. We describe both cascaded architectures and end-to-end models which can directly translate source speech into target text. In the cascaded condition, we investigate the effectiveness of different model architectures with robust training and achieve 2.72 BLEU improvements over last year’s optimal system on MuST-C English-German test set. In the end-to-end condition, we build models based on Transformer and Conformer architectures, achieving 2.26 BLEU improvements over last year’s optimal end-to-end system. The end-to-end system has obtained promising results, but it is still lagging behind our cascaded models.
2022.iwslt-1.15
zhang-etal-2022-ustc
+ 10.18653/v1/2022.iwslt-1.15
The AISP-SJTU Simultaneous Translation System for IWSLT 2022
@@ -266,6 +281,7 @@
This paper describes AISP-SJTU’s submissions for the IWSLT 2022 Simultaneous Translation task. We participate in the text-to-text and speech-to-text simultaneous translation from English to Mandarin Chinese. The training of the CAAT is improved by training across multiple values of right context window size, which achieves good online performance without setting a prior right context window size for training. For speech-to-text task, the best model we submitted achieves 25.87, 26.21, 26.45 BLEU in low, medium and high regimes on tst-COMMON, corresponding to 27.94, 28.31, 28.43 BLEU in text-to-text task.
2022.iwslt-1.16
zhu-etal-2022-aisp
+ 10.18653/v1/2022.iwslt-1.16
The Xiaomi Text-to-Text Simultaneous Speech Translation System for IWSLT 2022
@@ -282,6 +298,7 @@
This system paper describes the Xiaomi Translation System for the IWSLT 2022 Simultaneous Speech Translation (noted as SST) shared task. We participate in the English-to-Mandarin Chinese Text-to-Text (noted as T2T) track. Our system is built based on the Transformer model with novel techniques borrowed from our recent research work. For the data filtering, language-model-based and rule-based methods are conducted to filter the data to obtain high-quality bilingual parallel corpora. We also strengthen our system with some dominating techniques related to data augmentation, such as knowledge distillation, tagged back-translation, and iterative back-translation. We also incorporate novel training techniques such as R-drop, deep model, and large batch training which have been shown to be beneficial to the naive Transformer model. In the SST scenario, several variations of exttt{wait-k} strategies are explored. Furthermore, in terms of robustness, both data-based and model-based ways are used to reduce the sensitivity of our system to Automatic Speech Recognition (ASR) outputs. We finally design some inference algorithms and use the adaptive-ensemble method based on multiple model variants to further improve the performance of the system. Compared with strong baselines, fusing all techniques can improve our system by 2 extasciitilde3 BLEU scores under different latency regimes.
2022.iwslt-1.17
guo-etal-2022-xiaomi
+ 10.18653/v1/2022.iwslt-1.17
NVIDIA NeMo Offline Speech Translation Systems for IWSLT 2022
@@ -299,6 +316,7 @@
Europarl-ST
LibriSpeech
VoxPopuli
+ 10.18653/v1/2022.iwslt-1.18
The NiuTrans’s Submission to the IWSLT22 English-to-Chinese Offline Speech Translation Task
@@ -314,6 +332,7 @@
This paper describes NiuTrans’s submission to the IWSLT22 English-to-Chinese (En-Zh) offline speech translation task. The end-to-end and bilingual system is built by constrained English and Chinese data and translates the English speech to Chinese text without intermediate transcription. Our speech translation models are composed of different pre-trained acoustic models and machine translation models by two kinds of adapters. We compared the effect of the standard speech feature (e.g. log Mel-filterbank) and the pre-training speech feature and try to make them interact. The final submission is an ensemble of three potential speech translation models. Our single best and ensemble model achieves 18.66 BLEU and 19.35 BLEU separately on MuST-C En-Zh tst-COMMON set.
2022.iwslt-1.19
zhang-etal-2022-niutranss
+ 10.18653/v1/2022.iwslt-1.19
The HW-TSC’s Offline Speech Translation System for IWSLT 2022 Evaluation
@@ -335,6 +354,7 @@
wang-etal-2022-hw
LibriSpeech
TED-LIUM 3
+ 10.18653/v1/2022.iwslt-1.20
The HW-TSC’s Simultaneous Speech Translation System for IWSLT 2022 Evaluation
@@ -356,6 +376,7 @@
wang-etal-2022-hw-tscs
LibriSpeech
TED-LIUM 3
+ 10.18653/v1/2022.iwslt-1.21
MLLP-VRAIN UPV systems for the IWSLT 2022 Simultaneous Speech Translation and Speech-to-Speech Translation tasks
@@ -376,6 +397,7 @@
Europarl-ST
MuST-C
OpenSubtitles
+ 10.18653/v1/2022.iwslt-1.22
Pretrained Speech Encoders and Efficient Fine-tuning Methods for Speech Translation: UPC at IWSLT 2022
@@ -390,6 +412,7 @@
tsiamas-etal-2022-pretrained
Europarl-ST
MuST-C
+ 10.18653/v1/2022.iwslt-1.23
CUNI-KIT System for Simultaneous Speech Translation Task at IWSLT 2022
@@ -405,6 +428,7 @@
In this paper, we describe our submission to the Simultaneous Speech Translation at IWSLT 2022. We explore strategies to utilize an offline model in a simultaneous setting without the need to modify the original model. In our experiments, we show that our onlinization algorithm is almost on par with the offline setting while being 3x faster than offline in terms of latency on the test set. We also show that the onlinized offline model outperforms the best IWSLT2021 simultaneous system in medium and high latency regimes and is almost on par in the low latency regime. We make our system publicly available.
2022.iwslt-1.24
polak-etal-2022-cuni
+ 10.18653/v1/2022.iwslt-1.24
NAIST Simultaneous Speech-to-Text Translation System for IWSLT 2022
@@ -421,6 +445,7 @@
2022.iwslt-1.25
fukuda-etal-2022-naist
MuST-C
+ 10.18653/v1/2022.iwslt-1.25
The HW-TSC’s Speech to Speech Translation System for IWSLT 2022 Evaluation
@@ -442,6 +467,7 @@
guo-etal-2022-hw
LibriSpeech
TED-LIUM 3
+ 10.18653/v1/2022.iwslt-1.26
CMU’s IWSLT 2022 Dialect Speech Translation System
@@ -458,6 +484,7 @@
This paper describes CMU’s submissions to the IWSLT 2022 dialect speech translation (ST) shared task for translating Tunisian-Arabic speech to English text. We use additional paired Modern Standard Arabic data (MSA) to directly improve the speech recognition (ASR) and machine translation (MT) components of our cascaded systems. We also augment the paired ASR data with pseudo translations via sequence-level knowledge distillation from an MT model and use these artificial triplet ST data to improve our end-to-end (E2E) systems. Our E2E models are based on the Multi-Decoder architecture with searchable hidden intermediates. We extend the Multi-Decoder by orienting the speech encoder towards the target language by applying ST supervision as hierarchical connectionist temporal classification (CTC) multi-task. During inference, we apply joint decoding of the ST CTC and ST autoregressive decoder branches of our modified Multi-Decoder. Finally, we apply ROVER voting, posterior combination, and minimum bayes-risk decoding with combined N-best lists to ensemble our various cascaded and E2E systems. Our best systems reached 20.8 and 19.5 BLEU on test2 (blind) and test1 respectively. Without any additional MSA data, we reached 20.4 and 19.2 on the same test sets.
2022.iwslt-1.27
yan-etal-2022-cmus
+ 10.18653/v1/2022.iwslt-1.27
ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks
@@ -476,6 +503,7 @@
This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2022: low-resource and dialect speech translation. For the Tunisian Arabic-English dataset (low-resource and dialect tracks), we build an end-to-end model as our joint primary submission, and compare it against cascaded models that leverage a large fine-tuned wav2vec 2.0 model for ASR. Our results show that in our settings pipeline approaches are still very competitive, and that with the use of transfer learning, they can outperform end-to-end models for speech translation (ST). For the Tamasheq-French dataset (low-resource track) our primary submission leverages intermediate representations from a wav2vec 2.0 model trained on 234 hours of Tamasheq audio, while our contrastive model uses a French phonetic transcription of the Tamasheq audio as input in a Conformer speech translation architecture jointly trained on automatic speech recognition, ST and machine translation losses. Our results highlight that self-supervised models trained on smaller sets of target data are more effective to low-resource end-to-end ST fine-tuning, compared to large off-the-shelf models. Results also illustrate that even approximate phonetic transcriptions can improve ST scores.
2022.iwslt-1.28
zanon-boito-etal-2022-trac
+ 10.18653/v1/2022.iwslt-1.28
JHU IWSLT 2022 Dialect Speech Translation System Description
@@ -487,6 +515,7 @@
This paper details the Johns Hopkins speech translation (ST) system used in the IWLST2022 dialect speech translation task. Our system uses a cascade of automatic speech recognition (ASR) and machine translation (MT). We use a Conformer model for ASR systems and a Transformer model for machine translation. Surprisingly, we found that while using additional ASR training data resulted in only a negligible change in performance as measured by BLEU or word error rate (WER), aggressive text normalization improved BLEU more significantly. We also describe an approach, similar to back-translation, for improving performance using synthetic dialectal source text produced from source sentences in mismatched dialects.
2022.iwslt-1.29
yang-etal-2022-jhu
+ 10.18653/v1/2022.iwslt-1.29
Controlling Translation Formality Using Pre-trained Multilingual Language Models
@@ -499,6 +528,7 @@
rippeth-etal-2022-controlling
CCMatrix
ParaCrawl
+ 10.18653/v1/2022.iwslt-1.30
Controlling Formality in Low-Resource NMT with Domain Adaptation and Re-Ranking: SLT-CDT-UoS at IWSLT2022
@@ -512,6 +542,7 @@
MuST-C
ParaCrawl
WikiMatrix
+ 10.18653/v1/2022.iwslt-1.31
Improving Machine Translation Formality Control with Weakly-Labelled Data Augmentation and Post Editing Strategies
@@ -524,6 +555,7 @@
This paper describes Amazon Alexa AI’s implementation for the IWSLT 2022 shared task on formality control. We focus on the unconstrained and supervised task for en→hi (Hindi) and en→ja (Japanese) pairs where very limited formality annotated data is available. We propose three simple yet effective post editing strategies namely, T-V conversion, utilizing a verb conjugator and seq2seq models in order to rewrite the translated phrases into formal or informal language. Considering nuances for formality and informality in different languages, our analysis shows that a language-specific post editing strategy achieves the best performance. To address the unique challenge of limited formality annotations, we further develop a formality classifier to perform weakly labelled data augmentation which automatically generates synthetic formality labels from large parallel corpus. Empirical results on the IWSLT formality testset have shown that proposed system achieved significant improvements in terms of formality accuracy while retaining BLEU score on-par with baseline.
2022.iwslt-1.32
zhang-etal-2022-improving-machine
+ 10.18653/v1/2022.iwslt-1.32
HW-TSC’s Participation in the IWSLT 2022 Isometric Spoken Language Translation
@@ -543,6 +575,7 @@
This paper presents our submissions to the IWSLT 2022 Isometric Spoken Language Translation task. We participate in all three language pairs (English-German, English-French, English-Spanish) under the constrained setting, and submit an English-German result under the unconstrained setting. We use the standard Transformer model as the baseline and obtain the best performance via one of its variants that shares the decoder input and output embedding. We perform detailed pre-processing and filtering on the provided bilingual data. Several strategies are used to train our models, such as Multilingual Translation, Back Translation, Forward Translation, R-Drop, Average Checkpoint, and Ensemble. We investigate three methods for biasing the output length: i) conditioning the output to a given target-source length-ratio class; ii) enriching the transformer positional embedding with length information and iii) length control decoding for non-autoregressive translation etc. Our submissions achieve 30.7, 41.6 and 36.7 BLEU respectively on the tst-COMMON test sets for English-German, English-French, English-Spanish tasks and 100% comply with the length requirements.
2022.iwslt-1.33
li-etal-2022-hw
+ 10.18653/v1/2022.iwslt-1.33
AppTek’s Submission to the IWSLT 2022 Isometric Spoken Language Translation Task
@@ -553,6 +586,7 @@
2022.iwslt-1.34
wilken-matusov-2022-appteks
MuST-C
+ 10.18653/v1/2022.iwslt-1.34
Hierarchical Multi-task learning framework for Isometric-Speech Language Translation
@@ -567,6 +601,7 @@
aakash0017/machine-translation-iswlt
MuST-C
PAWS-X
+ 10.18653/v1/2022.iwslt-1.35
diff --git a/data/xml/2022.lchange.xml b/data/xml/2022.lchange.xml
index a9ccee8d26..1853ca666f 100644
--- a/data/xml/2022.lchange.xml
+++ b/data/xml/2022.lchange.xml
@@ -44,6 +44,7 @@
We present a benchmark in six European languages containing manually annotated information about olfactory situations and events following a FrameNet-like approach. The documents selection covers ten domains of interest to cultural historians in the olfactory domain and includes texts published between 1620 to 1920, allowing a diachronic analysis of smell descriptions. With this work, we aim to foster the development of olfactory information extraction approaches as well as the analysis of changes in smell descriptions over time.
2022.lchange-1.1
menini-etal-2022-multilingual
+ 10.18653/v1/2022.lchange-1.1
Language Acquisition, Neutral Change, and Diachronic Trends in Noun Classifiers
@@ -54,6 +55,7 @@
2022.lchange-1.2
kali-kodner-2022-language
an-k45/classifier-change
+ 10.18653/v1/2022.lchange-1.2
Deconstructing destruction: A Cognitive Linguistics perspective on a computational analysis of diachronic change
@@ -64,6 +66,7 @@
In this paper, we aim to introduce a Cognitive Linguistics perspective into a computational analysis of near-synonyms. We focus on a single set of Dutch near-synonyms, vernielen and vernietigen, roughly translated as ‘to destroy’, replicating the analysis from Geeraerts (1997) with distributional models. Our analysis, which tracks the meaning of both words in a corpus of 16th-20th century prose data, shows that both lexical items have undergone semantic change, led by differences in their prototypical semantic core.
2022.lchange-1.3
franco-etal-2022-deconstructing
+ 10.18653/v1/2022.lchange-1.3
What is Done is Done: an Incremental Approach to Semantic Shift Detection
@@ -75,6 +78,7 @@
Contextual word embedding techniques for semantic shift detection are receiving more and more attention. In this paper, we present What is Done is Done (WiDiD), an incremental approach to semantic shift detection based on incremental clustering techniques and contextual embedding methods to capture the changes over the meanings of a target word along a diachronic corpus. In WiDiD, the word contexts observed in the past are consolidated as a set of clusters that constitute the “memory” of the word meanings observed so far. Such a memory is exploited as a basis for subsequent word observations, so that the meanings observed in the present are stratified over the past ones.
2022.lchange-1.4
periti-etal-2022-done
+ 10.18653/v1/2022.lchange-1.4
From qualifiers to quantifiers: semantic shift at the paradigm level
@@ -83,6 +87,7 @@
Language change has often been conceived as a competition between linguistic variants. However, language units may be complex organizations in themselves, e.g. in the case of schematic constructions, featuring a free slot. Such a slot is filled by words forming a set or ‘paradigm’ and engaging in inter-related dynamics within this constructional environment. To tackle this complexity, a simple computational method is offered to automatically characterize their interactions, and visualize them through networks of cooperation and competition. Applying this method to the French paradigm of quantifiers, I show that this method efficiently captures phenomena regarding the evolving organization of constructional paradigms, in particular the constitution of competing clusters of fillers that promote different semantic strategies overall.
2022.lchange-1.5
feltgen-2022-qualifiers
+ 10.18653/v1/2022.lchange-1.5
Do Not Fire the Linguist: Grammatical Profiles Help Language Models Detect Semantic Change
@@ -93,6 +98,7 @@
Morphological and syntactic changes in word usage — as captured, e.g., by grammatical profiles — have been shown to be good predictors of a word’s meaning change. In this work, we explore whether large pre-trained contextualised language models, a common tool for lexical semantic change detection, are sensitive to such morphosyntactic changes. To this end, we first compare the performance of grammatical profiles against that of a multilingual neural language model (XLM-R) on 10 datasets, covering 7 languages, and then combine the two approaches in ensembles to assess their complementarity. Our results show that ensembling grammatical profiles with XLM-R improves semantic change detection performance for most datasets and languages. This indicates that language models do not fully cover the fine-grained morphological and syntactic signals that are explicitly represented in grammatical profiles. An interesting exception are the test sets where the time spans under analysis are much longer than the time gap between them (for example, century-long spans with a one-year gap between them). Morphosyntactic change is slow so grammatical profiles do not detect in such cases. In contrast, language models, thanks to their access to lexical information, are able to detect fast topical changes.
2022.lchange-1.6
giulianelli-etal-2022-fire
+ 10.18653/v1/2022.lchange-1.6
Explainable Publication Year Prediction of Eighteenth Century Texts with the BERT Model
@@ -109,6 +115,7 @@
In this paper, we describe a BERT model trained on the Eighteenth Century Collections Online (ECCO) dataset of digitized documents. The ECCO dataset poses unique modelling challenges due to the presence of Optical Character Recognition (OCR) artifacts. We establish the performance of the BERT model on a publication year prediction task against linear baseline models and human judgement, finding the BERT model to be superior to both and able to date the works, on average, with less than 7 years absolute error. We also explore how language change over time affects the model by analyzing the features the model uses for publication year predictions as given by the Integrated Gradients model explanation method.
2022.lchange-1.7
rastas-etal-2022-explainable
+ 10.18653/v1/2022.lchange-1.7
Using Cross-Lingual Part of Speech Tagging for Partially Reconstructing the Classic Language Family Tree Model
@@ -119,6 +126,7 @@
The tree model is well known for expressing the historic evolution of languages. This model has been considered as a method of describing genetic relationships between languages. Nevertheless, some researchers question the model’s ability to predict the proximity between two languages, since it represents genetic relatedness rather than linguistic resemblance. Defining other language proximity models has been an active research area for many years. In this paper we explore a part-of-speech model for defining proximity between languages using a multilingual language model that was fine-tuned on the task of cross-lingual part-of-speech tagging. We train the model on one language and evaluate it on another; the measured performance is then used to define the proximity between the two languages. By further developing the model, we show that it can reconstruct some parts of the tree model.
2022.lchange-1.8
samohi-etal-2022-using
+ 10.18653/v1/2022.lchange-1.8
A New Framework for Fast Automated Phonological Reconstruction Using Trimmed Alignments and Sound Correspondence Patterns
@@ -130,6 +138,7 @@
2022.lchange-1.9
list-etal-2022-new
lingpy/supervised-reconstruction-paper
+ 10.18653/v1/2022.lchange-1.9
Caveats of Measuring Semantic Change of Cognates and Borrowings using Multilingual Word Embeddings
@@ -140,6 +149,7 @@
2022.lchange-1.10
fourrier-montariol-2022-caveats
clefourrier/historical-semantic-change
+ 10.18653/v1/2022.lchange-1.10
Lexicon of Changes: Towards the Evaluation of Diachronic Semantic Shift in Chinese
@@ -150,6 +160,7 @@
Recent research has brought a wind of using computational approaches to the classic topic of semantic change, aiming to tackle one of the most challenging issues in the evolution of human language. While several methods for detecting semantic change have been proposed, such studies are limited to a few languages, where evaluation datasets are available. This paper presents the first dataset for evaluating Chinese semantic change in contexts preceding and following the Reform and Opening-up, covering a 50-year period in Modern Chinese. Following the DURel framework, we collected 6,000 human judgments for the dataset. We also reported the performance of alignment-based word embedding models on this evaluation dataset, achieving high and significant correlation scores.
2022.lchange-1.11
chen-etal-2022-lexicon
+ 10.18653/v1/2022.lchange-1.11
Low Saxon dialect distances at the orthographic and syntactic level
@@ -160,6 +171,7 @@
We compare five Low Saxon dialects from the 19th and 21st century from Germany and the Netherlands with each other as well as with modern Standard Dutch and Standard German. Our comparison is based on character n-grams on the one hand and PoS n-grams on the other and we show that these two lead to different distances. Particularly in the PoS-based distances, one can observe all of the 21st century Low Saxon dialects shifting towards the modern majority languages.
2022.lchange-1.12
siewert-etal-2022-low
+ 10.18653/v1/2022.lchange-1.12
“Vaderland”, “Volk” and “Natie”: Semantic Change Related to Nationalism in Dutch Literature Between 1700 and 1880 Captured with Dynamic Bernoulli Word Embeddings
@@ -170,6 +182,7 @@
Languages can respond to external events in various ways - the creation of new words or named entities, additional senses might develop for already existing words or the valence of words can change. In this work, we explore the semantic shift of the Dutch words “natie” (“nation”), “volk” (“people”) and “vaderland” (“fatherland”) over a period that is known for the rise of nationalism in Europe: 1700-1880. The semantic change is measured by means of Dynamic Bernoulli Word Embeddings which allow for comparison between word embeddings over different time slices. The word embeddings were generated based on Dutch fiction literature divided over different decades. From the analysis of the absolute drifts, it appears that the word “natie” underwent a relatively small drift. However, the drifts of “vaderland’” and “volk”’ show multiple peaks, culminating around the turn of the nineteenth century. To verify whether this semantic change can indeed be attributed to nationalistic movements, a detailed analysis of the nearest neighbours of the target words is provided. From the analysis, it appears that “natie”, “volk” and “vaderlan”’ became more nationalistically-loaded over time.
2022.lchange-1.13
timmermans-etal-2022-vaderland
+ 10.18653/v1/2022.lchange-1.13
Using neural topic models to track context shifts of words: a case study of COVID-related terms before and after the lockdown in April 2020
@@ -179,6 +192,7 @@
This paper explores lexical meaning changes in a new dataset, which includes tweets from before and after the COVID-related lockdown in April 2020. We use this dataset to evaluate traditional and more recent unsupervised approaches to lexical semantic change that make use of contextualized word representations based on the BERT neural language model to obtain representations of word usages. We argue that previous models that encode local representations of words cannot capture global context shifts such as the context shift of face masks since the pandemic outbreak. We experiment with neural topic models to track context shifts of words. We show that this approach can reveal textual associations of words that go beyond their lexical meaning representation. We discuss future work and how to proceed capturing the pragmatic aspect of meaning change as opposed to lexical semantic change.
2022.lchange-1.14
kellert-mahmud-uz-zaman-2022-using
+ 10.18653/v1/2022.lchange-1.14
Roadblocks in Gender Bias Measurement for Diachronic Corpora
@@ -192,6 +206,7 @@
2022.lchange-1.15
alshahrani-etal-2022-roadblocks
clarkson-accountability-transparency/gbiasroadblocks
+ 10.18653/v1/2022.lchange-1.15
LSCDiscovery: A shared task on semantic change discovery and detection in Spanish
@@ -202,6 +217,7 @@
We present the first shared task on semantic change discovery and detection in Spanish. We create the first dataset of Spanish words manually annotated by semantic change using the DURel framewok (Schlechtweg et al., 2018). The task is divided in two phases: 1) graded change discovery, and 2) binary change detection. In addition to introducing a new language for this task, the main novelty with respect to the previous tasks consists in predicting and evaluating changes for all vocabulary words in the corpus. Six teams participated in phase 1 and seven teams in phase 2 of the shared task, and the best system obtained a Spearman rank correlation of 0.735 for phase 1 and an F1 score of 0.735 for phase 2. We describe the systems developed by the competing teams, highlighting the techniques that were particularly useful.
2022.lchange-1.16
d-zamora-reina-etal-2022-black
+ 10.18653/v1/2022.lchange-1.16
BOS at LSCDiscovery: Lexical Substitution for Interpretable Lexical Semantic Change Detection
@@ -211,6 +227,7 @@
We propose a solution for the LSCDiscovery shared task on Lexical Semantic Change Detection in Spanish. Our approach is based on generating lexical substitutes that describe old and new senses of a given word. This approach achieves the second best result in sense loss and sense gain detection subtasks. By observing those substitutes that are specific for only one time period, one can understand which senses were obtained or lost. This allows providing more detailed information about semantic change to the user and makes our method interpretable.
2022.lchange-1.17
kudisov-arefyev-2022-black
+ 10.18653/v1/2022.lchange-1.17
DeepMistake at LSCDiscovery: Can a Multilingual Word-in-Context Model Replace Human Annotators?
@@ -220,6 +237,7 @@
In this paper we describe our solution of the LSCDiscovery shared task on Lexical Semantic Change Discovery (LSCD) in Spanish. Our solution employs a Word-in-Context (WiC) model, which is trained to determine if a particular word has the same meaning in two given contexts. We basically try to replicate the annotation of the dataset for the shared task, but replacing human annotators with a neural network. In the graded change discovery subtask, our solution has achieved the 2nd best result according to all metrics. In the main binary change detection subtask, our F1-score is 0.655 compared to 0.716 of the best submission, corresponding to the 5th place. However, in the optional sense gain detection subtask we have outperformed all other participants. During the post-evaluation experiments we compared different ways to prepare WiC data in Spanish for fine-tuning. We have found that it helps leaving only examples annotated as 1 (unrelated senses) and 4 (identical senses) rather than using 2x more examples including intermediate annotations.
2022.lchange-1.18
homskiy-arefyev-2022-black
+ 10.18653/v1/2022.lchange-1.18
UAlberta at LSCDiscovery: Lexical Semantic Change Detection via Word Sense Disambiguation
@@ -230,6 +248,7 @@
We describe our two systems for the shared task on Lexical Semantic Change Discovery in Spanish. For binary change detection, we frame the task as a word sense disambiguation (WSD) problem. We derive sense frequency distributions for target words in both old and modern corpora. We assume that the word semantics have changed if a sense is observed in only one of the two corpora, or the relative change for any sense exceeds a tuned threshold. For graded change discovery, we follow the design of CIRCE (Pömsl and Lyapin, 2020) by combining both static and contextual embeddings. For contextual embeddings, we use XLM-RoBERTa instead of BERT, and train the model to predict a masked token instead of the time period. Our language-independent methods achieve results that are close to the best-performing systems in the shared task.
2022.lchange-1.19
teodorescu-etal-2022-black
+ 10.18653/v1/2022.lchange-1.19
CoToHiLi at LSCDiscovery: the Role of Linguistic Features in Predicting Semantic Change
@@ -243,6 +262,7 @@
This paper presents the contributions of the CoToHiLi team for the LSCDiscovery shared task on semantic change in the Spanish language. We participated in both tasks (graded discovery and binary change, including sense gain and sense loss) and proposed models based on word embedding distances combined with hand-crafted linguistic features, including polysemy, number of neological synonyms, and relation to cognates in English. We find that models that include linguistically informed features combined using weights assigned manually by experts lead to promising results.
2022.lchange-1.20
sabina-uban-etal-2022-black
+ 10.18653/v1/2022.lchange-1.20
HSE at LSCDiscovery in Spanish: Clustering and Profiling for Lexical Semantic Change Discovery
@@ -256,6 +276,7 @@
kashleva-etal-2022-black
Various fixes throughout the paper.
+ 10.18653/v1/2022.lchange-1.21
GlossReader at LSCDiscovery: Train to Select a Proper Gloss in English – Discover Lexical Semantic Change in Spanish
@@ -265,6 +286,7 @@
The contextualized embeddings obtained from neural networks pre-trained as Language Models (LM) or Masked Language Models (MLM) are not well suitable for solving the Lexical Semantic Change Detection (LSCD) task because they are more sensitive to changes in word forms rather than word meaning, a property previously known as the word form bias or orthographic bias. Unlike many other NLP tasks, it is also not obvious how to fine-tune such models for LSCD. In order to conclude if there are any differences between senses of a particular word in two corpora, a human annotator or a system shall analyze many examples containing this word from both corpora. This makes annotation of LSCD datasets very labour-consuming. The existing LSCD datasets contain up to 100 words that are labeled according to their semantic change, which is hardly enough for fine-tuning. To solve these problems we fine-tune the XLM-R MLM as part of a gloss-based WSD system on a large WSD dataset in English. Then we employ zero-shot cross-lingual transferability of XLM-R to build the contextualized embeddings for examples in Spanish. In order to obtain the graded change score for each word, we calculate the average distance between our improved contextualized embeddings of its old and new occurrences. For the binary change detection subtask, we apply thresholding to the same scores. Our solution has shown the best results among all other participants in all subtasks except for the optional sense gain detection subtask.
2022.lchange-1.22
rachinskiy-arefyev-2022-black
+ 10.18653/v1/2022.lchange-1.22
diff --git a/data/xml/2022.lnls.xml b/data/xml/2022.lnls.xml
index d02d8b1635..da6435ffd7 100644
--- a/data/xml/2022.lnls.xml
+++ b/data/xml/2022.lnls.xml
@@ -27,6 +27,7 @@
2022.lnls-1.1
ri-etal-2022-finding
ALFRED
+ 10.18653/v1/2022.lnls-1.1
GrammarSHAP: An Efficient Model-Agnostic and Structure-Aware NLP Explainer
@@ -41,6 +42,7 @@
mosca-etal-2022-grammarshap
IMDb Movie Reviews
SST
+ 10.18653/v1/2022.lnls-1.2
Single-Turn Debate Does Not Help Humans Answer Hard Reading-Comprehension Questions
@@ -55,6 +57,7 @@
Current QA systems can generate reasonable-sounding yet false answers without explanation or evidence for the generated answer, which is especially problematic when humans cannot readily check the model’s answers. This presents a challenge for building trust in machine learning systems. We take inspiration from real-world situations where difficult questions are answered by considering opposing sides (see Irving et al., 2018). For multiple-choice QA examples, we build a dataset of single arguments for both a correct and incorrect answer option in a debate-style set-up as an initial step in training models to produce explanations for two candidate answers. We use long contexts—humans familiar with the context write convincing explanations for pre-selected correct and incorrect answers, and we test if those explanations allow humans who have not read the full context to more accurately determine the correct answer. We do not find that explanations in our set-up improve human accuracy, but a baseline condition shows that providing human-selected text snippets does improve accuracy. We use these findings to suggest ways of improving the debate set up for future data collection efforts.
2022.lnls-1.3
parrish-etal-2022-single
+ 10.18653/v1/2022.lnls-1.3
When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data
@@ -68,6 +71,7 @@
SNLI
TACRED
e-SNLI
+ 10.18653/v1/2022.lnls-1.4
A survey on improving NLP models with human explanations
@@ -78,6 +82,7 @@
2022.lnls-1.5
hartmann-sonntag-2022-survey
e-SNLI
+ 10.18653/v1/2022.lnls-1.5
diff --git a/data/xml/2022.ltedi.xml b/data/xml/2022.ltedi.xml
index 9ccbdc425b..ba44e72a92 100644
--- a/data/xml/2022.ltedi.xml
+++ b/data/xml/2022.ltedi.xml
@@ -27,6 +27,7 @@
2022.ltedi-1.1
markl-2022-mind
Common Voice
+ 10.18653/v1/2022.ltedi-1.1
Regex in a Time of Deep Learning: The Role of an Old Technology in Age Discrimination Detection in Job Advertisements
@@ -37,6 +38,7 @@
Deep learning holds great promise for detecting discriminatory language in the public sphere. However, for the detection of illegal age discrimination in job advertisements, regex approaches are still strong performers. In this paper, we investigate job advertisements in the Netherlands. We present a qualitative analysis of the benefits of the ‘old’ approach based on regexes and investigate how neural embeddings could address its limitations.
2022.ltedi-1.2
pillar-etal-2022-regex
+ 10.18653/v1/2022.ltedi-1.2
Doing not Being: Concrete Language as a Bridge from Language Technology to Ethnically Inclusive Job Ads
@@ -48,6 +50,7 @@
This paper makes the case for studying concreteness in language as a bridge that will allow language technology to support the understanding and improvement of ethnic inclusivity in job advertisements. We propose an annotation scheme that guides the assignment of sentences in job ads to classes that reflect concrete actions, i.e., what the employer needs people to do, and abstract dispositions, i.e., who the employer expects people to be. Using an annotated dataset of Dutch-language job ads, we demonstrate that machine learning technology is effectively able to distinguish these classes.
2022.ltedi-1.3
adams-etal-2022-concrete
+ 10.18653/v1/2022.ltedi-1.3
Measuring Harmful Sentence Completion in Language Models for LGBTQIA+ Individuals
@@ -61,6 +64,7 @@
nozza-etal-2022-measuring
milanlproc/honest
HONEST
+ 10.18653/v1/2022.ltedi-1.4
Using BERT Embeddings to Model Word Importance in Conversational Transcripts for Deaf and Hard of Hearing Users
@@ -72,6 +76,7 @@
Deaf and hard of hearing individuals regularly rely on captioning while watching live TV. Live TV captioning is evaluated by regulatory agencies using various caption evaluation metrics. However, caption evaluation metrics are often not informed by preferences of DHH users or how meaningful the captions are. There is a need to construct caption evaluation metrics that take the relative importance of words in transcript into account. We conducted correlation analysis between two types of word embeddings and human-annotated labelled word-importance scores in existing corpus. We found that normalized contextualized word embeddings generated using BERT correlated better with manually annotated importance scores than word2vec-based word embeddings. We make available a pairing of word embeddings and their human-annotated importance scores. We also provide proof-of-concept utility by training word importance models, achieving an F1-score of 0.57 in the 6-class word importance classification task.
2022.ltedi-1.5
amin-etal-2022-using
+ 10.18653/v1/2022.ltedi-1.5
Detoxifying Language Models with a Toxic Corpus
@@ -82,6 +87,7 @@
2022.ltedi-1.6
park-rudzicz-2022-detoxifying
WebText
+ 10.18653/v1/2022.ltedi-1.6
Inferring Gender: A Scalable Methodology for Gender Detection with Online Lexical Databases
@@ -92,6 +98,7 @@
2022.ltedi-1.7
bartl-leavy-2022-inferring
marionbartl/lexical-gender
+ 10.18653/v1/2022.ltedi-1.7
Debiasing Pre-Trained Language Models via Efficient Fine-Tuning
@@ -106,6 +113,7 @@
CrowS-Pairs
StereoSet
WinoBias
+ 10.18653/v1/2022.ltedi-1.8
Disambiguation of morpho-syntactic features of African American English – the case of habitual be
@@ -117,6 +125,7 @@
Recent research has highlighted that natural language processing (NLP) systems exhibit a bias againstAfrican American speakers. These errors are often caused by poor representation of linguistic features unique to African American English (AAE), which is due to the relatively low probability of occurrence for many such features. We present a workflow to overcome this issue in the case of habitual “be”. Habitual “be” is isomorphic, and therefore ambiguous, with other forms of uninflected “be” found in both AAE and General American English (GAE). This creates a clear challenge for bias in NLP technologies. To overcome the scarcity, we employ a combination of rule-based filters and data augmentation that generate a corpus balanced between habitual and non-habitual instances. This balanced corpus trains unbiased machine learning classifiers, as demonstrated on a corpus of AAE transcribed texts, achieving .65 F_1 score at classifying habitual “be”.
2022.ltedi-1.9
santiago-etal-2022-disambiguation
+ 10.18653/v1/2022.ltedi-1.9
Behind the Mask: Demographic bias in name detection for PII masking
@@ -128,6 +137,7 @@
2022.ltedi-1.10
mansfield-etal-2022-behind
csmansfield/pii-masking-bias
+ 10.18653/v1/2022.ltedi-1.10
Mapping the Multilingual Margins: Intersectional Biases of Sentiment Analysis Systems in English, Spanish, and Arabic
@@ -140,6 +150,7 @@
As natural language processing systems become more widespread, it is necessary to address fairness issues in their implementation and deployment to ensure that their negative impacts on society are understood and minimized. However, there is limited work that studies fairness using a multilingual and intersectional framework or on downstream tasks. In this paper, we introduce four multilingual Equity Evaluation Corpora, supplementary test sets designed to measure social biases, and a novel statistical framework for studying unisectional and intersectional social biases in natural language processing. We use these tools to measure gender, racial, ethnic, and intersectional social biases across five models trained on emotion regression tasks in English, Spanish, and Arabic. We find that many systems demonstrate statistically significant unisectional and intersectional social biases. We make our code and datasets available for download.
2022.ltedi-1.11
camara-etal-2022-mapping
+ 10.18653/v1/2022.ltedi-1.11
Monte Carlo Tree Search for Interpreting Stress in Natural Language
@@ -151,6 +162,7 @@
2022.ltedi-1.12
swanson-etal-2022-monte
swansonk14/mcts_interpretability
+ 10.18653/v1/2022.ltedi-1.12
IIITSurat@LT-EDI-ACL2022: Hope Speech Detection using Machine Learning
@@ -162,6 +174,7 @@
This paper addresses the issue of Hope Speech detection using machine learning techniques. Designing a robust model that helps in predicting the target class with higher accuracy is a challenging task in machine learning, especially when the distribution of the class labels is highly imbalanced. This study uses and compares the experimental outcomes of the different oversampling techniques. Many models are implemented to classify the comments into Hope and Non-Hope speech, and it found that machine learning algorithms perform better than deep learning models. The English language dataset used in this research was developed by collecting YouTube comments and is part of the task “ACL-2022:Hope Speech Detection for Equality, Diversity, and Inclusion”. The proposed model achieved a weighted F1-score of 0.55 on the test dataset and secured the first rank among the participated teams.
2022.ltedi-1.13
roy-etal-2022-iiitsurat
+ 10.18653/v1/2022.ltedi-1.13
The Best of both Worlds: Dual Channel Language modeling for Hope Speech Detection in low-resourced Kannada
@@ -175,6 +188,7 @@
2022.ltedi-1.14
hande-etal-2022-best
KanHope
+ 10.18653/v1/2022.ltedi-1.14
NYCU_TWD@LT-EDI-ACL2022: Ensemble Models with VADER and Contrastive Learning for Detecting Signs of Depression from Social Media
@@ -186,6 +200,7 @@
This paper presents a state-of-the-art solution to the LT-EDI-ACL 2022 Task 4: Detecting Signs of Depression from Social Media Text. The goal of this task is to detect the severity levels of depression of people from social media posts, where people often share their feelings on a daily basis. To detect the signs of depression, we propose a framework with pre-trained language models using rich information instead of training from scratch, gradient boosting and deep learning models for modeling various aspects, and supervised contrastive learning for the generalization ability. Moreover, ensemble techniques are also employed in consideration of the different advantages of each method. Experiments show that our framework achieves a 2nd prize ranking with a macro F1-score of 0.552, showing the effectiveness and robustness of our approach.
2022.ltedi-1.15
wang-etal-2022-nycu
+ 10.18653/v1/2022.ltedi-1.15
UMUTeam@LT-EDI-ACL2022: Detecting homophobic and transphobic comments in Tamil
@@ -196,6 +211,7 @@
This working-notes are about the participation of the UMUTeam in a LT-EDI shared task concerning the identification of homophobic and transphobic comments in YouTube. These comments are written in English, which has high availability to machine-learning resources; Tamil, which has fewer resources; and a transliteration from Tamil to Roman script combined with English sentences. To carry out this shared task, we train a neural network that combines several feature sets applying a knowledge integration strategy. These features are linguistic features extracted from a tool developed by our research group and contextual and non-contextual sentence embeddings. We ranked 7th for English subtask (macro f1-score of 45%), 3rd for Tamil subtask (macro f1-score of 82%), and 2nd for Tamil-English subtask (macro f1-score of 58%).
2022.ltedi-1.16
garcia-diaz-etal-2022-umuteam-lt
+ 10.18653/v1/2022.ltedi-1.16
UMUTeam@LT-EDI-ACL2022: Detecting Signs of Depression from text
@@ -205,6 +221,7 @@
Depression is a mental condition related to sadness and the lack of interest in common daily tasks. In this working-notes, we describe the proposal of the UMUTeam in the LT-EDI shared task (ACL 2022) concerning the identification of signs of depression in social network posts. This task is somehow related to other relevant Natural Language Processing tasks such as Emotion Analysis. In this shared task, the organisers challenged the participants to distinguish between moderate and severe signs of depression (or no signs of depression at all) in a set of social posts written in English. Our proposal is based on the combination of linguistic features and several sentence embeddings using a knowledge integration strategy. Our proposal achieved the 6th position, with a macro f1-score of 53.82 in the official leader board.
2022.ltedi-1.17
garcia-diaz-valencia-garcia-2022-umuteam
+ 10.18653/v1/2022.ltedi-1.17
bitsa_nlp@LT-EDI-ACL2022: Leveraging Pretrained Language Models for Detecting Homophobia and Transphobia in Social Media Comments
@@ -215,6 +232,7 @@
2022.ltedi-1.18
bhandari-goyal-2022-bitsa
vitthal-bhandari/homophobia-transphobia-detection
+ 10.18653/v1/2022.ltedi-1.18
ABLIMET @LT-EDI-ACL2022: A Roberta based Approach for Homophobia/Transphobia Detection in Social Media
@@ -223,6 +241,7 @@
This paper describes our system that participated in LT-EDI-ACL2022- Homophobia/Transphobia Detection in Social Media. Sexual minorities face a lot of unfair treatment and discrimination in our world. This creates enormous stress and many psychological problems for sexual minorities. There is a lot of hate speech on the internet, and Homophobia/Transphobia is the one against sexual minorities. Identifying and processing Homophobia/ Transphobia through natural language processing technology can improve the efficiency of processing Homophobia/ Transphobia, and can quickly screen out Homophobia/Transphobia on the Internet. The organizer of LT-EDI-ACL2022- Homophobia/Transphobia Detection in Social Media constructs a Homophobia/ Transphobia detection dataset based on YouTube comments for English and Tamil. We use a Roberta -based approach to conduct Homophobia/ Transphobia detection experiments on the dataset of the competition, and get better results.
2022.ltedi-1.19
maimaitituoheti-2022-ablimet
+ 10.18653/v1/2022.ltedi-1.19
MUCIC@LT-EDI-ACL2022: Hope Speech Detection using Data Re-Sampling and 1D Conv-LSTM
@@ -234,6 +253,7 @@
Spreading positive vibes or hope content on social media may help many people to get motivated in their life. To address Hope Speech detection in YouTube comments, this paper presents the description of the models submitted by our team - MUCIC, to the Hope Speech Detection for Equality, Diversity, and Inclusion (HopeEDI) shared task at Association for Computational Linguistics (ACL) 2022. This shared task consists of texts in five languages, namely: English, Spanish (in Latin scripts), and Tamil, Malayalam, and Kannada (in code-mixed native and Roman scripts) with the aim of classifying the YouTube comment into “Hope”, “Not-Hope” or “Not-Intended” categories. The proposed methodology uses the re-sampling technique to deal with imbalanced data in the corpus and obtained 1st rank for English language with a macro-averaged F1-score of 0.550 and weighted-averaged F1-score of 0.860. The code to reproduce this work is available in GitHub.
2022.ltedi-1.20
gowda-etal-2022-mucic
+ 10.18653/v1/2022.ltedi-1.20
DeepBlues@LT-EDI-ACL2022: Depression level detection modelling through domain specific BERT and short text Depression classifiers
@@ -245,6 +265,7 @@
We discuss a variety of approaches to build a robust Depression level detection model from longer social media posts (i.e., Reddit Depression forum posts) using a mental health text pre-trained BERT model. Further, we report our experimental results based on a strategy to select excerpts from long text and then fine-tune the BERT model to combat the issue of memory constraints while processing such texts. We show that, with domain specific BERT, we can achieve reasonable accuracy with fixed text size (in this case 200 tokens) for this task. In addition we can use short text classifiers to extract relevant text from the long text and achieve slightly better accuracy, albeit, trading off with the processing time for extracting such excerpts.
2022.ltedi-1.21
farruque-etal-2022-deepblues
+ 10.18653/v1/2022.ltedi-1.21
SSN_ARMM@ LT-EDI -ACL2022: Hope Speech Detection for Equality, Diversity, and Inclusion Using ALBERT model
@@ -259,6 +280,7 @@
In recent years social media has become one of the major forums for expressing human views and emotions. With the help of smartphones and high-speed internet, anyone can express their views on Social media. However, this can also lead to the spread of hatred and violence in society. Therefore it is necessary to build a method to find and support helpful social media content. In this paper, we studied Natural Language Processing approach for detecting Hope speech in a given sentence. The task was to classify the sentences into ‘Hope speech’ and ‘Non-hope speech’. The dataset was provided by LT-EDI organizers with text from Youtube comments. Based on the task description, we developed a system using the pre-trained language model BERT to complete this task. Our model achieved 1st rank in the Kannada language with a weighted average F1 score of 0.750, 2nd rank in the Malayalam language with a weighted average F1 score of 0.740, 3rd rank in the Tamil language with a weighted average F1 score of 0.390 and 6th rank in the English language with a weighted average F1 score of 0.880.
2022.ltedi-1.22
vijayakumar-etal-2022-ssn
+ 10.18653/v1/2022.ltedi-1.22
SUH_ASR@LT-EDI-ACL2022: Transformer based Approach for Speech Recognition for Vulnerable Individuals in Tamil
@@ -268,6 +290,7 @@
An Automatic Speech Recognition System is developed for addressing the Tamil conversational speech data of the elderly people andtransgender. The speech corpus used in this system is collected from the people who adhere their communication in Tamil at some primary places like bank, hospital, vegetable markets. Our ASR system is designed with pre-trained model which is used to recognize the speechdata. WER(Word Error Rate) calculation is used to analyse the performance of the ASR system. This evaluation could help to make acomparison of utterances between the elderly people and others. Similarly, the comparison between the transgender and other people isalso done. Our proposed ASR system achieves the word error rate as 39.65%.
2022.ltedi-1.23
s-b-2022-suh
+ 10.18653/v1/2022.ltedi-1.23
LPS@LT-EDI-ACL2022:An Ensemble Approach about Hope Speech Detection
@@ -276,6 +299,7 @@
The task shared by sponsor about Hope Speech Detection for Equality, Diversity, and Inclusion at LT-EDI-ACL-2022.The goal of this task is to identify whether a given comment contains hope speech or not,and hope is considered significant for the well-being, recuperation and restoration of human life.Our work aims to change the prevalent way of thinking by moving away from a preoccupation with discrimination, loneliness or the worst things in life to building the confidence, support and good qualities based on comments by individuals. In response to the need to detect equality, diversity and inclusion of hope speech in a multilingual environment, we built an integration model and achieved well performance on multiple datasets presented by the sponsor and the specific results can be referred to the experimental results section.
2022.ltedi-1.24
zhu-2022-lps
+ 10.18653/v1/2022.ltedi-1.24
CURAJ_IIITDWD@LT-EDI-ACL 2022: Hope Speech Detection in English YouTube Comments using Deep Learning Techniques
@@ -286,6 +310,7 @@
Hope Speech are positive terms that help to promote or criticise a point of view without hurting the user’s or community’s feelings. Non-Hope Speech, on the other side, includes expressions that are harsh, ridiculing, or demotivating. The goal of this article is to find the hope speech comments in a YouTube dataset. The datasets were created as part of the “LT-EDI-ACL 2022: Hope Speech Detection for Equality, Diversity, and Inclusion” shared task. The shared task dataset was proposed in Malayalam, Tamil, English, Spanish, and Kannada languages. In this paper, we worked at English-language YouTube comments. We employed several deep learning based models such as DNN (dense or fully connected neural network), CNN (Convolutional Neural Network), Bi-LSTM (Bidirectional Long Short Term Memory Network), and GRU(Gated Recurrent Unit) to identify the hopeful comments. We also used Stacked LSTM-CNN and Stacked LSTM-LSTM network to train the model. The best macro average F1-score 0.67 for development dataset was obtained using the DNN model. The macro average F1-score of 0.67 was achieved for the classification done on the test data as well.
2022.ltedi-1.25
jha-etal-2022-curaj
+ 10.18653/v1/2022.ltedi-1.25
SSN_MLRG3 @LT-EDI-ACL2022-Depression Detection System from Social Media Text using Transformer Models
@@ -299,6 +324,7 @@
Depression is a common mental illness that involves sadness and lack of interest in all day-to-day activities. The task is to classify the social media text as signs of depression into three labels namely “not depressed”, “moderately depressed”, and “severely depressed”. We have build a system using Deep Learning Model “Transformers”. Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio. The multi-class classification model used in our system is based on the ALBERT model. In the shared task ACL 2022, Our team SSN_MLRG3 obtained a Macro F1 score of 0.473.
2022.ltedi-1.26
esackimuthu-etal-2022-ssn
+ 10.18653/v1/2022.ltedi-1.26
BERT 4EVER@LT-EDI-ACL2022-Detecting signs of Depression from Social Media:Detecting Depression in Social Media using Prompt-Learning and Word-Emotion Cluster
@@ -311,6 +337,7 @@
In this paper, we report the solution of the team BERT 4EVER for the LT-EDI-2022 shared task2: Homophobia/Transphobia Detection in social media comments in ACL 2022, which aims to classify Youtube comments into one of the following categories: no,moderate, or severe depression. We model the problem as a text classification task and a text generation task and respectively propose two different models for the tasks.To combine the knowledge learned from these two different models, we softly fuse the predicted probabilities of the models above and then select the label with the highest probability as the final output.In addition, multiple augmentation strategies are leveraged to improve the model generalization capability, such as back translation and adversarial training.Experimental results demonstrate the effectiveness of the proposed models and two augmented strategies.
2022.ltedi-1.27
lin-etal-2022-bert
+ 10.18653/v1/2022.ltedi-1.27
CIC@LT-EDI-ACL2022: Are transformers the only hope? Hope speech detection for Spanish and English comments
@@ -322,6 +349,7 @@
Hope is an inherent part of human life and essential for improving the quality of life. Hope increases happiness and reduces stress and feelings of helplessness. Hope speech is the desired outcome for better and can be studied using text from various online sources where people express their desires and outcomes. In this paper, we address a deep-learning approach with a combination of linguistic and psycho-linguistic features for hope-speech detection. We report our best results submitted to LT-EDI-2022 which ranked 2nd and 3rd in English and Spanish respectively.
2022.ltedi-1.28
balouchzahi-etal-2022-cic
+ 10.18653/v1/2022.ltedi-1.28
scubeMSEC@LT-EDI-ACL2022: Detection of Depression using Transformer Models
@@ -334,6 +362,7 @@
Social media platforms play a major role in our day-to-day life and are considered as a virtual friend by many users, who use the social media to share their feelings all day. Many a time, the content which is shared by users on social media replicate their internal life. Nowadays people love to share their daily life incidents like happy or unhappy moments and their feelings in social media and it makes them feel complete and it has become a habit for many users. Social media provides a new chance to identify the feelings of a person through their posts. The aim of the shared task is to develop a model in which the system is capable of analyzing the grammatical markers related to onset and permanent symptoms of depression. We as a team participated in the shared task Detecting Signs of Depression from Social Media Text at LT-EDI 2022- ACL 2022 and we have proposed a model which predicts depression from English social media posts using the data set shared for the task. The prediction is done based on the labels Moderate, Severe and Not Depressed. We have implemented this using different transformer models like DistilBERT, RoBERTa and ALBERT by which we were able to achieve a Macro F1 score of 0.337, 0.457 and 0.387 respectively. Our code is publicly available in the github
2022.ltedi-1.29
s-etal-2022-scubemsec
+ 10.18653/v1/2022.ltedi-1.29
SSNCSE_NLP@LT-EDI-ACL2022:Hope Speech Detection for Equality, Diversity and Inclusion using sentence transformers
@@ -346,6 +375,7 @@
In recent times, applications have been developed to regulate and control the spread of negativity and toxicity on online platforms. The world is filled with serious problems like political & religious conflicts, wars, pandemics, and offensive hate speech is the last thing we desire. Our task was to classify a text into ‘Hope Speech’ and ‘Non-Hope Speech’. We searched for datasets acquired from YouTube comments that offer support, reassurance, inspiration, and insight, and the ones that don’t. The datasets were provided to us by the LTEDI organizers in English, Tamil, Spanish, Kannada, and Malayalam. To successfully identify and classify them, we employed several machine learning transformer models such as m-BERT, MLNet, BERT, XLMRoberta, and XLM_MLM. The observed results indicate that the BERT and m-BERT have obtained the best results among all the other techniques, gaining a weighted F1- score of 0.92, 0.71, 0.76, 0.87, and 0.83 for English, Tamil, Spanish, Kannada, and Malayalam respectively. This paper depicts our work for the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion at LTEDI 2021.
2022.ltedi-1.30
b-etal-2022-ssncse
+ 10.18653/v1/2022.ltedi-1.30
SOA_NLP@LT-EDI-ACL2022: An Ensemble Model for Hope Speech Detection from YouTube Comments
@@ -356,6 +386,7 @@
Language should be accommodating of equality and diversity as a fundamental aspect of communication. The language of internet users has a big impact on peer users all over the world. On virtual platforms such as Facebook, Twitter, and YouTube, people express their opinions in different languages. People respect others’ accomplishments, pray for their well-being, and cheer them on when they fail. Such motivational remarks are hope speech remarks. Simultaneously, a group of users encourages discrimination against women, people of color, people with disabilities, and other minorities based on gender, race, sexual orientation, and other factors. To recognize hope speech from YouTube comments, the current study offers an ensemble approach that combines a support vector machine, logistic regression, and random forest classifiers. Extensive testing was carried out to discover the best features for the aforementioned classifiers. In the support vector machine and logistic regression classifiers, char-level TF-IDF features were used, whereas in the random forest classifier, word-level features were used. The proposed ensemble model performed significantly well among English, Spanish, Tamil, Malayalam, and Kannada YouTube comments.
2022.ltedi-1.31
kumar-etal-2022-soa
+ 10.18653/v1/2022.ltedi-1.31
IIT Dhanbad @LT-EDI-ACL2022- Hope Speech Detection for Equality, Diversity, and Inclusion
@@ -366,6 +397,7 @@
Hope is considered significant for the wellbeing,recuperation and restoration of humanlife by health professionals. Hope speech reflectsthe belief that one can discover pathwaysto their desired objectives and become rousedto utilise those pathways. Hope speech offerssupport, reassurance, suggestions, inspirationand insight. Hate speech is a prevalent practicethat society has to struggle with everyday.The freedom of speech and ease of anonymitygranted by social media has also resulted inincitement to hatred. In this paper, we workto identify and promote positive and supportivecontent on these platforms. We work withseveral machine learning models to classify socialmedia comments as hope speech or nonhopespeech in English. This paper portraysour work for the Shared Task on Hope SpeechDetection for Equality, Diversity, and Inclusionat LT-EDI-ACL 2022.
2022.ltedi-1.32
gupta-etal-2022-iit
+ 10.18653/v1/2022.ltedi-1.32
IISERB@LT-EDI-ACL2022: A Bag of Words and Document Embeddings Based Framework to Identify Severity of Depression Over Social Media
@@ -374,6 +406,7 @@
The DepSign-LT-EDI-ACL2022 shared task focuses on early prediction of severity of depression over social media posts. The BioNLP group at Department of Data Science and Engineering in Indian Institute of Science Education and Research Bhopal (IISERB) has participated in this challenge and submitted three runs based on three different text mining models. The severity of depression were categorized into three classes, viz., no depression, moderate, and severe and the data to build models were released as part of this shared task. The objective of this work is to identify relevant features from the given social media texts for effective text classification. As part of our investigation, we explored features derived from text data using document embeddings technique and simple bag of words model following different weighting schemes. Subsequently, adaptive boosting, logistic regression, random forest and support vector machine (SVM) classifiers were used to identify the scale of depression from the given texts. The experimental analysis on the given validation data show that the SVM classifier using the bag of words model following term frequency and inverse document frequency weighting scheme outperforms the other models for identifying depression. However, this framework could not achieve a place among the top ten runs of the shared task. This paper describes the potential of the proposed framework as well as the possible reasons behind mediocre performance on the given data.
2022.ltedi-1.33
basu-2022-iiserb
+ 10.18653/v1/2022.ltedi-1.33
SSNCSE_NLP@LT-EDI-ACL2022: Homophobia/Transphobia Detection in Multiple Languages using SVM Classifiers and BERT-based Transformers
@@ -385,6 +418,7 @@
Over the years, there has been a slow but steady change in the attitude of society towards different kinds of sexuality. However, on social media platforms, where people have the license to be anonymous, toxic comments targeted at homosexuals, transgenders and the LGBTQ+ community are not uncommon. Detection of homophobic comments on social media can be useful in making the internet a safer place for everyone. For this task, we used a combination of word embeddings and SVM Classifiers as well as some BERT-based transformers. We achieved a weighted F1-score of 0.93 on the English dataset, 0.75 on the Tamil dataset and 0.87 on the Tamil-English Code-Mixed dataset.
2022.ltedi-1.34
swaminathan-etal-2022-ssncse
+ 10.18653/v1/2022.ltedi-1.34
KUCST@LT-EDI-ACL2022: Detecting Signs of Depression from Social Media Text
@@ -394,6 +428,7 @@
In this paper we present our approach for detecting signs of depression from social media text. Our model relies on word unigrams, part-of-speech tags, readabilitiy measures and the use of first, second or third person and the number of words. Our best model obtained a macro F1-score of 0.439 and ranked 25th, out of 31 teams. We further take advantage of the interpretability of the Logistic Regression model and we make an attempt to interpret the model coefficients with the hope that these will be useful for further research on the topic.
2022.ltedi-1.35
agirrezabal-amann-2022-kucst
+ 10.18653/v1/2022.ltedi-1.35
E8-IJS@LT-EDI-ACL2022 - BERT, AutoML and Knowledge-graph backed Detection of Depression
@@ -405,6 +440,7 @@
Depression is a mental illness that negatively affects a person’s well-being and can, if left untreated, lead to serious consequences such as suicide. Therefore, it is important to recognize the signs of depression early. In the last decade, social media has become one of the most common places to express one’s feelings. Hence, there is a possibility of text processing and applying machine learning techniques to detect possible signs of depression. In this paper, we present our approaches to solving the shared task titled Detecting Signs of Depression from Social Media Text. We explore three different approaches to solve the challenge: fine-tuning BERT model, leveraging AutoML for the construction of features and classifier selection and finally, we explore latent spaces derived from the combination of textual and knowledge-based representations. We ranked 9th out of 31 teams in the competition. Our best solution, based on knowledge graph and textual representations, was 4.9% behind the best model in terms of Macro F1, and only 1.9% behind in terms of Recall.
2022.ltedi-1.36
tavchioski-etal-2022-e8
+ 10.18653/v1/2022.ltedi-1.36
Nozza@LT-EDI-ACL2022: Ensemble Modeling for Homophobia and Transphobia Detection
@@ -413,6 +449,7 @@
In this paper, we describe our approach for the task of homophobia and transphobia detection in English social media comments. The dataset consists of YouTube comments, and it has been released for the shared task on Homophobia/Transphobia Detection in social media comments. Given the high class imbalance, we propose a solution based on data augmentation and ensemble modeling. We fine-tuned different large language models (BERT, RoBERTa, and HateBERT) and used the weighted majority vote on their predictions.Our proposed model obtained 0.48 and 0.94 for macro and weighted F1-score, respectively, ranking at the third position.
2022.ltedi-1.37
nozza-2022-nozza
+ 10.18653/v1/2022.ltedi-1.37
KADO@LT-EDI-ACL2022: BERT-based Ensembles for Detecting Signs of Depression from Social Media Text
@@ -423,6 +460,7 @@
Depression is a common and serious mental illness that early detection can improve the patient’s symptoms and make depression easier to treat. This paper mainly introduces the relevant content of the task “Detecting Signs of Depression from Social Media Text at DepSign-LT-EDI@ACL-2022”. The goal of DepSign is to classify the signs of depression into three labels namely “not depressed”, “moderately depressed”, and “severely depressed” based on social media’s posts. In this paper, we propose a predictive ensemble model that utilizes the fine-tuned contextualized word embedding, ALBERT, DistilBERT, RoBERTa, and BERT base model. We show that our model outperforms the baseline models in all considered metrics and achieves an F1 score of 54% and accuracy of 61%, ranking 5th on the leader-board for the DepSign task.
2022.ltedi-1.38
janatdoust-etal-2022-kado
+ 10.18653/v1/2022.ltedi-1.38
Sammaan@LT-EDI-ACL2022: Ensembled Transformers Against Homophobia and Transphobia
@@ -434,6 +472,7 @@
2022.ltedi-1.39
upadhyay-etal-2022-sammaan
GLUE
+ 10.18653/v1/2022.ltedi-1.39
OPI@LT-EDI-ACL2022: Detecting Signs of Depression from Social Media Text using RoBERTa Pre-trained Language Models
@@ -443,6 +482,7 @@
This paper presents our winning solution for the Shared Task on Detecting Signs of Depression from Social Media Text at LT-EDI-ACL2022. The task was to create a system that, given social media posts in English, should detect the level of depression as ‘not depressed’, ‘moderately depressed’ or ‘severely depressed’. We based our solution on transformer-based language models. We fine-tuned selected models: BERT, RoBERTa, XLNet, of which the best results were obtained for RoBERTa. Then, using the prepared corpus, we trained our own language model called DepRoBERTa (RoBERTa for Depression Detection). Fine-tuning of this model improved the results. The third solution was to use the ensemble averaging, which turned out to be the best solution. It achieved a macro-averaged F1-score of 0.583. The source code of prepared solution is available at https://github.com/rafalposwiata/depression-detection-lt-edi-2022.
2022.ltedi-1.40
poswiata-perelkiewicz-2022-opi
+ 10.18653/v1/2022.ltedi-1.40
FilipN@LT-EDI-ACL2022-Detecting signs of Depression from Social Media: Examining the use of summarization methods as data augmentation for text classification
@@ -454,6 +494,7 @@
nilsson-kovacs-2022-filipn
flippe3/dsdsm_augmentation
C4
+ 10.18653/v1/2022.ltedi-1.41
NAYEL @LT-EDI-ACL2022: Homophobia/Transphobia Detection for Equality, Diversity, and Inclusion using SVM
@@ -465,6 +506,7 @@
Analysing the contents of social media platforms such as YouTube, Facebook and Twitter gained interest due to the vast number of users. One of the important tasks is homophobia/transphobia detection. This paper illustrates the system submitted by our team for the homophobia/transphobia detection in social media comments shared task. A machine learning-based model has been designed and various classification algorithms have been implemented for automatic detection of homophobia in YouTube comments. TF/IDF has been used with a range of bigram model for vectorization of comments. Support Vector Machines has been used to develop the proposed model and our submission reported 0.91, 0.92, 0.88 weighted f1-score for English, Tamil and Tamil-English datasets respectively.
2022.ltedi-1.42
ashraf-etal-2022-nayel
+ 10.18653/v1/2022.ltedi-1.42
giniUs @LT-EDI-ACL2022: Aasha: Transformers based Hope-EDI
@@ -475,6 +517,7 @@
2022.ltedi-1.43
surana-chinagundi-2022-ginius
HopeEDI
+ 10.18653/v1/2022.ltedi-1.43
SSN_MLRG1@LT-EDI-ACL2022: Multi-Class Classification using BERT models for Detecting Depression Signs from Social Media Text
@@ -487,6 +530,7 @@
DepSign-LT-EDI@ACL-2022 aims to ascer-tain the signs of depression of a person fromtheir messages and posts on social mediawherein people share their feelings and emo-tions. Given social media postings in English,the system should classify the signs of depres-sion into three labels namely “not depressed”,“moderately depressed”, and “severely de-pressed”. To achieve this objective, we haveadopted a fine-tuned BERT model. This solu-tion from team SSN_MLRG1 achieves 58.5%accuracy on the DepSign-LT-EDI@ACL-2022test set.
2022.ltedi-1.44
anantharaman-etal-2022-ssn
+ 10.18653/v1/2022.ltedi-1.44
DepressionOne@LT-EDI-ACL2022: Using Machine Learning with SMOTE and Random UnderSampling to Detect Signs of Depression on Social Media Text.
@@ -496,6 +540,7 @@
Depression is a common and serious medical illness that negatively affects how you feel, the way you think, and how you act. Detecting depression is essential as it must be treated early to avoid painful consequences. Nowadays, people are broadcasting how they feel via posts and comments. Using social media, we can extract many comments related to depression and use NLP techniques to train and detect depression. This work presents the submission of the DepressionOne team at LT-EDI-2022 for the shared task, detecting signs of depression from social media text. The depression data is small and unbalanced. Thus, we have used oversampling and undersampling methods such as SMOTE and RandomUnderSampler to represent the data. Later, we used machine learning methods to train and detect the signs of depression.
2022.ltedi-1.45
dowlagar-mamidi-2022-depressionone
+ 10.18653/v1/2022.ltedi-1.45
LeaningTower@LT-EDI-ACL2022: When Hope and Hate Collide
@@ -507,6 +552,7 @@
The 2022 edition of LT-EDI proposed two tasks in various languages. Task Hope Speech Detection required models for the automatic identification of hopeful comments for equality, diversity, and inclusion. Task Homophobia/Transphobia Detection focused on the identification of homophobic and transphobic comments. We targeted both tasks in English by using reinforced BERT-based approaches. Our core strategy aimed at exploiting the data available for each given task to augment the amount of supervised instances in the other. On the basis of an active learning process, we trained a model on the dataset for Task i and applied it to the dataset for Task j to iteratively integrate new silver data for Task i. Our official submissions to the shared task obtained a macro-averaged F_1 score of 0.53 for Hope Speech and 0.46 for Homo/Transphobia, placing our team in the third and fourth positions out of 11 and 12 participating teams respectively.
2022.ltedi-1.46
muti-etal-2022-leaningtower
+ 10.18653/v1/2022.ltedi-1.46
MUCS@Text-LT-EDI@ACL 2022: Detecting Sign of Depression from Social Media Text using Supervised Learning Approach
@@ -518,6 +564,7 @@
Social media has seen enormous growth in its users recently and knowingly or unknowingly the behavior of a person will be reflected in the comments she/he posts on social media. Users having the sign of depression may post negative or disturbing content seeking the attention of other users. Hence, social media data can be analysed to check whether the users’ have the sign of depression and help them to get through the situation if required. However, as analyzing the increasing amount of social media data manually in laborious and error-prone, automated tools have to be developed for the same. To address the issue of detecting the sign of depression content on social media, in this paper, we - team MUCS, describe an Ensemble of Machine Learning (ML) models and a Transfer Learning (TL) model submitted to “Detecting Signs of Depression from Social Media Text-LT-EDI@ACL 2022” (DepSign-LT-EDI@ACL-2022) shared task at Association for Computational Linguistics (ACL) 2022. Both frequency and text based features are used to train an Ensemble model and Bidirectional Encoder Representations from Transformers (BERT) fine-tuned with raw text is used to train the TL model. Among the two models, the TL model performed better with a macro averaged F-score of 0.479 and placed 18th rank in the shared task. The code to reproduce the proposed models is available in github page1.
2022.ltedi-1.47
hegde-etal-2022-mucs-text
+ 10.18653/v1/2022.ltedi-1.47
SSNCSE_NLP@LT-EDI-ACL2022: Speech Recognition for Vulnerable Individuals in Tamil using pre-trained XLSR models
@@ -529,6 +576,7 @@
Automatic speech recognition is a tool used to transform human speech into a written form. It is used in a variety of avenues, such as in voice commands, customer, service and more. It has emerged as an essential tool in the digitisation of daily life. It has been known to be of vital importance in making the lives of elderly and disabled people much easier. In this paper we describe an automatic speech recognition model, determined by using three pre-trained models, fine-tuned from the Facebook XLSR Wav2Vec2 model, which was trained using the Common Voice Dataset. The best model for speech recognition in Tamil is determined by finding the word error rate of the data. This work explains the submission made by SSNCSE_NLP in the shared task organized by LT-EDI at ACL 2022. A word error rate of 39.4512 is achieved.
2022.ltedi-1.48
srinivasan-etal-2022-ssncse
+ 10.18653/v1/2022.ltedi-1.48
IDIAP_TIET@LT-EDI-ACL2022 : Hope Speech Detection in Social Media using Contextualized BERT with Attention Mechanism
@@ -540,6 +588,7 @@
2022.ltedi-1.49
khanna-etal-2022-idiap
deepanshu-beep/hope-speech-attention
+ 10.18653/v1/2022.ltedi-1.49
SSN@LT-EDI-ACL2022: Transfer Learning using BERT for Detecting Signs of Depression from Social Media Texts
@@ -549,6 +598,7 @@
Depression is one of the most common mentalissues faced by people. Detecting signs ofdepression early on can help in the treatmentand prevention of extreme outcomes like suicide.Since the advent of the internet, peoplehave felt more comfortable discussing topicslike depression online due to the anonymityit provides. This shared task has used datascraped from various social media sites andaims to develop models that detect signs andthe severity of depression effectively. In thispaper, we employ transfer learning by applyingenhanced BERT model trained for Wikipediadataset to the social media text and performtext classification. The model gives a F1-scoreof 63.8% which was reasonably better than theother competing models.
2022.ltedi-1.50
s-antony-2022-ssn
+ 10.18653/v1/2022.ltedi-1.50
Findings of the Shared Task on Detecting Signs of Depression from Social Media
@@ -560,6 +610,7 @@
Social media is considered as a platform whereusers express themselves. The rise of social me-dia as one of humanity’s most important publiccommunication platforms presents a potentialprospect for early identification and manage-ment of mental illness. Depression is one suchillness that can lead to a variety of emotionaland physical problems. It is necessary to mea-sure the level of depression from the socialmedia text to treat them and to avoid the nega-tive consequences. Detecting levels of depres-sion is a challenging task since it involves themindset of the people which can change period-ically. The aim of the DepSign-LT-EDI@ACL-2022 shared task is to classify the social me-dia text into three levels of depression namely“Not Depressed”, “Moderately Depressed”, and“Severely Depressed”. This overview presentsa description on the task, the data set, method-ologies used and an analysis on the results ofthe submissions. The models that were submit-ted as a part of the shared task had used a va-riety of technologies from traditional machinelearning algorithms to deep learning models.It could be observed from the result that thetransformer based models have outperformedthe other models. Among the 31 teams whohad submitted their results for the shared task,the best macro F1-score of 0.583 was obtainedusing transformer based model.
2022.ltedi-1.51
s-etal-2022-findings
+ 10.18653/v1/2022.ltedi-1.51
Findings of the Shared Task on Speech Recognition for Vulnerable Individuals in Tamil
@@ -573,6 +624,7 @@
This paper illustrates the overview of the sharedtask on automatic speech recognition in the Tamillanguage. In the shared task, spontaneousTamil speech data gathered from elderly andtransgender people was given for recognitionand evaluation. These utterances were collected from people when they communicatedin the public locations such as hospitals, markets, vegetable shop, etc. The speech corpusincludes utterances of male, female, and transgender and was split into training and testingdata. The given task was evaluated using WER(Word Error Rate). The participants used thetransformer-based model for automatic speechrecognition. Different results using differentpre-trained transformer models are discussedin this overview paper.
2022.ltedi-1.52
b-etal-2022-findings-shared
+ 10.18653/v1/2022.ltedi-1.52
DLRG@LT-EDI-ACL2022:Detecting signs of Depression from Social Media using XGBoost Method
@@ -582,6 +634,7 @@
Depression is linked to the development of dementia.Cognitive functions such as thinkingand remembering generally deteriorate in dementiapatients. Social media usage has beenincreased among the people in recent days. Thetechnology advancements help the communityto express their views publicly. Analysing thesigns of depression from texts has become animportant area of research now, as it helps toidentify this kind of mental disorders among thepeople from their social media posts. As part ofthe shared task on detecting signs of depressionfrom social media text, a dataset has been providedby the organizers (Sampath et al.). Weapplied different machine learning techniquessuch as Support Vector Machine, Random Forestand XGBoost classifier to classify the signsof depression. Experimental results revealedthat, the XGBoost model outperformed othermodels with the highest classification accuracyof 0.61% and an Macro F1 score of 0.54.
2022.ltedi-1.53
sharen-rajalakshmi-2022-dlrg
+ 10.18653/v1/2022.ltedi-1.53
IDIAP Submission@LT-EDI-ACL2022 : Hope Speech Detection for Equality, Diversity and Inclusion
@@ -593,6 +646,7 @@
singh-motlicek-2022-idiap
muskaan-singh/hate-speech-detection
HopeEDI
+ 10.18653/v1/2022.ltedi-1.54
IDIAP Submission@LT-EDI-ACL2022: Homophobia/Transphobia Detection in social media comments
@@ -603,6 +657,7 @@
2022.ltedi-1.55
singh-motlicek-2022-idiap-submission
muskaan-singh/homophobia-and-transphobia-acl-submission
+ 10.18653/v1/2022.ltedi-1.55
IDIAP Submission@LT-EDI-ACL2022: Detecting Signs of Depression from Social Media Text
@@ -612,6 +667,7 @@
Depression is a common illness involving sadness and lack of interest in all day-to-day activities. It is important to detect depression at an early stage as it is treated at an early stage to avoid consequences. In this paper, we present our system submission of ARGUABLY for DepSign-LT-EDI@ACL-2022. We aim to detect the signs of depression of a person from their social media postings wherein people share their feelings and emotions. The proposed system is an ensembled voting model with fine-tuned BERT, RoBERTa, and XLNet. Given social media postings in English, the submitted system classify the signs of depression into three labels, namely “not depressed,” “moderately depressed,” and “severely depressed.” Our best model is ranked 3^{rd} position with 0.54% accuracy . We make our codebase accessible here.
2022.ltedi-1.56
singh-motlicek-2022-idiap-submission-lt
+ 10.18653/v1/2022.ltedi-1.56
Overview of The Shared Task on Homophobia and Transphobia Detection in Social Media Comments
@@ -626,6 +682,7 @@
Homophobia and Transphobia Detection is the task of identifying homophobia, transphobia, and non-anti-LGBT+ content from the given corpus. Homophobia and transphobia are both toxic languages directed at LGBTQ+ individuals that are described as hate speech. This paper summarizes our findings on the “Homophobia and Transphobia Detection in social media comments” shared task held at LT-EDI 2022 - ACL 2022 1. This shared taskfocused on three sub-tasks for Tamil, English, and Tamil-English (code-mixed) languages. It received 10 systems for Tamil, 13 systems for English, and 11 systems for Tamil-English. The best systems for Tamil, English, and Tamil-English scored 0.570, 0.870, and 0.610, respectively, on average macro F1-score.
2022.ltedi-1.57
chakravarthi-etal-2022-overview
+ 10.18653/v1/2022.ltedi-1.57
Overview of the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion
@@ -645,6 +702,7 @@
Hope Speech detection is the task of classifying a sentence as hope speech or non-hope speech given a corpus of sentences. Hope speech is any message or content that is positive, encouraging, reassuring, inclusive and supportive that inspires and engenders optimism in the minds of people. In contrast to identifying and censoring negative speech patterns, hope speech detection is focussed on recognising and promoting positive speech patterns online. In this paper, we report an overview of the findings and results from the shared task on hope speech detection for Tamil, Malayalam, Kannada, English and Spanish languages conducted in the second workshop on Language Technology for Equality, Diversity and Inclusion (LT-EDI-2022) organised as a part of ACL 2022. The participants were provided with annotated training & development datasets and unlabelled test datasets in all the five languages. The goal of the shared task is to classify the given sentences into one of the two hope speech classes. The performances of the systems submitted by the participants were evaluated in terms of micro-F1 score and weighted-F1 score. The datasets for this challenge are openly available
2022.ltedi-1.58
chakravarthi-etal-2022-overview-shared
+ 10.18653/v1/2022.ltedi-1.58
diff --git a/data/xml/2022.mml.xml b/data/xml/2022.mml.xml
index e8107572d4..aa74f4da47 100644
--- a/data/xml/2022.mml.xml
+++ b/data/xml/2022.mml.xml
@@ -38,6 +38,7 @@
jung-etal-2022-language
COCO
COCO-CN
+ 10.18653/v1/2022.mml-1.1
diff --git a/data/xml/2022.nlp4convai.xml b/data/xml/2022.nlp4convai.xml
index aa965f8ace..4cc5a1de09 100644
--- a/data/xml/2022.nlp4convai.xml
+++ b/data/xml/2022.nlp4convai.xml
@@ -31,6 +31,7 @@
2022.nlp4convai-1.1
lee-etal-2022-randomized
DailyDialog
+ 10.18653/v1/2022.nlp4convai-1.1
Are Pre-trained Transformers Robust in Intent Classification? A Missing Ingredient in Evaluation of Out-of-Scope Intent Detection
@@ -45,6 +46,7 @@
Pre-trained Transformer-based models were reported to be robust in intent classification. In this work, we first point out the importance of in-domain out-of-scope detection in few-shot intent recognition tasks and then illustrate the vulnerability of pre-trained Transformer-based models against samples that are in-domain but out-of-scope (ID-OOS). We construct two new datasets, and empirically show that pre-trained models do not perform well on both ID-OOS examples and general out-of-scope examples, especially on fine-grained few-shot intent detection tasks.
2022.nlp4convai-1.2
zhang-etal-2022-pre-trained
+ 10.18653/v1/2022.nlp4convai-1.2
Conversational AI for Positive-sum Retailing under Falsehood Control
@@ -56,6 +58,7 @@
Retailing combines complicated communication skills and strategies to reach an agreement between buyer and seller with identical or different goals. In each transaction a good seller finds an optimal solution by considering his/her own profits while simultaneously considering whether the buyer’s needs have been met. In this paper, we manage the retailing problem by mixing cooperation and competition. We present a rich dataset of buyer-seller bargaining in a simulated marketplace in which each agent values goods and utility separately. Various attributes (preference, quality, and profit) are initially hidden from one agent with respect to its role; during the conversation, both sides may reveal, fake, or retain the information uncovered to come to a final decision through natural language. Using this dataset, we leverage transfer learning techniques on a pretrained, end-to-end model and enhance its decision-making ability toward the best choice in terms of utility by means of multi-agent reinforcement learning. An automatic evaluation shows that our approach results in more optimal transactions than human does. We also show that our framework controls the falsehoods generated by seller agents.
2022.nlp4convai-1.3
liao-etal-2022-conversational
+ 10.18653/v1/2022.nlp4convai-1.3
D-REX: Dialogue Relation Extraction with Explanations
@@ -70,6 +73,7 @@
albalak-etal-2022-rex
alon-albalak/D-REX
DialogRE
+ 10.18653/v1/2022.nlp4convai-1.4
Data Augmentation for Intent Classification with Off-the-shelf Large Language Models
@@ -85,6 +89,7 @@
sahu-etal-2022-data
elementai/data-augmentation-with-llms
CLINC150
+ 10.18653/v1/2022.nlp4convai-1.5
Extracting and Inferring Personal Attributes from Dialogue
@@ -101,6 +106,7 @@
ConceptNet
PERSONA-CHAT
Universal Dependencies
+ 10.18653/v1/2022.nlp4convai-1.6
From Rewriting to Remembering: Common Ground for Conversational QA Models
@@ -114,6 +120,7 @@
2022.nlp4convai-1.7
tredici-etal-2022-rewriting
QReCC
+ 10.18653/v1/2022.nlp4convai-1.7
Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents
@@ -128,6 +135,7 @@
2022.nlp4convai-1.8
smith-etal-2022-human
PERSONA-CHAT
+ 10.18653/v1/2022.nlp4convai-1.8
KG-CRuSE: Recurrent Walks over Knowledge Graph for Explainable Conversation Reasoning using Semantic Embeddings
@@ -140,6 +148,7 @@
sarkar-etal-2022-kg
rajbsk/kg-cruse
OpenDialKG
+ 10.18653/v1/2022.nlp4convai-1.9
Knowledge Distillation Meets Few-Shot Learning: An Approach for Few-Shot Intent Classification Within and Across Domains
@@ -151,6 +160,7 @@
2022.nlp4convai-1.10
sauer-etal-2022-knowledge
ATIS
+ 10.18653/v1/2022.nlp4convai-1.10
MTL-SLT: Multi-Task Learning for Spoken Language Tasks
@@ -169,6 +179,7 @@
LibriSpeech
SLURP
Spoken-SQuAD
+ 10.18653/v1/2022.nlp4convai-1.11
Multimodal Conversational AI: A Survey of Datasets and Approaches
@@ -193,6 +204,7 @@
Visual Question Answering
Visual7W
YouCook2
+ 10.18653/v1/2022.nlp4convai-1.12
Open-domain Dialogue Generation: What We Can Do, Cannot Do, And Should Do Next
@@ -207,6 +219,7 @@
kann-etal-2022-open
PERSONA-CHAT
Wizard of Wikipedia
+ 10.18653/v1/2022.nlp4convai-1.13
Relevance in Dialogue: Is Less More? An Empirical Comparison of Existing Metrics, and a Novel Simple Metric
@@ -219,6 +232,7 @@
ikb-a/idk-dialogue-relevance
FED
Topical-Chat
+ 10.18653/v1/2022.nlp4convai-1.14
RetroNLU: Retrieval Augmented Task-Oriented Semantic Parsing
@@ -232,6 +246,7 @@
2022.nlp4convai-1.15
gupta-etal-2022-retronlu
TOPv2
+ 10.18653/v1/2022.nlp4convai-1.15
Stylistic Response Generation by Controlling Personality Traits and Intent
@@ -246,6 +261,7 @@
PANDORA
Topical-Chat
Wizard of Wikipedia
+ 10.18653/v1/2022.nlp4convai-1.16
Toward Knowledge-Enriched Conversational Recommendation Systems
@@ -262,6 +278,7 @@
zhang-etal-2022-toward
ConceptNet
ReDial
+ 10.18653/v1/2022.nlp4convai-1.17
Understanding and Improving the Exemplar-based Generation for Open-domain Conversation
@@ -274,6 +291,7 @@
Exemplar-based generative models for open-domain conversation produce responses based on the exemplars provided by the retriever, taking advantage of generative models and retrieval models. However, due to the one-to-many problem of the open-domain conversation, they often ignore the retrieved exemplars while generating responses or produce responses over-fitted to the retrieved exemplars. To address these advantages, we introduce a training method selecting exemplars that are semantically relevant to the gold response but lexically distanced from the gold response. In the training phase, our training method first uses the gold response instead of dialogue context as a query to select exemplars that are semantically relevant to the gold response. And then, it eliminates the exemplars that lexically resemble the gold responses to alleviate the dependency of the generative models on that exemplars. The remaining exemplars could be irrelevant to the given context since they are searched depending on the gold response. Thus, our training method further utilizes the relevance scores between the given context and the exemplars to penalize the irrelevant exemplars. Extensive experiments demonstrate that our proposed training method alleviates the drawbacks of the existing exemplar-based generative models and significantly improves the performance in terms of appropriateness and informativeness.
2022.nlp4convai-1.18
han-etal-2022-understanding
+ 10.18653/v1/2022.nlp4convai-1.18
diff --git a/data/xml/2022.nlppower.xml b/data/xml/2022.nlppower.xml
index d704c35a00..7aaf0b6864 100644
--- a/data/xml/2022.nlppower.xml
+++ b/data/xml/2022.nlppower.xml
@@ -30,6 +30,7 @@
GLUE
SQuAD
SuperGLUE
+ 10.18653/v1/2022.nlppower-1.1
Towards Stronger Adversarial Baselines Through Human-AI Collaboration
@@ -40,6 +41,7 @@
2022.nlppower-1.2
you-lowd-2022-towards
SST
+ 10.18653/v1/2022.nlppower-1.2
Benchmarking for Public Health Surveillance tasks on Social Media with a Domain-Specific Pretrained Language Model
@@ -54,6 +56,7 @@
naseem-etal-2022-benchmarking
Dreaddit
PUBHEALTH
+ 10.18653/v1/2022.nlppower-1.3
Why only Micro-F1? Class Weighting of Measures for Relation Classification
@@ -67,6 +70,7 @@
harbecke-etal-2022-micro
dfki-nlp/weighting-schemes-report
DocRED
+ 10.18653/v1/2022.nlppower-1.4
Automatically Discarding Straplines to Improve Data Quality for Abstractive News Summarization
@@ -81,6 +85,7 @@
keleg-etal-2022-automatically
CNN/Daily Mail
NEWSROOM
+ 10.18653/v1/2022.nlppower-1.5
A global analysis of metrics used for measuring performance in natural language processing
@@ -94,6 +99,7 @@
2022.nlppower-1.6
blagec-etal-2022-global
OpenBioLink/ITO
+ 10.18653/v1/2022.nlppower-1.6
Beyond Static models and test sets: Benchmarking the potential of pre-trained models across tasks and languages
@@ -112,6 +118,7 @@
XCOPA
XNLI
XQuAD
+ 10.18653/v1/2022.nlppower-1.7
Checking HateCheck: a cross-functional analysis of behaviour-aware learning for hate speech detection
@@ -122,6 +129,7 @@
2022.nlppower-1.8
henrique-luz-de-araujo-roth-2022-checking
peluz/checking-hatecheck-code
+ 10.18653/v1/2022.nlppower-1.8
Language Invariant Properties in Natural Language Processing
@@ -133,6 +141,7 @@
2022.nlppower-1.9
bianchi-etal-2022-language
milanlproc/language-invariant-properties
+ 10.18653/v1/2022.nlppower-1.9
DACT-BERT: Differentiable Adaptive Computation Time for an Efficient BERT Inference
@@ -145,6 +154,7 @@
2022.nlppower-1.10
eyzaguirre-etal-2022-dact
GLUE
+ 10.18653/v1/2022.nlppower-1.10
Benchmarking Post-Hoc Interpretability Approaches for Transformer-based Misogyny Detection
@@ -157,6 +167,7 @@
2022.nlppower-1.11
attanasio-etal-2022-benchmarking
milanlproc/benchmarking-xai-misogyny
+ 10.18653/v1/2022.nlppower-1.11
Characterizing the Efficiency vs. Accuracy Trade-off for Long-Context NLP Models
@@ -172,6 +183,7 @@
LRA
QASPER
SCROLLS
+ 10.18653/v1/2022.nlppower-1.12
diff --git a/data/xml/2022.repl4nlp.xml b/data/xml/2022.repl4nlp.xml
index 3091fb389c..96a47c208a 100644
--- a/data/xml/2022.repl4nlp.xml
+++ b/data/xml/2022.repl4nlp.xml
@@ -38,6 +38,7 @@
2022.repl4nlp-1.1
valerio-miceli-barone-etal-2022-distributionally
MTNT
+ 10.18653/v1/2022.repl4nlp-1.1
Q-Learning Scheduler for Multi Task Learning Through the use of Histogram of Task Uncertainty
@@ -50,6 +51,7 @@
meshgi-etal-2022-q
IMDb Movie Reviews
Penn Treebank
+ 10.18653/v1/2022.repl4nlp-1.2
When does CLIP generalize better than unimodal models? When judging human-centric concepts
@@ -62,6 +64,7 @@
2022.repl4nlp-1.4
bielawski-etal-2022-clip
Book Cover Dataset
+ 10.18653/v1/2022.repl4nlp-1.4
From Hyperbolic Geometry Back to Word Embeddings
@@ -74,6 +77,7 @@
2022.repl4nlp-1.5
assylbekov-etal-2022-hyperbolic
soltustik/rhg
+ 10.18653/v1/2022.repl4nlp-1.5
A Comparative Study of Pre-trained Encoders for Low-Resource Named Entity Recognition
@@ -90,6 +94,7 @@
CoNLL-2003
Few-NERD
WNUT 2017
+ 10.18653/v1/2022.repl4nlp-1.6
Clozer”:" Adaptable Data Augmentation for Cloze-style Reading Comprehension
@@ -105,6 +110,7 @@
2022.repl4nlp-1.7
lovenia-etal-2022-clozer
ReCAM
+ 10.18653/v1/2022.repl4nlp-1.7
Analyzing Gender Representation in Multilingual Models
@@ -115,6 +121,7 @@
Multilingual language models were shown to allow for nontrivial transfer across scripts and languages. In this work, we study the structure of the internal representations that enable this transfer. We focus on the representations of gender distinctions as a practical case study, and examine the extent to which the gender concept is encoded in shared subspaces across different languages. Our analysis shows that gender representations consist of several prominent components that are shared across languages, alongside language-specific components. The existence of language-independent and language-specific components provides an explanation for an intriguing empirical observation we make”:" while gender classification transfers well across languages, interventions for gender removal trained on a single language do not transfer easily to others.
2022.repl4nlp-1.8
gonen-etal-2022-analyzing
+ 10.18653/v1/2022.repl4nlp-1.8
Detecting Textual Adversarial Examples Based on Distributional Characteristics of Data Representations
@@ -130,6 +137,7 @@
MultiNLI
WikiText-103
WikiText-2
+ 10.18653/v1/2022.repl4nlp-1.9
A Vocabulary-Free Multilingual Neural Tokenizer for End-to-End Task Learning
@@ -143,6 +151,7 @@
Subword tokenization is a commonly used input pre-processing step in most recent NLP models. However, it limits the models’ ability to leverage end-to-end task learning. Its frequency-based vocabulary creation compromises tokenization in low-resource languages, leading models to produce suboptimal representations. Additionally, the dependency on a fixed vocabulary limits the subword models’ adaptability across languages and domains. In this work, we propose a vocabulary-free neural tokenizer by distilling segmentation information from heuristic-based subword tokenization. We pre-train our character-based tokenizer by processing unique words from multilingual corpus, thereby extensively increasing word diversity across languages. Unlike the predefined and fixed vocabularies in subword methods, our tokenizer allows end-to-end task learning, resulting in optimal task-specific tokenization. The experimental results show that replacing the subword tokenizer with our neural tokenizer consistently improves performance on multilingual (NLI) and code-switching (sentiment analysis) tasks, with larger gains in low-resource languages. Additionally, our neural tokenizer exhibits a robust performance on downstream tasks when adversarial noise is present (typos and misspelling), further increasing the initial improvements over statistical subword tokenizers.
2022.repl4nlp-1.10
mofijul-islam-etal-2022-vocabulary
+ 10.18653/v1/2022.repl4nlp-1.10
Identifying the Limits of Cross-Domain Knowledge Transfer for Pretrained Models
@@ -160,6 +169,7 @@
QNLI
SNLI
SST
+ 10.18653/v1/2022.repl4nlp-1.11
Temporal Knowledge Graph Reasoning with Low-rank and Model-agnostic Representations
@@ -172,6 +182,7 @@
dikeoulias-etal-2022-temporal
iodike/chronokge
ICEWS
+ 10.18653/v1/2022.repl4nlp-1.12
ANNA”:" Enhanced Language Representation for Question Answering
@@ -189,6 +200,7 @@
C4
GLUE
SQuAD
+ 10.18653/v1/2022.repl4nlp-1.13
Video Language Co-Attention with Multimodal Fast-Learning Feature Fusion for VideoQA
@@ -201,6 +213,7 @@
abdessaied-etal-2022-video
MSR-VTT
MSVD
+ 10.18653/v1/2022.repl4nlp-1.15
Detecting Word-Level Adversarial Text Attacks via SHapley Additive exPlanations
@@ -215,6 +228,7 @@
AG News
IMDb Movie Reviews
SST
+ 10.18653/v1/2022.repl4nlp-1.16
Binary Encoded Word Mover’s Distance
@@ -223,6 +237,7 @@
Word Mover’s Distance is a textual distance metric which calculates the minimum transport cost between two sets of word embeddings. This metric achieves impressive results on semantic similarity tasks, but is slow and difficult to scale due to the large number of floating point calculations. This paper demonstrates that by combining pre-existing lower bounds with binary encoded word vectors, the metric can be rendered highly efficient in terms of computation time and memory while still maintaining accuracy on several textual similarity tasks.
2022.repl4nlp-1.17
johnson-2022-binary
+ 10.18653/v1/2022.repl4nlp-1.17
Unsupervised Geometric and Topological Approaches for Cross-Lingual Sentence Representation and Comparison
@@ -232,6 +247,7 @@
We propose novel structural-based approaches for the generation and comparison of cross lingual sentence representations. We do so by applying geometric and topological methods to analyze the structure of sentences, as captured by their word embeddings. The key properties of our methods are”:" (a) They are designed to be isometric invariant, in order to provide language-agnostic representations. (b) They are fully unsupervised, and use no cross-lingual signal. The quality of our representations, and their preservation across languages, are evaluated in similarity comparison tasks, achieving competitive results. Furthermore, we show that our structural-based representations can be combined with existing methods for improved results.
2022.repl4nlp-1.18
haim-meirom-bobrowski-2022-unsupervised
+ 10.18653/v1/2022.repl4nlp-1.18
A Study on Entity Linking Across Domains”:" Which Data is Best for Fine-Tuning?
@@ -244,6 +260,7 @@
Entity linking disambiguates mentions by mapping them to entities in a knowledge graph (KG). One important question in today’s research is how to extend neural entity linking systems to new domains. In this paper, we aim at a system that enables linking mentions to entities from a general-domain KG and a domain-specific KG at the same time. In particular, we represent the entities of different KGs in a joint vector space and address the questions of which data is best suited for creating and fine-tuning that space, and whether fine-tuning harms performance on the general domain. We find that a combination of data from both the general and the special domain is most helpful. The first is especially necessary for avoiding performance loss on the general domain. While additional supervision on entities that appear in both KGs performs best in an intrinsic evaluation of the vector space, it has less impact on the downstream task of entity linking.
2022.repl4nlp-1.19
soliman-etal-2022-study
+ 10.18653/v1/2022.repl4nlp-1.19
TRAttack”:" Text Rewriting Attack Against Text Retrieval
@@ -256,6 +273,7 @@
Text retrieval has been widely-used in many online applications to help users find relevant information from a text collection. In this paper, we study a new attack scenario against text retrieval to evaluate its robustness to adversarial attacks under the black-box setting, in which attackers want their own texts to always get high relevance scores with different users’ input queries and thus be retrieved frequently and can receive large amounts of impressions for profits. Considering that most current attack methods only simply follow certain fixed optimization rules, we propose a novel text rewriting attack (TRAttack) method with learning ability from the multi-armed bandit mechanism. Extensive experiments conducted on simulated victim environments demonstrate that TRAttack can yield texts that have higher relevance scores with different given users’ queries than those generated by current state-of-the-art attack methods. We also evaluate TRAttack on Tencent Cloud’s and Baidu Cloud’s commercially-available text retrieval APIs, and the rewritten adversarial texts successfully get high relevance scores with different user queries, which shows the practical potential of our method and the risk of text retrieval systems.
2022.repl4nlp-1.20
song-etal-2022-trattack
+ 10.18653/v1/2022.repl4nlp-1.20
On the Geometry of Concreteness
@@ -264,6 +282,7 @@
In this paper we investigate how concreteness and abstractness are represented in word embedding spaces. We use data for English and German, and show that concreteness and abstractness can be determined independently and turn out to be completely opposite directions in the embedding space. Various methods can be used to determine the direction of concreteness, always resulting in roughly the same vector. Though concreteness is a central aspect of the meaning of words and can be detected clearly in embedding spaces, it seems not as easy to subtract or add concreteness to words to obtain other words or word senses like e.g. can be done with a semantic property like gender.
2022.repl4nlp-1.21
wartena-2022-geometry
+ 10.18653/v1/2022.repl4nlp-1.21
Towards Improving Selective Prediction Ability of NLP Systems
@@ -276,6 +295,7 @@
varshney-etal-2022-towards
MRPC
SNLI
+ 10.18653/v1/2022.repl4nlp-1.23
On Target Representation in Continuous-output Neural Machine Translation
@@ -285,6 +305,7 @@
Continuous generative models proved their usefulness in high-dimensional data, such as image and audio generation. However, continuous models for text generation have received limited attention from the community. In this work, we study continuous text generation using Transformers for neural machine translation (NMT). We argue that the choice of embeddings is crucial for such models, so we aim to focus on one particular aspect”:" target representation via embeddings. We explore pretrained embeddings and also introduce knowledge transfer from the discrete Transformer model using embeddings in Euclidean and non-Euclidean spaces. Our results on the WMT Romanian-English and English-Turkish benchmarks show such transfer leads to the best-performing continuous model.
2022.repl4nlp-1.24
tokarchuk-niculae-2022-target
+ 10.18653/v1/2022.repl4nlp-1.24
Zero-shot Cross-lingual Transfer is Under-specified Optimization
@@ -296,6 +317,7 @@
2022.repl4nlp-1.25
wu-etal-2022-zero
XNLI
+ 10.18653/v1/2022.repl4nlp-1.25
Same Author or Just Same Topic? Towards Content-Independent Style Representations
@@ -306,6 +328,7 @@
Linguistic style is an integral component of language. Recent advances in the development of style representations have increasingly used training objectives from authorship verification (AV)”:" Do two texts have the same author? The assumption underlying the AV training task (same author approximates same writing style) enables self-supervised and, thus, extensive training. However, a good performance on the AV task does not ensure good “general-purpose” style representations. For example, as the same author might typically write about certain topics, representations trained on AV might also encode content information instead of style alone. We introduce a variation of the AV training task that controls for content using conversation or domain labels. We evaluate whether known style dimensions are represented and preferred over content information through an original variation to the recently proposed STEL framework. We find that representations trained by controlling for conversation are better than representations trained with domain or no content control at representing style independent from content.
2022.repl4nlp-1.26
wegmann-etal-2022-author
+ 10.18653/v1/2022.repl4nlp-1.26
WeaNF”:" Weak Supervision with Normalizing Flows
@@ -316,6 +339,7 @@
2022.repl4nlp-1.27
stephan-roth-2022-weanf
IMDb Movie Reviews
+ 10.18653/v1/2022.repl4nlp-1.27
diff --git a/data/xml/2022.slpat.xml b/data/xml/2022.slpat.xml
index d1ddf15bae..cdf2b36f1d 100644
--- a/data/xml/2022.slpat.xml
+++ b/data/xml/2022.slpat.xml
@@ -24,6 +24,7 @@
We present MozoLM, an open-source language model microservice package intended for use in AAC text-entry applications, with a particular focus on the design principles of the library. The intent of the library is to allow the ensembling of multiple diverse language models without requiring the clients (user interface designers, system users or speech-language pathologists) to attend to the formats of the models. Issues around privacy, security, dynamic versus static models, and methods of model combination are explored and specific design choices motivated. Some simulation experiments demonstrating the benefits of personalized language model ensembling via the library are presented.
2022.slpat-1.1
roark-gutkin-2022-design
+ 10.18653/v1/2022.slpat-1.1
ColorCode: A Bayesian Approach to Augmentative and Alternative Communication with Two Buttons
@@ -33,6 +34,7 @@
2022.slpat-1.2
daly-2022-colorcode
mrdaly/colorcode
+ 10.18653/v1/2022.slpat-1.2
A glimpse of assistive technology in daily life
@@ -48,6 +50,7 @@
Robitaille (2010) wrote ‘if all technology companies have accessibility in their mind then people with disabilities won’t be left behind.’ Current technology has come a long way from where it stood decades ago; however, researchers and manufacturers often do not include people with disabilities in the design process and tend to accommodate them after the fact. In this paper we share feedback from four assistive technology users who rely on one or more assistive technology devices in their everyday lives. We believe end users should be part of the design process and that by bringing together experts and users, we can bridge the research/practice gap.
2022.slpat-1.3
vaidyanathan-etal-2022-glimpse
+ 10.18653/v1/2022.slpat-1.3
A comparison study on patient-psychologist voice diarization
@@ -64,6 +67,7 @@
Conversations between a clinician and a patient, in natural conditions, are valuable sources of information for medical follow-up. The automatic analysis of these dialogues could help extract new language markers and speed up the clinicians’ reports. Yet, it is not clear which model is the most efficient to detect and identify the speaker turns, especially for individuals with speech disorders. Here, we proposed a split of the data that allows conducting a comparative evaluation of different diarization methods. We designed and trained end-to-end neural network architectures to directly tackle this task from the raw signal and evaluate each approach under the same metric. We also studied the effect of fine-tuning models to find the best performance. Experimental results are reported on naturalistic clinical conversations between Psychologists and Interviewees, at different stages of Huntington’s disease, displaying a large panel of speech disorders. We found out that our best end-to-end model achieved 19.5 % IER on the test set, compared to 23.6% achieved by the finetuning of the X-vector architecture. Finally, we observed that we could extract clinical markers directly from the automatic systems, highlighting the clinical relevance of our methods.
2022.slpat-1.4
riad-etal-2022-comparison
+ 10.18653/v1/2022.slpat-1.4
Producing Standard German Subtitles for Swiss German TV Content
@@ -74,6 +78,7 @@
In this study we compare two approaches (neural machine translation and edit-based) and the use of synthetic data for the task of translating normalised Swiss German ASR output into correct written Standard German for subtitles, with a special focus on syntactic differences. Results suggest that NMT is better suited to this task and that relatively simple rule-based generation of training data could be a valuable approach for cases where little training data is available and transformations are simple.
2022.slpat-1.5
gerlach-etal-2022-producing
+ 10.18653/v1/2022.slpat-1.5
Investigating the Medical Coverage of a Translation System into Pictographs for Patients with an Intellectual Disability
@@ -85,6 +90,7 @@
Communication between physician and patients can lead to misunderstandings, especially for disabled people. An automatic system that translates natural language into a pictographic language is one of the solutions that could help to overcome this issue. In this preliminary study, we present the French version of a translation system using the Arasaac pictographs and we investigate the strategies used by speech therapists to translate into pictographs. We also evaluate the medical coverage of this tool for translating physician questions and patient instructions.
2022.slpat-1.6
norre-etal-2022-investigating
+ 10.18653/v1/2022.slpat-1.6
On the Ethical Considerations of Text Simplification
@@ -94,6 +100,7 @@
2022.slpat-1.7
gooding-2022-ethical
Newsela
+ 10.18653/v1/2022.slpat-1.7
Applying the Stereotype Content Model to assess disability bias in popular pre-trained NLP models underlying AI-based assistive technologies
@@ -104,6 +111,7 @@
Stereotypes are a positive or negative, generalized, and often widely shared belief about the attributes of certain groups of people, such as people with sensory disabilities. If stereotypes manifest in assistive technologies used by deaf or blind people, they can harm the user in a number of ways, especially considering the vulnerable nature of the target population. AI models underlying assistive technologies have been shown to contain biased stereotypes, including racial, gender, and disability biases. We build on this work to present a psychology-based stereotype assessment of the representation of disability, deafness, and blindness in BERT using the Stereotype Content Model. We show that BERT contains disability bias, and that this bias differs along established stereotype dimensions.
2022.slpat-1.8
herold-etal-2022-applying
+ 10.18653/v1/2022.slpat-1.8
CueBot: Cue-Controlled Response Generation for Assistive Interaction Usages
@@ -119,6 +127,7 @@
2022.slpat-1.9
h-kumar-etal-2022-cuebot
DailyDialog
+ 10.18653/v1/2022.slpat-1.9
Challenges in assistive technology development for an endangered language: an Irish (Gaelic) perspective
@@ -133,6 +142,7 @@
This paper describes three areas of assistive technology development which deploy the resources and speech technology for Irish (Gaelic), newly emerging from the ABAIR initiative. These include (i) a screenreading facility for visually impaired people, (ii) an application to help develop phonological awareness and early literacy for dyslexic people (iii) a speech-enabled AAC system for non-speaking people. Each of these is at a different stage of development and poses unique challenges: these are dis-cussed along with the approaches adopted to address them. Three guiding principles underlie development. Firstly, the sociolinguistic context and the needs of the community are essential considerations in setting priorities. Secondly, development needs to be language sensitive. The need for skilled researchers with a deep knowledge of Irish structure is illustrated in the case of (ii) and (iii), where aspects of Irish linguistic structure (phonological, morphological and grammatical) and the striking differences from English pose challenges for systems aimed at bilingual Irish-English users. Thirdly, and most importantly, the users and their support networks are central – not as passive recipients of ready-made technologies, but as active partners at every stage of development, from design to implementation, evaluation and dissemination.
2022.slpat-1.10
ni-chasaide-etal-2022-challenges
+ 10.18653/v1/2022.slpat-1.10
diff --git a/data/xml/2022.spanlp.xml b/data/xml/2022.spanlp.xml
index 14fe7903a5..9f1d51557c 100644
--- a/data/xml/2022.spanlp.xml
+++ b/data/xml/2022.spanlp.xml
@@ -30,6 +30,7 @@
tran-etal-2022-improving
FewRel
Wiki-ZSL
+ 10.18653/v1/2022.spanlp-1.1
Choose Your QA Model Wisely: A Systematic Study of Generative and Extractive Readers for Question Answering
@@ -46,6 +47,7 @@
MRQA
Natural Questions
SQuAD
+ 10.18653/v1/2022.spanlp-1.2
Efficient Machine Translation Domain Adaptation
@@ -57,6 +59,7 @@
2022.spanlp-1.3
martins-etal-2022-efficient
deep-spin/efficient_knn_mt
+ 10.18653/v1/2022.spanlp-1.3
Field Extraction from Forms with Unlabeled Data
@@ -71,6 +74,7 @@
2022.spanlp-1.4
gao-etal-2022-field
salesforce/inv-cdip
+ 10.18653/v1/2022.spanlp-1.4
Knowledge Base Index Compression via Dimensionality and Precision Reduction
@@ -84,6 +88,7 @@
zouhar-etal-2022-knowledge
HotpotQA
Natural Questions
+ 10.18653/v1/2022.spanlp-1.5
diff --git a/data/xml/2022.spnlp.xml b/data/xml/2022.spnlp.xml
index f5ef4c8df4..c71a8f4e7d 100644
--- a/data/xml/2022.spnlp.xml
+++ b/data/xml/2022.spnlp.xml
@@ -29,6 +29,7 @@
kando-etal-2022-multilingual
CLAMS
Universal Dependencies
+ 10.18653/v1/2022.spnlp-1.1
Joint Entity and Relation Extraction Based on Table Labeling Using Convolutional Neural Networks
@@ -40,6 +41,7 @@
2022.spnlp-1.2
ma-etal-2022-joint
youmima/tablert-cnn
+ 10.18653/v1/2022.spnlp-1.2
TempCaps: A Capsule Network-based Embedding Model for Temporal Knowledge Graph Completion
@@ -57,6 +59,7 @@
fu-etal-2022-tempcaps
fuguigui/tempcaps
ICEWS
+ 10.18653/v1/2022.spnlp-1.3
SlotGAN: Detecting Mentions in Text via Adversarial Distant Learning
@@ -68,6 +71,7 @@
2022.spnlp-1.4
daza-etal-2022-slotgan
CoNLL-2003
+ 10.18653/v1/2022.spnlp-1.4
A Joint Learning Approach for Semi-supervised Neural Topic Modeling
@@ -80,6 +84,7 @@
Topic models are some of the most popular ways to represent textual data in an interpret-able manner. Recently, advances in deep generative models, specifically auto-encoding variational Bayes (AEVB), have led to the introduction of unsupervised neural topic models, which leverage deep generative models as opposed to traditional statistics-based topic models. We extend upon these neural topic models by introducing the Label-Indexed Neural Topic Model (LI-NTM), which is, to the extent of our knowledge, the first effective upstream semi-supervised neural topic model. We find that LI-NTM outperforms existing neural topic models in document reconstruction benchmarks, with the most notable results in low labeled data regimes and for data-sets with informative labels; furthermore, our jointly learned classifier outperforms baseline classifiers in ablation studies.
2022.spnlp-1.5
chiu-etal-2022-joint
+ 10.18653/v1/2022.spnlp-1.5
Neural String Edit Distance
@@ -90,6 +95,7 @@
2022.spnlp-1.6
libovicky-fraser-2022-neural
jlibovicky/neural-string-edit-distance
+ 10.18653/v1/2022.spnlp-1.6
Predicting Attention Sparsity in Transformers
@@ -104,6 +110,7 @@
treviso-etal-2022-predicting
WikiText-103
WikiText-2
+ 10.18653/v1/2022.spnlp-1.7
diff --git a/data/xml/2022.wassa.xml b/data/xml/2022.wassa.xml
index 803f7975ab..781cc1cf20 100644
--- a/data/xml/2022.wassa.xml
+++ b/data/xml/2022.wassa.xml
@@ -30,6 +30,7 @@
Authors of posts in social media communicate their emotions and what causes them with text and images. While there is work on emotion and stimulus detection for each modality separately, it is yet unknown if the modalities contain complementary emotion information in social media. We aim at filling this research gap and contribute a novel, annotated corpus of English multimodal Reddit posts. On this resource, we develop models to automatically detect the relation between image and text, an emotion stimulus category and the emotion class. We evaluate if these tasks require both modalities and find for the image–text relations, that text alone is sufficient for most categories (complementary, illustrative, opposing): the information in the text allows to predict if an image is required for emotion understanding. The emotions of anger and sadness are best predicted with a multimodal model, while text alone is sufficient for disgust, joy, and surprise. Stimuli depicted by objects, animals, food, or a person are best predicted by image-only models, while multimodal mod- els are most effective on art, events, memes, places, or screenshots.
2022.wassa-1.1
khlyzova-etal-2022-complementarity
+ 10.18653/v1/2022.wassa-1.1
Multiplex Anti-Asian Sentiment before and during the Pandemic: Introducing New Datasets from Twitter Mining
@@ -42,6 +43,7 @@
COVID-19 has disproportionately threatened minority communities in the U.S, not only in health but also in societal impact. However, social scientists and policymakers lack critical data to capture the dynamics of the anti-Asian hate trend and to evaluate its scale and scope. We introduce new datasets from Twitter related to anti-Asian hate sentiment before and during the pandemic. Relying on Twitter’s academic API, we retrieve hateful and counter-hate tweets from the Twitter Historical Database. To build contextual understanding and collect related racial cues, we also collect instances of heated arguments, often political, but not necessarily hateful, discussing Chinese issues. We then use the state-of-the-art hate speech classifiers to discern whether these tweets express hatred. These datasets can be used to study hate speech, general anti-Asian or Chinese sentiment, and hate linguistics by social scientists as well as to evaluate and build hate speech or sentiment analysis classifiers by computational scholars.
2022.wassa-1.2
lin-etal-2022-multiplex
+ 10.18653/v1/2022.wassa-1.2
Domain-Aware Contrastive Knowledge Transfer for Multi-domain Imbalanced Data
@@ -53,6 +55,7 @@
2022.wassa-1.3
ke-etal-2022-domain
LIAR
+ 10.18653/v1/2022.wassa-1.3
“splink” is happy and “phrouth” is scary: Emotion Intensity Analysis for Nonsense Words
@@ -64,6 +67,7 @@
People associate affective meanings to words - “death” is scary and sad while “party” is connotated with surprise and joy. This raises the question if the association is purely a product of the learned affective imports inherent to semantic meanings, or is also an effect of other features of words, e.g., morphological and phonological patterns. We approach this question with an annotation-based analysis leveraging nonsense words. Specifically, we conduct a best-worst scaling crowdsourcing study in which participants assign intensity scores for joy, sadness, anger, disgust, fear, and surprise to 272 non-sense words and, for comparison of the results to previous work, to 68 real words. Based on this resource, we develop character-level and phonology-based intensity regressors. We evaluate them on both nonsense words and real words (making use of the NRC emotion intensity lexicon of 7493 words), across six emotion categories. The analysis of our data reveals that some phonetic patterns show clear differences between emotion intensities. For instance, s as a first phoneme contributes to joy, sh to surprise, p as last phoneme more to disgust than to anger and fear. In the modelling experiments, a regressor trained on real words from the NRC emotion intensity lexicon shows a higher performance (r = 0.17) than regressors that aim at learning the emotion connotation purely from nonsense words. We conclude that humans do associate affective meaning to words based on surface patterns, but also based on similarities to existing words (“juy” to “joy”, or “flike” to “like”).
2022.wassa-1.4
sabbatino-etal-2022-splink
+ 10.18653/v1/2022.wassa-1.4
SentEMO: A Multilingual Adaptive Platform for Aspect-based Sentiment and Emotion Analysis
@@ -78,6 +82,7 @@
In this paper, we present the SentEMO platform, a tool that provides aspect-based sentiment analysis and emotion detection of unstructured text data such as reviews, emails and customer care conversations. Currently, models have been trained for five domains and one general domain and are implemented in a pipeline approach, where the output of one model serves as the input for the next. The results are presented in three dashboards, allowing companies to gain more insights into what stakeholders think of their products and services. The SentEMO platform is available at https://sentemo.ugent.be
2022.wassa-1.5
de-geyndt-etal-2022-sentemo
+ 10.18653/v1/2022.wassa-1.5
Can Emotion Carriers Explain Automatic Sentiment Prediction? A Study on Personal Narratives
@@ -91,6 +96,7 @@
2022.wassa-1.6
mousavi-etal-2022-emotion
sislab/pns_val-ec_annotation
+ 10.18653/v1/2022.wassa-1.6
Infusing Knowledge from Wikipedia to Enhance Stance Detection
@@ -102,6 +108,7 @@
2022.wassa-1.7
he-etal-2022-infusing
zihaohe123/wiki-enhanced-stance-detection
+ 10.18653/v1/2022.wassa-1.7
Uncertainty Regularized Multi-Task Learning
@@ -113,6 +120,7 @@
2022.wassa-1.8
meshgi-etal-2022-uncertainty
IMDb Movie Reviews
+ 10.18653/v1/2022.wassa-1.8
Evaluating Contextual Embeddings and their Extraction Layers for Depression Assessment
@@ -123,6 +131,7 @@
Many recent works in natural language processing have demonstrated ability to assess aspects of mental health from personal discourse. At the same time, pre-trained contextual word embedding models have grown to dominate much of NLP but little is known empirically on how to best apply them for mental health assessment. Using degree of depression as a case study, we do an empirical analysis on which off-the-shelf language model, individual layers, and combinations of layers seem most promising when applied to human-level NLP tasks. Notably, we find RoBERTa most effective and, despite the standard in past work suggesting the second-to-last or concatenation of the last 4 layers, we find layer 19 (sixth-to last) is at least as good as layer 23 when using 1 layer. Further, when using multiple layers, distributing them across the second half (i.e. Layers 12+), rather than last 4, of the 24 layers yielded the most accurate results.
2022.wassa-1.9
matero-etal-2022-understanding
+ 10.18653/v1/2022.wassa-1.9
Emotion Analysis of Writers and Readers of Japanese Tweets on Vaccinations
@@ -136,6 +145,7 @@
2022.wassa-1.10
ramos-etal-2022-emotion
patrickjohnramos/bert-japan-vaccination
+ 10.18653/v1/2022.wassa-1.10
Opinion-based Relational Pivoting for Cross-domain Aspect Term Extraction
@@ -149,6 +159,7 @@
Domain adaptation methods often exploit domain-transferable input features, a.k.a. pivots. The task of Aspect and Opinion Term Extraction presents a special challenge for domain transfer: while opinion terms largely transfer across domains, aspects change drastically from one domain to another (e.g. from restaurants to laptops). In this paper, we investigate and establish empirically a prior conjecture, which suggests that the linguistic relations connecting opinion terms to their aspects transfer well across domains and therefore can be leveraged for cross-domain aspect term extraction. We present several analyses supporting this conjecture, via experiments with four linguistic dependency formalisms to represent relation patterns. Subsequently, we present an aspect term extraction method that drives models to consider opinion–aspect relations via explicit multitask objectives. This method provides significant performance gains, even on top of a prior state-of-the-art linguistically-informed model, which are shown in analysis to stem from the relational pivoting signal.
2022.wassa-1.11
klein-etal-2022-opinion
+ 10.18653/v1/2022.wassa-1.11
English-Malay Word Embeddings Alignment for Cross-lingual Emotion Classification with Hierarchical Attention Network
@@ -158,6 +169,7 @@
The main challenge in English-Malay cross-lingual emotion classification is that there are no Malay training emotion corpora. Given that machine translation could fall short in contextually complex tweets, we only limited machine translation to the word level. In this paper, we bridge the language gap between English and Malay through cross-lingual word embeddings constructed using singular value decomposition. We pre-trained our hierarchical attention model using English tweets and fine-tuned it using a set of gold standard Malay tweets. Our model uses significantly less computational resources compared to the language models. Experimental results show that the performance of our model is better than mBERT in zero-shot learning by 2.4% and Malay BERT by 0.8% when a limited number of Malay tweets is available. In exchange for 6 – 7 times less in computational time, our model only lags behind mBERT and XLM-RoBERTa by a margin of 0.9 – 4.3 % in few-shot learning. Also, the word-level attention could be transferred to the Malay tweets accurately using the cross-lingual word embeddings.
2022.wassa-1.12
lim-liew-2022-english-malay
+ 10.18653/v1/2022.wassa-1.12
Assessment of Massively Multilingual Sentiment Classifiers
@@ -171,6 +183,7 @@
Models are increasing in size and complexity in the hunt for SOTA. But what if those 2%increase in performance does not make a difference in a production use case? Maybe benefits from a smaller, faster model outweigh those slight performance gains. Also, equally good performance across languages in multilingual tasks is more important than SOTA results on a single one. We present the biggest, unified, multilingual collection of sentiment analysis datasets. We use these to assess 11 models and 80 high-quality sentiment datasets (out of 342 raw datasets collected) in 27 languages and included results on the internally annotated datasets. We deeply evaluate multiple setups, including fine-tuning transformer-based models for measuring performance. We compare results in numerous dimensions addressing the imbalance in both languages coverage and dataset sizes. Finally, we present some best practices for working with such a massive collection of datasets and models for a multi-lingual perspective.
2022.wassa-1.13
rajda-etal-2022-assessment
+ 10.18653/v1/2022.wassa-1.13
Improving Social Meaning Detection with Pragmatic Masking and Surrogate Fine-Tuning
@@ -180,6 +193,7 @@
Masked language models (MLMs) are pre-trained with a denoising objective that is in a mismatch with the objective of downstream fine-tuning. We propose pragmatic masking and surrogate fine-tuning as two complementing strategies that exploit social cues to drive pre-trained representations toward a broad set of concepts useful for a wide class of social meaning tasks. We test our models on 15 different Twitter datasets for social meaning detection. Our methods achieve 2.34% F_1 over a competitive baseline, while outperforming domain-specific language models pre-trained on large datasets. Our methods also excel in few-shot learning: with only 5% of training data (severely few-shot), our methods enable an impressive 68.54% average F_1. The methods are also language agnostic, as we show in a zero-shot setting involving six datasets from three different languages.
2022.wassa-1.14
zhang-abdul-mageed-2022-improving
+ 10.18653/v1/2022.wassa-1.14
Distinguishing In-Groups and Onlookers by Language Use
@@ -195,6 +209,7 @@
Inferring group membership of social media users is of high interest in many domains. Group membership is typically inferred via network interactions with other members, or by the usage of in-group language. However, network information is incomplete when users or groups move between platforms, and in-group keywords lose significance as public discussion about a group increases. Similarly, using keywords to filter content and users can fail to distinguish between the various groups that discuss a topic—perhaps confounding research on public opinion and narrative trends. We present a classifier intended to distinguish members of groups from users discussing a group based on contextual usage of keywords. We demonstrate the classifier on a sample of community pairs from Reddit and focus on results related to the COVID-19 pandemic.
2022.wassa-1.15
minot-etal-2022-distinguishing
+ 10.18653/v1/2022.wassa-1.15
Irony Detection for Dutch: a Venture into the Implicit
@@ -206,6 +221,7 @@
This paper presents the results of a replication experiment for automatic irony detection in Dutch social media text, investigating both a feature-based SVM classifier, as was done by Van Hee et al. (2017) and and a transformer-based approach. In addition to building a baseline model, an important goal of this research is to explore the implementation of common-sense knowledge in the form of implicit sentiment, as we strongly believe that common-sense and connotative knowledge are essential to the identification of irony and implicit meaning in tweets.We show promising results and the presented approach can provide a solid baseline and serve as a staging ground to build on in future experiments for irony detection in Dutch.
2022.wassa-1.16
maladry-etal-2022-irony
+ 10.18653/v1/2022.wassa-1.16
Pushing on Personality Detection from Verbal Behavior: A Transformer Meets Text Contours of Psycholinguistic Features
@@ -217,6 +233,7 @@
Research at the intersection of personality psychology, computer science, and linguistics has recently focused increasingly on modeling and predicting personality from language use. We report two major improvements in predicting personality traits from text data: (1) to our knowledge, the most comprehensive set of theory-based psycholinguistic features and (2) hybrid models that integrate a pre-trained Transformer Language Model BERT and Bidirectional Long Short-Term Memory (BLSTM) networks trained on within-text distributions (‘text contours’) of psycholinguistic features. We experiment with BLSTM models (with and without Attention) and with two techniques for applying pre-trained language representations from the transformer model - ‘feature-based’ and ‘fine-tuning’. We evaluate the performance of the models we built on two benchmark datasets that target the two dominant theoretical models of personality: the Big Five Essay dataset (Pennebaker and King, 1999) and the MBTI Kaggle dataset (Li et al., 2018). Our results are encouraging as our models outperform existing work on the same datasets. More specifically, our models achieve improvement in classification accuracy by 2.9% on the Essay dataset and 8.28% on the Kaggle MBTI dataset. In addition, we perform ablation experiments to quantify the impact of different categories of psycholinguistic features in the respective personality prediction models.
2022.wassa-1.17
kerz-etal-2022-pushing
+ 10.18653/v1/2022.wassa-1.17
XLM-EMO: Multilingual Emotion Prediction in Social Media Text
@@ -228,6 +245,7 @@
2022.wassa-1.18
bianchi-etal-2022-xlm
milanlproc/xlm-emo
+ 10.18653/v1/2022.wassa-1.18
Evaluating Content Features and Classification Methods for Helpfulness Prediction of Online Reviews: Establishing a Benchmark for Portuguese
@@ -237,6 +255,7 @@
Over the years, the review helpfulness prediction task has been the subject of several works, but remains being a challenging issue in Natural Language Processing, as results vary a lot depending on the domain, on the adopted features and on the chosen classification strategy. This paper attempts to evaluate the impact of content features and classification methods for two different domains. In particular, we run our experiments for a low resource language – Portuguese –, trying to establish a benchmark for this language. We show that simple features and classical classification methods are powerful for the task of helpfulness prediction, but are largely outperformed by a convolutional neural network-based solution.
2022.wassa-1.19
sousa-pardo-2022-evaluating
+ 10.18653/v1/2022.wassa-1.19
WASSA 2022 Shared Task: Predicting Empathy, Emotion and Personality in Reaction to News Stories
@@ -249,6 +268,7 @@
2022.wassa-1.20
barriere-etal-2022-wassa
GoEmotions
+ 10.18653/v1/2022.wassa-1.20
IUCL at WASSA 2022 Shared Task: A Text-only Approach to Empathy and Emotion Detection
@@ -259,6 +279,7 @@
Our system, IUCL, participated in the WASSA 2022 Shared Task on Empathy Detection and Emotion Classification. Our main goal in building this system is to investigate how the use of demographic attributes influences performance. Our (official) results show that our text-only systems perform very competitively, ranking first in the empathy detection task, reaching an average Pearson correlation of 0.54, and second in the emotion classification task, reaching a Macro-F of 0.572. Our systems that use both text and demographic data are less competitive.
2022.wassa-1.21
chen-etal-2022-iucl
+ 10.18653/v1/2022.wassa-1.21
Continuing Pre-trained Model with Multiple Training Strategies for Emotional Classification
@@ -271,6 +292,7 @@
Emotion is the essential attribute of human beings. Perceiving and understanding emotions in a human-like manner is the most central part of developing emotional intelligence. This paper describes the contribution of the LingJing team’s method to the Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA) 2022 shared task on Emotion Classification. The participants are required to predict seven emotions from empathic responses to news or stories that caused harm to individuals, groups, or others. This paper describes the continual pre-training method for the masked language model (MLM) to enhance the DeBERTa pre-trained language model. Several training strategies are designed to further improve the final downstream performance including the data augmentation with the supervised transfer, child-tuning training, and the late fusion method. Extensive experiments on the emotional classification dataset show that the proposed method outperforms other state-of-the-art methods, demonstrating our method’s effectiveness. Moreover, our submission ranked Top-1 with all metrics in the evaluation phase for the Emotion Classification task.
2022.wassa-1.22
li-etal-2022-continuing
+ 10.18653/v1/2022.wassa-1.22
Empathy and Distress Prediction using Transformer Multi-output Regression and Emotion Analysis with an Ensemble of Supervised and Zero-Shot Learning Models
@@ -283,6 +305,7 @@
2022.wassa-1.23
del-arco-etal-2022-empathy
CARER
+ 10.18653/v1/2022.wassa-1.23
Leveraging Emotion-Specific features to improve Transformer performance for Emotion Classification
@@ -295,6 +318,7 @@
This paper describes team PVG’s AI Club’s approach to the Emotion Classification shared task held at WASSA 2022. This Track 2 sub-task focuses on building models which can predict a multi-class emotion label based on essays from news articles where a person, group or another entity is affected. Baseline transformer models have been demonstrating good results on sequence classification tasks, and we aim to improve this performance with the help of ensembling techniques, and by leveraging two variations of emotion-specific representations. We observe better results than our baseline models and achieve an accuracy of 0.619 and a macro F1 score of 0.520 on the emotion classification task.
2022.wassa-1.24
desai-etal-2022-leveraging
+ 10.18653/v1/2022.wassa-1.24
Transformer based ensemble for emotion detection
@@ -307,6 +331,7 @@
2022.wassa-1.25
kane-etal-2022-transformer
GoEmotions
+ 10.18653/v1/2022.wassa-1.25
Team IITP-AINLPML at WASSA 2022: Empathy Detection, Emotion Classification and Personality Detection
@@ -318,6 +343,7 @@
Computational comprehension and identifying emotional components in language have been critical in enhancing human-computer connection in recent years. The WASSA 2022 Shared Task introduced four tracks and released a dataset of news stories: Track-1 for Empathy and Distress Prediction, Track-2 for Emotion classification, Track-3 for Personality prediction, and Track-4 for Interpersonal Reactivity Index prediction at the essay level. This paper describes our participation in the WASSA 2022 shared task on the tasks mentioned above. We developed multi-task deep learning methods to address Tracks 1 and 2 and machine learning models for Track 3 and 4. Our developed systems achieved average Pearson scores of 0.483, 0.05, and 0.08 for Track 1, 3, and 4, respectively, and a macro F1 score of 0.524 for Track 2 on the test set. We ranked 8th, 11th, 2nd and 2nd for tracks 1, 2, 3, and 4 respectively.
2022.wassa-1.26
ghosh-etal-2022-team
+ 10.18653/v1/2022.wassa-1.26
Transformer-based Architecture for Empathy Prediction and Emotion Classification
@@ -329,6 +355,7 @@
This paper describes the contribution of team PHG to the WASSA 2022 shared task on Empathy Prediction and Emotion Classification. The broad goal of this task was to model an empathy score, a distress score and the type of emotion associated with the person who had reacted to the essay written in response to a newspaper article. We have used the RoBERTa model for training and top of which few layers are added to finetune the transformer. We also use few machine learning techniques to augment as well as upsample the data. Our system achieves a Pearson Correlation Coefficient of 0.488 on Task 1 (Empathy - 0.470 and Distress - 0.506) and Macro F1-score of 0.531 on Task 2.
2022.wassa-1.27
vasava-etal-2022-transformer
+ 10.18653/v1/2022.wassa-1.27
Prompt-based Pre-trained Model for Personality and Interpersonal Reactivity Prediction
@@ -342,6 +369,7 @@
This paper describes the LingJing team’s method to the Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA) 2022 shared task on Personality Prediction (PER) and Reactivity Index Prediction (IRI). In this paper, we adopt the prompt-based method with the pre-trained language model to accomplish these tasks. Specifically, the prompt is designed to provide knowledge of the extra personalized information for enhancing the pre-trained model. Data augmentation and model ensemble are adopted for obtaining better results. Extensive experiments are performed, which shows the effectiveness of the proposed method. On the final submission, our system achieves a Pearson Correlation Coefficient of 0.2301 and 0.2546 on Track 3 and Track 4 respectively. We ranked 1-st on both sub-tasks.
2022.wassa-1.28
li-etal-2022-prompt-based
+ 10.18653/v1/2022.wassa-1.28
SURREY-CTS-NLP at WASSA2022: An Experiment of Discourse and Sentiment Analysis for the Prediction of Empathy, Distress and Emotion
@@ -355,6 +383,7 @@
2022.wassa-1.29
qian-etal-2022-surrey
GoEmotions
+ 10.18653/v1/2022.wassa-1.29
An Ensemble Approach to Detect Emotions at an Essay Level
@@ -366,6 +395,7 @@
maheshwari-varma-2022-ensemble
him-mah10/an-ensemble-approach-to-detect-emotions-at-an-essay-level
GoEmotions
+ 10.18653/v1/2022.wassa-1.30
CAISA at WASSA 2022: Adapter-Tuning for Empathy Prediction
@@ -378,6 +408,7 @@
lahnala-etal-2022-caisa
caisa-lab/wassa-empathy-adapters
CARER
+ 10.18653/v1/2022.wassa-1.31
NLPOP: a Dataset for Popularity Prediction of Promoted NLP Research on Twitter
@@ -389,6 +420,7 @@
2022.wassa-1.32
obadic-etal-2022-nlpop
lobadic/nlpop
+ 10.18653/v1/2022.wassa-1.32
Tagging Without Rewriting: A Probabilistic Model for Unpaired Sentiment and Style Transfer
@@ -399,6 +431,7 @@
shuo-2022-tagging
GYAFC
IMDb Movie Reviews
+ 10.18653/v1/2022.wassa-1.33
Polite Task-oriented Dialog Agents: To Generate or to Rewrite?
@@ -410,6 +443,7 @@
2022.wassa-1.34
silva-etal-2022-polite
MMD
+ 10.18653/v1/2022.wassa-1.34
Items from Psychometric Tests as Training Data for Personality Profiling Models of Twitter Users
@@ -420,6 +454,7 @@
Machine-learned models for author profiling in social media often rely on data acquired via self-reporting-based psychometric tests (questionnaires) filled out by social media users. This is an expensive but accurate data collection strategy. Another, less costly alternative, which leads to potentially more noisy and biased data, is to rely on labels inferred from publicly available information in the profiles of the users, for instance self-reported diagnoses or test results. In this paper, we explore a third strategy, namely to directly use a corpus of items from validated psychometric tests as training data. Items from psychometric tests often consist of sentences from an I-perspective (e.g., ‘I make friends easily.’). Such corpora of test items constitute ‘small data’, but their availability for many concepts is a rich resource. We investigate this approach for personality profiling, and evaluate BERT classifiers fine-tuned on such psychometric test items for the big five personality traits (openness, conscientiousness, extraversion, agreeableness, neuroticism) and analyze various augmentation strategies regarding their potential to address the challenges coming with such a small corpus. Our evaluation on a publicly available Twitter corpus shows a comparable performance to in-domain training for 4/5 personality traits with T5-based data augmentation.
2022.wassa-1.35
kreuter-etal-2022-items
+ 10.18653/v1/2022.wassa-1.35
diff --git a/data/xml/2022.wit.xml b/data/xml/2022.wit.xml
index b3d18dbec0..d3ae7690df 100644
--- a/data/xml/2022.wit.xml
+++ b/data/xml/2022.wit.xml
@@ -27,6 +27,7 @@
2022.wit-1.1
park-lee-2022-unsupervised
seongminp/graph-dialogue-summary
+ 10.18653/v1/2022.wit-1.1
An Interactive Analysis of User-reported Long COVID Symptoms using Twitter Data
@@ -37,6 +38,7 @@
With millions of documented recoveries from COVID-19 worldwide, various long-term sequelae have been observed in a large group of survivors. This paper is aimed at systematically analyzing user-generated conversations on Twitter that are related to long-term COVID symptoms for a better understanding of the Long COVID health consequences. Using an interactive information extraction tool built especially for this purpose, we extracted key information from the relevant tweets and analyzed the user-reported Long COVID symptoms with respect to their demographic and geographical characteristics. The results of our analysis are expected to improve the public awareness on long-term COVID-19 sequelae and provide important insights to public health authorities.
2022.wit-1.2
miao-etal-2022-interactive
+ 10.18653/v1/2022.wit-1.2
Bi-Directional Recurrent Neural Ordinary Differential Equations for Social Media Text Classification
@@ -47,6 +49,7 @@
Classification of posts in social media such as Twitter is difficult due to the noisy and short nature of texts. Sequence classification models based on recurrent neural networks (RNN) are popular for classifying posts that are sequential in nature. RNNs assume the hidden representation dynamics to evolve in a discrete manner and do not consider the exact time of the posting. In this work, we propose to use recurrent neural ordinary differential equations (RNODE) for social media post classification which consider the time of posting and allow the computation of hidden representation to evolve in a time-sensitive continuous manner. In addition, we propose a novel model, Bi-directional RNODE (Bi-RNODE), which can consider the information flow in both the forward and backward directions of posting times to predict the post label. Our experiments demonstrate that RNODE and Bi-RNODE are effective for the problem of stance classification of rumours in social media.
2022.wit-1.3
tamire-etal-2022-bi
+ 10.18653/v1/2022.wit-1.3