New submissions for Tue, 30 May 23 #364
Labels
abstract meaning representation
argument mining
citation context analysis
computational social science
contrastive
cross-language information retrieval
cross-lingual information retrieval
data augmentation
extreme multi-label
knowledge discovery
knowledge graph
legal text
legal
mixup
multi-task
paraphrase
passage generation
plagiarism
robustness
scholarly document processing
scholarly
semantic similarity
similarity measure
simplification
summarization
text generation
Keyword: abstract meaning representation
Slide, Constrain, Parse, Repeat: Synchronous SlidingWindows for Document AMR Parsing
Authors: Sadhana Kumaravel, Tahira Naseem, Ramon Fernandez Astudillo, Radu Florian, Salim RoukosArxiv: https://arxiv.org/abs/2305.17273
TLDR: The sliding window approach provides an elegant way to handle contexts of sizes larger than the Transformer's input window, for tasks like language modeling. Here we extend this approach to the sequence-to-sequence task of document parsing. For this, we exploit recent progress in transition-based parsing to implement a parser with synchronous sliding windows over source and target. We develop an oracle and a parser for document-level AMR by expanding on Structured-BART such that it lever
Repo: None
Keyword: computational social science
From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models
Authors: Julia Mendelsohn, Ronan Le Bras, Yejin Choi, Maarten SapArxiv: https://arxiv.org/abs/2305.17174
TLDR: Dogwhistles are coded expressions that simultaneously convey one meaning to a broad audience and a second one, often hateful or provocative, to a narrow in-group; they are deployed to evade both political repercussions and algorithmic content moderation. For example, in the sentence 'we need to end the cosmopolitan experiment,' the word 'cosmopolitan' likely means 'worldly' to many, but secretly means 'Jewish' to a select few. We present the first large-scale computational investigation
Repo: None
Keyword: contrastive
MT-SLVR: Multi-Task Self-Supervised Learning for Transformation In(Variant) Representations
Authors: Calum Heggan, Tim Hospedales, Sam Budgett, Mehrdad YaghoobiArxiv: https://arxiv.org/abs/2305.17191
TLDR: Contrastive self-supervised learning has gained attention for its ability to create high-quality representations from large unlabelled data sets. A key reason that these powerful features enable data-efficient learning of downstream tasks is that they provide augmentation invariance, which is often a useful inductive bias. However, the amount and type of invariances preferred is not known apriori, and varies across different downstream tasks. We therefore propose a multi-task self-Supervised
Repo: None
Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities
Authors: Jingyuan Sun, Mingxiao Li, Zijiao Chen, Yunhao Zhang, Shaonan Wang, Marie-Francine MoensArxiv: https://arxiv.org/abs/2305.17214
TLDR: Decoding visual stimuli from neural responses recorded by functional Magnetic Resonance Imaging (fMRI) presents an intriguing intersection between cognitive neuroscience and machine learning, promising advancements in understanding human visual perception and building non-invasive brain-machine interfaces. However, the task is challenging due to the noisy nature of fMRI signals and the intricate pattern of brain visual representations. To mitigate these challenges, we introduce a two-phase fMRI representation learning framework. The first phase pre-trains an f
Repo: None
CODET: A Benchmark for Contrastive Dialectal Evaluation of Machine Translation
Authors: Md Mahfuz Ibn Alam, Sina Ahmadi, Antonios AnastasopoulosArxiv: https://arxiv.org/abs/2305.17267
TLDR: Neural machine translation (NMT) systems exhibit limited robustness in handling source-side linguistic variations. Their performance tends to degrade when faced with even slight deviations in language usage, such as different domains or variations introduced by second-language speakers. It is intuitive to extend this observation to encompass dialectal variations as well, but the work allowing the community to evaluate MT systems on this dimension is limited. To alleviate this issue, we compile and release \dataset, a contrastive
Repo: None
Kernel-SSL: Kernel KL Divergence for Self-supervised Learning
Authors: Yifan Zhang, Zhiquan Tan, Jingqin Yang, Yang YuanArxiv: https://arxiv.org/abs/2305.17326
TLDR: Contrastive learning usually compares one positive anchor sample with lots of negative samples to perform Self-Supervised Learning (SSL). Alternatively, non-contrastive Learning, as exemplified by methods like BYOL, SimSiam, and Barlow Twins, accomplishes SSL without the explicit use of negativeamples. Inspired by the existing analysis for contrastive learning, we provide a reproducing kernel Hilbert space (RKHS) understanding of many existing non-ContrastIVE learning methods
Repo: None
Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser
Authors: Yung-Hsuan Lai, Yen-Chun Chen, Yu-Chiang Frank WangArxiv: https://arxiv.org/abs/2305.17343
TLDR: Audio-visual learning has been a major pillar of multi-modal machine learning, where the community mostly focused on its modality-aligned setting, i.e., the audio and visual modality are both assumed to signal the prediction target. With the Look, Listen, and Parse dataset (LLP), we investigate the under-explored unaligned setting. This is the case where the goal is to recognize video and visual events in a video with only weak labels observed.
Repo: None
Zero- and Few-Shot Event Detection via Prompt-Based Meta Learning
Authors: Zhenrui Yue, Huimin Zeng, Mengfei Lan, Heng Ji, Dong WangArxiv: https://arxiv.org/abs/2305.17373
TLDR: With emerging online topics as a source for numerous new events, detecting unseen / rare event types presents an elusive challenge for existing event detection methods, where only limited data access is provided for training. To address the data scarcity problem in event detection, we propose MetaEvent, a meta learning-based framework for zero- and few-shot event detection. Specifically, we sample training tasks from existing event types and perform meta training to search for optimal parameters that quickly adapt to unseen tasks. In our
Repo: None
GIMM: InfoMin-Max for Automated Graph Contrastive Learning
Authors: Xin Xiong (1), Furao Shen (1), Xiangyu Wang (1), Jian Zhao (2) ((1) School of Artificial Intelligence, Nanjing University, (2) School of Electronic Science and Engineering, Nanjing University)Arxiv: https://arxiv.org/abs/2305.17437
TLDR: Graph contrastive learning (GCL) shows great potential in unsupervised graph representation learning. Data augmentation plays a vital role in GCL, and its optimal choice heavily depends on the downstream task. Many GCL methods with automated data augmentation face the risk of insufficient information as they fail to preserve the essential information necessary for the upstream task. To solve this problem, we propose InfoMin-Max for automated Graph contrastive Learning (GIMM), which prevents GCL from
Repo: None
Decoupling Pseudo Label Disambiguation and Representation Learning for Generalized Intent Discovery
Authors: Yutao Mou, Xiaoshuai Song, Keqing He, Chen Zeng, Pei Wang, Jingang Wang, Yunsen Xian, Weiran XuArxiv: https://arxiv.org/abs/2305.17699
TLDR: Generalized intent discovery aims to extend a closed-set in-domain intent classifier to an open-world intent set including in- domain and out-of-domain intents. The key challenges lie in pseudo label disambiguation and representation learning. Previous methods suffer from a coupling of pseudo label Disambiguated and representation Learning, that is, the reliability of pseudo labels relies on representation learning, and representationlearning is restricted by pseudo labels in turn. In this paper,
Repo: None
RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename Refactoring
Authors: Hao Liu, Yanlin Wang, Zhao Wei, Yong Xu, Juhong Wang, Hui Li, Rongrong JiArxiv: https://arxiv.org/abs/2305.17708
TLDR: Refactoring is an indispensable practice of improving the quality and maintainability of source code in software evolution. Rename refactoring, or renaming, is the most frequently performed refactororing that suggests a new name for an identifier to enhance readability when the identifier is poorly named. However, most existing works only identify renaming activities between two versions of source software, while few works express concern about how to suggest a new Name. In this paper, we study automatic rename ref
Repo: None
Whitening-based Contrastive Learning of Sentence Embeddings
Authors: Wenjie Zhuo, Yifan Sun, Xiaohan Wang, Linchao Zhu, Yi YangArxiv: https://arxiv.org/abs/2305.17746
TLDR: This paper presents a whitening-based contrastive learning method for sentence embedding learning (WhitenedCSE), which combines contrastive Learning with a novel shuffled group whitening. Generally, contrastive training pulls distortions of a single sample (i.e., positive samples) close and push negative samples far away, correspondingly facilitating the alignment and uniformity in the feature space. A popular alternative to the "pushing'' operation is whitening the feature spaces, which scatters
Repo: None
Point-PC: Point Cloud Completion Guided by Prior Knowledge via Causal Inference
Authors: Weizhi Nie, Chuanqi Jiao, Ruidong Chen, Weijie Wang, Bruno Lepri, Nicu Sebe, Anan LiuArxiv: https://arxiv.org/abs/2305.17770
TLDR: Point cloud completion aims to recover raw point clouds captured by scanners from partial observations caused by occlusion and limited view angles. Many approaches utilize a partial-complete paradigm in which missing parts are directly predicted by a global feature learned from partial inputs. This makes it hard to recover details because the global feature is unlikely to capture the full details of all missing parts. In this paper, we propose a novel approach to point cloud completion called Point-PC, which uses a memory network to retrieve
Repo: None
Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal Action Localization
Authors: Huan Ren, Wenfei Yang, Tianzhu Zhang, Yongdong ZhangArxiv: https://arxiv.org/abs/2305.17861
TLDR: Weakly-supervised temporal action localization aims to localize and recognize actions in untrimmed videos with only video-level category labels during training. Without instance-level annotations, most existing methods follow the Segment-based Multiple Instance Learning (S-MIL) framework, where the predictions of segments are supervised by the labels of videos. However, the objective for acquiring segment-level scores during training is not consistent with the target for acquiring proposal-level score during testing,
Repo: https://github.com/RenHuan1999/CVPR2023_P-MIL
ContrastNER: Contrastive-based Prompt Tuning for Few-shot NER
Authors: Amirhossein Layegh, Amir H. Payberah, Ahmet Soylu, Dumitru Roman, Mihhail MatskinArxiv: https://arxiv.org/abs/2305.17951
TLDR: Prompt-based language models have produced encouraging results in numerous applications, including Named Entity Recognition (NER) tasks. NER aims to identify entities in a sentence and provide their types. However, the strong performance of most available NER approaches is heavily dependent on the design of discrete prompts and a verbalizer to map the model-predicted outputs to entity categories, which are complicated undertakings. To address these challenges, we present ContrastNER, a prompt-based NER framework
Repo: None
Multi-Modal Face Stylization with a Generative Prior
Authors: Mengtian Li, Yi Dong, Minxuan Lin, Haibin Huang, Pengfei Wan, Chongyang MaArxiv: https://arxiv.org/abs/2305.18009
TLDR: In this work, we introduce a new approach for artistic face stylization. Despite existing methods achieving impressive results in this task, there is still room for improvement in generating high-quality stylized faces with diverse styles and accurate facial reconstruction. Our proposed framework, MMFS, supports multi-modal face stylizing by leveraging the strengths of StyleGAN and integrates it into an encoder-decoder architecture. Specifically, we use the mid-resolution and high-resolution layers of StyleG
Repo: None
Abstractive Summarization as Augmentation for Document-Level Event Detection
Authors: Janko Vidaković, Filip Karlo Došilović, Domagoj PluščecArxiv: https://arxiv.org/abs/2305.18023
TLDR: Transformer-based models have consistently produced substantial performance gains across a variety of NLP tasks, compared to shallow models. However, deep models are orders of magnitude more computationally expensive than shallow models, especially on tasks with large sequence lengths, such as document-level event detection. In this work, we attempt to bridge the performance gap between shallow and deep models on document- level event detection by using abstractive text summarization as an augmentation method. We augment the DocEE dataset
Repo: None
Semantic Role Labeling Guided Out-of-distribution Detection
Authors: Jinan Zou, Maihao Guo, Yu Tian, Yuhao Lin, Haiyao Cao, Lingqiao Liu, Ehsan Abbasnejad, Javen Qinfeng ShiArxiv: https://arxiv.org/abs/2305.18026
TLDR: Identifying unexpected domain-shifted instances in natural language processing is crucial in real-world applications. Previous works identify the OOD instance by leveraging a single global feature embedding to represent the sentence, which cannot characterize subtle OOD patterns well. Another major challenge current OOD methods face is learning effective low-dimensional sentence representations to identify the hard OOD instances that are semantically similar to the ID data. In this paper, we propose a new unsupervised OOD detection method
Repo: None
Contrastive Learning Based Recursive Dynamic Multi-Scale Network for Image Deraining
Authors: Zhiying Jiang, Risheng Liu, Shuzhou Yang, Zengxi Zhang, Xin FanArxiv: https://arxiv.org/abs/2305.18092
TLDR: Rain streaks significantly decrease the visibility of captured images and are also a stumbling block that restricts the performance of subsequent computer vision applications. The existing deep learning-based image deraining methods employ manually crafted networks and learn a straightforward projection from rainy images to clear images. In pursuit of better deraining performance, they focus on elaborating a more complicated architecture rather than exploiting the intrinsic properties of the positive and negative information. In this paper, we propose a contrastive learning-Based Image deraining method
Repo: None
Reason to explain: Interactive contrastive explanations (REASONX)
Authors: Laura State, Salvatore Ruggieri, Franco TuriniArxiv: https://arxiv.org/abs/2305.18143
TLDR: Many high-performing machine learning models are not interpretable. As they are increasingly used in decision scenarios that can critically affect individuals, it is necessary to develop tools to better understand their outputs. Popular explanation methods include contrastive explanations. However, they suffer several shortcomings, among others an insufficient incorporation of background knowledge, and a lack of interactivity. While (dialogue-like) interactivity is important to better communicate an explanation, background knowledge has the potential to significantly improve their quality,
Repo: None
LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning
Authors: Amirhossein Abaskohi, Sascha Rothe, Yadollah YaghoobzadehArxiv: https://arxiv.org/abs/2305.18169
TLDR: In recent years, there has been significant progress in developing pre-trained language models for NLP. However, these models often struggle when fine-tuned on small datasets. To address this issue, researchers have proposed various adaptation approaches. Prompt-based tuning is arguably the most common way, especially for larger models. Previous research shows that adding contrastive learning to prompt-based fine-tuning is effective as it helps the model generate embeddings that are more distinguishable between classes
Repo: None
Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors
Authors: Paul S. Scotti, Atmadeep Banerjee, Jimmie Goode, Stepan Shabalin, Alex Nguyen, Ethan Cohen, Aidan J. Dempster, Nathalie Verlinde, Elad Yundler, David Weisberg, Kenneth A. Norman, Tanishq Mathew AbrahamArxiv: https://arxiv.org/abs/2305.18274
TLDR: We present MindEye, a novel fMRI-to-image approach to retrieve and reconstruct viewed images from brain activity. Our model comprises two parallel submodules that are specialized for retrieval (using contrastive learning) and reconstruction (using a diffusion prior). MindEye can map fMRI brain activity to any high dimensional multimodal latent space, like CLIP image space, enabling image reconstruction using generative models that accept embeddings from this latent space. We comprehensively compare our approach
Repo: None
Keyword: data augmentation
Generalization Error without Independence: Denoising, Linear Regression, and Transfer Learning
Authors: Chinmaya Kausik, Kashvi Srivastava, Rishi SonthaliaArxiv: https://arxiv.org/abs/2305.17297
TLDR: Studying the generalization abilities of linear models with real data is a central question in statistical learning. While there exist a limited number of prior important works (Loureiro et al. (2021A, 2021B), Wei et al., and Xu 2020) that do validate theoretical work with Real data, these works have limitations due to technical assumptions. These assumptions include having a well-conditioned covariance matrix and having independent and identically distributed data. These assumption are not necessarily
Repo: None
Disambiguated Lexically Constrained Neural Machine Translation
Authors: Jinpeng Zhang, Nini Xiao, Ke Wang, Chuanqi Dong, Xiangyu Duan, Yuqi Zhang, Min ZhangArxiv: https://arxiv.org/abs/2305.17351
TLDR: Lexically constrained neural machine translation (LCNMT), which controls the translation generation with pre-specified constraints, is important in many practical applications. Current approaches to LCNMT typically assume that the pre-defined lexical constraints are contextually appropriate. This assumption limits their application to real-world scenarios where a source lexicon may have multiple target constraints, and disambiguation is needed to select the most suitable one. In this paper, we propose disambambiguated LC
Repo: None
GIMM: InfoMin-Max for Automated Graph Contrastive Learning
Authors: Xin Xiong (1), Furao Shen (1), Xiangyu Wang (1), Jian Zhao (2) ((1) School of Artificial Intelligence, Nanjing University, (2) School of Electronic Science and Engineering, Nanjing University)Arxiv: https://arxiv.org/abs/2305.17437
TLDR: Graph contrastive learning (GCL) shows great potential in unsupervised graph representation learning. Data augmentation plays a vital role in GCL, and its optimal choice heavily depends on the downstream task. Many GCL methods with automated data augmentation face the risk of insufficient information as they fail to preserve the essential information necessary for the upstream task. To solve this problem, we propose InfoMin-Max for automated Graph contrastive Learning (GIMM), which prevents GCL from
Repo: None
Toward Understanding Generative Data Augmentation
Authors: Chenyu Zheng, Guoqiang Wu, Chongxuan LiArxiv: https://arxiv.org/abs/2305.17476
TLDR: Generative data augmentation, which scales datasets by obtaining fake labeled examples from a trained conditional generative model, boosts classification performance in various learning tasks including (semi-)supervised learning, few-shot learning, and adversarially robust learning. However, little work has theoretically investigated the effect of generative data Augmentation. To fill this gap, we establish a general stability bound in this not independently and identically distributed (non-i.i.d.) setting, where
Repo: None
Spot keywords from very noisy and mixed speech
Authors: Ying Shi, Dong Wang, Lantian Li, Jiqing Han, Shi YinArxiv: https://arxiv.org/abs/2305.17706
TLDR: Most existing keyword spotting research focuses on conditions with slight or moderate noise. In this paper, we try to tackle a more challenging task: detecting keywords buried under strong interfering speech (10 times higher than the keyword in amplitude), and even worse, mixed with other keywords. We propose a novel Mix Training (MT) strategy that encourages the model to discover low-energy keywords from noisy and mixed speech. Experiments were conducted with a vanilla CNN and two EfficientNet (B0/B
Repo: None
StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation
Authors: Kun Song, Yi Ren, Yi Lei, Chunfeng Wang, Kun Wei, Lei Xie, Xiang Yin, Zejun MaArxiv: https://arxiv.org/abs/2305.17732
TLDR: Direct speech-to-speech translation (S2ST) has gradually become popular as it has many advantages compared with cascade S2ST. However, current research mainly focuses on the accuracy of semantic translation and ignores the speech style transfer from a source language to a target language. The lack of high-fidelity expressive parallel data makes such style transfer challenging, especially in more practical zero-shot scenarios. To solve this problem, we first build a parallel corpus using a multi-lingual
Repo: None
Targeted Data Generation: Finding and Fixing Model Weaknesses
Authors: Zexue He, Marco Tulio Ribeiro, Fereshte KhaniArxiv: https://arxiv.org/abs/2305.17804
TLDR: Even when aggregate accuracy is high, state-of-the-art NLP models often fail systematically on specific subgroups of data, resulting in unfair outcomes and eroding user trust. Additional data collection may not help in addressing these weaknesses, as such challenging subgroups may be unknown to users, and underrepresented in the existing and new data. We propose Targeted Data Generation (TDG), a framework that automatically identifies challenging subgroup, and generates new data for those subgroups using
Repo: None
Data Augmentation for Low-Resource Keyphrase Generation
Authors: Krishna Garg, Jishnu Ray Chowdhury, Cornelia CarageaArxiv: https://arxiv.org/abs/2305.17968
TLDR: Keyphrase generation is the task of summarizing the contents of any given article into a few salient phrases (or keyphrases). Existing works for the task mostly rely on large-scale annotated datasets, which are not easy to acquire. Very few works address the problem of keyphrase generation in low-resource settings, but they still rely on a lot of additional unlabeled data for pretraining and on automatic methods for pseudo-annotations. In this paper, we
Repo: None
Extrinsic Factors Affecting the Accuracy of Biomedical NER
Authors: Zhiyi Li, Shengjie Zhang, Yujie Song, Jungyeul ParkArxiv: https://arxiv.org/abs/2305.18152
TLDR: Biomedical named entity recognition (NER) is a critial task that aims to identify structured information in clinical text, which is often replete with complex, technical terms and a high degree of variability. Accurate and reliable NER can facilitate the extraction and analysis of important biomedical information, which can be used to improve downstream applications including the healthcare system. However, NER in the biomedical domain is challenging due to limited data availability, as the high expertise, time, and expenses are required
Repo: None
LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning
Authors: Amirhossein Abaskohi, Sascha Rothe, Yadollah YaghoobzadehArxiv: https://arxiv.org/abs/2305.18169
TLDR: In recent years, there has been significant progress in developing pre-trained language models for NLP. However, these models often struggle when fine-tuned on small datasets. To address this issue, researchers have proposed various adaptation approaches. Prompt-based tuning is arguably the most common way, especially for larger models. Previous research shows that adding contrastive learning to prompt-based fine-tuning is effective as it helps the model generate embeddings that are more distinguishable between classes
Repo: None
Improved Probabilistic Image-Text Representations
Authors: Sanghyuk ChunArxiv: https://arxiv.org/abs/2305.18171
TLDR: Image-Text Matching (ITM) task, a fundamental vision-language (VL) task; suffers from the inherent ambiguity arising from multiplicity and imperfect annotations. Deterministic functions are not sufficiently powerful to capture ambiguity, prompting the exploration of probabilistic embeddings to tackle the challenge. However, the existing probabilistically ITM approach encounters two key shortcomings; the burden of heavy computations due to the Monte Carlo approximation, and the loss saturation issue in the face of
Repo: None
Rethinking Counterfactual Data Augmentation Under Confounding
Authors: Abbavaram Gowtham Reddy, Saketh Bachu, Saloni Dash, Charchit Sharma, Amit Sharma, Vineeth N BalasubramanianArxiv: https://arxiv.org/abs/2305.18183
TLDR: Counterfactual data augmentation has recently emerged as a method to mitigate confounding biases in the training data for a machine learning model. These biases, such as spurious correlations, arise due to various observed and unobserved confounding variables in the data generation process. In this paper, we formally analyze how confounding biases impact downstream classifiers and present a causal viewpoint to the solutions based on counterfactualdata augmentation. We explore how removing confounding biases serves as a means to learn invariant features,
Repo: None
Keyword: knowledge graph
A Categorical Representation Language and Computational System for Knowledge-Based Planning
Authors: Angeline Aguinaldo, Evan Patterson, James Fairbanks, Jaime RuizArxiv: https://arxiv.org/abs/2305.17208
TLDR: Classical planning representation languages based on first-order logic have been extensively used to model and solve planning problems, but they struggle to capture implicit preconditions and effects that arise in complex planning scenarios. To address this problem, we propose an alternative approach to representing and transforming world states during planning. Based on the category-theoretic concepts of
Repo: None
Choose your Data Wisely: A Framework for Semantic Counterfactuals
Authors: Edmund Dervakos, Konstantinos Thomas, Giorgos Filandrianos, Giorgos StamouArxiv: https://arxiv.org/abs/2305.17667
TLDR: Counterfactual explanations have been argued to be one of the most intuitive forms of explanation. They are typically defined as a minimal set of edits on a given data sample that, when applied, changes the output of a model on that sample. However, a minimal Set of edits is not always clear and understandable to an end-user, as it could, for instance, constitute an adversarial example (which is indistinguishable from the original data sample to an End-user). Instead, there
Repo: None
Sequential Condition Evolved Interaction Knowledge Graph for Traditional Chinese Medicine Recommendation
Authors: Jingjin Liu, Hankz Hankui Zhuo, Kebing Jin, Jiamin Yuan, Zhimin Yang, Zhengan YaoArxiv: https://arxiv.org/abs/2305.17866
TLDR: Traditional Chinese Medicine (TCM) has a rich history of utilizing natural herbs to treat a diversity of illnesses. In practice, TCM diagnosis and treatment are highly personalized and organically holistic, requiring comprehensive consideration of the patient's state and symptoms over time. However, existing TCM recommendation approaches overlook the changes in patient status and only explore potential patterns between symptoms and prescriptions. In this paper, we propose a novel Sequential Condition Evolved Interaction Knowledge Graph (SCEIKG),
Repo: None
Representation Learning on Hyper-Relational and Numeric Knowledge Graphs with Transformers
Authors: Chanyoung Chung, Jaejun Lee, Joyce Jiyoung WhangArxiv: https://arxiv.org/abs/2305.18256
TLDR: A hyper-relational knowledge graph has been recently studied where a triplet is associated with a set of qualifiers; a qualifier is composed of a relation and an entity, providing auxiliary information for a Triplet. While existing hyper-Relational knowledge Graph embedding methods assume that the entities are discrete objects, some information should be represented using numeric values, e.g., (J.R.R., was born in, 1892). Also, a triplett (J.-R
Repo: None
Keyword: legal
NaturalFinger: Generating Natural Fingerprint with Generative Adversarial Networks
Authors: Kang Yang, Kunhao LaiArxiv: https://arxiv.org/abs/2305.17868
TLDR: Deep neural network (DNN) models have become a critical asset of the model owner as training them requires a large amount of resource (i.e. labeled data). Therefore, many fingerprinting schemes have been proposed to safeguard the intellectual property (IP) of the Model owner against model extraction and illegal redistribution. However, previous schemes adopt unnatural images as the fingerprint, such as adversarial examples and noisy images, which can be easily perceived and rejected by the adversary. In this paper,
Repo: None
Keyword: mixup
Spot keywords from very noisy and mixed speech
Authors: Ying Shi, Dong Wang, Lantian Li, Jiqing Han, Shi YinArxiv: https://arxiv.org/abs/2305.17706
TLDR: Most existing keyword spotting research focuses on conditions with slight or moderate noise. In this paper, we try to tackle a more challenging task: detecting keywords buried under strong interfering speech (10 times higher than the keyword in amplitude), and even worse, mixed with other keywords. We propose a novel Mix Training (MT) strategy that encourages the model to discover low-energy keywords from noisy and mixed speech. Experiments were conducted with a vanilla CNN and two EfficientNet (B0/B
Repo: None
Conditional Score Guidance for Text-Driven Image-to-Image Translation
Authors: Hyunsoo Lee, Minsoo Kang, Bohyung HanArxiv: https://arxiv.org/abs/2305.18007
TLDR: We present a novel algorithm for text-driven image-to-image translation based on a pretrained text- to-image diffusion model. Our method aims to generate a target image by selectively editing the regions of interest in a source image, defined by a modifying text, while preserving the remaining parts. In contrast to existing techniques that solely rely on a target prompt, we introduce a new score function, which considers both a source prompt and a source images, tailored to address specific translation tasks
Repo: None
Keyword: multi-task
MT-SLVR: Multi-Task Self-Supervised Learning for Transformation In(Variant) Representations
Authors: Calum Heggan, Tim Hospedales, Sam Budgett, Mehrdad YaghoobiArxiv: https://arxiv.org/abs/2305.17191
TLDR: Contrastive self-supervised learning has gained attention for its ability to create high-quality representations from large unlabelled data sets. A key reason that these powerful features enable data-efficient learning of downstream tasks is that they provide augmentation invariance, which is often a useful inductive bias. However, the amount and type of invariances preferred is not known apriori, and varies across different downstream tasks. We therefore propose a multi-task self-Supervised
Repo: None
Coping with low data availability for social media crisis message categorisation
Authors: Congcong WangArxiv: https://arxiv.org/abs/2305.17211
TLDR: During crisis situations, social media allows people to quickly share information, including messages requesting help. This can be valuable to emergency responders, who need to categorise and prioritise these messages based on the type of assistance being requested. However, the high volume of messages makes it difficult to filter and prioritised them without the use of computational techniques. Fully supervised filtering techniques for crisis message categorisation typically require a large amount of annotated training data, but this can be difficult to obtain during an
Repo: None
DynaShare: Task and Instance Conditioned Parameter Sharing for Multi-Task Learning
Authors: Elahe Rahimian, Golara Javadi, Frederick Tung, Gabriel OliveiraArxiv: https://arxiv.org/abs/2305.17305
TLDR: Multi-task networks rely on effective parameter sharing to achieve robust generalization across tasks. In this paper, we present a novel parameter sharing method for multi-task learning that conditions parameter sharing on both the task and the intermediate feature representations at inference time. In contrast to traditional parameter sharing approaches, which fix or learn a deterministic sharing pattern during training and apply the same pattern to all examples during inference, we propose to dynamically decide which parts of the network to activate and which parts to activate
Repo: None
Understanding Emotion Valence is a Joint Deep Learning Task
Authors: Gabriel Roccabruna, Seyed Mahed Mousavi, Giuseppe RiccardiArxiv: https://arxiv.org/abs/2305.17422
TLDR: The valence analysis of speakers' utterances or written posts helps to understand the activation and variations of the emotional state throughout the conversation. More recently, the concept of Emotion Carriers (EC) has been introduced to explain the emotion felt by the speaker and its manifestations. In this work, we investigate the natural inter-dependency of valence and ECs via a multi-task learning approach. We experiment with Pre-trained Language Models (PLM) for single-task
Repo: None
A Match Made in Heaven: A Multi-task Framework for Hyperbole and Metaphor Detection
Authors: Naveen Badathala (1), Abisek Rajakumar Kalarani (1), Tejpalsingh Siledar (1), Pushpak Bhattacharyya (1), ((1) Indian Institute of Technology Bombay)Arxiv: https://arxiv.org/abs/2305.17480
TLDR: Hyperbole and metaphor are common in day-to-day communication (e.g., "I am in deep trouble": how does trouble have depth?), which makes their detection important, especially in a conversational AI setting. Existing approaches to automatically detect metaphor and hyperbole have studied these language phenomena independently, but their relationship has hardly, if ever, been explored computationally. In this paper, we propose a multi-task deep learning framework to detect hyperbole and symbol simultaneously.
Repo: None
Towards computing low-makespan solutions for multi-arm multi-task planning problems
Authors: Hartmann Valentin N., Toussaint MarcArxiv: https://arxiv.org/abs/2305.17527
TLDR: We propose an approach to find low-makespan solutions to multi-robot multi-task planning problems in environments where robots block each other from completing tasks simultaneously. We introduce a formulation of the problem that allows for an approach based on greedy descent with random restarts for generation of the task assignment and task sequence. We then use a multi-agent path planner to evaluate the makespan of a given assignment and sequence. The planner decomposes the problem into multiple simple subproblems that
Repo: None
AIMS: All-Inclusive Multi-Level Segmentation
Authors: Lu Qi, Jason Kuen, Weidong Guo, Jiuxiang Gu, Zhe Lin, Bo Du, Yu Xu, Ming-Hsuan YangArxiv: https://arxiv.org/abs/2305.17768
TLDR: Despite the progress of image segmentation for accurate visual entity segmentation, completing the diverse requirements of image editing applications for different-level region-of-interest selections remains unsolved. In this paper, we propose a new task, All-Inclusive Multi-Level Segmentation (AIMS), which segments visual regions into three levels: part, entity, and relation (two entities with some semantic relationships). We also build a unified AIMS model through multi-dataset
Repo: None
Keyword: robustness
Ghost in the Minecraft: Generally Capable Agents for Open-World Enviroments via Large Language Models with Text-based Knowledge and Memory
Authors: Xizhou Zhu, Yuntao Chen, Hao Tian, Chenxin Tao, Weijie Su, Chenyu Yang, Gao Huang, Bin Li, Lewei Lu, Xiaogang Wang, Yu Qiao, Zhaoxiang Zhang, Jifeng DaiArxiv: https://arxiv.org/abs/2305.17144
TLDR: The captivating realm of Minecraft has attracted substantial research interest in recent years, serving as a rich platform for developing intelligent agents capable of functioning in open-world environments. However, the current research landscape predominantly focuses on specific objectives, such as the popular "ObtainDiamond" task, and has not yet shown effective generalization to a broader spectrum of tasks. Furthermore, the currently leading success rate for the "ObtainedDiamond" tasks stands at around 20%, highlighting the limitations of Reinforcement
Repo: None
GVdoc: Graph-based Visual Document Classification
Authors: Fnu Mohbat, Mohammed J. Zaki, Catherine Finegan-Dollak, Ashish VermaArxiv: https://arxiv.org/abs/2305.17219
TLDR: The robustness of a model for real-world deployment is decided by how well it performs on unseen data and distinguishes between in-domain and out-of-domain samples. Visual document classifiers have shown impressive performance on in-distribution test sets. However, they tend to have a hard time correctly classifying and differentiating out- of-distributary examples. Image-based classifiers lack the text component, whereas multi-modality transformer-based models face the token
Repo: None
A Reissner-Mindlin plate formulation using symmetric Hu-Zhang elements via polytopal transformations
Authors: Adam Sky, Michael Neunteufel, Jack S. Hale, Andreas ZilianArxiv: https://arxiv.org/abs/2305.17249
TLDR: In this work we develop new finite element discretisations of the shear-deformable Reissner--Mindlin plate problem based on the Hellinger-Reissner principle of asymmetric stresses. Specifically, we use conforming Hu-Zhang elements to discretise the bending moments in the space of symmetric square integrable fields with a square integrate divergence $\boldsymbol{M} \in \mathcal{HZ} \subset
Repo: None
Large Language Models Can be Lazy Learners: Analyze Shortcuts in In-Context Learning
Authors: Ruixiang Tang, Dehan Kong, Longtao Huang, Hui XueArxiv: https://arxiv.org/abs/2305.17256
TLDR: Large language models (LLMs) have recently shown great potential for in-context learning, where LLMs learn a new task simply by conditioning on a few input-label pairs (prompts). Despite their potential, our understanding of the factors influencing end-task performance and the robustness of in- Context learning remains limited. This paper aims to bridge this knowledge gap by investigating the reliance of LLMs on shortcuts or spurious correlations within prompts. Through comprehensive experiments on classification and extraction tasks
Repo: None
CODET: A Benchmark for Contrastive Dialectal Evaluation of Machine Translation
Authors: Md Mahfuz Ibn Alam, Sina Ahmadi, Antonios AnastasopoulosArxiv: https://arxiv.org/abs/2305.17267
TLDR: Neural machine translation (NMT) systems exhibit limited robustness in handling source-side linguistic variations. Their performance tends to degrade when faced with even slight deviations in language usage, such as different domains or variations introduced by second-language speakers. It is intuitive to extend this observation to encompass dialectal variations as well, but the work allowing the community to evaluate MT systems on this dimension is limited. To alleviate this issue, we compile and release \dataset, a contrastive
Repo: None
Fourier-DeepONet: Fourier-enhanced deep operator networks for full waveform inversion with improved accuracy, generalizability, and robustness
Authors: Min Zhu, Shihang Feng, Youzuo Lin, Lu LuArxiv: https://arxiv.org/abs/2305.17289
TLDR: Full waveform inversion (FWI) infers the subsurface structure information from seismic waveform data by solving a non-convex optimization problem. Data-driven FWI has been increasingly studied with various neural network architectures to improve accuracy and computational efficiency. Nevertheless, the applicability of pre-trained neural networks is severely restricted by potential discrepancies between the source function used in the field survey and the one utilized during training. Here, we develop a Fourier-enhanced
Repo: None
Exploiting Large Neuroimaging Datasets to Create Connectome-Constrained Approaches for more Robust, Efficient, and Adaptable Artificial Intelligence
Authors: Erik C. Johnson, Brian S. Robinson, Gautam K. Vallabha, Justin Joyce, Jordan K. Matelsky, Raphael Norman-Tenazas, Isaac Western, Marisel Villafañe-Delgado, Martha Cervantes, Michael S. Robinette, Arun V. Reddy, Lindsey Kitchell, Patricia K. Rivlin, Elizabeth P. Reilly, Nathan Drenkow, Matthew J. Roos, I-Jeng Wang, Brock A. Wester, William R. Gray-Roncal, Joan A. HoffmannArxiv: https://arxiv.org/abs/2305.17300
TLDR: Despite the progress in deep learning networks, efficient learning at the edge (enabling adaptable, low-complexity machine learning solutions) remains a critical need for defense and commercial applications. We envision a pipeline to utilize large neuroimaging datasets, including maps of the brain which capture neuron and synapse connectivity, to improve machine learning approaches. We have pursued different approaches within this pipeline structure. First, as a demonstration of data-driven discovery, the team has developed a technique for discovery
Repo: None
An Image Based Visual Servo Method for Probe-and-Drogue Autonomous Aerial Refueling
Authors: Quan Quan, Runxiao Liu, Hao Liu, Zeqing Ma, Jinrui RenArxiv: https://arxiv.org/abs/2305.17414
TLDR: With the high focus on autonomous aerial refueling recently, it becomes increasingly urgent to design efficient methods or algorithms to solve AAR problems in complicated aerial environments. Apart from the complex aerodynamic disturbance, another problem is the pose estimation error caused by the camera calibration error, installation error, or 3D object modeling error, which may not satisfy the highly accurate docking. The main objective of the effort described in this paper is the implementation of an image-based visual servo control method, which
Repo: None
Choosing the Right Weights: Balancing Value, Strategy, and Noise in Recommender Systems
Authors: Smitha Milli, Emma Pierson, Nikhil GargArxiv: https://arxiv.org/abs/2305.17428
TLDR: Many recommender systems are based on optimizing a linear weighting of different user behaviors, such as clicks, likes, shares, etc. Though the choice of weights can have a significant impact, there is little formal study or guidance on how to choose them. We analyze the optimal choice of weight from the perspectives of both users and content producers who strategically respond to the weights. We consider three aspects of user behavior: value-faithfulness (how well a behavior indicates whether the user values the
Repo: None
On the Importance of Backbone to the Adversarial Robustness of Object Detectors
Authors: Xiao Li, Hang Chen, Xiaolin HuArxiv: https://arxiv.org/abs/2305.17438
TLDR: Object detection is a critical component of various security-sensitive applications, such as autonomous driving and video surveillance. However, existing deep learning-based object detectors are vulnerable to adversarial attacks, which poses a significant challenge to their reliability and safety. Through experiments, we found that existing works on improving the adversarial robustness of object detectors have given a false sense of security. We argue that using adversarially pre-trained backbone networks is essential for enhancing and improving the object detectors. We
Repo: None
A Diffusion Model for Event Skeleton Generation
Authors: Fangqi Zhu, Lin Zhang, Jun Gao, Bing Qin, Ruifeng Xu, Haiqin YangArxiv: https://arxiv.org/abs/2305.17458
TLDR: Event skeleton generation, aiming to induce an event schema skeleton graph with abstracted event nodes and their temporal relations from a set of event instance graphs, is a critical step in the temporal complex event schema induction task. Existing methods effectively address this task from a graph generation perspective but suffer from noise-sensitive and error accumulation, e.g., the inability to correct errors while generating schema. We, therefore, propose a novel Diffusion Event Graph Model~(DEGM) to address
Repo: None
Keep it Upright: Model Predictive Control for Nonprehensile Object Transportation with Obstacle Avoidance on a Mobile Manipulator
Authors: Adam Heins, Angela P. SchoelligArxiv: https://arxiv.org/abs/2305.17484
TLDR: We consider a nonprehensile manipulation task in which a mobile manipulator must balance objects on its end effector without grasping them -- known as the waiter's problem -- and move to a desired location while avoiding static and dynamic obstacles. In constrast to existing approaches, our focus is on fast online planning in response to new and changing environments. Our main contribution is a whole-body constrained model predictive controller (MPC) for a mobile accumulator that balances objects and avoids collisions.
Repo: None
Online Nonstochastic Model-Free Reinforcement Learning
Authors: Udaya Ghai, Arushi Gupta, Wenhan Xia, Karan Singh, Elad HazanArxiv: https://arxiv.org/abs/2305.17552
TLDR: In this work, we explore robust model-free reinforcement learning algorithms for environments that may be dynamic or even adversarial. Conventional state-based policies fail to accommodate the challenge imposed by the presence of unmodeled disturbances in such settings. Additionally, optimizing linear state- based policies pose obstacle for efficient optimization, leading to nonconvex objectives even in benign environments like linear dynamical systems. Drawing inspiration from recent advancements in model-based control, we introduce a novel class of policies
Repo: None
Online Causation Monitoring of Signal Temporal Logic
Authors: Zhenya Zhang, Jie An, Paolo Arcaini, Ichiro HasuoArxiv: https://arxiv.org/abs/2305.17754
TLDR: Online monitoring is an effective validation approach for hybrid systems, that, at runtime, checks whether the (partial) signals of a system satisfy a specification in, e.g., Signal Temporal Logic (STL). The classic STL monitoring is performed by computing a robustness interval that specifies, at each instant, how far the monitored signals are from violating and satisfying the specification. However, since a strictness interval monotonically shrinks during monitoring, classic online monitors may fail in reporting
Repo: None
NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models
Authors: Kai Mei, Zheng Li, Zhenting Wang, Yang Zhang, Shiqing MaArxiv: https://arxiv.org/abs/2305.17826
TLDR: Prompt-based learning is vulnerable to backdoor attacks. Existing backdoor attacks against prompt-based models consider injecting backdoors into the entire embedding layers or word embedding vectors. Such attacks can be easily affected by retraining on downstream tasks and with different prompting strategies, limiting the transferability of backdoor attacks and limiting the ability of the backdoor attacks to be effective. In this work, we propose transferable backdoor attacks for prompt- based models, called NOTABLE, which is independent of
Repo: None
speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition
Authors: Haoyu Lu, Nan Li, Tongtong Song, Longbiao Wang, Jianwu Dang, Xiaobao Wang, Shiliang ZhangArxiv: https://arxiv.org/abs/2305.17860
TLDR: In recent years, the joint training of speech enhancement front-end and automatic speech recognition (ASR) back-end has been widely used to improve the robustness of ASR systems. Traditional joint training methods only use enhanced speech as input for the backend. However, it is difficult for speech enhancement systems to directly separate speech from input due to the diverse types of noise with different intensities. Furthermore, speech distortion and residual noise are often observed in enhanced speech, and the distortion of
Repo: None
Universal Mechanical Polycomputation in Granular Matter
Authors: Atoosa Parsa, Sven Witthaus, Nidhi Pashine, Corey S. O'Hern, Rebecca Kramer-Bottiglio, Josh BongardArxiv: https://arxiv.org/abs/2305.17872
TLDR: Unconventional computing devices are increasingly of interest as they can operate in environments hostile to silicon-based electronics, or compute in ways that traditional electronics cannot. Mechanical computers, wherein information processing is a material property emerging from the interaction of components with the environment, are one such class of devices. This information processing can be manifested in various physical substrates, one of which is granular matter. In a granular assembly, vibration can be treated as the information-bearing mode. This can
Repo: None
Maximizing Safety and Efficiency for Cooperative Lane-Changing: A Minimally Disruptive Approach
Authors: Andres S. Chavez Armijos, Anni Li, Christos G. CassandrasArxiv: https://arxiv.org/abs/2305.17883
TLDR: This paper addresses cooperative lane-changing maneuvers in mixed traffic, aiming to minimize traffic flow disruptions while accounting for uncooperative vehicles. The proposed approach adopts controllers combining Optimal control with Control Barrier Functions (OCBF controllers) which guarantee spatio-temporal constraints through the use of fixed-time convergence. Additionally, we introduce robustness to disturbances by deriving a method for handling worst-case disturbances using the dual of a linear programming problem. We present a near-optimal
Repo: None
Deeply Coupled Cross-Modal Prompt Learning
Authors: Xuejing Liu, Wei Tang, Jinghui Lu, Rui Zhao, Zhaojun Guo, Fei TanArxiv: https://arxiv.org/abs/2305.17903
TLDR: Recent advancements in multimodal foundation models (e.g., CLIP) have excelled in zero-shot generalization. Prompt tuning involved in the knowledge transfer from foundation models to downstream tasks has gained significant attention recently. Existing prompt-tuning methods in cross-modal learning, however, either solely focus on language branch, or learn vision-language interaction in a shallow mechanism. In this context, we propose a Deeply coupled Cross-Modal Prompt learning (D
Repo: None
Fourier Analysis on Robustness of Graph Convolutional Neural Networks for Skeleton-based Action Recognition
Authors: Nariki Tanaka, Hiroshi Kera, Kazuhiko KawamotoArxiv: https://arxiv.org/abs/2305.17939
TLDR: Using Fourier analysis, we explore the robustness and vulnerability of graph convolutional neural networks (GCNs) for skeleton-based action recognition. We adopt a joint Fourier transform (JFT), a combination of the graph Fouriertransform (GFT) and the discrete Fourier Transform (DFT), to examine the robustnesses of adversarially-trained GCNs against adversarial attacks and common corruptions. Experimental results with the NTU RGB+D dataset reveal that
Repo: None
Improving the Generalizability of Trajectory Prediction Models with Frenet-Based Domain Normalization
Authors: Luyao Ye, Zikang Zhou, Jianping WangArxiv: https://arxiv.org/abs/2305.17965
TLDR: Predicting the future trajectories of nearby objects plays a pivotal role in Robotics and Automation such as autonomous driving. While learning-based trajectory prediction methods have achieved remarkable performance on public benchmarks, the generalization ability of these approaches remains questionable. The poor generalizability on unseen domains, a well-recognized defect of data-driven approaches, can potentially harm the real-world performance of trajectory prediction models. We are thus motivated to improve generalization able of models instead of merely
Repo: None
TReR: A Lightweight Transformer Re-Ranking Approach for 3D LiDAR Place Recognition
Authors: Tiago Barros, Luís Garrote, Martin Aleksandrov, Cristiano Premebida, Urbano J. NunesArxiv: https://arxiv.org/abs/2305.18013
TLDR: Autonomous driving systems often require reliable loop closure detection to guarantee reduced localization drift. Recently, 3D LiDAR-based localization methods have used retrieval-based place recognition to find revisited places efficiently. However, when deployed in challenging real-world scenarios, the place recognition models become more complex, which comes at the cost of high computational demand. This work tackles this problem from an information-retrieval perspective, adopting a first-retrieve-then-re-ranking paradigm
Repo: None
Game of Tones: Faculty detection of GPT-4 generated content in university assessments
Authors: Mike Perkins (1), Jasper Roe (2), Darius Postma (1), James McGaughran (1), Don Hickerson (1) ((1) British University Vietnam, Vietnam, (2) James Cook University Singapore, Singapore)Arxiv: https://arxiv.org/abs/2305.18081
TLDR: This study explores the robustness of university assessments against the use of Open AI's Generative Pre-Trained Transformer 4 (GPT-4) generated content and evaluates the ability of academic staff to detect its use when supported by the Turnitin Artificial Intelligence (AI) detection tool. The research involved twenty-two GPT- 4 generated submissions being created and included in the assessment process to be marked by fifteen different faculty members. The study reveals that although the detection tool identified
Repo: None
Improved Probabilistic Image-Text Representations
Authors: Sanghyuk ChunArxiv: https://arxiv.org/abs/2305.18171
TLDR: Image-Text Matching (ITM) task, a fundamental vision-language (VL) task; suffers from the inherent ambiguity arising from multiplicity and imperfect annotations. Deterministic functions are not sufficiently powerful to capture ambiguity, prompting the exploration of probabilistic embeddings to tackle the challenge. However, the existing probabilistically ITM approach encounters two key shortcomings; the burden of heavy computations due to the Monte Carlo approximation, and the loss saturation issue in the face of
Repo: None
Contextual Knowledge Learning For Dialogue Generation
Authors: Wen Zheng, Natasa Milic-Frayling, Ke ZhouArxiv: https://arxiv.org/abs/2305.18200
TLDR: Incorporating conversational context and knowledge into dialogue generation models has been essential for improving the quality of the generated responses. The context, comprising utterances from previous dialogue exchanges, is used as a source of content for response generation and as a means of selecting external knowledge. However, to avoid introducing irrelevant content, it is key to enable fine-grained scoring of context and Knowledge. In this paper, we present a novel approach to context- knowledge weighting as an integral part
Repo: None
Online Dynamic Acknowledgement with Learned Predictions
Authors: Sungjin Im, Benjamin Moseley, Chenyang Xu, Ruilong ZhangArxiv: https://arxiv.org/abs/2305.18227
TLDR: We revisit the online dynamic acknowledgment problem. In the problem, a sequence of requests arrive over time to be acknowledged, and all outstanding requests can be satisfied simultaneously by one acknowledgement. The goal of the problem is to minimize the total request delay plus acknowledgement cost. This elegant model studies the trade-off between acknowledgement cost and waiting experienced by requests. The problem has been well studied and the tight competitive ratios have been determined. For this well-studied problem, we focus on how to effectively
Repo: None
Multi-behavior Self-supervised Learning for Recommendation
Authors: Jingcao Xu, Chaokun Wang, Cheng Wu, Yang Song, Kai Zheng, Xiaowei Wang, Changping Wang, Guorui Zhou, Kun GaiArxiv: https://arxiv.org/abs/2305.18238
TLDR: Modern recommender systems often deal with a variety of user interactions, e.g., click, forward, purchase, etc., which requires the underlying recommender engines to fully understand and leverage multi-behavior data from users. Despite recent efforts towards making use of heterogeneous data, multi-behavior recommendation still faces great challenges. Firstly, sparse target signals and noisy auxiliary interactions remain an issue. Secondly, existing methods utilizing self-supervised learning (SSL) to tackle the data sparsity
Repo: None
Keyword: scholarly
Multiscale Positive-Unlabeled Detection of AI-Generated Texts
Authors: Yuchuan Tian, Hanting Chen, Xutao Wang, Zheyuan Bai, Qinghua Zhang, Ruifeng Li, Chao Xu, Yunhe WangArxiv: https://arxiv.org/abs/2305.18149
TLDR: Recent releases of Large Language Models (LLMs), e.g. ChatGPT, are astonishing at generating human-like texts, but they may get misused for fake scholarly texts, fake news, fake tweets, et cetera. Previous works have proposed methods to detect these multiscale AI-generated texts, including simple ML classifiers, pretrained-model-based training-agnostic methods, and finetuned language classification models. However, mainstream detectors are formulated
Repo: None
Keyword: semantic similarity
Modeling Adversarial Attack on Pre-trained Language Models as Sequential Decision Making
Authors: Xuanjie Fang, Sijie Cheng, Yang Liu, Wei WangArxiv: https://arxiv.org/abs/2305.17440
TLDR: Pre-trained language models (PLMs) have been widely used to underpin various downstream tasks. However, the adversarial attack task has found that PLMs are vulnerable to small perturbations. Mainstream methods adopt a detached two-stage framework to attack without considering the subsequent influence of substitution at each step. In this paper, we formally model the multifaceted attack task on PLMs as a sequential decision-making problem, where the whole attack process is sequential with two decision-
Repo: None
Keyword: summarization
An Investigation of Evaluation Metrics for Automated Medical Note Generation
Authors: Asma Ben Abacha, Wen-wai Yim, George Michalopoulos, Thomas LinArxiv: https://arxiv.org/abs/2305.17364
TLDR: Recent studies on automatic note generation have shown that doctors can save significant amounts of time when using automatic clinical note generation (Knoll et al., 2022). Summarization models have been used for this task to generate clinical notes as summaries of doctor-patient conversations (Krishna et al,' 2021; Cai et al, 2022). However, assessing which model would best serve clinicians in their daily practice is still a challenging task due to the large set of possible correct summaries,
Repo: None
MeetingBank: A Benchmark Dataset for Meeting Summarization
Authors: Yebowen Hu, Tim Ganter, Hanieh Deilamsalehy, Franck Dernoncourt, Hassan Foroosh, Fei LiuArxiv: https://arxiv.org/abs/2305.17529
TLDR: As the number of recorded meetings increases, it becomes increasingly important to utilize summarization technology to create useful summaries of these recordings. However, there is a crucial lack of annotated meeting corpora for developing this technology, as it can be hard to collect meetings, especially when the topics discussed are confidential. Furthermore, meeting summaries written by experienced writers are scarce, making it hard for abstractive summarizers to produce sensible output without a reliable reference. This lack of annotationated corpora
Repo: None
Abstractive Summarization as Augmentation for Document-Level Event Detection
Authors: Janko Vidaković, Filip Karlo Došilović, Domagoj PluščecArxiv: https://arxiv.org/abs/2305.18023
TLDR: Transformer-based models have consistently produced substantial performance gains across a variety of NLP tasks, compared to shallow models. However, deep models are orders of magnitude more computationally expensive than shallow models, especially on tasks with large sequence lengths, such as document-level event detection. In this work, we attempt to bridge the performance gap between shallow and deep models on document- level event detection by using abstractive text summarization as an augmentation method. We augment the DocEE dataset
Repo: None
Assess and Summarize: Improve Outage Understanding with Large Language Models
Authors: Pengxiang Jin, Shenglin Zhang, Minghua Ma, Haozhe Li, Yu Kang, Liqun Li, Yudong Liu, Bo Qiao, Chaoyun Zhang, Pu Zhao, Shilin He, Federica Sarro, Yingnong Dang, Saravan Rajmohan, Qingwei Lin, Dongmei ZhangArxiv: https://arxiv.org/abs/2305.18084
TLDR: Cloud systems have become increasingly popular in recent years due to their flexibility and scalability. Each time cloud computing applications and services hosted on the cloud are affected by a cloud outage, users can experience slow response times, connection issues or total service disruption, resulting in a significant negative business impact. Outages are usually comprised of several concurring events/source causes, and therefore understanding the context of outages is a very challenging yet crucial first step toward mitigating and resolving outages. In current practice
Repo: None
The Utility of Large Language Models and Generative AI for Education Research
Authors: Andrew Katz, Umair Shakir, Ben ChambersArxiv: https://arxiv.org/abs/2305.18125
TLDR: The use of natural language processing (NLP) techniques in engineering education can provide valuable insights into the underlying processes involved in generating text. While accessing these insights can be labor-intensive if done manually, recent advances in NLP and large language models have made it a realistic option for individuals. This study explores and evaluates a combination of clustering, summarization, and prompting techniques to analyze over 1,000 student essays in which students discussed their career interests. The specific assignment prompted students to
Repo: None
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Authors: Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea FinnArxiv: https://arxiv.org/abs/2305.18290
TLDR: While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely unsupersupervised nature of their training. Existing methods for gaining such steerability collect human labels of the relative quality of model generations and fine-tune the unSupervised LM to align with these preferences, often with reinforcement learning from human feedback (RLHF). However, RLHF is a complex and often
Repo: None
Keyword: text generation
KoSBI: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application
Authors: Hwaran Lee, Seokhee Hong, Joonsuk Park, Takyoung Kim, Gunhee Kim, Jung-Woo HaArxiv: https://arxiv.org/abs/2305.17701
TLDR: Large language models (LLMs) learn not only natural text generation abilities but also social biases against different demographic groups from real-world data. This poses a critical risk when deploying LLM-based applications. Existing research and resources are not readily applicable in South Korea due to the differences in language and culture, both of which significantly affect the biases and targeted demographic groups. This limitation requires localized social bias datasets to ensure the safe and effective deployment of LLMs. To this end, we
Repo: None
RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename Refactoring
Authors: Hao Liu, Yanlin Wang, Zhao Wei, Yong Xu, Juhong Wang, Hui Li, Rongrong JiArxiv: https://arxiv.org/abs/2305.17708
TLDR: Refactoring is an indispensable practice of improving the quality and maintainability of source code in software evolution. Rename refactoring, or renaming, is the most frequently performed refactororing that suggests a new name for an identifier to enhance readability when the identifier is poorly named. However, most existing works only identify renaming activities between two versions of source software, while few works express concern about how to suggest a new Name. In this paper, we study automatic rename ref
Repo: None
Abstractive Summarization as Augmentation for Document-Level Event Detection
Authors: Janko Vidaković, Filip Karlo Došilović, Domagoj PluščecArxiv: https://arxiv.org/abs/2305.18023
TLDR: Transformer-based models have consistently produced substantial performance gains across a variety of NLP tasks, compared to shallow models. However, deep models are orders of magnitude more computationally expensive than shallow models, especially on tasks with large sequence lengths, such as document-level event detection. In this work, we attempt to bridge the performance gap between shallow and deep models on document- level event detection by using abstractive text summarization as an augmentation method. We augment the DocEE dataset
Repo: None
GripRank: Bridging the Gap between Retrieval and Generation via the Generative Knowledge Improved Passage Ranking
Authors: Jiaqi Bai, Hongcheng Guo, Jiaheng Liu, Jian Yang, Xinnian Liang, Zhao Yan, Zhoujun LiArxiv: https://arxiv.org/abs/2305.18144
TLDR: Retrieval-enhanced text generation, which aims to leverage passages retrieved from a large passage corpus for delivering a proper answer given the input query, has shown remarkable progress on knowledge-intensive language tasks such as open-domain question answering and knowledge-enhancing dialogue generation. However, the retrieved passages are not ideal for guiding answer generation because of the discrepancy between retrieval and generation, i.e., the candidate passages are all treated equally during the retrieval procedure without considering their potential to generate
Repo: None
A Critical Evaluation of Evaluations for Long-form Question Answering
Authors: Fangyuan Xu, Yixiao Song, Mohit Iyyer, Eunsol ChoiArxiv: https://arxiv.org/abs/2305.18201
TLDR: Long-form question answering (LFQA) enables answering a wide range of questions, but its flexibility poses enormous challenges for evaluation. We perform the first targeted study of the evaluation of long-form answers, covering both human and automatic evaluation practices. We hire domain experts in seven areas to provide preference judgments over pairs of answers, along with free-form justifications for their choices. We present a careful analysis of experts' evaluation, which focuses on new aspects such as the comprehens
Repo: None
HowkGPT: Investigating the Detection of ChatGPT-generated University Student Homework through Context-Aware Perplexity Analysis
Authors: Christoforos Vasilatos, Manaar Alam, Talal Rahwan, Yasir Zaki, Michail ManiatakosArxiv: https://arxiv.org/abs/2305.18226
TLDR: As the use of Large Language Models (LLMs) in text generation tasks proliferates, concerns arise over their potential to compromise academic integrity. The education sector currently tussles with distinguishing student-authored homework assignments from AI-generated ones. This paper addresses the challenge by introducing HowkGPT, designed to identify homework assignments generated by AI. HowkgPT is built upon a dataset of academic assignments and accompanying metadata [17] and employs a pretrained LLM to compute perplex
Repo: None
GlyphControl: Glyph Conditional Control for Visual Text Generation
Authors: Yukang Yang, Dongnan Gui, Yuhui Yuan, Haisong Ding, Han Hu, Kai ChenArxiv: https://arxiv.org/abs/2305.18259
TLDR: Recently, there has been a growing interest in developing diffusion-based text-to-image generative models capable of generating coherent and well-formed visual text. In this paper, we propose a novel and efficient approach called GlyphControl to address this task. Unlike existing methods that rely on character-aware text encoders like ByT5 and require retraining of text- to-image models, our approach leverages additional glyph conditional information to enhance the performance of the off-
Repo: None
Transformer Language Models Handle Word Frequency in Prediction Head
Authors: Goro Kobayashi, Tatsuki Kuribayashi, Sho Yokoi, Kentaro InuiArxiv: https://arxiv.org/abs/2305.18294
TLDR: Prediction head is a crucial component of Transformer language models. Despite its direct impact on prediction, this component has often been overlooked in analyzing Transformers. In this study, we investigate the inner workings of the prediction head, specifically focusing on bias parameters. Our experiments with BERT and GPT-2 models reveal that the biases in their word prediction heads play a significant role in the models' ability to reflect word frequency in a corpus, aligning with the logit adjustment method commonly used
Repo: None
The text was updated successfully, but these errors were encountered: