Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New submissions for Tue, 30 May 23 #364

Open
e-tornike opened this issue May 30, 2023 · 0 comments
Open

New submissions for Tue, 30 May 23 #364

e-tornike opened this issue May 30, 2023 · 0 comments

Comments

@e-tornike
Copy link
Owner

Keyword: abstract meaning representation

Slide, Constrain, Parse, Repeat: Synchronous SlidingWindows for Document AMR Parsing

Authors: Sadhana Kumaravel, Tahira Naseem, Ramon Fernandez Astudillo, Radu Florian, Salim Roukos
Arxiv: https://arxiv.org/abs/2305.17273
TLDR: The sliding window approach provides an elegant way to handle contexts of sizes larger than the Transformer's input window, for tasks like language modeling. Here we extend this approach to the sequence-to-sequence task of document parsing. For this, we exploit recent progress in transition-based parsing to implement a parser with synchronous sliding windows over source and target. We develop an oracle and a parser for document-level AMR by expanding on Structured-BART such that it lever
Repo: None

Keyword: computational social science

From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models

Authors: Julia Mendelsohn, Ronan Le Bras, Yejin Choi, Maarten Sap
Arxiv: https://arxiv.org/abs/2305.17174
TLDR: Dogwhistles are coded expressions that simultaneously convey one meaning to a broad audience and a second one, often hateful or provocative, to a narrow in-group; they are deployed to evade both political repercussions and algorithmic content moderation. For example, in the sentence 'we need to end the cosmopolitan experiment,' the word 'cosmopolitan' likely means 'worldly' to many, but secretly means 'Jewish' to a select few. We present the first large-scale computational investigation
Repo: None

Keyword: contrastive

MT-SLVR: Multi-Task Self-Supervised Learning for Transformation In(Variant) Representations

Authors: Calum Heggan, Tim Hospedales, Sam Budgett, Mehrdad Yaghoobi
Arxiv: https://arxiv.org/abs/2305.17191
TLDR: Contrastive self-supervised learning has gained attention for its ability to create high-quality representations from large unlabelled data sets. A key reason that these powerful features enable data-efficient learning of downstream tasks is that they provide augmentation invariance, which is often a useful inductive bias. However, the amount and type of invariances preferred is not known apriori, and varies across different downstream tasks. We therefore propose a multi-task self-Supervised
Repo: None

Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities

Authors: Jingyuan Sun, Mingxiao Li, Zijiao Chen, Yunhao Zhang, Shaonan Wang, Marie-Francine Moens
Arxiv: https://arxiv.org/abs/2305.17214
TLDR: Decoding visual stimuli from neural responses recorded by functional Magnetic Resonance Imaging (fMRI) presents an intriguing intersection between cognitive neuroscience and machine learning, promising advancements in understanding human visual perception and building non-invasive brain-machine interfaces. However, the task is challenging due to the noisy nature of fMRI signals and the intricate pattern of brain visual representations. To mitigate these challenges, we introduce a two-phase fMRI representation learning framework. The first phase pre-trains an f
Repo: None

CODET: A Benchmark for Contrastive Dialectal Evaluation of Machine Translation

Authors: Md Mahfuz Ibn Alam, Sina Ahmadi, Antonios Anastasopoulos
Arxiv: https://arxiv.org/abs/2305.17267
TLDR: Neural machine translation (NMT) systems exhibit limited robustness in handling source-side linguistic variations. Their performance tends to degrade when faced with even slight deviations in language usage, such as different domains or variations introduced by second-language speakers. It is intuitive to extend this observation to encompass dialectal variations as well, but the work allowing the community to evaluate MT systems on this dimension is limited. To alleviate this issue, we compile and release \dataset, a contrastive
Repo: None

Kernel-SSL: Kernel KL Divergence for Self-supervised Learning

Authors: Yifan Zhang, Zhiquan Tan, Jingqin Yang, Yang Yuan
Arxiv: https://arxiv.org/abs/2305.17326
TLDR: Contrastive learning usually compares one positive anchor sample with lots of negative samples to perform Self-Supervised Learning (SSL). Alternatively, non-contrastive Learning, as exemplified by methods like BYOL, SimSiam, and Barlow Twins, accomplishes SSL without the explicit use of negativeamples. Inspired by the existing analysis for contrastive learning, we provide a reproducing kernel Hilbert space (RKHS) understanding of many existing non-ContrastIVE learning methods
Repo: None

Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser

Authors: Yung-Hsuan Lai, Yen-Chun Chen, Yu-Chiang Frank Wang
Arxiv: https://arxiv.org/abs/2305.17343
TLDR: Audio-visual learning has been a major pillar of multi-modal machine learning, where the community mostly focused on its modality-aligned setting, i.e., the audio and visual modality are both assumed to signal the prediction target. With the Look, Listen, and Parse dataset (LLP), we investigate the under-explored unaligned setting. This is the case where the goal is to recognize video and visual events in a video with only weak labels observed.
Repo: None

Zero- and Few-Shot Event Detection via Prompt-Based Meta Learning

Authors: Zhenrui Yue, Huimin Zeng, Mengfei Lan, Heng Ji, Dong Wang
Arxiv: https://arxiv.org/abs/2305.17373
TLDR: With emerging online topics as a source for numerous new events, detecting unseen / rare event types presents an elusive challenge for existing event detection methods, where only limited data access is provided for training. To address the data scarcity problem in event detection, we propose MetaEvent, a meta learning-based framework for zero- and few-shot event detection. Specifically, we sample training tasks from existing event types and perform meta training to search for optimal parameters that quickly adapt to unseen tasks. In our
Repo: None

GIMM: InfoMin-Max for Automated Graph Contrastive Learning

Authors: Xin Xiong (1), Furao Shen (1), Xiangyu Wang (1), Jian Zhao (2) ((1) School of Artificial Intelligence, Nanjing University, (2) School of Electronic Science and Engineering, Nanjing University)
Arxiv: https://arxiv.org/abs/2305.17437
TLDR: Graph contrastive learning (GCL) shows great potential in unsupervised graph representation learning. Data augmentation plays a vital role in GCL, and its optimal choice heavily depends on the downstream task. Many GCL methods with automated data augmentation face the risk of insufficient information as they fail to preserve the essential information necessary for the upstream task. To solve this problem, we propose InfoMin-Max for automated Graph contrastive Learning (GIMM), which prevents GCL from
Repo: None

Decoupling Pseudo Label Disambiguation and Representation Learning for Generalized Intent Discovery

Authors: Yutao Mou, Xiaoshuai Song, Keqing He, Chen Zeng, Pei Wang, Jingang Wang, Yunsen Xian, Weiran Xu
Arxiv: https://arxiv.org/abs/2305.17699
TLDR: Generalized intent discovery aims to extend a closed-set in-domain intent classifier to an open-world intent set including in- domain and out-of-domain intents. The key challenges lie in pseudo label disambiguation and representation learning. Previous methods suffer from a coupling of pseudo label Disambiguated and representation Learning, that is, the reliability of pseudo labels relies on representation learning, and representationlearning is restricted by pseudo labels in turn. In this paper,
Repo: None

RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename Refactoring

Authors: Hao Liu, Yanlin Wang, Zhao Wei, Yong Xu, Juhong Wang, Hui Li, Rongrong Ji
Arxiv: https://arxiv.org/abs/2305.17708
TLDR: Refactoring is an indispensable practice of improving the quality and maintainability of source code in software evolution. Rename refactoring, or renaming, is the most frequently performed refactororing that suggests a new name for an identifier to enhance readability when the identifier is poorly named. However, most existing works only identify renaming activities between two versions of source software, while few works express concern about how to suggest a new Name. In this paper, we study automatic rename ref
Repo: None

Whitening-based Contrastive Learning of Sentence Embeddings

Authors: Wenjie Zhuo, Yifan Sun, Xiaohan Wang, Linchao Zhu, Yi Yang
Arxiv: https://arxiv.org/abs/2305.17746
TLDR: This paper presents a whitening-based contrastive learning method for sentence embedding learning (WhitenedCSE), which combines contrastive Learning with a novel shuffled group whitening. Generally, contrastive training pulls distortions of a single sample (i.e., positive samples) close and push negative samples far away, correspondingly facilitating the alignment and uniformity in the feature space. A popular alternative to the "pushing'' operation is whitening the feature spaces, which scatters
Repo: None

Point-PC: Point Cloud Completion Guided by Prior Knowledge via Causal Inference

Authors: Weizhi Nie, Chuanqi Jiao, Ruidong Chen, Weijie Wang, Bruno Lepri, Nicu Sebe, Anan Liu
Arxiv: https://arxiv.org/abs/2305.17770
TLDR: Point cloud completion aims to recover raw point clouds captured by scanners from partial observations caused by occlusion and limited view angles. Many approaches utilize a partial-complete paradigm in which missing parts are directly predicted by a global feature learned from partial inputs. This makes it hard to recover details because the global feature is unlikely to capture the full details of all missing parts. In this paper, we propose a novel approach to point cloud completion called Point-PC, which uses a memory network to retrieve
Repo: None

Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal Action Localization

Authors: Huan Ren, Wenfei Yang, Tianzhu Zhang, Yongdong Zhang
Arxiv: https://arxiv.org/abs/2305.17861
TLDR: Weakly-supervised temporal action localization aims to localize and recognize actions in untrimmed videos with only video-level category labels during training. Without instance-level annotations, most existing methods follow the Segment-based Multiple Instance Learning (S-MIL) framework, where the predictions of segments are supervised by the labels of videos. However, the objective for acquiring segment-level scores during training is not consistent with the target for acquiring proposal-level score during testing,
Repo: https://github.com/RenHuan1999/CVPR2023_P-MIL

ContrastNER: Contrastive-based Prompt Tuning for Few-shot NER

Authors: Amirhossein Layegh, Amir H. Payberah, Ahmet Soylu, Dumitru Roman, Mihhail Matskin
Arxiv: https://arxiv.org/abs/2305.17951
TLDR: Prompt-based language models have produced encouraging results in numerous applications, including Named Entity Recognition (NER) tasks. NER aims to identify entities in a sentence and provide their types. However, the strong performance of most available NER approaches is heavily dependent on the design of discrete prompts and a verbalizer to map the model-predicted outputs to entity categories, which are complicated undertakings. To address these challenges, we present ContrastNER, a prompt-based NER framework
Repo: None

Multi-Modal Face Stylization with a Generative Prior

Authors: Mengtian Li, Yi Dong, Minxuan Lin, Haibin Huang, Pengfei Wan, Chongyang Ma
Arxiv: https://arxiv.org/abs/2305.18009
TLDR: In this work, we introduce a new approach for artistic face stylization. Despite existing methods achieving impressive results in this task, there is still room for improvement in generating high-quality stylized faces with diverse styles and accurate facial reconstruction. Our proposed framework, MMFS, supports multi-modal face stylizing by leveraging the strengths of StyleGAN and integrates it into an encoder-decoder architecture. Specifically, we use the mid-resolution and high-resolution layers of StyleG
Repo: None

Abstractive Summarization as Augmentation for Document-Level Event Detection

Authors: Janko Vidaković, Filip Karlo Došilović, Domagoj Pluščec
Arxiv: https://arxiv.org/abs/2305.18023
TLDR: Transformer-based models have consistently produced substantial performance gains across a variety of NLP tasks, compared to shallow models. However, deep models are orders of magnitude more computationally expensive than shallow models, especially on tasks with large sequence lengths, such as document-level event detection. In this work, we attempt to bridge the performance gap between shallow and deep models on document- level event detection by using abstractive text summarization as an augmentation method. We augment the DocEE dataset
Repo: None

Semantic Role Labeling Guided Out-of-distribution Detection

Authors: Jinan Zou, Maihao Guo, Yu Tian, Yuhao Lin, Haiyao Cao, Lingqiao Liu, Ehsan Abbasnejad, Javen Qinfeng Shi
Arxiv: https://arxiv.org/abs/2305.18026
TLDR: Identifying unexpected domain-shifted instances in natural language processing is crucial in real-world applications. Previous works identify the OOD instance by leveraging a single global feature embedding to represent the sentence, which cannot characterize subtle OOD patterns well. Another major challenge current OOD methods face is learning effective low-dimensional sentence representations to identify the hard OOD instances that are semantically similar to the ID data. In this paper, we propose a new unsupervised OOD detection method
Repo: None

Contrastive Learning Based Recursive Dynamic Multi-Scale Network for Image Deraining

Authors: Zhiying Jiang, Risheng Liu, Shuzhou Yang, Zengxi Zhang, Xin Fan
Arxiv: https://arxiv.org/abs/2305.18092
TLDR: Rain streaks significantly decrease the visibility of captured images and are also a stumbling block that restricts the performance of subsequent computer vision applications. The existing deep learning-based image deraining methods employ manually crafted networks and learn a straightforward projection from rainy images to clear images. In pursuit of better deraining performance, they focus on elaborating a more complicated architecture rather than exploiting the intrinsic properties of the positive and negative information. In this paper, we propose a contrastive learning-Based Image deraining method
Repo: None

Reason to explain: Interactive contrastive explanations (REASONX)

Authors: Laura State, Salvatore Ruggieri, Franco Turini
Arxiv: https://arxiv.org/abs/2305.18143
TLDR: Many high-performing machine learning models are not interpretable. As they are increasingly used in decision scenarios that can critically affect individuals, it is necessary to develop tools to better understand their outputs. Popular explanation methods include contrastive explanations. However, they suffer several shortcomings, among others an insufficient incorporation of background knowledge, and a lack of interactivity. While (dialogue-like) interactivity is important to better communicate an explanation, background knowledge has the potential to significantly improve their quality,
Repo: None

LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning

Authors: Amirhossein Abaskohi, Sascha Rothe, Yadollah Yaghoobzadeh
Arxiv: https://arxiv.org/abs/2305.18169
TLDR: In recent years, there has been significant progress in developing pre-trained language models for NLP. However, these models often struggle when fine-tuned on small datasets. To address this issue, researchers have proposed various adaptation approaches. Prompt-based tuning is arguably the most common way, especially for larger models. Previous research shows that adding contrastive learning to prompt-based fine-tuning is effective as it helps the model generate embeddings that are more distinguishable between classes
Repo: None

Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors

Authors: Paul S. Scotti, Atmadeep Banerjee, Jimmie Goode, Stepan Shabalin, Alex Nguyen, Ethan Cohen, Aidan J. Dempster, Nathalie Verlinde, Elad Yundler, David Weisberg, Kenneth A. Norman, Tanishq Mathew Abraham
Arxiv: https://arxiv.org/abs/2305.18274
TLDR: We present MindEye, a novel fMRI-to-image approach to retrieve and reconstruct viewed images from brain activity. Our model comprises two parallel submodules that are specialized for retrieval (using contrastive learning) and reconstruction (using a diffusion prior). MindEye can map fMRI brain activity to any high dimensional multimodal latent space, like CLIP image space, enabling image reconstruction using generative models that accept embeddings from this latent space. We comprehensively compare our approach
Repo: None

Keyword: data augmentation

Generalization Error without Independence: Denoising, Linear Regression, and Transfer Learning

Authors: Chinmaya Kausik, Kashvi Srivastava, Rishi Sonthalia
Arxiv: https://arxiv.org/abs/2305.17297
TLDR: Studying the generalization abilities of linear models with real data is a central question in statistical learning. While there exist a limited number of prior important works (Loureiro et al. (2021A, 2021B), Wei et al., and Xu 2020) that do validate theoretical work with Real data, these works have limitations due to technical assumptions. These assumptions include having a well-conditioned covariance matrix and having independent and identically distributed data. These assumption are not necessarily
Repo: None

Disambiguated Lexically Constrained Neural Machine Translation

Authors: Jinpeng Zhang, Nini Xiao, Ke Wang, Chuanqi Dong, Xiangyu Duan, Yuqi Zhang, Min Zhang
Arxiv: https://arxiv.org/abs/2305.17351
TLDR: Lexically constrained neural machine translation (LCNMT), which controls the translation generation with pre-specified constraints, is important in many practical applications. Current approaches to LCNMT typically assume that the pre-defined lexical constraints are contextually appropriate. This assumption limits their application to real-world scenarios where a source lexicon may have multiple target constraints, and disambiguation is needed to select the most suitable one. In this paper, we propose disambambiguated LC
Repo: None

GIMM: InfoMin-Max for Automated Graph Contrastive Learning

Authors: Xin Xiong (1), Furao Shen (1), Xiangyu Wang (1), Jian Zhao (2) ((1) School of Artificial Intelligence, Nanjing University, (2) School of Electronic Science and Engineering, Nanjing University)
Arxiv: https://arxiv.org/abs/2305.17437
TLDR: Graph contrastive learning (GCL) shows great potential in unsupervised graph representation learning. Data augmentation plays a vital role in GCL, and its optimal choice heavily depends on the downstream task. Many GCL methods with automated data augmentation face the risk of insufficient information as they fail to preserve the essential information necessary for the upstream task. To solve this problem, we propose InfoMin-Max for automated Graph contrastive Learning (GIMM), which prevents GCL from
Repo: None

Toward Understanding Generative Data Augmentation

Authors: Chenyu Zheng, Guoqiang Wu, Chongxuan Li
Arxiv: https://arxiv.org/abs/2305.17476
TLDR: Generative data augmentation, which scales datasets by obtaining fake labeled examples from a trained conditional generative model, boosts classification performance in various learning tasks including (semi-)supervised learning, few-shot learning, and adversarially robust learning. However, little work has theoretically investigated the effect of generative data Augmentation. To fill this gap, we establish a general stability bound in this not independently and identically distributed (non-i.i.d.) setting, where
Repo: None

Spot keywords from very noisy and mixed speech

Authors: Ying Shi, Dong Wang, Lantian Li, Jiqing Han, Shi Yin
Arxiv: https://arxiv.org/abs/2305.17706
TLDR: Most existing keyword spotting research focuses on conditions with slight or moderate noise. In this paper, we try to tackle a more challenging task: detecting keywords buried under strong interfering speech (10 times higher than the keyword in amplitude), and even worse, mixed with other keywords. We propose a novel Mix Training (MT) strategy that encourages the model to discover low-energy keywords from noisy and mixed speech. Experiments were conducted with a vanilla CNN and two EfficientNet (B0/B
Repo: None

StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation

Authors: Kun Song, Yi Ren, Yi Lei, Chunfeng Wang, Kun Wei, Lei Xie, Xiang Yin, Zejun Ma
Arxiv: https://arxiv.org/abs/2305.17732
TLDR: Direct speech-to-speech translation (S2ST) has gradually become popular as it has many advantages compared with cascade S2ST. However, current research mainly focuses on the accuracy of semantic translation and ignores the speech style transfer from a source language to a target language. The lack of high-fidelity expressive parallel data makes such style transfer challenging, especially in more practical zero-shot scenarios. To solve this problem, we first build a parallel corpus using a multi-lingual
Repo: None

Targeted Data Generation: Finding and Fixing Model Weaknesses

Authors: Zexue He, Marco Tulio Ribeiro, Fereshte Khani
Arxiv: https://arxiv.org/abs/2305.17804
TLDR: Even when aggregate accuracy is high, state-of-the-art NLP models often fail systematically on specific subgroups of data, resulting in unfair outcomes and eroding user trust. Additional data collection may not help in addressing these weaknesses, as such challenging subgroups may be unknown to users, and underrepresented in the existing and new data. We propose Targeted Data Generation (TDG), a framework that automatically identifies challenging subgroup, and generates new data for those subgroups using
Repo: None

Data Augmentation for Low-Resource Keyphrase Generation

Authors: Krishna Garg, Jishnu Ray Chowdhury, Cornelia Caragea
Arxiv: https://arxiv.org/abs/2305.17968
TLDR: Keyphrase generation is the task of summarizing the contents of any given article into a few salient phrases (or keyphrases). Existing works for the task mostly rely on large-scale annotated datasets, which are not easy to acquire. Very few works address the problem of keyphrase generation in low-resource settings, but they still rely on a lot of additional unlabeled data for pretraining and on automatic methods for pseudo-annotations. In this paper, we
Repo: None

Extrinsic Factors Affecting the Accuracy of Biomedical NER

Authors: Zhiyi Li, Shengjie Zhang, Yujie Song, Jungyeul Park
Arxiv: https://arxiv.org/abs/2305.18152
TLDR: Biomedical named entity recognition (NER) is a critial task that aims to identify structured information in clinical text, which is often replete with complex, technical terms and a high degree of variability. Accurate and reliable NER can facilitate the extraction and analysis of important biomedical information, which can be used to improve downstream applications including the healthcare system. However, NER in the biomedical domain is challenging due to limited data availability, as the high expertise, time, and expenses are required
Repo: None

LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning

Authors: Amirhossein Abaskohi, Sascha Rothe, Yadollah Yaghoobzadeh
Arxiv: https://arxiv.org/abs/2305.18169
TLDR: In recent years, there has been significant progress in developing pre-trained language models for NLP. However, these models often struggle when fine-tuned on small datasets. To address this issue, researchers have proposed various adaptation approaches. Prompt-based tuning is arguably the most common way, especially for larger models. Previous research shows that adding contrastive learning to prompt-based fine-tuning is effective as it helps the model generate embeddings that are more distinguishable between classes
Repo: None

Improved Probabilistic Image-Text Representations

Authors: Sanghyuk Chun
Arxiv: https://arxiv.org/abs/2305.18171
TLDR: Image-Text Matching (ITM) task, a fundamental vision-language (VL) task; suffers from the inherent ambiguity arising from multiplicity and imperfect annotations. Deterministic functions are not sufficiently powerful to capture ambiguity, prompting the exploration of probabilistic embeddings to tackle the challenge. However, the existing probabilistically ITM approach encounters two key shortcomings; the burden of heavy computations due to the Monte Carlo approximation, and the loss saturation issue in the face of
Repo: None

Rethinking Counterfactual Data Augmentation Under Confounding

Authors: Abbavaram Gowtham Reddy, Saketh Bachu, Saloni Dash, Charchit Sharma, Amit Sharma, Vineeth N Balasubramanian
Arxiv: https://arxiv.org/abs/2305.18183
TLDR: Counterfactual data augmentation has recently emerged as a method to mitigate confounding biases in the training data for a machine learning model. These biases, such as spurious correlations, arise due to various observed and unobserved confounding variables in the data generation process. In this paper, we formally analyze how confounding biases impact downstream classifiers and present a causal viewpoint to the solutions based on counterfactualdata augmentation. We explore how removing confounding biases serves as a means to learn invariant features,
Repo: None

Keyword: knowledge graph

A Categorical Representation Language and Computational System for Knowledge-Based Planning

Authors: Angeline Aguinaldo, Evan Patterson, James Fairbanks, Jaime Ruiz
Arxiv: https://arxiv.org/abs/2305.17208
TLDR: Classical planning representation languages based on first-order logic have been extensively used to model and solve planning problems, but they struggle to capture implicit preconditions and effects that arise in complex planning scenarios. To address this problem, we propose an alternative approach to representing and transforming world states during planning. Based on the category-theoretic concepts of $\mathsf{C}$-sets and double-pushout rewriting (DPO), our proposed representation can effectively handle structured knowledge about
Repo: None

Choose your Data Wisely: A Framework for Semantic Counterfactuals

Authors: Edmund Dervakos, Konstantinos Thomas, Giorgos Filandrianos, Giorgos Stamou
Arxiv: https://arxiv.org/abs/2305.17667
TLDR: Counterfactual explanations have been argued to be one of the most intuitive forms of explanation. They are typically defined as a minimal set of edits on a given data sample that, when applied, changes the output of a model on that sample. However, a minimal Set of edits is not always clear and understandable to an end-user, as it could, for instance, constitute an adversarial example (which is indistinguishable from the original data sample to an End-user). Instead, there
Repo: None

Sequential Condition Evolved Interaction Knowledge Graph for Traditional Chinese Medicine Recommendation

Authors: Jingjin Liu, Hankz Hankui Zhuo, Kebing Jin, Jiamin Yuan, Zhimin Yang, Zhengan Yao
Arxiv: https://arxiv.org/abs/2305.17866
TLDR: Traditional Chinese Medicine (TCM) has a rich history of utilizing natural herbs to treat a diversity of illnesses. In practice, TCM diagnosis and treatment are highly personalized and organically holistic, requiring comprehensive consideration of the patient's state and symptoms over time. However, existing TCM recommendation approaches overlook the changes in patient status and only explore potential patterns between symptoms and prescriptions. In this paper, we propose a novel Sequential Condition Evolved Interaction Knowledge Graph (SCEIKG),
Repo: None

Representation Learning on Hyper-Relational and Numeric Knowledge Graphs with Transformers

Authors: Chanyoung Chung, Jaejun Lee, Joyce Jiyoung Whang
Arxiv: https://arxiv.org/abs/2305.18256
TLDR: A hyper-relational knowledge graph has been recently studied where a triplet is associated with a set of qualifiers; a qualifier is composed of a relation and an entity, providing auxiliary information for a Triplet. While existing hyper-Relational knowledge Graph embedding methods assume that the entities are discrete objects, some information should be represented using numeric values, e.g., (J.R.R., was born in, 1892). Also, a triplett (J.-R
Repo: None

Keyword: legal

NaturalFinger: Generating Natural Fingerprint with Generative Adversarial Networks

Authors: Kang Yang, Kunhao Lai
Arxiv: https://arxiv.org/abs/2305.17868
TLDR: Deep neural network (DNN) models have become a critical asset of the model owner as training them requires a large amount of resource (i.e. labeled data). Therefore, many fingerprinting schemes have been proposed to safeguard the intellectual property (IP) of the Model owner against model extraction and illegal redistribution. However, previous schemes adopt unnatural images as the fingerprint, such as adversarial examples and noisy images, which can be easily perceived and rejected by the adversary. In this paper,
Repo: None

Keyword: mixup

Spot keywords from very noisy and mixed speech

Authors: Ying Shi, Dong Wang, Lantian Li, Jiqing Han, Shi Yin
Arxiv: https://arxiv.org/abs/2305.17706
TLDR: Most existing keyword spotting research focuses on conditions with slight or moderate noise. In this paper, we try to tackle a more challenging task: detecting keywords buried under strong interfering speech (10 times higher than the keyword in amplitude), and even worse, mixed with other keywords. We propose a novel Mix Training (MT) strategy that encourages the model to discover low-energy keywords from noisy and mixed speech. Experiments were conducted with a vanilla CNN and two EfficientNet (B0/B
Repo: None

Conditional Score Guidance for Text-Driven Image-to-Image Translation

Authors: Hyunsoo Lee, Minsoo Kang, Bohyung Han
Arxiv: https://arxiv.org/abs/2305.18007
TLDR: We present a novel algorithm for text-driven image-to-image translation based on a pretrained text- to-image diffusion model. Our method aims to generate a target image by selectively editing the regions of interest in a source image, defined by a modifying text, while preserving the remaining parts. In contrast to existing techniques that solely rely on a target prompt, we introduce a new score function, which considers both a source prompt and a source images, tailored to address specific translation tasks
Repo: None

Keyword: multi-task

MT-SLVR: Multi-Task Self-Supervised Learning for Transformation In(Variant) Representations

Authors: Calum Heggan, Tim Hospedales, Sam Budgett, Mehrdad Yaghoobi
Arxiv: https://arxiv.org/abs/2305.17191
TLDR: Contrastive self-supervised learning has gained attention for its ability to create high-quality representations from large unlabelled data sets. A key reason that these powerful features enable data-efficient learning of downstream tasks is that they provide augmentation invariance, which is often a useful inductive bias. However, the amount and type of invariances preferred is not known apriori, and varies across different downstream tasks. We therefore propose a multi-task self-Supervised
Repo: None

Coping with low data availability for social media crisis message categorisation

Authors: Congcong Wang
Arxiv: https://arxiv.org/abs/2305.17211
TLDR: During crisis situations, social media allows people to quickly share information, including messages requesting help. This can be valuable to emergency responders, who need to categorise and prioritise these messages based on the type of assistance being requested. However, the high volume of messages makes it difficult to filter and prioritised them without the use of computational techniques. Fully supervised filtering techniques for crisis message categorisation typically require a large amount of annotated training data, but this can be difficult to obtain during an
Repo: None

DynaShare: Task and Instance Conditioned Parameter Sharing for Multi-Task Learning

Authors: Elahe Rahimian, Golara Javadi, Frederick Tung, Gabriel Oliveira
Arxiv: https://arxiv.org/abs/2305.17305
TLDR: Multi-task networks rely on effective parameter sharing to achieve robust generalization across tasks. In this paper, we present a novel parameter sharing method for multi-task learning that conditions parameter sharing on both the task and the intermediate feature representations at inference time. In contrast to traditional parameter sharing approaches, which fix or learn a deterministic sharing pattern during training and apply the same pattern to all examples during inference, we propose to dynamically decide which parts of the network to activate and which parts to activate
Repo: None

Understanding Emotion Valence is a Joint Deep Learning Task

Authors: Gabriel Roccabruna, Seyed Mahed Mousavi, Giuseppe Riccardi
Arxiv: https://arxiv.org/abs/2305.17422
TLDR: The valence analysis of speakers' utterances or written posts helps to understand the activation and variations of the emotional state throughout the conversation. More recently, the concept of Emotion Carriers (EC) has been introduced to explain the emotion felt by the speaker and its manifestations. In this work, we investigate the natural inter-dependency of valence and ECs via a multi-task learning approach. We experiment with Pre-trained Language Models (PLM) for single-task
Repo: None

A Match Made in Heaven: A Multi-task Framework for Hyperbole and Metaphor Detection

Authors: Naveen Badathala (1), Abisek Rajakumar Kalarani (1), Tejpalsingh Siledar (1), Pushpak Bhattacharyya (1), ((1) Indian Institute of Technology Bombay)
Arxiv: https://arxiv.org/abs/2305.17480
TLDR: Hyperbole and metaphor are common in day-to-day communication (e.g., "I am in deep trouble": how does trouble have depth?), which makes their detection important, especially in a conversational AI setting. Existing approaches to automatically detect metaphor and hyperbole have studied these language phenomena independently, but their relationship has hardly, if ever, been explored computationally. In this paper, we propose a multi-task deep learning framework to detect hyperbole and symbol simultaneously.
Repo: None

Towards computing low-makespan solutions for multi-arm multi-task planning problems

Authors: Hartmann Valentin N., Toussaint Marc
Arxiv: https://arxiv.org/abs/2305.17527
TLDR: We propose an approach to find low-makespan solutions to multi-robot multi-task planning problems in environments where robots block each other from completing tasks simultaneously. We introduce a formulation of the problem that allows for an approach based on greedy descent with random restarts for generation of the task assignment and task sequence. We then use a multi-agent path planner to evaluate the makespan of a given assignment and sequence. The planner decomposes the problem into multiple simple subproblems that
Repo: None

AIMS: All-Inclusive Multi-Level Segmentation

Authors: Lu Qi, Jason Kuen, Weidong Guo, Jiuxiang Gu, Zhe Lin, Bo Du, Yu Xu, Ming-Hsuan Yang
Arxiv: https://arxiv.org/abs/2305.17768
TLDR: Despite the progress of image segmentation for accurate visual entity segmentation, completing the diverse requirements of image editing applications for different-level region-of-interest selections remains unsolved. In this paper, we propose a new task, All-Inclusive Multi-Level Segmentation (AIMS), which segments visual regions into three levels: part, entity, and relation (two entities with some semantic relationships). We also build a unified AIMS model through multi-dataset
Repo: None

Keyword: robustness

Ghost in the Minecraft: Generally Capable Agents for Open-World Enviroments via Large Language Models with Text-based Knowledge and Memory

Authors: Xizhou Zhu, Yuntao Chen, Hao Tian, Chenxin Tao, Weijie Su, Chenyu Yang, Gao Huang, Bin Li, Lewei Lu, Xiaogang Wang, Yu Qiao, Zhaoxiang Zhang, Jifeng Dai
Arxiv: https://arxiv.org/abs/2305.17144
TLDR: The captivating realm of Minecraft has attracted substantial research interest in recent years, serving as a rich platform for developing intelligent agents capable of functioning in open-world environments. However, the current research landscape predominantly focuses on specific objectives, such as the popular "ObtainDiamond" task, and has not yet shown effective generalization to a broader spectrum of tasks. Furthermore, the currently leading success rate for the "ObtainedDiamond" tasks stands at around 20%, highlighting the limitations of Reinforcement
Repo: None

GVdoc: Graph-based Visual Document Classification

Authors: Fnu Mohbat, Mohammed J. Zaki, Catherine Finegan-Dollak, Ashish Verma
Arxiv: https://arxiv.org/abs/2305.17219
TLDR: The robustness of a model for real-world deployment is decided by how well it performs on unseen data and distinguishes between in-domain and out-of-domain samples. Visual document classifiers have shown impressive performance on in-distribution test sets. However, they tend to have a hard time correctly classifying and differentiating out- of-distributary examples. Image-based classifiers lack the text component, whereas multi-modality transformer-based models face the token
Repo: None

A Reissner-Mindlin plate formulation using symmetric Hu-Zhang elements via polytopal transformations

Authors: Adam Sky, Michael Neunteufel, Jack S. Hale, Andreas Zilian
Arxiv: https://arxiv.org/abs/2305.17249
TLDR: In this work we develop new finite element discretisations of the shear-deformable Reissner--Mindlin plate problem based on the Hellinger-Reissner principle of asymmetric stresses. Specifically, we use conforming Hu-Zhang elements to discretise the bending moments in the space of symmetric square integrable fields with a square integrate divergence $\boldsymbol{M} \in \mathcal{HZ} \subset
Repo: None

Large Language Models Can be Lazy Learners: Analyze Shortcuts in In-Context Learning

Authors: Ruixiang Tang, Dehan Kong, Longtao Huang, Hui Xue
Arxiv: https://arxiv.org/abs/2305.17256
TLDR: Large language models (LLMs) have recently shown great potential for in-context learning, where LLMs learn a new task simply by conditioning on a few input-label pairs (prompts). Despite their potential, our understanding of the factors influencing end-task performance and the robustness of in- Context learning remains limited. This paper aims to bridge this knowledge gap by investigating the reliance of LLMs on shortcuts or spurious correlations within prompts. Through comprehensive experiments on classification and extraction tasks
Repo: None

CODET: A Benchmark for Contrastive Dialectal Evaluation of Machine Translation

Authors: Md Mahfuz Ibn Alam, Sina Ahmadi, Antonios Anastasopoulos
Arxiv: https://arxiv.org/abs/2305.17267
TLDR: Neural machine translation (NMT) systems exhibit limited robustness in handling source-side linguistic variations. Their performance tends to degrade when faced with even slight deviations in language usage, such as different domains or variations introduced by second-language speakers. It is intuitive to extend this observation to encompass dialectal variations as well, but the work allowing the community to evaluate MT systems on this dimension is limited. To alleviate this issue, we compile and release \dataset, a contrastive
Repo: None

Fourier-DeepONet: Fourier-enhanced deep operator networks for full waveform inversion with improved accuracy, generalizability, and robustness

Authors: Min Zhu, Shihang Feng, Youzuo Lin, Lu Lu
Arxiv: https://arxiv.org/abs/2305.17289
TLDR: Full waveform inversion (FWI) infers the subsurface structure information from seismic waveform data by solving a non-convex optimization problem. Data-driven FWI has been increasingly studied with various neural network architectures to improve accuracy and computational efficiency. Nevertheless, the applicability of pre-trained neural networks is severely restricted by potential discrepancies between the source function used in the field survey and the one utilized during training. Here, we develop a Fourier-enhanced
Repo: None

Exploiting Large Neuroimaging Datasets to Create Connectome-Constrained Approaches for more Robust, Efficient, and Adaptable Artificial Intelligence

Authors: Erik C. Johnson, Brian S. Robinson, Gautam K. Vallabha, Justin Joyce, Jordan K. Matelsky, Raphael Norman-Tenazas, Isaac Western, Marisel Villafañe-Delgado, Martha Cervantes, Michael S. Robinette, Arun V. Reddy, Lindsey Kitchell, Patricia K. Rivlin, Elizabeth P. Reilly, Nathan Drenkow, Matthew J. Roos, I-Jeng Wang, Brock A. Wester, William R. Gray-Roncal, Joan A. Hoffmann
Arxiv: https://arxiv.org/abs/2305.17300
TLDR: Despite the progress in deep learning networks, efficient learning at the edge (enabling adaptable, low-complexity machine learning solutions) remains a critical need for defense and commercial applications. We envision a pipeline to utilize large neuroimaging datasets, including maps of the brain which capture neuron and synapse connectivity, to improve machine learning approaches. We have pursued different approaches within this pipeline structure. First, as a demonstration of data-driven discovery, the team has developed a technique for discovery
Repo: None

An Image Based Visual Servo Method for Probe-and-Drogue Autonomous Aerial Refueling

Authors: Quan Quan, Runxiao Liu, Hao Liu, Zeqing Ma, Jinrui Ren
Arxiv: https://arxiv.org/abs/2305.17414
TLDR: With the high focus on autonomous aerial refueling recently, it becomes increasingly urgent to design efficient methods or algorithms to solve AAR problems in complicated aerial environments. Apart from the complex aerodynamic disturbance, another problem is the pose estimation error caused by the camera calibration error, installation error, or 3D object modeling error, which may not satisfy the highly accurate docking. The main objective of the effort described in this paper is the implementation of an image-based visual servo control method, which
Repo: None

Choosing the Right Weights: Balancing Value, Strategy, and Noise in Recommender Systems

Authors: Smitha Milli, Emma Pierson, Nikhil Garg
Arxiv: https://arxiv.org/abs/2305.17428
TLDR: Many recommender systems are based on optimizing a linear weighting of different user behaviors, such as clicks, likes, shares, etc. Though the choice of weights can have a significant impact, there is little formal study or guidance on how to choose them. We analyze the optimal choice of weight from the perspectives of both users and content producers who strategically respond to the weights. We consider three aspects of user behavior: value-faithfulness (how well a behavior indicates whether the user values the
Repo: None

On the Importance of Backbone to the Adversarial Robustness of Object Detectors

Authors: Xiao Li, Hang Chen, Xiaolin Hu
Arxiv: https://arxiv.org/abs/2305.17438
TLDR: Object detection is a critical component of various security-sensitive applications, such as autonomous driving and video surveillance. However, existing deep learning-based object detectors are vulnerable to adversarial attacks, which poses a significant challenge to their reliability and safety. Through experiments, we found that existing works on improving the adversarial robustness of object detectors have given a false sense of security. We argue that using adversarially pre-trained backbone networks is essential for enhancing and improving the object detectors. We
Repo: None

A Diffusion Model for Event Skeleton Generation

Authors: Fangqi Zhu, Lin Zhang, Jun Gao, Bing Qin, Ruifeng Xu, Haiqin Yang
Arxiv: https://arxiv.org/abs/2305.17458
TLDR: Event skeleton generation, aiming to induce an event schema skeleton graph with abstracted event nodes and their temporal relations from a set of event instance graphs, is a critical step in the temporal complex event schema induction task. Existing methods effectively address this task from a graph generation perspective but suffer from noise-sensitive and error accumulation, e.g., the inability to correct errors while generating schema. We, therefore, propose a novel Diffusion Event Graph Model~(DEGM) to address
Repo: None

Keep it Upright: Model Predictive Control for Nonprehensile Object Transportation with Obstacle Avoidance on a Mobile Manipulator

Authors: Adam Heins, Angela P. Schoellig
Arxiv: https://arxiv.org/abs/2305.17484
TLDR: We consider a nonprehensile manipulation task in which a mobile manipulator must balance objects on its end effector without grasping them -- known as the waiter's problem -- and move to a desired location while avoiding static and dynamic obstacles. In constrast to existing approaches, our focus is on fast online planning in response to new and changing environments. Our main contribution is a whole-body constrained model predictive controller (MPC) for a mobile accumulator that balances objects and avoids collisions.
Repo: None

Online Nonstochastic Model-Free Reinforcement Learning

Authors: Udaya Ghai, Arushi Gupta, Wenhan Xia, Karan Singh, Elad Hazan
Arxiv: https://arxiv.org/abs/2305.17552
TLDR: In this work, we explore robust model-free reinforcement learning algorithms for environments that may be dynamic or even adversarial. Conventional state-based policies fail to accommodate the challenge imposed by the presence of unmodeled disturbances in such settings. Additionally, optimizing linear state- based policies pose obstacle for efficient optimization, leading to nonconvex objectives even in benign environments like linear dynamical systems. Drawing inspiration from recent advancements in model-based control, we introduce a novel class of policies
Repo: None

Online Causation Monitoring of Signal Temporal Logic

Authors: Zhenya Zhang, Jie An, Paolo Arcaini, Ichiro Hasuo
Arxiv: https://arxiv.org/abs/2305.17754
TLDR: Online monitoring is an effective validation approach for hybrid systems, that, at runtime, checks whether the (partial) signals of a system satisfy a specification in, e.g., Signal Temporal Logic (STL). The classic STL monitoring is performed by computing a robustness interval that specifies, at each instant, how far the monitored signals are from violating and satisfying the specification. However, since a strictness interval monotonically shrinks during monitoring, classic online monitors may fail in reporting
Repo: None

NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models

Authors: Kai Mei, Zheng Li, Zhenting Wang, Yang Zhang, Shiqing Ma
Arxiv: https://arxiv.org/abs/2305.17826
TLDR: Prompt-based learning is vulnerable to backdoor attacks. Existing backdoor attacks against prompt-based models consider injecting backdoors into the entire embedding layers or word embedding vectors. Such attacks can be easily affected by retraining on downstream tasks and with different prompting strategies, limiting the transferability of backdoor attacks and limiting the ability of the backdoor attacks to be effective. In this work, we propose transferable backdoor attacks for prompt- based models, called NOTABLE, which is independent of
Repo: None

speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition

Authors: Haoyu Lu, Nan Li, Tongtong Song, Longbiao Wang, Jianwu Dang, Xiaobao Wang, Shiliang Zhang
Arxiv: https://arxiv.org/abs/2305.17860
TLDR: In recent years, the joint training of speech enhancement front-end and automatic speech recognition (ASR) back-end has been widely used to improve the robustness of ASR systems. Traditional joint training methods only use enhanced speech as input for the backend. However, it is difficult for speech enhancement systems to directly separate speech from input due to the diverse types of noise with different intensities. Furthermore, speech distortion and residual noise are often observed in enhanced speech, and the distortion of
Repo: None

Universal Mechanical Polycomputation in Granular Matter

Authors: Atoosa Parsa, Sven Witthaus, Nidhi Pashine, Corey S. O'Hern, Rebecca Kramer-Bottiglio, Josh Bongard
Arxiv: https://arxiv.org/abs/2305.17872
TLDR: Unconventional computing devices are increasingly of interest as they can operate in environments hostile to silicon-based electronics, or compute in ways that traditional electronics cannot. Mechanical computers, wherein information processing is a material property emerging from the interaction of components with the environment, are one such class of devices. This information processing can be manifested in various physical substrates, one of which is granular matter. In a granular assembly, vibration can be treated as the information-bearing mode. This can
Repo: None

Maximizing Safety and Efficiency for Cooperative Lane-Changing: A Minimally Disruptive Approach

Authors: Andres S. Chavez Armijos, Anni Li, Christos G. Cassandras
Arxiv: https://arxiv.org/abs/2305.17883
TLDR: This paper addresses cooperative lane-changing maneuvers in mixed traffic, aiming to minimize traffic flow disruptions while accounting for uncooperative vehicles. The proposed approach adopts controllers combining Optimal control with Control Barrier Functions (OCBF controllers) which guarantee spatio-temporal constraints through the use of fixed-time convergence. Additionally, we introduce robustness to disturbances by deriving a method for handling worst-case disturbances using the dual of a linear programming problem. We present a near-optimal
Repo: None

Deeply Coupled Cross-Modal Prompt Learning

Authors: Xuejing Liu, Wei Tang, Jinghui Lu, Rui Zhao, Zhaojun Guo, Fei Tan
Arxiv: https://arxiv.org/abs/2305.17903
TLDR: Recent advancements in multimodal foundation models (e.g., CLIP) have excelled in zero-shot generalization. Prompt tuning involved in the knowledge transfer from foundation models to downstream tasks has gained significant attention recently. Existing prompt-tuning methods in cross-modal learning, however, either solely focus on language branch, or learn vision-language interaction in a shallow mechanism. In this context, we propose a Deeply coupled Cross-Modal Prompt learning (D
Repo: None

Fourier Analysis on Robustness of Graph Convolutional Neural Networks for Skeleton-based Action Recognition

Authors: Nariki Tanaka, Hiroshi Kera, Kazuhiko Kawamoto
Arxiv: https://arxiv.org/abs/2305.17939
TLDR: Using Fourier analysis, we explore the robustness and vulnerability of graph convolutional neural networks (GCNs) for skeleton-based action recognition. We adopt a joint Fourier transform (JFT), a combination of the graph Fouriertransform (GFT) and the discrete Fourier Transform (DFT), to examine the robustnesses of adversarially-trained GCNs against adversarial attacks and common corruptions. Experimental results with the NTU RGB+D dataset reveal that
Repo: None

Improving the Generalizability of Trajectory Prediction Models with Frenet-Based Domain Normalization

Authors: Luyao Ye, Zikang Zhou, Jianping Wang
Arxiv: https://arxiv.org/abs/2305.17965
TLDR: Predicting the future trajectories of nearby objects plays a pivotal role in Robotics and Automation such as autonomous driving. While learning-based trajectory prediction methods have achieved remarkable performance on public benchmarks, the generalization ability of these approaches remains questionable. The poor generalizability on unseen domains, a well-recognized defect of data-driven approaches, can potentially harm the real-world performance of trajectory prediction models. We are thus motivated to improve generalization able of models instead of merely
Repo: None

TReR: A Lightweight Transformer Re-Ranking Approach for 3D LiDAR Place Recognition

Authors: Tiago Barros, Luís Garrote, Martin Aleksandrov, Cristiano Premebida, Urbano J. Nunes
Arxiv: https://arxiv.org/abs/2305.18013
TLDR: Autonomous driving systems often require reliable loop closure detection to guarantee reduced localization drift. Recently, 3D LiDAR-based localization methods have used retrieval-based place recognition to find revisited places efficiently. However, when deployed in challenging real-world scenarios, the place recognition models become more complex, which comes at the cost of high computational demand. This work tackles this problem from an information-retrieval perspective, adopting a first-retrieve-then-re-ranking paradigm
Repo: None

Game of Tones: Faculty detection of GPT-4 generated content in university assessments

Authors: Mike Perkins (1), Jasper Roe (2), Darius Postma (1), James McGaughran (1), Don Hickerson (1) ((1) British University Vietnam, Vietnam, (2) James Cook University Singapore, Singapore)
Arxiv: https://arxiv.org/abs/2305.18081
TLDR: This study explores the robustness of university assessments against the use of Open AI's Generative Pre-Trained Transformer 4 (GPT-4) generated content and evaluates the ability of academic staff to detect its use when supported by the Turnitin Artificial Intelligence (AI) detection tool. The research involved twenty-two GPT- 4 generated submissions being created and included in the assessment process to be marked by fifteen different faculty members. The study reveals that although the detection tool identified
Repo: None

Improved Probabilistic Image-Text Representations

Authors: Sanghyuk Chun
Arxiv: https://arxiv.org/abs/2305.18171
TLDR: Image-Text Matching (ITM) task, a fundamental vision-language (VL) task; suffers from the inherent ambiguity arising from multiplicity and imperfect annotations. Deterministic functions are not sufficiently powerful to capture ambiguity, prompting the exploration of probabilistic embeddings to tackle the challenge. However, the existing probabilistically ITM approach encounters two key shortcomings; the burden of heavy computations due to the Monte Carlo approximation, and the loss saturation issue in the face of
Repo: None

Contextual Knowledge Learning For Dialogue Generation

Authors: Wen Zheng, Natasa Milic-Frayling, Ke Zhou
Arxiv: https://arxiv.org/abs/2305.18200
TLDR: Incorporating conversational context and knowledge into dialogue generation models has been essential for improving the quality of the generated responses. The context, comprising utterances from previous dialogue exchanges, is used as a source of content for response generation and as a means of selecting external knowledge. However, to avoid introducing irrelevant content, it is key to enable fine-grained scoring of context and Knowledge. In this paper, we present a novel approach to context- knowledge weighting as an integral part
Repo: None

Online Dynamic Acknowledgement with Learned Predictions

Authors: Sungjin Im, Benjamin Moseley, Chenyang Xu, Ruilong Zhang
Arxiv: https://arxiv.org/abs/2305.18227
TLDR: We revisit the online dynamic acknowledgment problem. In the problem, a sequence of requests arrive over time to be acknowledged, and all outstanding requests can be satisfied simultaneously by one acknowledgement. The goal of the problem is to minimize the total request delay plus acknowledgement cost. This elegant model studies the trade-off between acknowledgement cost and waiting experienced by requests. The problem has been well studied and the tight competitive ratios have been determined. For this well-studied problem, we focus on how to effectively
Repo: None

Multi-behavior Self-supervised Learning for Recommendation

Authors: Jingcao Xu, Chaokun Wang, Cheng Wu, Yang Song, Kai Zheng, Xiaowei Wang, Changping Wang, Guorui Zhou, Kun Gai
Arxiv: https://arxiv.org/abs/2305.18238
TLDR: Modern recommender systems often deal with a variety of user interactions, e.g., click, forward, purchase, etc., which requires the underlying recommender engines to fully understand and leverage multi-behavior data from users. Despite recent efforts towards making use of heterogeneous data, multi-behavior recommendation still faces great challenges. Firstly, sparse target signals and noisy auxiliary interactions remain an issue. Secondly, existing methods utilizing self-supervised learning (SSL) to tackle the data sparsity
Repo: None

Keyword: scholarly

Multiscale Positive-Unlabeled Detection of AI-Generated Texts

Authors: Yuchuan Tian, Hanting Chen, Xutao Wang, Zheyuan Bai, Qinghua Zhang, Ruifeng Li, Chao Xu, Yunhe Wang
Arxiv: https://arxiv.org/abs/2305.18149
TLDR: Recent releases of Large Language Models (LLMs), e.g. ChatGPT, are astonishing at generating human-like texts, but they may get misused for fake scholarly texts, fake news, fake tweets, et cetera. Previous works have proposed methods to detect these multiscale AI-generated texts, including simple ML classifiers, pretrained-model-based training-agnostic methods, and finetuned language classification models. However, mainstream detectors are formulated
Repo: None

Keyword: semantic similarity

Modeling Adversarial Attack on Pre-trained Language Models as Sequential Decision Making

Authors: Xuanjie Fang, Sijie Cheng, Yang Liu, Wei Wang
Arxiv: https://arxiv.org/abs/2305.17440
TLDR: Pre-trained language models (PLMs) have been widely used to underpin various downstream tasks. However, the adversarial attack task has found that PLMs are vulnerable to small perturbations. Mainstream methods adopt a detached two-stage framework to attack without considering the subsequent influence of substitution at each step. In this paper, we formally model the multifaceted attack task on PLMs as a sequential decision-making problem, where the whole attack process is sequential with two decision-
Repo: None

Keyword: summarization

An Investigation of Evaluation Metrics for Automated Medical Note Generation

Authors: Asma Ben Abacha, Wen-wai Yim, George Michalopoulos, Thomas Lin
Arxiv: https://arxiv.org/abs/2305.17364
TLDR: Recent studies on automatic note generation have shown that doctors can save significant amounts of time when using automatic clinical note generation (Knoll et al., 2022). Summarization models have been used for this task to generate clinical notes as summaries of doctor-patient conversations (Krishna et al,' 2021; Cai et al, 2022). However, assessing which model would best serve clinicians in their daily practice is still a challenging task due to the large set of possible correct summaries,
Repo: None

MeetingBank: A Benchmark Dataset for Meeting Summarization

Authors: Yebowen Hu, Tim Ganter, Hanieh Deilamsalehy, Franck Dernoncourt, Hassan Foroosh, Fei Liu
Arxiv: https://arxiv.org/abs/2305.17529
TLDR: As the number of recorded meetings increases, it becomes increasingly important to utilize summarization technology to create useful summaries of these recordings. However, there is a crucial lack of annotated meeting corpora for developing this technology, as it can be hard to collect meetings, especially when the topics discussed are confidential. Furthermore, meeting summaries written by experienced writers are scarce, making it hard for abstractive summarizers to produce sensible output without a reliable reference. This lack of annotationated corpora
Repo: None

Abstractive Summarization as Augmentation for Document-Level Event Detection

Authors: Janko Vidaković, Filip Karlo Došilović, Domagoj Pluščec
Arxiv: https://arxiv.org/abs/2305.18023
TLDR: Transformer-based models have consistently produced substantial performance gains across a variety of NLP tasks, compared to shallow models. However, deep models are orders of magnitude more computationally expensive than shallow models, especially on tasks with large sequence lengths, such as document-level event detection. In this work, we attempt to bridge the performance gap between shallow and deep models on document- level event detection by using abstractive text summarization as an augmentation method. We augment the DocEE dataset
Repo: None

Assess and Summarize: Improve Outage Understanding with Large Language Models

Authors: Pengxiang Jin, Shenglin Zhang, Minghua Ma, Haozhe Li, Yu Kang, Liqun Li, Yudong Liu, Bo Qiao, Chaoyun Zhang, Pu Zhao, Shilin He, Federica Sarro, Yingnong Dang, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang
Arxiv: https://arxiv.org/abs/2305.18084
TLDR: Cloud systems have become increasingly popular in recent years due to their flexibility and scalability. Each time cloud computing applications and services hosted on the cloud are affected by a cloud outage, users can experience slow response times, connection issues or total service disruption, resulting in a significant negative business impact. Outages are usually comprised of several concurring events/source causes, and therefore understanding the context of outages is a very challenging yet crucial first step toward mitigating and resolving outages. In current practice
Repo: None

The Utility of Large Language Models and Generative AI for Education Research

Authors: Andrew Katz, Umair Shakir, Ben Chambers
Arxiv: https://arxiv.org/abs/2305.18125
TLDR: The use of natural language processing (NLP) techniques in engineering education can provide valuable insights into the underlying processes involved in generating text. While accessing these insights can be labor-intensive if done manually, recent advances in NLP and large language models have made it a realistic option for individuals. This study explores and evaluates a combination of clustering, summarization, and prompting techniques to analyze over 1,000 student essays in which students discussed their career interests. The specific assignment prompted students to
Repo: None

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Authors: Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn
Arxiv: https://arxiv.org/abs/2305.18290
TLDR: While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely unsupersupervised nature of their training. Existing methods for gaining such steerability collect human labels of the relative quality of model generations and fine-tune the unSupervised LM to align with these preferences, often with reinforcement learning from human feedback (RLHF). However, RLHF is a complex and often
Repo: None

Keyword: text generation

KoSBI: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application

Authors: Hwaran Lee, Seokhee Hong, Joonsuk Park, Takyoung Kim, Gunhee Kim, Jung-Woo Ha
Arxiv: https://arxiv.org/abs/2305.17701
TLDR: Large language models (LLMs) learn not only natural text generation abilities but also social biases against different demographic groups from real-world data. This poses a critical risk when deploying LLM-based applications. Existing research and resources are not readily applicable in South Korea due to the differences in language and culture, both of which significantly affect the biases and targeted demographic groups. This limitation requires localized social bias datasets to ensure the safe and effective deployment of LLMs. To this end, we
Repo: None

RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename Refactoring

Authors: Hao Liu, Yanlin Wang, Zhao Wei, Yong Xu, Juhong Wang, Hui Li, Rongrong Ji
Arxiv: https://arxiv.org/abs/2305.17708
TLDR: Refactoring is an indispensable practice of improving the quality and maintainability of source code in software evolution. Rename refactoring, or renaming, is the most frequently performed refactororing that suggests a new name for an identifier to enhance readability when the identifier is poorly named. However, most existing works only identify renaming activities between two versions of source software, while few works express concern about how to suggest a new Name. In this paper, we study automatic rename ref
Repo: None

Abstractive Summarization as Augmentation for Document-Level Event Detection

Authors: Janko Vidaković, Filip Karlo Došilović, Domagoj Pluščec
Arxiv: https://arxiv.org/abs/2305.18023
TLDR: Transformer-based models have consistently produced substantial performance gains across a variety of NLP tasks, compared to shallow models. However, deep models are orders of magnitude more computationally expensive than shallow models, especially on tasks with large sequence lengths, such as document-level event detection. In this work, we attempt to bridge the performance gap between shallow and deep models on document- level event detection by using abstractive text summarization as an augmentation method. We augment the DocEE dataset
Repo: None

GripRank: Bridging the Gap between Retrieval and Generation via the Generative Knowledge Improved Passage Ranking

Authors: Jiaqi Bai, Hongcheng Guo, Jiaheng Liu, Jian Yang, Xinnian Liang, Zhao Yan, Zhoujun Li
Arxiv: https://arxiv.org/abs/2305.18144
TLDR: Retrieval-enhanced text generation, which aims to leverage passages retrieved from a large passage corpus for delivering a proper answer given the input query, has shown remarkable progress on knowledge-intensive language tasks such as open-domain question answering and knowledge-enhancing dialogue generation. However, the retrieved passages are not ideal for guiding answer generation because of the discrepancy between retrieval and generation, i.e., the candidate passages are all treated equally during the retrieval procedure without considering their potential to generate
Repo: None

A Critical Evaluation of Evaluations for Long-form Question Answering

Authors: Fangyuan Xu, Yixiao Song, Mohit Iyyer, Eunsol Choi
Arxiv: https://arxiv.org/abs/2305.18201
TLDR: Long-form question answering (LFQA) enables answering a wide range of questions, but its flexibility poses enormous challenges for evaluation. We perform the first targeted study of the evaluation of long-form answers, covering both human and automatic evaluation practices. We hire domain experts in seven areas to provide preference judgments over pairs of answers, along with free-form justifications for their choices. We present a careful analysis of experts' evaluation, which focuses on new aspects such as the comprehens
Repo: None

HowkGPT: Investigating the Detection of ChatGPT-generated University Student Homework through Context-Aware Perplexity Analysis

Authors: Christoforos Vasilatos, Manaar Alam, Talal Rahwan, Yasir Zaki, Michail Maniatakos
Arxiv: https://arxiv.org/abs/2305.18226
TLDR: As the use of Large Language Models (LLMs) in text generation tasks proliferates, concerns arise over their potential to compromise academic integrity. The education sector currently tussles with distinguishing student-authored homework assignments from AI-generated ones. This paper addresses the challenge by introducing HowkGPT, designed to identify homework assignments generated by AI. HowkgPT is built upon a dataset of academic assignments and accompanying metadata [17] and employs a pretrained LLM to compute perplex
Repo: None

GlyphControl: Glyph Conditional Control for Visual Text Generation

Authors: Yukang Yang, Dongnan Gui, Yuhui Yuan, Haisong Ding, Han Hu, Kai Chen
Arxiv: https://arxiv.org/abs/2305.18259
TLDR: Recently, there has been a growing interest in developing diffusion-based text-to-image generative models capable of generating coherent and well-formed visual text. In this paper, we propose a novel and efficient approach called GlyphControl to address this task. Unlike existing methods that rely on character-aware text encoders like ByT5 and require retraining of text- to-image models, our approach leverages additional glyph conditional information to enhance the performance of the off-
Repo: None

Transformer Language Models Handle Word Frequency in Prediction Head

Authors: Goro Kobayashi, Tatsuki Kuribayashi, Sho Yokoi, Kentaro Inui
Arxiv: https://arxiv.org/abs/2305.18294
TLDR: Prediction head is a crucial component of Transformer language models. Despite its direct impact on prediction, this component has often been overlooked in analyzing Transformers. In this study, we investigate the inner workings of the prediction head, specifically focusing on bias parameters. Our experiments with BERT and GPT-2 models reveal that the biases in their word prediction heads play a significant role in the models' ability to reflect word frequency in a corpus, aligning with the logit adjustment method commonly used
Repo: None
@e-tornike e-tornike self-assigned this May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment