Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New submissions for Wed, 14 Jun 23 #374

Open
e-tornike opened this issue Jun 14, 2023 · 0 comments
Open

New submissions for Wed, 14 Jun 23 #374

e-tornike opened this issue Jun 14, 2023 · 0 comments

Comments

@e-tornike
Copy link
Owner

Keyword: contrastive

Heterophily-aware Social Bot Detection with Supervised Contrastive Learning

Authors: Qi Wu, Yingguan Yang, Buyun He, Hao Liu, Xiang Wang, Yong Liao, Renyu Yang, Pengyuan Zhou
Arxiv: https://arxiv.org/abs/2306.07478
TLDR: Detecting ever-evolving social bots has become increasingly challenging. Advanced bots tend to interact more with humans as a camouflage to evade detection. While graph-based detection methods can exploit various relations in social networks to model node behaviors, the aggregated information from neighbors largely ignore the inherent heterophily, i.e., the connections between different classes of accounts. Message passing mechanism on heterophilic edges can lead to feature mixture between bots and normal users, resulting in more false negatives.
Repo: None

Learning Unnormalized Statistical Models via Compositional Optimization

Authors: Wei Jiang, Jiayu Qin, Lingyu Wu, Changyou Chen, Tianbao Yang, Lijun Zhang
Arxiv: https://arxiv.org/abs/2306.07485
TLDR: Learning unnormalized statistical models (e.g., energy-based models) is computationally challenging due to the complexity of handling the partition function. To eschew this complexity, noise-contrastive estimation~(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise. However, as found in previous works, NCE may perform poorly in many tasks due to its flat loss landscape and slow convergence. In this
Repo: None

Enhanced Multimodal Representation Learning with Cross-modal KD

Authors: Mengxi Chen, Linyu Xing, Yu Wang, Ya Zhang
Arxiv: https://arxiv.org/abs/2306.07646
TLDR: This paper explores the tasks of leveraging auxiliary modalities which are only available at training to enhance multimodal representation learning through cross-modal Knowledge Distillation (KD). The widely adopted mutual information maximization-based objective leads to a short-cut solution of the weak teacher, i.e., achieving the maximum mutual information by simply making the teacher model as weak as the student model. To prevent such a weak solution, we introduce an additional objective to minimize the conditional entropy of
Repo: None

Time-aware Graph Structure Learning via Sequence Prediction on Temporal Graphs

Authors: Haozhen Zhang, Xueting Han, Xi Xiao, Jing Bai
Arxiv: https://arxiv.org/abs/2306.07699
TLDR: Temporal Graph Learning, which aims to model the time-evolving nature of graphs, has gained increasing attention and achieved remarkable performance recently. However, in reality, graph structures are often incomplete and noisy, which hinders temporal graph networks (TGNs) from learning informative representations. Graph contrastive learning uses data augmentation to generate plausible variations of existing data and learn robust representations.However, rule-based augmentation approaches may be suboptimal as they lack learnability and fail
Repo: None

Contrastive Learning-Based Audio to Lyrics Alignment for Multiple Languages

Authors: Simon Durand, Daniel Stoller, Sebastian Ewert
Arxiv: https://arxiv.org/abs/2306.07744
TLDR: Lyrics alignment gained considerable attention in recent years. State-of-the-art systems either re-use established speech recognition toolkits, or design end-to-end solutions involving a Connectionist Temporal Classification (CTC) loss. However, both approaches suffer from specific weaknesses: Toolkits are known for their complexity, and CTC systems use a loss designed for transcription which can limit alignment accuracy. In this paper, we use instead a contrastive learning procedure that
Repo: None

Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images

Authors: Ming Y. Lu, Bowen Chen, Andrew Zhang, Drew F.K. Williamson, Richard J. Chen, Tong Ding, Long Phi Le, Yung-Sung Chuang, Faisal Mahmood
Arxiv: https://arxiv.org/abs/2306.07831
TLDR: Contrastive visual language pretraining has emerged as a powerful method for either training new language-aware image encoders or augmenting existing pretrained models with zero-shot visual recognition capabilities. However, existing works typically train on large datasets of image-text pairs and have been designed to perform downstream tasks involving only small to medium sized-images, neither of which are applicable to the emerging field of computational pathology where there are limited publicly available paired image-Text datasets and each image can
Repo: None

GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Speech Emotion Recognition

Authors: Yu Pan, Yanni Hu, Yuguang Yang, Jixun Yao, Wen Fei, Lei Ma, Heng Lu
Arxiv: https://arxiv.org/abs/2306.07848
TLDR: Contrastive Language-Audio Pretraining (CLAP) has recently exhibited impressive success in diverse fields. In this paper, we propose GEmo-CLAP, a kind of efficient gender-attribute-enhanced CLAP model for speech emotion recognition (SER). Specifically, we first build an effective emotion CLAP models termed Emo-CLA for SER, utilizing various self-supervised learning based pre-trained models. Then, considering the importance of the gender attribute in speech
Repo: None

Image Captioners Are Scalable Vision Learners Too

Authors: Michael Tschannen, Manoj Kumar, Andreas Steiner, Xiaohua Zhai, Neil Houlsby, Lucas Beyer
Arxiv: https://arxiv.org/abs/2306.07915
TLDR: Contrastive pretraining on image-text pairs from the web is one of the most popular large-scale pretraining strategies for vision backbones, especially in the context of large multimodal models. At the same time, image captioning on this type of data is commonly considered an inferior pretraining strategy. In this paper, we perform a fair comparison of these two pretraining techniques, carefully matching training data, compute, and model capacity. Using a standard encoder-dec
Repo: None

MOFI: Learning Image Representations from Noisy Entity Annotated Images

Authors: Wentao Wu, Aleksei Timofeev, Chen Chen, Bowen Zhang, Kun Duan, Shuangning Liu, Yantao Zheng, Jon Shlens, Xianzhi Du, Zhe Gan, Yinfei Yang
Arxiv: https://arxiv.org/abs/2306.07952
TLDR: We present MOFI, a new vision foundation model designed to learn image representations from noisy entity annotated images. MOFI differs from previous work in two key aspects: ($i$) pre-training data, and ($ii$) training recipe. Regarding data, we introduce a new approach to automatically assign entity labels to images from noisy image-text pairs. Our approach involves employing a named entity recognition model to extract entities from the alt-text, and then using a CLIP model
Repo: None

Supervised-Contrastive Loss Learns Orthogonal Frames and Batching Matters

Authors: Ganesh Ramachandra Kini, Vala Vakilian, Tina Behnia, Jaidev Gill, Christos Thrampoulidis
Arxiv: https://arxiv.org/abs/2306.07960
TLDR: Supervised contrastive loss (SCL) is a competitive and often superior alternative to the cross-entropy (CE) loss for classification. In this paper we ask: what differences in the learning process occur when the two different loss functions are being optimized? To answer this question, our main finding is that the geometry of embeddings learned by SCL forms an orthogonal frame (OF) regardless of the number of training examples per class. This is in contrast to the
Repo: None

Keyword: data augmentation

Medical Data Augmentation via ChatGPT: A Case Study on Medication Identification and Medication Event Classification

Authors: Shouvon Sarker, Lijun Qian, Xishuang Dong
Arxiv: https://arxiv.org/abs/2306.07297
TLDR: The identification of key factors such as medications, diseases, and relationships within electronic health records and clinical notes has a wide range of applications in the clinical field. In the N2C2 2022 competitions, various tasks were presented to promote the identification and key factors in e health records (EHRs) using the Contextualized Medication Event Dataset (CMED). Pretrained large language models (LLMs) demonstrated exceptional performance in these tasks. This study aims to explore the
Repo: None

Textual Augmentation Techniques Applied to Low Resource Machine Translation: Case of Swahili

Authors: Catherine Gitau, VUkosi Marivate
Arxiv: https://arxiv.org/abs/2306.07414
TLDR: In this work we investigate the impact of applying textual data augmentation tasks to low resource machine translation. There has been recent interest in investigating approaches for training systems for languages with limited resources and one popular approach is the use of data auguration techniques. Data augmentation aims to increase the quantity of data that is available to train the system. In machine translation, majority of the language pairs around the world are considered low resource because they have little parallel data available and the quality of neural machine translation
Repo: None

Gender-Inclusive Grammatical Error Correction through Augmentation

Authors: Gunnar Lund, Kostiantyn Omelianchuk, Igor Samokhin
Arxiv: https://arxiv.org/abs/2306.07415
TLDR: In this paper we show that GEC systems display gender bias related to the use of masculine and feminine terms and the gender-neutral singular "they". We develop parallel datasets of texts with masculine and femin terms and singular "you" and use them to quantify gender bias in three competitive GEC Systems. We contribute a novel data augmentation technique for singular "she" leveraging linguistic insights about its distribution relative to plural "they" in GECs. We demonstrate that both this data aug
Repo: None

Parametric Implicit Face Representation for Audio-Driven Facial Reenactment

Authors: Ricong Huang, Peiwen Lai, Yipeng Qin, Guanbin Li
Arxiv: https://arxiv.org/abs/2306.07579
TLDR: Audio-driven facial reenactment is a crucial technique that has a range of applications in film-making, virtual avatars and video conferences. Existing works either employ explicit intermediate face representations (e.g., 2D facial landmarks or 3D face models) or implicit ones (e-g., Neural Radiance Fields), thus suffering from the trade-offs between interpretability and expressive power, hence between controllability and quality of the results. In this work, we
Repo: None

Rethinking Adversarial Training with A Simple Baseline

Authors: Hong Liu, Shin'ichi Satoh
Arxiv: https://arxiv.org/abs/2306.07613
TLDR: We report competitive results on RobustBench for CIFAR and SVHN using a simple yet effective baseline approach. Our approach involves a training protocol that integrates rescaled square loss, cyclic learning rates, and erasing-based data augmentation. The outcomes we have achieved are comparable to those of the model trained with state-of-the-art techniques, which is currently the predominant choice for adversarial training. Our baseline, referred to as SimpleAT, yields three novel
Repo: None

Rethink the Effectiveness of Text Data Augmentation: An Empirical Analysis

Authors: Zhengxiang Shi, Aldo Lipani
Arxiv: https://arxiv.org/abs/2306.07664
TLDR: In recent years, language models (LMs) have made remarkable progress in advancing the field of natural language processing (NLP). However, the impact of data augmentation (DA) techniques on the fine-tuning (FT) performance of these LMs has been a topic of ongoing debate. In this study, we evaluate the effectiveness of three different FT methods in conjugation with back-translation across an array of 7 diverse NLP tasks, including classification and regression types,
Repo: None

Lookaround Optimizer: $k$ steps around, 1 step average

Authors: Jiangtao Zhang, Shunyu Liu, Jie Song, Tongtian Zhu, Zhengqi Xu, Mingli Song
Arxiv: https://arxiv.org/abs/2306.07684
TLDR: Weight Average (WA) is an active research topic due to its simplicity in ensembling deep networks and the effectiveness in promoting generalization. Existing weight average approaches, however, are often carried out along only one training trajectory in a post-hoc manner (i.e., the weights are averaged after the entire training process is finished), which significantly degrades the diversity between networks and thus impairs the effectiveness. In this paper, inspired by weight average, we propose Lookaround
Repo: None

Time-aware Graph Structure Learning via Sequence Prediction on Temporal Graphs

Authors: Haozhen Zhang, Xueting Han, Xi Xiao, Jing Bai
Arxiv: https://arxiv.org/abs/2306.07699
TLDR: Temporal Graph Learning, which aims to model the time-evolving nature of graphs, has gained increasing attention and achieved remarkable performance recently. However, in reality, graph structures are often incomplete and noisy, which hinders temporal graph networks (TGNs) from learning informative representations. Graph contrastive learning uses data augmentation to generate plausible variations of existing data and learn robust representations.However, rule-based augmentation approaches may be suboptimal as they lack learnability and fail
Repo: None

Robustness and Generalization Performance of Deep Learning Models on Cyber-Physical Systems: A Comparative Study

Authors: Alexander Windmann, Henrik Steude, Oliver Niggemann
Arxiv: https://arxiv.org/abs/2306.07737
TLDR: Deep learning (DL) models have seen increased attention for time series forecasting, yet the application on cyber-physical systems (CPS) is hindered by the lacking robustness of these methods. Thus, this study evaluates the robustness and generalization performance of DL architectures on multivariate time series data from CPS. Our investigation focuses on the models' ability to handle a range of perturbations, such as sensor faults and noise, and assesses their impact on overall performance. Furthermore,
Repo: None

Generated Graph Detection

Authors: Yihan Ma, Zhikun Zhang, Ning Yu, Xinlei He, Michael Backes, Yun Shen, Yang Zhang
Arxiv: https://arxiv.org/abs/2306.07758
TLDR: Graph generative models become increasingly effective for data distribution approximation and data augmentation. While they have aroused public concerns about their malicious misuses or misinformation broadcasts, just as what Deepfake visual and auditory media has been delivering to society. Hence it is essential to regulate the prevalence of generated graphs. To tackle this problem, we pioneer the formulation of the generated graph detection problem to distinguish generated graphs from real ones. We propose the first framework to systematically investigate a set of sophisticated models and their performance
Repo: None

Keyword: knowledge discovery

Few-shot Multi-domain Knowledge Rearming for Context-aware Defence against Advanced Persistent Threats

Authors: Gaolei Li, Yuanyuan Zhao, Wenqi Wei, Yuchen Liu
Arxiv: https://arxiv.org/abs/2306.07685
TLDR: Advanced persistent threats (APTs) have novel features such as multi-stage penetration, highly-tailored intention, and evasive tactics. APTs defense requires fusing multi-dimensional threat intelligence data to identify attack intentions and conducts efficient knowledge discovery strategies by data-driven machine learning to recognize entity relationships. However, data-based machine learning lacks generalization ability on fresh or unknown samples, reducing the accuracy and practicality of the defense model. Besides, the private deployment of these AP
Repo: None

Keyword: knowledge graph

Noisy Positive-Unlabeled Learning with Self-Training for Speculative Knowledge Graph Reasoning

Authors: Ruijie Wang, Baoyu Li, Yichen Lu, Dachun Sun, Jinning Li, Yuchen Yan, Shengzhong Liu, Hanghang Tong, Tarek F. Abdelzaher
Arxiv: https://arxiv.org/abs/2306.07512
TLDR: This paper studies speculative reasoning task on real-world knowledge graphs (KG) that contain both \textit{false negative issue} (i.e., potential true facts being excluded) and \textiv{false positive issue} or \simit{ unreliable or outdated facts being included). State-of-the-art methods fall short in the speculative reasoning ability, as they assume the correctness of a fact is solely determined by its presence in KG, making them vulnerable to
Repo: None

Contextual Dictionary Lookup for Knowledge Graph Completion

Authors: Jining Wang, Delai Qiu, YouMing Liu, Yining Wang, Chuan Chen, Zibin Zheng, Yuren Zhou
Arxiv: https://arxiv.org/abs/2306.07719
TLDR: Knowledge graph completion (KGC) aims to solve the incompleteness of knowledge graphs (KGs) by predicting missing links from known triples, numbers of knowledge graph embedding (KGE) models have been proposed to perform KGC by learning embeddings. Nevertheless, most existing embedding models map each relation into a unique vector, overlooking the specific fine-grained semantics of them under different entities, resulting in limited performance and applicability. Additionally, the few available
Repo: None

Keyword: legal

Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard

Authors: Ehsan Kamalloo, Nandan Thakur, Carlos Lassance, Xueguang Ma, Jheng-Hong Yang, Jimmy Lin
Arxiv: https://arxiv.org/abs/2306.07471
TLDR: BEIR is a benchmark dataset for zero-shot evaluation of information retrieval models across 18 different domain/task combinations. In recent years, we have witnessed the growing popularity of a representation learning approach to building retrieval models, typically using pretrained transformers in a supervised setting. This naturally begs the question: How effective are these models when presented with queries and documents that differ from the training data? Examples include searching in different domains (e.g., medical or legal text) and with different
Repo: None

Discrimination through Image Selection by Job Advertisers on Facebook

Authors: Varun Nagaraj Rao, Aleksandra Korolova
Arxiv: https://arxiv.org/abs/2306.07527
TLDR: Targeted advertising platforms are widely used by job advertisers to reach potential employees; thus issues of discrimination due to targeting that have surfaced have received widespread attention. Advertisers could misuse targeting tools to exclude people based on gender, race, location and other protected attributes from seeing their job ads. In response to legal actions, Facebook disabled the ability for explicit targeting based on many attributes for some ad categories, including employment. Although this is a step in the right direction, prior work has shown that
Repo: None

Keyword: legal text

Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard

Authors: Ehsan Kamalloo, Nandan Thakur, Carlos Lassance, Xueguang Ma, Jheng-Hong Yang, Jimmy Lin
Arxiv: https://arxiv.org/abs/2306.07471
TLDR: BEIR is a benchmark dataset for zero-shot evaluation of information retrieval models across 18 different domain/task combinations. In recent years, we have witnessed the growing popularity of a representation learning approach to building retrieval models, typically using pretrained transformers in a supervised setting. This naturally begs the question: How effective are these models when presented with queries and documents that differ from the training data? Examples include searching in different domains (e.g., medical or legal text) and with different
Repo: None

Keyword: multi-task

Semi-supervised learning made simple with self-supervised clustering

Authors: Enrico Fini, Pietro Astolfi, Karteek Alahari, Xavier Alameda-Pineda, Julien Mairal, Moin Nabi, Elisa Ricci
Arxiv: https://arxiv.org/abs/2306.07483
TLDR: Self-supervised learning models have been shown to learn rich visual representations without requiring human annotations. However, in many real-world scenarios, labels are partially available, limiting their usefulness. This has led to a recent line of work on semi-subsupervised methods inspired by self-Supervised principles. In this paper, we propose a conceptually simple yet empirically powerful approach to turn clustering-based self-subvised methods such as SwAV or DINO into semi-
Repo: None

KuaiSAR: A Unified Search And Recommendation Dataset

Authors: Zhongxiang Sun, Zihua Si, Xiaoxue Zang, Dewei Leng, Yanan Niu, Yang Song, Xiao Zhang, Jun Xu
Arxiv: https://arxiv.org/abs/2306.07705
TLDR: The confluence of Search and Recommendation services is a vital aspect of online content platforms like Kuaishou and TikTok. The integration of S&R modeling is a highly intuitive approach adopted by industry practitioners. However, there is a noticeable lack of research conducted in this area within the academia, primarily due to the absence of publicly available datasets. Consequently, a substantial gap has emerged between academia and industry regarding research endeavors in this field. To bridge this gap, we introduce the first
Repo: None

Synapse: Leveraging Few-Shot Exemplars for Human-Level Computer Control

Authors: Longtao Zheng, Rundong Wang, Bo An
Arxiv: https://arxiv.org/abs/2306.07863
TLDR: This paper investigates the design of few-shot exemplars for computer automation through prompting large language models (LLMs). While previous prompting approaches focus on self-correction, we find that well-structured exemplars alone are sufficient for human-level performance. We present Synapse, an in-context computer control agent demonstrating human- level performance on the MiniWob++ benchmark. Synapse consists of three main components: 1) state-conditional decomposition, which divides demonstrations
Repo: None

MOFI: Learning Image Representations from Noisy Entity Annotated Images

Authors: Wentao Wu, Aleksei Timofeev, Chen Chen, Bowen Zhang, Kun Duan, Shuangning Liu, Yantao Zheng, Jon Shlens, Xianzhi Du, Zhe Gan, Yinfei Yang
Arxiv: https://arxiv.org/abs/2306.07952
TLDR: We present MOFI, a new vision foundation model designed to learn image representations from noisy entity annotated images. MOFI differs from previous work in two key aspects: ($i$) pre-training data, and ($ii$) training recipe. Regarding data, we introduce a new approach to automatically assign entity labels to images from noisy image-text pairs. Our approach involves employing a named entity recognition model to extract entities from the alt-text, and then using a CLIP model
Repo: None

Keyword: plagiarism

Ethical Aspects of ChatGPT in Software Engineering Research

Authors: Muhammad Azeem Akbar, Arif Ali Khan, Peng Liang
Arxiv: https://arxiv.org/abs/2306.07557
TLDR: ChatGPT can improve Software Engineering (SE) research practices by offering efficient, accessible information analysis and synthesis based on natural language interactions. However, ChatGPT could bring ethical challenges, encompassing plagiarism, privacy, data security, and the risk of generating biased or potentially detrimental data. This research aims to fill the given gap by elaborating on the key elements: motivators, demotivators, and ethical principles of using ChatgPT in SE research. To achieve this
Repo: None

Keyword: robustness

H-SLAM: Hybrid Direct-Indirect Visual SLAM

Authors: Georges Younes, Douaa Khalil, John Zelek, Daniel Asmar
Arxiv: https://arxiv.org/abs/2306.07363
TLDR: The recent success of hybrid methods in monocular odometry has led to many attempts to generalize the performance gains to hybrid monocular SLAM. However, most attempts fall short in several respects, with the most prominent issue being the need for two different map representations (local and global maps), with each requiring different, computationally expensive, and often redundant processes to maintain. Moreover, these maps tend to drift with respect to each other, resulting in contradicting pose and scene estimates, and
Repo: None

Composing Efficient, Robust Tests for Policy Selection

Authors: Dustin Morrill, Thomas J. Walsh, Daniel Hernandez, Peter R. Wurman, Peter Stone
Arxiv: https://arxiv.org/abs/2306.07372
TLDR: Modern reinforcement learning systems produce many high-quality policies throughout the learning process. However, to choose which policy to actually deploy in the real world, they must be tested under an intractable number of environmental conditions. We introduce RPOSST, an algorithm to select a small set of test cases from a larger pool based on a relatively small number of sample evaluations. RPOSst treats the test case selection problem as a two-player game and optimizes a solution with provable $
Repo: None

Compositor: Bottom-up Clustering and Compositing for Robust Part and Object Segmentation

Authors: Ju He, Jieneng Chen, Ming-Xian Lin, Qihang Yu, Alan Yuille
Arxiv: https://arxiv.org/abs/2306.07404
TLDR: In this work, we present a robust approach for joint part and object segmentation. Specifically, we reformulate object and part segmentation as an optimization problem and build a hierarchical feature representation including pixel, part, and object-level embeddings to solve it in a bottom-up clustering manner. Pixels are grouped into several clusters where the part-level embeddeddings serve as cluster centers. Afterwards, object masks are obtained by compositing the part proposals. This bottom-
Repo: None

Robust Reinforcement Learning through Efficient Adversarial Herding

Authors: Juncheng Dong, Hao-Lun Hsu, Qitong Gao, Vahid Tarokh, Miroslav Pajic
Arxiv: https://arxiv.org/abs/2306.07408
TLDR: Although reinforcement learning (RL) is considered the gold standard for policy design, it may not always provide a robust solution in various scenarios. This can result in severe performance degradation when the environment is exposed to potential disturbances. Adversarial training using a two-player max-min game has been proven effective in enhancing the robustness of RL agents. In this work, we extend the two-players game by introducing an adversarial herd, which involves a group of adversaries, in order to
Repo: None

On the Robustness of Removal-Based Feature Attributions

Authors: Chris Lin, Ian Covert, Su-In Lee
Arxiv: https://arxiv.org/abs/2306.07462
TLDR: To explain complex models based on their inputs, many feature attribution methods have been developed that assign importance scores to input features. However, some recent work challenges the robustness of feature attributions by showing that these methods are sensitive to input and model perturbations, while other work addresses this robustness issue by proposing robust attribution methods and model modifications. Nevertheless, previous work on attribution robustness has focused primarily on gradient-based feature attribution and removal-based attribution methods are not comprehensively well
Repo: None

Reviving Shift Equivariance in Vision Transformers

Authors: Peijian Ding, Davit Soselia, Thomas Armstrong, Jiahao Su, Furong Huang
Arxiv: https://arxiv.org/abs/2306.07470
TLDR: Shift equivariance is a fundamental principle that governs how we perceive the world - our recognition of an object remains invariant with respect to shifts. Transformers have gained immense popularity due to their effectiveness in both language and vision tasks. While the self-attention operator in vision transformers (ViT) is permutation-equivariant and thus shift-equivant, patch embedding, positional encoding, and subsampled attention in ViT variants can disrupt this property, resulting
Repo: None

PaVa: a novel Path-based Valley-seeking clustering algorithm

Authors: Lin Ma, Conan Liu, Tiefeng Ma, Shuangzhe Liu
Arxiv: https://arxiv.org/abs/2306.07503
TLDR: Clustering methods are being applied to a wider range of scenarios involving more complex datasets, where the shapes of clusters tend to be arbitrary. In this paper, we propose a novel Path-based Valley-seeking clustering algorithm for arbitrarily shaped clusters. This work aims to seek the valleys among clusters and then individually extract clusters. Three vital techniques are used in this algorithm. First, path distance (minmax distance) is employed to transform the irregular boundaries among clusters, that is density valleys
Repo: None

Noisy Positive-Unlabeled Learning with Self-Training for Speculative Knowledge Graph Reasoning

Authors: Ruijie Wang, Baoyu Li, Yichen Lu, Dachun Sun, Jinning Li, Yuchen Yan, Shengzhong Liu, Hanghang Tong, Tarek F. Abdelzaher
Arxiv: https://arxiv.org/abs/2306.07512
TLDR: This paper studies speculative reasoning task on real-world knowledge graphs (KG) that contain both \textit{false negative issue} (i.e., potential true facts being excluded) and \textiv{false positive issue} or \simit{ unreliable or outdated facts being included). State-of-the-art methods fall short in the speculative reasoning ability, as they assume the correctness of a fact is solely determined by its presence in KG, making them vulnerable to
Repo: None

Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective

Authors: Zeyu Zhang, Yi Su, Hui Yuan, Yiran Wu, Rishab Balasubramanian, Qingyun Wu, Huazheng Wang, Mengdi Wang
Arxiv: https://arxiv.org/abs/2306.07528
TLDR: Off-policy Learning to Rank (LTR) aims to optimize a ranker from data collected by a deployed logging policy. However, existing off-policy learning to rank methods often make strong assumptions about how users generate the click data, i.e., the click model, and hence need to tailor their methods specifically under different click models. In this paper, we show that offline RL algorithms can be used to optimize the ranking process under general stochastic click models as a Markov
Repo: None

UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding

Authors: Chenpeng Du, Yiwei Guo, Feiyu Shen, Zhijun Liu, Zheng Liang, Xie Chen, Shuai Wang, Hui Zhang, Kai Yu
Arxiv: https://arxiv.org/abs/2306.07547
TLDR: The utilization of discrete speech tokens, divided into semantic tokens and acoustic tokens, has been proven superior to traditional acoustic feature mel-spectrograms in terms of naturalness and robustness for text-to-speech (TTS) synthesis. Recent popular models, such as VALL-E and SPEAR-TTS, allow zero-shot speaker adaptation through auto-regressive (AR) continuation of acoustic tokens extracted from a short speech prompt. However, these AR models are restricted
Repo: None

Marking anything: application of point cloud in extracting video target features

Authors: Xiangchun Xu
Arxiv: https://arxiv.org/abs/2306.07559
TLDR: Extracting retrievable features from video is of great significance for structured video database construction, video copyright protection and fake video rumor refutation. Inspired by point cloud data processing, this paper proposes a method for marking anything (MA) in the video, which can extract the contour features of any target in a video and convert it into a feature vector with a length of 256 that can be retrieved. The algorithm uses YOLO-v8 algorithm, multi-object tracking algorithm
Repo: None

I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models

Authors: Raz Lapid, Moshe Sipper
Arxiv: https://arxiv.org/abs/2306.07591
TLDR: Modern image-to-text systems typically adopt the encoder-decoder framework, which comprises two main components: an image encoder, responsible for extracting image features, and a transformer-based decoder, used for generating captions. Taking inspiration from the analysis of neural networks' robustness against adversarial perturbations, we propose a novel gray-box algorithm for creating adversarial examples in image- to-text models. Unlike image classification tasks that have a finite set of
Repo: None

Non-Asymptotic State and Disturbance Estimation for a Class of Triangular Nonlinear Systems using Modulating Functions

Authors: Yasmine Marani, Ibrahima N'Doye, Taous-Meriem Laleg-Kirati
Arxiv: https://arxiv.org/abs/2306.07620
TLDR: Dynamical models are often corrupted by model uncertainties, external disturbances, and measurement noise. These factors affect the performance of model-based observers and as a result, affect the closed-loop performance. Therefore, it is critical to develop robust model- based estimators that reconstruct both the states and the model disturbances while mitigating the effect of measurement noise in order to ensure good system monitoring and closed-Loop performance when designing controllers. In this article, a robust step by step non-asy
Repo: None

Multiple-Step Quantized Triplet STDP Implemented with Memristive Synapse

Authors: Y. Liu, D. Wang, Z. Dong, W. Zhao
Arxiv: https://arxiv.org/abs/2306.07712
TLDR: As an extension of the pairwise spike-timingdependent plasticity (STDP) learning rule, the triplet STDP is provided with greater capability in characterizing the synaptic changes in the biological neural cell. In this work, a novel mixedsignal circuit scheme, called multiple-step quantized triplet stDP, is designed to provide a precise and flexible implementation of coactivation triplet-STDP learning rule in memristive synapse spiking neural network. The
Repo: None

Robustness of SAM: Segment Anything Under Corruptions and Beyond

Authors: Yu Qiao, Chaoning Zhang, Taegoo Kang, Donghun Kim, Shehbaz Tariq, Chenshuang Zhang, Choong Seon Hong
Arxiv: https://arxiv.org/abs/2306.07713
TLDR: Segment anything model (SAM), as the name suggests, is claimed to be capable of cutting out any object. SAM is a vision foundation model which demonstrates impressive zero-shot transfer performance with the guidance of a prompt. However, there is currently a lack of comprehensive evaluation of its robustness performance under various types of corruptions. Prior works show that SAM is biased towards texture (style) rather than shape, motivated by which we start by investigating SAM's robustness against style transfer,
Repo: None

Theoretical Foundations of Adversarially Robust Learning

Authors: Omar Montasser
Arxiv: https://arxiv.org/abs/2306.07723
TLDR: Despite extraordinary progress, current machine learning systems have been shown to be brittle against adversarial examples: seemingly innocuous but carefully crafted perturbations of test examples that cause machine learning predictors to misclassify. Can we learn predictors robust to adversarial arguments? and how? There has been much empirical interest in this contemporary challenge in machine learning, and in this thesis, we address it from a theoretical perspective. In this thesis we explore what robustness properties can we hope to guarantee against
Repo: None

BeliefPPG: Uncertainty-aware Heart Rate Estimation from PPG signals via Belief Propagation

Authors: Valentin Bieri, Paul Streli, Berken Utku Demirel, Christian Holz
Arxiv: https://arxiv.org/abs/2306.07730
TLDR: We present a novel learning-based method that achieves state-of-the-art performance on several heart rate estimation benchmarks extracted from photoplethysmography signals (PPG). We consider the evolution of the heart rate in the context of a discrete-time stochastic process that we represent as a hidden Markov model. We derive a distribution over possible heart rate values for a given PPG signal window through a trained neural network. Using belief propagation, we incorporate the statistical
Repo: None

Robustness and Generalization Performance of Deep Learning Models on Cyber-Physical Systems: A Comparative Study

Authors: Alexander Windmann, Henrik Steude, Oliver Niggemann
Arxiv: https://arxiv.org/abs/2306.07737
TLDR: Deep learning (DL) models have seen increased attention for time series forecasting, yet the application on cyber-physical systems (CPS) is hindered by the lacking robustness of these methods. Thus, this study evaluates the robustness and generalization performance of DL architectures on multivariate time series data from CPS. Our investigation focuses on the models' ability to handle a range of perturbations, such as sensor faults and noise, and assesses their impact on overall performance. Furthermore,
Repo: None

Generative Watermarking Against Unauthorized Subject-Driven Image Synthesis

Authors: Yihan Ma, Zhengyu Zhao, Xinlei He, Zheng Li, Michael Backes, Yang Zhang
Arxiv: https://arxiv.org/abs/2306.07754
TLDR: Large text-to-image models have shown remarkable performance in synthesizing high-quality images. In particular, the subject-driven model makes it possible to personalize the image synthesis for a specific subject, e.g., a human face or an artistic style, by fine-tuning the generic text- to-image model with a few images from that subject. Nevertheless, misuse of subject-based image synthesis may violate the authority of subject owners. For example, malicious users may
Repo: None

Low-Resource White-Box Semantic Segmentation of Supporting Towers on 3D Point Clouds via Signature Shape Identification

Authors: Diogo Lavado, Cláudia Soares, Alessandra Micheletti, Giovanni Bocchi, Alex Coronati, Manuel Silva, Patrizio Frosini
Arxiv: https://arxiv.org/abs/2306.07809
TLDR: Research in 3D semantic segmentation has been increasing performance metrics, like the IoU, by scaling model complexity and computational resources, leaving behind researchers and practitioners that (1) cannot access the necessary resources and (2) do need transparency on the model decision mechanisms. In this paper, we propose SCENE-Net, a low-resource white-box model for 3D point cloud semantic segmenting. SCENE--Net identifies signature shapes on the point cloud via group equivariant
Repo: None

Adversarial Capsule Networks for Romanian Satire Detection and Sentiment Analysis

Authors: Sebastian-Vasile Echim, Răzvan-Alexandru Smădu, Andrei-Marius Avram, Dumitru-Clementin Cercel, Florin Pop
Arxiv: https://arxiv.org/abs/2306.07845
TLDR: Satire detection and sentiment analysis are intensively explored natural language processing (NLP) tasks that study the identification of the satirical tone from texts and extracting sentiments in relationship with their targets. In languages with fewer research resources, an alternative is to produce artificial examples based on character-level adversarial processes to overcome dataset size limitations. Such samples are proven to act as a regularization method, thus improving the robustness of models. In this work, we improve the well-known NLP
Repo: None

Temporal Gradient Inversion Attacks with Robust Optimization

Authors: Bowen Li, Hanlin Gu, Ruoxin Chen, Jie Li, Chentao Wu, Na Ruan, Xueming Si, Lixin Fan
Arxiv: https://arxiv.org/abs/2306.07883
TLDR: Federated Learning (FL) has emerged as a promising approach for collaborative model training without sharing private data. However, privacy concerns regarding information exchanged during FL have received significant research attention. Gradient Inversion Attacks (GIAs) have been proposed to reconstruct the private data retained by local clients from the exchanged gradients. While recovering private data, the data dimensions and the model complexity increase, which thwart data reconstruction by GIAs. Existing methods adopt prior knowledge about private data to overcome
Repo: None

Best-Case Retrieval Evaluation: Improving the Sensitivity of Reciprocal Rank with Lexicographic Precision

Authors: Fernando Diaz
Arxiv: https://arxiv.org/abs/2306.07908
TLDR: Across a variety of ranking tasks, researchers use reciprocal rank to measure the effectiveness for users interested in exactly one relevant item. Despite its widespread use, evidence suggests that reciprocal rank is brittle when discriminating between systems. This brittleness, in turn, is compounded in modern evaluation settings where current, high-precision systems may be difficult to distinguish. We address the lack of sensitivity of reciprocal rank by introducing and connecting it to the concept of best-case retrieval, an evaluation method focusing on
Repo: https://github.com/diazf/pref_eval

Large Language Model Is Semi-Parametric Reinforcement Learning Agent

Authors: Danyang Zhang, Lu Chen, Situo Zhang, Hongshen Xu, Zihan Zhao, Kai Yu
Arxiv: https://arxiv.org/abs/2306.07929
TLDR: Inspired by the insights in cognitive science with respect to human memory and reasoning mechanism, a novel evolvable LLM-based (Large Language Model) agent framework is proposed as REMEMBERER. By equipping the LLM with a long-term experience memory, REMREMEMBERER is capable of exploiting the experiences from the past episodes even for different task goals, which is the same as the previous SOTA. In this way, the system can learn from the experiences of the
Repo: None

Reducing Exposure to Harmful Content via Graph Rewiring

Authors: Corinna Coupette, Stefan Neumann, Aristides Gionis
Arxiv: https://arxiv.org/abs/2306.07930
TLDR: Most media content consumed today is provided by digital platforms that aggregate input from diverse sources, where access to information is mediated by recommendation algorithms. One principal challenge in this context is dealing with content that is considered harmful. Striking a balance between competing stakeholder interests, rather than block harmful content altogether, one approach is to minimize the exposure to such content that are induced specifically by algorithmic recommendations. Hence, modeling media items and recommendations as a directed graph, we study the problem of reducing the
Repo: None

Keyword: summarization

TransCoder: Towards Unified Transferable Code Representation Learning Inspired by Human Skills

Authors: Qiushi Sun, Nuo Chen, Jianing Wang, Xiang Li, Ming Gao
Arxiv: https://arxiv.org/abs/2306.07285
TLDR: Code pre-trained models (CodePTMs) have recently demonstrated a solid capacity to process various software intelligence tasks, e.g., code clone detection, code translation, and code summarization. The current mainstream method that deploys these models to downstream tasks is to fine-tune them on individual tasks, which is generally costly and needs sufficient data for large models. To tackle the issue, in this paper, we present TransCoder, a unified Transferable fine-tuning
Repo: None

Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks

Authors: Veniamin Veselovsky, Manoel Horta Ribeiro, Robert West
Arxiv: https://arxiv.org/abs/2306.07899
TLDR: Large language models (LLMs) are remarkable data annotators. They can be used to generate high-fidelity supervised training data, as well as survey and experimental data. With the widespread adoption of LLMs, human gold--standard annotations are key to understanding the capabilities of LLM and the validity of their results. However, crowdsourcing, an important, inexpensive way to obtain human annotations, may itself be impacted by LLM usage, as crowd workers have financial incentives to use LL
Repo: None

Keyword: text generation

Adding guardrails to advanced chatbots

Authors: Yanchen Wang, Lisa Singh
Arxiv: https://arxiv.org/abs/2306.07500
TLDR: Generative AI models continue to become more powerful. The launch of ChatGPT in November 2022 has ushered in a new era of AI. ChatGPL and other similar chatbots have a range of capabilities, from answering student homework questions to creating music and art. There are already concerns that humans may be replaced by chatbots for a variety of jobs. Because of the wide spectrum of data chatbots are built on, we know that they will have human errors and human biases built into
Repo: None
@e-tornike e-tornike self-assigned this Jun 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment