Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New submissions for Tue, 27 Jun 23 #383

Open
e-tornike opened this issue Jun 27, 2023 · 0 comments
Open

New submissions for Tue, 27 Jun 23 #383

e-tornike opened this issue Jun 27, 2023 · 0 comments

Comments

@e-tornike
Copy link
Owner

Keyword: computational social science

Machine Learning and Consumer Data

Authors: Hannah H. Chang, Anirban Mukherjee
Arxiv: https://arxiv.org/abs/2306.14118
TLDR: The digital revolution has led to the digitization of human behavior, creating unprecedented opportunities to understand observable actions on an unmatched scale. Emerging phenomena such as crowdfunding and crowdsourcing have further illuminated consumer behavior while also introducing new behavioral patterns. However, the sheer volume and complexity of this data present significant challenges for marketing researchers and practitioners. Traditional methods used to analyze consumer data fall short in handling the breadth, precision, and scale of emerging data sources. To address this, computational methods have been developed to
Repo: None

Keyword: contrastive

Similarity Preserving Adversarial Graph Contrastive Learning

Authors: Yeonjun In, Kanghoon Yoon, Chanyoung Park
Arxiv: https://arxiv.org/abs/2306.13854
TLDR: Recent works demonstrate that GNN models are vulnerable to adversarial attacks, which refer to imperceptible perturbation on the graph structure and node features. Among various GNN model, graph contrastive learning (GCL) based methods specifically suffer from adversarial Attacks due to their inherent design that highly depends on the self-supervision signals derived from the original graph, which however already contains noise when the graph is attacked. To achieve adversarial robustness against such attacks, existing methods
Repo: None

Structuring Representation Geometry with Rotationally Equivariant Contrastive Learning

Authors: Sharut Gupta, Joshua Robinson, Derek Lim, Soledad Villar, Stefanie Jegelka
Arxiv: https://arxiv.org/abs/2306.13924
TLDR: Self-supervised learning converts raw perceptual data such as images to a compact space where simple Euclidean distances measure meaningful variations in data. In this paper, we extend this formulation by adding additional geometric structure to the embedding space by enforcing transformations of input space to correspond to simple (i.e., linear) transformations of embeddedding space. Specifically, in the contrastive learning setting, we introduce an equivariance objective and theoretically prove that its minima forces augmentations on input
Repo: None

Weakly Supervised Multi-Label Classification of Full-Text Scientific Papers

Authors: Yu Zhang, Bowen Jin, Xiusi Chen, Yanzhen Shen, Yunyi Zhang, Yu Meng, Jiawei Han
Arxiv: https://arxiv.org/abs/2306.14003
TLDR: Instead of relying on human-annotated training samples to build a classifier, weakly supervised scientific paper classification aims to classify papers only using category descriptions (e.g., category names, category-indicative keywords). Existing studies onweakly supervised paper classification are less concerned with two challenges: (1) Papers should be classified into not only coarse-grained research topics but also fine-granted themes, and potentially into multiple themes, given a large and fine-Gr
Repo: None

Scribble-supervised Cell Segmentation Using Multiscale Contrastive Regularization

Authors: Hyun-Jic Oh, Kanggeun Lee, Won-Ki Jeong
Arxiv: https://arxiv.org/abs/2306.14136
TLDR: Current state-of-the-art supervised deep learning-based segmentation approaches have demonstrated superior performance in medical image segmentation tasks. However, such supervised approaches require fully annotated pixel-level ground-truth labels, which are labor-intensive and time-consuming to acquire. Recently, Scribble2Label (S2L) demonstrated that using only a handful of scribbles with self-supervised learning can generate accurate segmentation results without full annotation.However, owing to the
Repo: None

Improving Reference-based Distinctive Image Captioning with Contrastive Rewards

Authors: Yangjun Mao, Jun Xiao, Dong Zhang, Meng Cao, Jian Shao, Yueting Zhuang, Long Chen
Arxiv: https://arxiv.org/abs/2306.14259
TLDR: Distinctive Image Captioning (DIC) -- generating distinctive captions that describe the unique details of a target image -- has received considerable attention over the last few years. A recent DIC method proposes to generate distinctive captIONS by comparing the target image with a set of semantic-similar reference images, i.e., reference-based DIC (Ref-DIC). It aims to force the generated captions to distinguish between the target images and the reference image. To ensure
Repo: None

Multi-Scale Cross Contrastive Learning for Semi-Supervised Medical Image Segmentation

Authors: Qianying Liu, Xiao Gu, Paul Henderson, Fani Deligianni
Arxiv: https://arxiv.org/abs/2306.14293
TLDR: Semi-supervised learning has demonstrated great potential in medical image segmentation by utilizing knowledge from unlabeled data. However, most existing approaches do not explicitly capture high-level semantic relations between distant regions, which limits their performance. In this paper, we focus on representation learning for semi-Supervised learning, by developing a novel Multi-Scale Cross Supervised Contrastive Learning (MCSC) framework, to segment structures in medical images. We jointly train CNN and Transformer models
Repo: None

ContentCTR: Frame-level Live Streaming Click-Through Rate Prediction with Multimodal Transformer

Authors: Jiaxin Deng, Dong Shen, Shiyao Wang, Xiangyu Wu, Fan Yang, Guorui Zhou, Gaofeng Meng
Arxiv: https://arxiv.org/abs/2306.14392
TLDR: In recent years, live streaming platforms have gained immense popularity as they allow users to broadcast their videos and interact in real-time with hosts and peers. Due to the dynamic changes of live content, accurate recommendation models are crucial for enhancing user experience. However, most previous works treat the live as a whole item and explore the Click-through-Rate (CTR) prediction framework on item-level, neglecting that the static changes that occur even within the same live room. In this
Repo: None

Contrastive Multi-view Framework for Customer Lifetime Value Prediction

Authors: Chuhan Wu, Jingjie Li, Qinglin Jia, Hong Zhu, Yuan Fang, Ruiming Tang
Arxiv: https://arxiv.org/abs/2306.14400
TLDR: Accurate customer lifetime value (LTV) prediction can help service providers optimize their marketing policies in customer-centric applications. However, the heavy sparsity of consumption events and the interference of data variance and noise obstruct LTV estimation. Many existing LTV prediction methods directly train a single-view LTV predictor on consumption samples, which may yield inaccurate and even biased knowledge extraction. In this paper, we propose a contrastive multi-view framework for LTV Prediction, which is a plug
Repo: None

A Self-supervised Contrastive Learning Method for Grasp Outcomes Prediction

Authors: Chengliang Liu, Binhua Huang, Yiwen Liu, Yuanzhe Su, Ke Mai, Yupo Zhang, Zhengkun Yi, Xinyu Wu
Arxiv: https://arxiv.org/abs/2306.14437
TLDR: In this paper, we investigate the effectiveness of contrastive learning methods for predicting grasp outcomes in an unsupervised manner. By utilizing a publicly available dataset, we demonstrate that contrastivelearning methods perform well on the task of grasp outcomes prediction. Specifically, the dynamic-dictionary-based method with the momentum updating technique achieves a satisfactory accuracy of 81.83% using data from one single tactile sensor, outperforming other unsupervisory methods. Our results reveal the potential of Contrastive learning
Repo: None

Histopathology Image Classification using Deep Manifold Contrastive Learning

Authors: Jing Wei Tan, Won-Ki Jeong
Arxiv: https://arxiv.org/abs/2306.14459
TLDR: Contrastive learning has gained popularity due to its robustness with good feature representation performance. However, cosine distance, the commonly used similarity metric in contrastive learning, is not well suited to represent the distance between two data points, especially on a nonlinear feature manifold. Inspired by manifold learning, we propose a novel extension of contrastive training that leverages geodesic distance between features as a similarity metric for histopathology whole slide image classification. To reduce the computational overhead in
Repo: None

Hard Sample Mining Enabled Contrastive Feature Learning for Wind Turbine Pitch System Fault Diagnosis

Authors: Zixuan Wang, Bo Qin, Mengxuan Li, Mark D. Butala, Haibo Wang, Peng Peng, Hongwei Wang
Arxiv: https://arxiv.org/abs/2306.14701
TLDR: The efficient utilization of wind power by wind turbines relies on the ability of their pitch systems to adjust blade pitch angles in response to varying wind speeds. However, the presence of multiple fault types in the pitch system poses challenges in accurately classifying these faults. This paper proposes a novel method based on hard sample mining-enabled contrastive feature learning (HSMCFL) to address this problem. The proposed method employs cosine similarity to identify hard samples and subsequently leverages contrastive features learning to
Repo: None

Keyword: data augmentation

Structuring Representation Geometry with Rotationally Equivariant Contrastive Learning

Authors: Sharut Gupta, Joshua Robinson, Derek Lim, Soledad Villar, Stefanie Jegelka
Arxiv: https://arxiv.org/abs/2306.13924
TLDR: Self-supervised learning converts raw perceptual data such as images to a compact space where simple Euclidean distances measure meaningful variations in data. In this paper, we extend this formulation by adding additional geometric structure to the embedding space by enforcing transformations of input space to correspond to simple (i.e., linear) transformations of embeddedding space. Specifically, in the contrastive learning setting, we introduce an equivariance objective and theoretically prove that its minima forces augmentations on input
Repo: None

An Analysis of Personalized Speech Recognition System Development for the Deaf and Hard-of-Hearing

Authors: Lester Phillip Violeta, Tomoki Toda
Arxiv: https://arxiv.org/abs/2306.13953
TLDR: Deaf or hard-of-hearing (DHH) speakers typically have atypical speech caused by deafness. With the growing support of speech-based devices and software applications, more work needs to be done to make these devices inclusive to everyone. To do so, we analyze the use of openly-available automatic speech recognition (ASR) tools with a DHH Japanese speaker dataset. As these out- of-the-box ASR models typically do not perform well
Repo: None

Towards Robust Aspect-based Sentiment Analysis through Non-counterfactual Augmentations

Authors: Xinyu Liu, Yan Ding, Kaikai An, Chunyang Xiao, Pranava Madhyastha, Tong Xiao, Jingbo Zhu
Arxiv: https://arxiv.org/abs/2306.13971
TLDR: While state-of-the-art NLP models have demonstrated excellent performance for aspect based sentiment analysis (ABSA), substantial evidence has been presented on their lack of robustness. This is especially manifested as significant degradation in performance when faced with out- of-distribution data. Recent solutions that rely on counterfactually augmented datasets show promising results, but they are inherently limited because of the lack of access to explicit causal structure. In this paper, we present an alternative approach that relies
Repo: None

Weighted Automata Extraction and Explanation of Recurrent Neural Networks for Natural Language Tasks

Authors: Zeming Wei, Xiyue Zhang, Yihao Zhang, Meng Sun
Arxiv: https://arxiv.org/abs/2306.14040
TLDR: Recurrent Neural Networks (RNNs) have achieved tremendous success in processing sequential data, yet understanding and analyzing their behaviours remains a significant challenge. To this end, many efforts have been made to extract finite automata from RNNs, which are more amenable for analysis and explanation. However, existing approaches like exact learning and compositional approaches for model extraction have limitations in either scalability or precision. In this paper, we propose a novel framework of Weighted Finite Automata
Repo: None

Semi-supervised Object Detection: A Survey on Recent Research and Progress

Authors: Yanyang Wang, Zhaoxiang Liu, Shiguo Lian
Arxiv: https://arxiv.org/abs/2306.14106
TLDR: In recent years, deep learning technology has been maturely applied in the field of object detection, and most algorithms tend to be supervised learning. However, a large amount of labeled data requires high costs of human resources, which brings about low efficiency and limitations. Semi-supervised object detection (SSOD) has been paid more and more attentions due to its high research value and practicability. It is designed to learn information by using small amounts of labeled and large amounts of unl
Repo: None

A Web-based Mpox Skin Lesion Detection System Using State-of-the-art Deep Learning Models Considering Racial Diversity

Authors: Shams Nafisa Ali, Md. Tazuddin Ahmed, Tasnim Jahan, Joydip Paul, S. M. Sakeef Sani, Nawsabah Noor, Anzirun Nahar Asma, Taufiq Hasan
Arxiv: https://arxiv.org/abs/2306.14169
TLDR: The recent 'Mpox' outbreak, formerly known as 'Monkeypox', has become a significant public health concern and has spread to over 110 countries globally. The challenge of clinically diagnosing mpox early on is due, in part, to its similarity to other types of rashes. Computer-aided screening tools have been proven valuable in cases where Polymerase Chain Reaction (PCR) based diagnosis is not immediately available. Deep learning methods are powerful in learning complex data representations
Repo: None

Few-Shot Continual Learning via Flat-to-Wide Approaches

Authors: Muhammad Anwar Ma'sum, Mahardhika Pratama, Lin Liu, Edwin Lughofer, Habibullah, Ryszard Kowalczyk
Arxiv: https://arxiv.org/abs/2306.14369
TLDR: Existing approaches on continual learning call for a lot of samples in their training processes. Such approaches are impractical for many real-world problems having limited samples because of the overfitting problem. This paper proposes a few-shot continual learning approach, termed FLat-tO-WidE AppRoach (FLOWER), where a flat-to-wide learning process finding the flat-wide minima is proposed to address the catastrophic forgetting problem. The issue of data scarcity is overcome
Repo: https://github.com/anwarmaxsum/flower

Pseudo-Trilateral Adversarial Training for Domain Adaptive Traversability Prediction

Authors: Zheng Chen, Durgakant Pushp, Jason M. Gregory, Lantao Liu
Arxiv: https://arxiv.org/abs/2306.14370
TLDR: Traversability prediction is a fundamental perception capability for autonomous navigation. Deep neural networks (DNNs) have been widely used to predict traversability during the last decade. The performance of DNNs is significantly boosted by exploiting a large amount of data. However, the diversity of data in different domains imposes significant gaps in the prediction performance. In this work, we make efforts to reduce the gaps by proposing a novel pseudo-trilateral adversarial model that adopts a coarse-to
Repo: None

Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition

Authors: Samuel Cahyawijaya, Holy Lovenia, Willy Chung, Rita Frieske, Zihan Liu, Pascale Fung
Arxiv: https://arxiv.org/abs/2306.14517
TLDR: Speech emotion recognition plays a crucial role in human-computer interactions. However, most speech emotion recognition research is biased toward English-speaking adults, which hinders its applicability to other demographic groups in different languages and age groups. In this work, we analyze the transferability of emotion recognition across three different languages--English, Mandarin Chinese, and Cantonese; and 2 different age groups--adults and the elderly. To conduct the experiment, we develop an English-Mandarin
Repo: None

Accelerating Molecular Graph Neural Networks via Knowledge Distillation

Authors: Filip Ekström Kelvinius, Dimitar Georgiev, Artur Petrov Toshev, Johannes Gasteiger
Arxiv: https://arxiv.org/abs/2306.14818
TLDR: Recent advances in graph neural networks (GNNs) have allowed molecular simulations with accuracy on par with conventional gold-standard methods at a fraction of the computational cost. Nonetheless, as the field has been progressing to bigger and more complex architectures, state-of-the-art GNNs have become largely prohibitive for many large-scale applications. In this paper, we, for the first time, explore the utility of knowledge distillation (KD) for accelerating molecular GNN
Repo: None

Keyword: knowledge graph

DEKGCI: A double-sided recommendation model for integrating knowledge graph and user-item interaction graph

Authors: Yajing Yang, Zeyu Zeng, Mao Chen, Ruirui Shang
Arxiv: https://arxiv.org/abs/2306.13837
TLDR: Both knowledge graphs and user-item interaction graphs are frequently used in recommender systems due to their ability to provide rich information for modeling users and items. However, existing studies often focused on one of these sources (either the knowledge graph or the user- item interaction graph), resulting in underutilization of the benefits that can be obtained by integrating both sources of information. In this paper, we propose DEKGCI, a novel double-sided recommendation model. In DEK GCI
Repo: None

IERL: Interpretable Ensemble Representation Learning -- Combining CrowdSourced Knowledge and Distributed Semantic Representations

Authors: Yuxin Zi, Kaushik Roy, Vignesh Narayanan, Manas Gaur, Amit Sheth
Arxiv: https://arxiv.org/abs/2306.13865
TLDR: Large Language Models (LLMs) encode meanings of words in the form of distributed semantics. Distributed semantics capture common statistical patterns among language tokens (words, phrases, and sentences) from large amounts of data. LLMs perform exceedingly well across General Language Understanding Evaluation (GLUE) tasks designed to test a model's understanding of the meanings of the input tokens. However, recent studies have shown that LLMs tend to generate unintended, inconsistent, or wrong texts as outputs when processing inputs that
Repo: None

Knowledge Graph-Augmented Korean Generative Commonsense Reasoning

Authors: Dahyun Jung, Jaehyung Seo, Jaewook Lee, Chanjun Park, Heuiseok Lim
Arxiv: https://arxiv.org/abs/2306.14470
TLDR: Generative commonsense reasoning refers to the task of generating acceptable and logical assumptions about everyday situations based on commonsense understanding. By utilizing an existing dataset such as Korean CommonGen, language generation models can learn commonsense arguments specific to the Korean language. However, language models often fail to consider the relationships between concepts and the deep knowledge inherent to concepts. To address these limitations, we propose a method to utilize the Korean knowledge graph data for text generation. Our experimental result shows that the proposed method
Repo: None

TransERR: Translation-based Knowledge Graph Completion via Efficient Relation Rotation

Authors: Jiang Li, Xiangdong Su
Arxiv: https://arxiv.org/abs/2306.14580
TLDR: This paper presents translation-based knowledge graph completion method via efficient relation rotation (TransERR), a straightforward yet effective alternative to traditional translation-solutional learning methods. TransERR encodes knowledge graphs in the hypercomplex-valued space, thus enabling it to possess a higher degree of translation freedom in mining latent information between the head and tail entities. To further minimize the translation distance, TranserR adaptively rotates the head entity and the tail entity with their corresponding unit qu
Repo: https://github.com/dellixx/transerr

Keyword: legal

Can GPT-4 Support Analysis of Textual Data in Tasks Requiring Highly Specialized Domain Expertise?

Authors: Jaromir Savelka, Kevin D. Ashley, Morgan A Gray, Hannes Westermann, Huihui Xu
Arxiv: https://arxiv.org/abs/2306.13906
TLDR: We evaluated the capability of generative pre-trained transformers~(GPT-4) in analysis of textual data in tasks that require highly specialized domain expertise. Specifically, we focused on the task of analyzing court opinions to interpret legal concepts. We found that GPT- 4, prompted with annotation guidelines, performs on par with well-trained law student annotators. We observed that, with a relatively minor decrease in performance, GPT‐4 can perform batch predictions leading to significant
Repo: None

On the Uses of Large Language Models to Interpret Ambiguous Cyberattack Descriptions

Authors: Reza Fayyazi, Shanchieh Jay Yang
Arxiv: https://arxiv.org/abs/2306.14062
TLDR: The volume, variety, and velocity of change in vulnerabilities and exploits have made incident threat analysis challenging with human expertise and experience along. The MITRE AT&CK framework employs Tactics, Techniques, and Procedures (TTPs) to describe how and why attackers exploit vulnerabilities. However, a TTP description written by one security professional can be interpreted very differently by another, leading to confusion in cybersecurity operations or even business, policy, and legal decisions. Meanwhile, advancements in AI have led
Repo: None

LiResolver: License Incompatibility Resolution for Open Source Software

Authors: Sihan Xu, Ya Gao, Lingling Fan, Linyu Li, Xiangrui Cai, Zheli Liu
Arxiv: https://arxiv.org/abs/2306.14675
TLDR: Open source software (OSS) licenses regulate the conditions under which OSS can be legally reused, distributed, and modified. However, a common issue arises when incorporating third-party OSS accompanied with licenses, i.e., license incompatibility, which occurs when multiple licenses exist in one project and there are conflicts between them. Despite being problematic, fixing license expatibility issues requires substantial efforts due to the lack of license understanding and complex package dependency. In this paper, we propose LiRes
Repo: None

Keyword: mixup

Pseudo-Trilateral Adversarial Training for Domain Adaptive Traversability Prediction

Authors: Zheng Chen, Durgakant Pushp, Jason M. Gregory, Lantao Liu
Arxiv: https://arxiv.org/abs/2306.14370
TLDR: Traversability prediction is a fundamental perception capability for autonomous navigation. Deep neural networks (DNNs) have been widely used to predict traversability during the last decade. The performance of DNNs is significantly boosted by exploiting a large amount of data. However, the diversity of data in different domains imposes significant gaps in the prediction performance. In this work, we make efforts to reduce the gaps by proposing a novel pseudo-trilateral adversarial model that adopts a coarse-to
Repo: None

A Positive-Unlabeled Metric Learning Framework for Document-Level Relation Extraction with Incomplete Labeling

Authors: Ye Wang, Huazheng Pan, Tao Zhang, Wen Wu, Wenxin Hu
Arxiv: https://arxiv.org/abs/2306.14806
TLDR: The goal of document-level relation extraction (RE) is to identify relations between entities that span multiple sentences. Recently, incomplete labeling in document- level RE has received increasing attention, and some studies have used methods such as positive-unlabeled learning to tackle this issue, but there is still a lot of room for improvement. Motivated by this, we propose a positive-augmentation and positive-mixup positive-nonlabeled metric learning framework (P3M).
Repo: None

Keyword: multi-task

MIRACLE: Multi-task Learning based Interpretable Regulation of Autoimmune Diseases through Common Latent Epigenetics

Authors: Pengcheng Xu, Jinpu Cai, Yulin Gao, Ziqi Rong, Hongyi Xin
Arxiv: https://arxiv.org/abs/2306.13866
TLDR: DNA methylation is a crucial regulator of gene transcription and has been linked to various diseases, including autoimmune diseases and cancers. However, diagnostics based on DNA methylation face challenges due to large feature sets and small sample sizes, resulting in overfitting and suboptimal performance. To address these issues, we propose MIRACLE, a novel interpretable neural network that leverages autoencoder-based multi-task learning to integrate multiple datasets and jointly identify common patterns in DNA
Repo: None

Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning

Authors: Xiao Ma, Swaroop Mishra, Ahmad Beirami, Alex Beutel, Jilin Chen
Arxiv: https://arxiv.org/abs/2306.14308
TLDR: Language models still struggle on moral reasoning, despite their impressive performance in many other tasks. In particular, the Moral Scenarios task in MMLU (Multi-task Language Understanding) is among the worst performing tasks for many language models, including GPT-3. In this work, we propose a new prompting framework, Thought Experiments, to teach language models to do better moral reasoning using counterfactuals. Experiment results show that our framework elicits counterfactually questions and
Repo: None

Multi-task Item-attribute Graph Pre-training for Strict Cold-start Item Recommendation

Authors: Yuwei Cao, Liangwei Yang, Chen Wang, Zhiwei Liu, Hao Peng, Chenyu You, Philip S. Yu
Arxiv: https://arxiv.org/abs/2306.14462
TLDR: Recommendation systems suffer in the strict cold-start (SCS) scenario, where the user-item interactions are entirely unavailable. The ID-based approaches completely fail to work. Cold-start recommenders, on the other hand, leverage item contents to map the new items to the existing ones. However, the existing SCS recommenders explore item contents in coarse-grained manners that introduce noise or information loss. Moreover, informative data sources other than item contents, such as users
Repo: https://github.com/yuweicao-uic/coldgpt

ChiPFormer: Transferable Chip Placement via Offline Decision Transformer

Authors: Yao Lai, Jinxin Liu, Zhentao Tang, Bin Wang, Jianye Hao, Ping Luo
Arxiv: https://arxiv.org/abs/2306.14744
TLDR: Placement is a critical step in modern chip design, aiming to determine the positions of circuit modules on the chip canvas. Recent works have shown that reinforcement learning (RL) can improve human performance in chip placement. However, such an RL-based approach suffers from long training time and low transfer ability in unseen chip circuits. To resolve these challenges, we cast the chip placement as an offline RL formulation and present ChiPFormer that enables learning a transferable placement policy from fixed offline data.
Repo: None

Composing Parameter-Efficient Modules with Arithmetic Operations

Authors: Jinghan Zhang, Shiqi Chen, Junteng Liu, Junxian He
Arxiv: https://arxiv.org/abs/2306.14870
TLDR: As an efficient alternative to conventional full finetuning, parameter-efficient finetuned (PEFT) is becoming the prevailing method to adapt pretrained language models. In PEFT, a lightweight module is learned on each dataset while the underlying pretrained languages model remains unchanged, resulting in multiple compact modules representing diverse skills when applied to various domains and tasks. In this paper, we propose to compose these parameter-efficiency modules through linear arithmetic operations in the weight space, thereby integrating different module
Repo: https://github.com/sjtu-lit/pem_composition

Keyword: paraphrase

Math Word Problem Solving by Generating Linguistic Variants of Problem Statements

Authors: Syed Rifat Raiyan, Md. Nafis Faiyaz, Shah Md. Jawad Kabir, Mohsinul Kabir, Hasan Mahmud, Md Kamrul Hasan
Arxiv: https://arxiv.org/abs/2306.13899
TLDR: The art of mathematical reasoning stands as a fundamental pillar of intellectual progress and is a central catalyst in cultivating human ingenuity. Researchers have recently published a plethora of works centered around the task of solving Math Word Problems (MWP) $-$ a crucial stride towards general AI. These existing models are susceptible to dependency on shallow heuristics and spurious correlations to derive the solution expressions. In order to ameliorate this issue, in this paper, we propose a framework for MWP solvers
Repo: None

Keyword: plagiarism

FastBCSD: Fast and Efficient Neural Network for Binary Code Similarity Detection

Authors: Chensen Huang, Guibo Zhu, Guojing Ge, Taihao Li, Jinqiao Wang
Arxiv: https://arxiv.org/abs/2306.14168
TLDR: Binary code similarity detection (BCSD) has various applications, including but not limited to vulnerability detection, plagiarism detection, and malware detection. Previous research efforts mainly focus on transforming binary code to assembly code strings using reverse compilation and then using pre-trained deep learning models with large parameters to obtain feature representation vector of binary code. While these models have proven to be effective in representing binary code, their large parameter size leads to considerable computational expenses during both training and inference. In this paper
Repo: None

Keyword: robustness

Improving Panoptic Segmentation for Nighttime or Low-Illumination Urban Driving Scenes

Authors: Ankur Chrungoo
Arxiv: https://arxiv.org/abs/2306.13725
TLDR: Autonomous vehicles and driving systems use scene parsing as an essential tool to understand the surrounding environment. Panoptic segmentation is a state-of-the-art technique which proves to be pivotal in this use case. Deep learning-based architectures have been utilized for effective and efficient PanoptIC Segmentation in recent times. However, when it comes to adverse conditions like dark scenes with poor illumination or nighttime images, existing methods perform poorly in comparison to daytime images. One of the
Repo: None

CeBed: A Benchmark for Deep Data-Driven OFDM Channel Estimation

Authors: Amal Feriani, Di Wu, Steve Liu, Greg Dudek
Arxiv: https://arxiv.org/abs/2306.13761
TLDR: Deep learning has been extensively used in wireless communication problems, including channel estimation. Although several data-driven approaches exist, a fair and realistic comparison between them is difficult due to inconsistencies in the experimental conditions and the lack of a standardized experimental design. In addition, the performance of data-by-the-numbers approaches is often compared based on empirical analysis. The lack of reproducibility and availability of standardized evaluation tools (e.g., datasets, codebases) hinder the development
Repo: None

Similarity Preserving Adversarial Graph Contrastive Learning

Authors: Yeonjun In, Kanghoon Yoon, Chanyoung Park
Arxiv: https://arxiv.org/abs/2306.13854
TLDR: Recent works demonstrate that GNN models are vulnerable to adversarial attacks, which refer to imperceptible perturbation on the graph structure and node features. Among various GNN model, graph contrastive learning (GCL) based methods specifically suffer from adversarial Attacks due to their inherent design that highly depends on the self-supervision signals derived from the original graph, which however already contains noise when the graph is attacked. To achieve adversarial robustness against such attacks, existing methods
Repo: None

Math Word Problem Solving by Generating Linguistic Variants of Problem Statements

Authors: Syed Rifat Raiyan, Md. Nafis Faiyaz, Shah Md. Jawad Kabir, Mohsinul Kabir, Hasan Mahmud, Md Kamrul Hasan
Arxiv: https://arxiv.org/abs/2306.13899
TLDR: The art of mathematical reasoning stands as a fundamental pillar of intellectual progress and is a central catalyst in cultivating human ingenuity. Researchers have recently published a plethora of works centered around the task of solving Math Word Problems (MWP) $-$ a crucial stride towards general AI. These existing models are susceptible to dependency on shallow heuristics and spurious correlations to derive the solution expressions. In order to ameliorate this issue, in this paper, we propose a framework for MWP solvers
Repo: None

Towards Robust Aspect-based Sentiment Analysis through Non-counterfactual Augmentations

Authors: Xinyu Liu, Yan Ding, Kaikai An, Chunyang Xiao, Pranava Madhyastha, Tong Xiao, Jingbo Zhu
Arxiv: https://arxiv.org/abs/2306.13971
TLDR: While state-of-the-art NLP models have demonstrated excellent performance for aspect based sentiment analysis (ABSA), substantial evidence has been presented on their lack of robustness. This is especially manifested as significant degradation in performance when faced with out- of-distribution data. Recent solutions that rely on counterfactually augmented datasets show promising results, but they are inherently limited because of the lack of access to explicit causal structure. In this paper, we present an alternative approach that relies
Repo: None

Individualized Dosing Dynamics via Neural Eigen Decomposition

Authors: Stav Belogolovsky, Ido Greenberg, Danny Eytan, Shie Mannor
Arxiv: https://arxiv.org/abs/2306.14020
TLDR: Dosing models often use differential equations to model biological dynamics. Neural differential equations in particular can learn to predict the derivative of a process, which permits predictions at irregular points of time. However, this temporal flexibility often comes with a high sensitivity to noise, whereas medical problems often present high noise and limited data. Moreover, medical dosing models must generalize reliably over individual patients and changing treatment policies. To address these challenges, we introduce the Neural Eigen Stochastic Differential Equation
Repo: None

Machine Learning needs its own Randomness Standard: Randomised Smoothing and PRNG-based attacks

Authors: Pranav Dahiya, Ilia Shumailov, Ross Anderson
Arxiv: https://arxiv.org/abs/2306.14043
TLDR: Randomness supports many critical functions in the field of machine learning (ML) including optimisation, data selection, privacy, and security. ML systems outsource the task of generating or harvesting randomness to the compiler, the cloud service provider or elsewhere in the toolchain. Yet there is a long history of attackers exploiting poor randomness, or even creating it -- as when the NSA put backdoors in random number generators to break cryptography. In this paper we consider whether attackers can compromise an
Repo: None

SuperBench: A Super-Resolution Benchmark Dataset for Scientific Machine Learning

Authors: Pu Ren, N. Benjamin Erichson, Shashank Subramanian, Omer San, Zarija Lukic, Michael W. Mahoney
Arxiv: https://arxiv.org/abs/2306.14070
TLDR: Super-Resolution (SR) techniques aim to enhance data resolution, enabling the retrieval of finer details, and improving the overall quality and fidelity of the data representation. There is growing interest in applying SR methods to complex spatiotemporal systems within the Scientific Machine Learning (SciML) community, with the hope of accelerating numerical simulations and/or improving forecasts in weather, climate, and related areas. However, the lack of standardized benchmark datasets for comparing and validating SR methods hind
Repo: None

Robust Spatiotemporal Traffic Forecasting with Reinforced Dynamic Adversarial Training

Authors: Fan Liu, Weijia Zhang, Hao Liu
Arxiv: https://arxiv.org/abs/2306.14126
TLDR: Machine learning-based forecasting models are commonly used in Intelligent Transportation Systems (ITS) to predict traffic patterns and provide city-wide services. However, most of the existing models are susceptible to adversarial attacks, which can lead to inaccurate predictions and negative consequences such as congestion and delays. Therefore, improving the adversarial robustness of these models is crucial for ITS. In this paper, we propose a novel framework for incorporating adversarial training into spatiotemporal traffic forecasting tasks. We demonstrate
Repo: None

Provably Convergent Policy Optimization via Metric-aware Trust Region Methods

Authors: Jun Song, Niao He, Lijun Ding, Chaoyue Zhao
Arxiv: https://arxiv.org/abs/2306.14133
TLDR: Trust-region methods based on Kullback-Leibler divergence are pervasively used to stabilize policy optimization in reinforcement learning. In this paper, we exploit more flexible metrics and examine two natural extensions of policy optimization with Wasserstein and Sinkhorn trust regions, namely Wasserste policy optimization (WPO) and Sankhorn policy optimization/SPO). Instead of restricting the policy to a parametric distribution class, we directly optimize the policy distribution and derive their closed
Repo: None

BotanicGarden: A high-quality and large-scale robot navigation dataset in challenging natural environments

Authors: Yuanzhi Liu, Yujia Fu, Minghui Qin, Yufeng Xu, Baoxin Xu, Fengdong Chen, Bart Goossens, Hongwei Yu, Chun Liu, Long Chen, Wei Tao, Hui Zhao
Arxiv: https://arxiv.org/abs/2306.14137
TLDR: The rapid developments of mobile robotics and autonomous navigation over the years are largely empowered by public datasets for testing and upgrading, such as SLAM and localization tasks. Impressive demos and benchmark results have arisen, indicating the establishment of a mature technical framework. However, from the view point of real-world deployments, there are still critical defects of robustness in challenging environments, especially in large-scale, GNSS-denied, textural-monotonous, and unstructured scenarios
Repo: None

BiFF: Bi-level Future Fusion with Polyline-based Coordinate for Interactive Trajectory Prediction

Authors: Yiyao Zhu, Di Luan, Shaojie Shen
Arxiv: https://arxiv.org/abs/2306.14161
TLDR: Predicting future trajectories of surrounding agents is essential for safety-critical autonomous driving. Most existing work focuses on predicting marginal trajectories for each agent independently. However, it has rarely been explored in predicting joint trajectories. In this work, we propose Bi-level Future Fusion (BiFF) to explicitly capture future interactions between interactive agents. Concretely, BiFF fuses the high-level future intentions followed by low-level behaviors. Then the polyline-based coordinate
Repo: None

On Evaluating the Adversarial Robustness of Semantic Segmentation Models

Authors: Levente Halmosi, Mark Jelasity
Arxiv: https://arxiv.org/abs/2306.14217
TLDR: Achieving robustness against adversarial input perturbation is an important and intriguing problem in machine learning. In the area of semantic image segmentation, a number of adversarial training approaches have been proposed as a defense against advers adversarial perturbations, but the methodology of evaluating the robustness of the models is still lacking, compared to image classification. Here, we demonstrate that, just like in image classification, it is important to evaluate the models over several different and hard attacks
Repo: None

A Spectral Perspective towards Understanding and Improving Adversarial Robustness

Authors: Binxiao Huang, Rui Lin, Chaofan Tao, Ngai Wong
Arxiv: https://arxiv.org/abs/2306.14262
TLDR: Deep neural networks (DNNs) are incredibly vulnerable to crafted, imperceptible adversarial perturbations. While adversarial training (AT) has proven to be an effective defense approach, the AT mechanism for robustness improvement is not fully understood. This work investigates AT from a spectral perspective, adding new insights to the design of effective defenses. In particular, we show that AT induces the deep model to focus more on the low-frequency region, which retains the shape-biased
Repo: None

Enhancing Adversarial Training via Reweighting Optimization Trajectory

Authors: Tianjin Huang, Shiwei Liu, Tianlong Chen, Meng Fang, Li Shen, Vlaod Menkovski, Lu Yin, Yulong Pei, Mykola Pechenizkiy
Arxiv: https://arxiv.org/abs/2306.14275
TLDR: Despite the fact that adversarial training has become the de facto method for improving the robustness of deep neural networks, it is well-known that vanilla adversarial learning suffers from daunting robust overfitting, resulting in unsatisfactory robust generalization. A number of approaches have been proposed to address these drawbacks such as extra regularization, adversarial weights perturbation, and training with more data over the last few years. However, the robustgeneralization improvement is yet far from satisfactory. In
Repo: None

Adaptive Sharpness-Aware Pruning for Robust Sparse Networks

Authors: Anna Bair, Hongxu Yin, Maying Shen, Pavlo Molchanov, Jose Alvarez
Arxiv: https://arxiv.org/abs/2306.14306
TLDR: Robustness and compactness are two essential components of deep learning models that are deployed in the real world. The seemingly conflicting aims of (i) generalization across domains as in robustness, and (ii) specificity to one domain as in compression, are why the overall design goal of achieving robust compact models, despite being highly important, is still a challenging open problem. We introduce Adaptive Sharpness-Aware Pruning, or AdaSAP, a method that yields robust
Repo: None

Addressing Cold Start Problem for End-to-end Automatic Speech Scoring

Authors: Jungbae Park, Seungtaek Choi
Arxiv: https://arxiv.org/abs/2306.14310
TLDR: Integrating automatic speech scoring/assessment systems has become a critical aspect of second-language speaking education. With self-supervised learning advancements, end-to-end speech scoring approaches have exhibited promising results. However, this study highlights the significant decrease in the performance of speech scoring systems in new question contexts, thereby identifying this as a cold start problem in terms of items. With the finding of cold-start phenomena, this paper seeks to alleviate the problem by following methods: 1)
Repo: None

A Closer Look at Geometric Temporal Dynamics for Face Anti-Spoofing

Authors: Chih-Jung Chang, Yaw-Chern Lee, Shih-Hsuan Yao, Min-Hung Chen, Chien-Yi Wang, Shang-Hong Lai, Trista Pei-Chun Chen
Arxiv: https://arxiv.org/abs/2306.14313
TLDR: Face anti-spoofing (FAS) is indispensable for a face recognition system. Many texture-driven countermeasures were developed against presentation attacks (PAs), but the performance against unseen domains or unseen spoofing types is still unsatisfactory. Instead of exhaustively collecting all the spoofing variations and making binary decisions of live/spoof, we offer a new perspective on the FAS task to distinguish between normal and abnormal movements of live and spoof presentations. We propose Geometry-
Repo: None

RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations

Authors: Yilun Zhao, Chen Zhao, Linyong Nan, Zhenting Qi, Wenlin Zhang, Xiangru Tang, Boyu Mi, Dragomir Radev
Arxiv: https://arxiv.org/abs/2306.14321
TLDR: Despite significant progress having been made in question answering on tabular data (Table QA), it's unclear whether, and to what extent existing Table QA models are robust to task-specific perturbations, e.g., replacing key question entities or shuffling table columns. We propose a benchmark called RobuT, which builds upon existing TableQA datasets (WTQ, WikiSQL-Weak, and SQA) and includes a set of human-annotated adversarial
Repo: https://github.com/yilunzhao/robut

Computational Asymmetries in Robust Classification

Authors: Samuele Marro, Michele Lombardi
Arxiv: https://arxiv.org/abs/2306.14326
TLDR: In the context of adversarial robustness, we make three strongly related contributions. First, we prove that while attacking ReLU classifiers is $\mathit{NP}$-hard, ensuring their robustness at training time is $\Sigma^2_P$-Hard (even on a single example). This asymmetry provides a rationale for the fact that robust classifications approaches are frequently fooled in the literature. Second, we show that inference-time robustness certificates are not
Repo: https://github.com/samuelemarro/counter-attack

Contrastive Multi-view Framework for Customer Lifetime Value Prediction

Authors: Chuhan Wu, Jingjie Li, Qinglin Jia, Hong Zhu, Yuan Fang, Ruiming Tang
Arxiv: https://arxiv.org/abs/2306.14400
TLDR: Accurate customer lifetime value (LTV) prediction can help service providers optimize their marketing policies in customer-centric applications. However, the heavy sparsity of consumption events and the interference of data variance and noise obstruct LTV estimation. Many existing LTV prediction methods directly train a single-view LTV predictor on consumption samples, which may yield inaccurate and even biased knowledge extraction. In this paper, we propose a contrastive multi-view framework for LTV Prediction, which is a plug
Repo: None

RoboCook: Long-Horizon Elasto-Plastic Object Manipulation with Diverse Tools

Authors: Haochen Shi, Huazhe Xu, Samuel Clarke, Yunzhu Li, Jiajun Wu
Arxiv: https://arxiv.org/abs/2306.14447
TLDR: Humans excel in complex long-horizon soft body manipulation tasks via flexible tool use: bread baking requires a knife to slice the dough and a rolling pin to flatten it. Often regarded as a hallmark of human cognition, tool use in autonomous robots remains limited due to challenges in understanding tool-object interactions. Here we develop an intelligent robotic system, RoboCook, which perceives, models, and manipulates elasto-plastic objects with various tools. RoboCook uses point
Repo: None

Histopathology Image Classification using Deep Manifold Contrastive Learning

Authors: Jing Wei Tan, Won-Ki Jeong
Arxiv: https://arxiv.org/abs/2306.14459
TLDR: Contrastive learning has gained popularity due to its robustness with good feature representation performance. However, cosine distance, the commonly used similarity metric in contrastive learning, is not well suited to represent the distance between two data points, especially on a nonlinear feature manifold. Inspired by manifold learning, we propose a novel extension of contrastive training that leverages geodesic distance between features as a similarity metric for histopathology whole slide image classification. To reduce the computational overhead in
Repo: None

Exploring the Robustness of Large Language Models for Solving Programming Problems

Authors: Atsushi Shirafuji, Yutaka Watanobe, Takumi Ito, Makoto Morishita, Yuki Nakamura, Yusuke Oda, Jun Suzuki
Arxiv: https://arxiv.org/abs/2306.14583
TLDR: Using large language models (LLMs) for source code has recently gained attention. LLMs, such as Transformer-based models like Codex and ChatGPT, have been shown to be highly capable of solving a wide range of programming problems. However, the extent to which LLMs understand problem descriptions and generate programs accordingly or just retrieve source code from the most relevant problem in training data based on superficial cues has not been discovered yet. To explore this research question, we conduct experiments to
Repo: None

A Closed-Loop Bin Picking System for Entangled Wire Harnesses using Bimanual and Dynamic Manipulation

Authors: Xinyi Zhang, Yukiyasu Domae, Weiwei Wan, Kensuke Harada
Arxiv: https://arxiv.org/abs/2306.14595
TLDR: This paper addresses the challenge of industrial bin picking using entangled wire harnesses. Wire harnesses are essential in manufacturing but poses challenges in automation due to their complex geometries and propensity for entanglement. Our previous work tackled this issue by proposing a quasi-static pulling motion to separate the entangled wire strands. However, it still lacks sufficiency and generalization to various shapes and structures. In this paper, we deploy a dual-arm robot that can grasp, extract and disent
Repo: None

A structure and asymptotic preserving scheme for the Vlasov-Poisson-Fokker-Planck model

Authors: Alain Blaustein (UT3), Francis Filbet (UT3)
Arxiv: https://arxiv.org/abs/2306.14605
TLDR: We propose a numerical method for the Vlasov-Poisson-Fokker-Planck model written as an hyperbolic system thanks to a spectral decomposition in the basis of Hermite functions with respect to the velocity variable and a structure preserving finite volume scheme for the space variable. On the one hand, we show that this scheme naturally preserves both stationary solutions and linearized free-energy estimates for the nonlinearity of the linearized space variable, and that it preserves
Repo: None

The race to robustness: exploiting fragile models for urban camouflage and the imperative for machine learning security

Authors: Harriet Farlow, Matthew Garratt, Gavin Mount, Tim Lynar
Arxiv: https://arxiv.org/abs/2306.14609
TLDR: Adversarial Machine Learning (AML) represents the ability to disrupt Machine Learning/ML) algorithms through a range of methods that broadly exploit the architecture of deep learning optimisation. This paper presents Distributed Adversarial Regions (DAR), a novel method that implements distributed instantiations of computer vision-based AML attack methods that may be used to disguise objects from image recognition in both white and black box settings. We consider the context of object detection models used in urban environments,
Repo: None

PhD Thesis: Exploring the role of (self-)attention in cognitive and computer vision architecture

Authors: Mohit Vaishnav
Arxiv: https://arxiv.org/abs/2306.14650
TLDR: We investigate the role of attention and memory in complex reasoning tasks. We analyze Transformer-based self-attention as a model and extend it with memory. By studying a synthetic visual reasoning test, we refine the taxonomy of reasoning tasks with ResNet50, we enhance feature maps using feature-based and spatial attention, achieving efficient solving of challenging visual reasoning tasks in SVRT tasks. Our findings contribute to understanding the attentional needs of SVRTs, as well as the
Repo: None

A denoised Mean Teacher for domain adaptive point cloud registration

Authors: Alexander Bigalke, Mattias P. Heinrich
Arxiv: https://arxiv.org/abs/2306.14749
TLDR: Point cloud-based medical registration promises increased computational efficiency, robustness to intensity shifts, and anonymity preservation but is limited by the inefficacy of unsupervised learning with similarity metrics. Supervised training on synthetic deformations is an alternative but, in turn, suffers from the domain gap to the real domain. In this work, we aim to tackle this gap through domain adaptation. Self-training with the Mean Teacher is an established approach to this problem but is impaired by the inherent noise
Repo: https://github.com/multimodallearning/denoised_mt_pcd_reg

A Positive-Unlabeled Metric Learning Framework for Document-Level Relation Extraction with Incomplete Labeling

Authors: Ye Wang, Huazheng Pan, Tao Zhang, Wen Wu, Wenxin Hu
Arxiv: https://arxiv.org/abs/2306.14806
TLDR: The goal of document-level relation extraction (RE) is to identify relations between entities that span multiple sentences. Recently, incomplete labeling in document- level RE has received increasing attention, and some studies have used methods such as positive-unlabeled learning to tackle this issue, but there is still a lot of room for improvement. Motivated by this, we propose a positive-augmentation and positive-mixup positive-nonlabeled metric learning framework (P3M).
Repo: None

Keyword: semantic similarity

Full Automation of Goal-driven LLM Dialog Threads with And-Or Recursors and Refiner Oracles

Authors: Paul Tarau
Arxiv: https://arxiv.org/abs/2306.14077
TLDR: We automate deep step-by step reasoning in an LLM dialog thread by recursively exploring alternatives (OR-nodes) and expanding details (AND-node) up to a given depth. Starting from a single succinct task-specific initiator we steer the automated dialog thread to stay focussed on the task by synthesizing a prompt that summarizes the depth-first steps taken so far. Our algorithm is derived from a simple recursive descent implementation of a Horn Clause interpreter, except
Repo: None

Keyword: simplification

logLTN: Differentiable Fuzzy Logic in the Logarithm Space

Authors: Samy Badreddine, Luciano Serafini, Michael Spranger
Arxiv: https://arxiv.org/abs/2306.14546
TLDR: The AI community is increasingly focused on merging logic with deep learning to create Neuro-Symbolic (NeSy) paradigms and assist neural approaches with symbolic knowledge. A significant trend in the literature involves integrating axioms and facts in loss functions by grounding logical symbols with neural networks and operators with fuzzy semantics. Logic Tensor Networks (LTN) is one of the main representatives in this category, known for its simplicity, efficiency, and versatility. However, it has been previously
Repo: https://github.com/sbadredd/logltn-experiments

Keyword: summarization

Fusing Multimodal Signals on Hyper-complex Space for Extreme Abstractive Text Summarization (TL;DR) of Scientific Contents

Authors: Yash Kumar Atri, Vikram Goyal, Tanmoy Chakraborty
Arxiv: https://arxiv.org/abs/2306.13968
TLDR: The realm of scientific text summarization has experienced remarkable progress due to the availability of annotated brief summaries and ample data. However, the utilization of multiple input modalities, such as videos and audio, has yet to be thoroughly explored. At present, scientific multimodal-input-based text summarisation systems tend to employ longer target summaries like abstracts, leading to an underwhelming performance in the task of text summarizing. In this paper, we deal with a novel
Repo: None

Vietnamese multi-document summary using subgraph selection approach -- VLSP 2022 AbMuSu Shared Task

Authors: Huu-Thin Nguyen, Tam Doan Thanh, Cam-Van Thi Nguyen
Arxiv: https://arxiv.org/abs/2306.14827
TLDR: Document summarization is a task to generate afluent, condensed summary for a document, and a cluster summary for the same document, which contains important information. A cluster of documents serves as the input for multi-document summarizing (MDS), while the cluster summary serves as an output. In this paper, we focus on transforming the extractive MDS problem into subgraph selection. Approaching the problem in the form of graphs helps to capture simultaneously the relationship between sentences in the same
Repo: None

Keyword: text generation

Large Language Models as Sous Chefs: Revising Recipes with GPT-3

Authors: Alyssa Hwang, Bryan Li, Zhaoyi Hou, Dan Roth
Arxiv: https://arxiv.org/abs/2306.13986
TLDR: With their remarkably improved text generation and prompting capabilities, large language models can adapt existing written information into forms that are easier to use and understand. In our work, we focus on recipes as an example of complex, diverse, and widely used instructions. We develop a prompt grounded in the original recipe and ingredients list that breaks recipes down into simpler steps. We apply this prompt to recipes from various world cuisines, and experiment with several large languages models (LLMs), finding best results with
Repo: None

Weakly Supervised Scene Text Generation for Low-resource Languages

Authors: Yangchen Xie, Xinyuan Chen, Hongjian Zhan, Palaiahankote Shivakum
Arxiv: https://arxiv.org/abs/2306.14269
TLDR: A large number of annotated training images is crucial for training successful scene text recognition models. However, collecting sufficient datasets can be a labor-intensive and costly process, particularly for low-resource languages. To address this challenge, auto-generating text data has shown promise in alleviating the problem. Unfortunately, existing scene text generation methods typically rely on a large amount of paired data, which is difficult to obtain for low and resource languages. In this paper, we propose a novel weak
Repo: None

Knowledge Graph-Augmented Korean Generative Commonsense Reasoning

Authors: Dahyun Jung, Jaehyung Seo, Jaewook Lee, Chanjun Park, Heuiseok Lim
Arxiv: https://arxiv.org/abs/2306.14470
TLDR: Generative commonsense reasoning refers to the task of generating acceptable and logical assumptions about everyday situations based on commonsense understanding. By utilizing an existing dataset such as Korean CommonGen, language generation models can learn commonsense arguments specific to the Korean language. However, language models often fail to consider the relationships between concepts and the deep knowledge inherent to concepts. To address these limitations, we propose a method to utilize the Korean knowledge graph data for text generation. Our experimental result shows that the proposed method
Repo: None

FunQA: Towards Surprising Video Comprehension

Authors: Binzhu Xie, Sicheng Zhang, Zitang Zhou, Bo Li, Yuanhan Zhang, Jack Hessel, Jingkang Yang, Ziwei Liu
Arxiv: https://arxiv.org/abs/2306.14899
TLDR: Surprising videos, e.g., funny clips, creative performances, or visual illusions, attract significant attention. Enjoyment of these videos is not simply a response to visual stimuli; rather, it hinges on the human capacity to understand (and appreciate) commonsense violations depicted in these videos. We introduce FunQA, a challenging video question answering (QA) dataset specifically designed to evaluate and enhance the depth of video reasoning based on counter-intuitive and fun videos. Unlike most video
Repo: https://github.com/jingkang50/funqa
@e-tornike e-tornike self-assigned this Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment