New submissions for Tue, 27 Jun 23 #383
Labels
abstract meaning representation
argument mining
citation context analysis
computational social science
contrastive
cross-language information retrieval
cross-lingual information retrieval
data augmentation
extreme multi-label
knowledge discovery
knowledge graph
legal text
legal
mixup
multi-task
paraphrase
passage generation
plagiarism
robustness
scholarly document processing
scholarly
semantic similarity
similarity measure
simplification
summarization
text generation
Keyword: computational social science
Machine Learning and Consumer Data
Authors: Hannah H. Chang, Anirban MukherjeeArxiv: https://arxiv.org/abs/2306.14118
TLDR: The digital revolution has led to the digitization of human behavior, creating unprecedented opportunities to understand observable actions on an unmatched scale. Emerging phenomena such as crowdfunding and crowdsourcing have further illuminated consumer behavior while also introducing new behavioral patterns. However, the sheer volume and complexity of this data present significant challenges for marketing researchers and practitioners. Traditional methods used to analyze consumer data fall short in handling the breadth, precision, and scale of emerging data sources. To address this, computational methods have been developed to
Repo: None
Keyword: contrastive
Similarity Preserving Adversarial Graph Contrastive Learning
Authors: Yeonjun In, Kanghoon Yoon, Chanyoung ParkArxiv: https://arxiv.org/abs/2306.13854
TLDR: Recent works demonstrate that GNN models are vulnerable to adversarial attacks, which refer to imperceptible perturbation on the graph structure and node features. Among various GNN model, graph contrastive learning (GCL) based methods specifically suffer from adversarial Attacks due to their inherent design that highly depends on the self-supervision signals derived from the original graph, which however already contains noise when the graph is attacked. To achieve adversarial robustness against such attacks, existing methods
Repo: None
Structuring Representation Geometry with Rotationally Equivariant Contrastive Learning
Authors: Sharut Gupta, Joshua Robinson, Derek Lim, Soledad Villar, Stefanie JegelkaArxiv: https://arxiv.org/abs/2306.13924
TLDR: Self-supervised learning converts raw perceptual data such as images to a compact space where simple Euclidean distances measure meaningful variations in data. In this paper, we extend this formulation by adding additional geometric structure to the embedding space by enforcing transformations of input space to correspond to simple (i.e., linear) transformations of embeddedding space. Specifically, in the contrastive learning setting, we introduce an equivariance objective and theoretically prove that its minima forces augmentations on input
Repo: None
Weakly Supervised Multi-Label Classification of Full-Text Scientific Papers
Authors: Yu Zhang, Bowen Jin, Xiusi Chen, Yanzhen Shen, Yunyi Zhang, Yu Meng, Jiawei HanArxiv: https://arxiv.org/abs/2306.14003
TLDR: Instead of relying on human-annotated training samples to build a classifier, weakly supervised scientific paper classification aims to classify papers only using category descriptions (e.g., category names, category-indicative keywords). Existing studies onweakly supervised paper classification are less concerned with two challenges: (1) Papers should be classified into not only coarse-grained research topics but also fine-granted themes, and potentially into multiple themes, given a large and fine-Gr
Repo: None
Scribble-supervised Cell Segmentation Using Multiscale Contrastive Regularization
Authors: Hyun-Jic Oh, Kanggeun Lee, Won-Ki JeongArxiv: https://arxiv.org/abs/2306.14136
TLDR: Current state-of-the-art supervised deep learning-based segmentation approaches have demonstrated superior performance in medical image segmentation tasks. However, such supervised approaches require fully annotated pixel-level ground-truth labels, which are labor-intensive and time-consuming to acquire. Recently, Scribble2Label (S2L) demonstrated that using only a handful of scribbles with self-supervised learning can generate accurate segmentation results without full annotation.However, owing to the
Repo: None
Improving Reference-based Distinctive Image Captioning with Contrastive Rewards
Authors: Yangjun Mao, Jun Xiao, Dong Zhang, Meng Cao, Jian Shao, Yueting Zhuang, Long ChenArxiv: https://arxiv.org/abs/2306.14259
TLDR: Distinctive Image Captioning (DIC) -- generating distinctive captions that describe the unique details of a target image -- has received considerable attention over the last few years. A recent DIC method proposes to generate distinctive captIONS by comparing the target image with a set of semantic-similar reference images, i.e., reference-based DIC (Ref-DIC). It aims to force the generated captions to distinguish between the target images and the reference image. To ensure
Repo: None
Multi-Scale Cross Contrastive Learning for Semi-Supervised Medical Image Segmentation
Authors: Qianying Liu, Xiao Gu, Paul Henderson, Fani DeligianniArxiv: https://arxiv.org/abs/2306.14293
TLDR: Semi-supervised learning has demonstrated great potential in medical image segmentation by utilizing knowledge from unlabeled data. However, most existing approaches do not explicitly capture high-level semantic relations between distant regions, which limits their performance. In this paper, we focus on representation learning for semi-Supervised learning, by developing a novel Multi-Scale Cross Supervised Contrastive Learning (MCSC) framework, to segment structures in medical images. We jointly train CNN and Transformer models
Repo: None
ContentCTR: Frame-level Live Streaming Click-Through Rate Prediction with Multimodal Transformer
Authors: Jiaxin Deng, Dong Shen, Shiyao Wang, Xiangyu Wu, Fan Yang, Guorui Zhou, Gaofeng MengArxiv: https://arxiv.org/abs/2306.14392
TLDR: In recent years, live streaming platforms have gained immense popularity as they allow users to broadcast their videos and interact in real-time with hosts and peers. Due to the dynamic changes of live content, accurate recommendation models are crucial for enhancing user experience. However, most previous works treat the live as a whole item and explore the Click-through-Rate (CTR) prediction framework on item-level, neglecting that the static changes that occur even within the same live room. In this
Repo: None
Contrastive Multi-view Framework for Customer Lifetime Value Prediction
Authors: Chuhan Wu, Jingjie Li, Qinglin Jia, Hong Zhu, Yuan Fang, Ruiming TangArxiv: https://arxiv.org/abs/2306.14400
TLDR: Accurate customer lifetime value (LTV) prediction can help service providers optimize their marketing policies in customer-centric applications. However, the heavy sparsity of consumption events and the interference of data variance and noise obstruct LTV estimation. Many existing LTV prediction methods directly train a single-view LTV predictor on consumption samples, which may yield inaccurate and even biased knowledge extraction. In this paper, we propose a contrastive multi-view framework for LTV Prediction, which is a plug
Repo: None
A Self-supervised Contrastive Learning Method for Grasp Outcomes Prediction
Authors: Chengliang Liu, Binhua Huang, Yiwen Liu, Yuanzhe Su, Ke Mai, Yupo Zhang, Zhengkun Yi, Xinyu WuArxiv: https://arxiv.org/abs/2306.14437
TLDR: In this paper, we investigate the effectiveness of contrastive learning methods for predicting grasp outcomes in an unsupervised manner. By utilizing a publicly available dataset, we demonstrate that contrastivelearning methods perform well on the task of grasp outcomes prediction. Specifically, the dynamic-dictionary-based method with the momentum updating technique achieves a satisfactory accuracy of 81.83% using data from one single tactile sensor, outperforming other unsupervisory methods. Our results reveal the potential of Contrastive learning
Repo: None
Histopathology Image Classification using Deep Manifold Contrastive Learning
Authors: Jing Wei Tan, Won-Ki JeongArxiv: https://arxiv.org/abs/2306.14459
TLDR: Contrastive learning has gained popularity due to its robustness with good feature representation performance. However, cosine distance, the commonly used similarity metric in contrastive learning, is not well suited to represent the distance between two data points, especially on a nonlinear feature manifold. Inspired by manifold learning, we propose a novel extension of contrastive training that leverages geodesic distance between features as a similarity metric for histopathology whole slide image classification. To reduce the computational overhead in
Repo: None
Hard Sample Mining Enabled Contrastive Feature Learning for Wind Turbine Pitch System Fault Diagnosis
Authors: Zixuan Wang, Bo Qin, Mengxuan Li, Mark D. Butala, Haibo Wang, Peng Peng, Hongwei WangArxiv: https://arxiv.org/abs/2306.14701
TLDR: The efficient utilization of wind power by wind turbines relies on the ability of their pitch systems to adjust blade pitch angles in response to varying wind speeds. However, the presence of multiple fault types in the pitch system poses challenges in accurately classifying these faults. This paper proposes a novel method based on hard sample mining-enabled contrastive feature learning (HSMCFL) to address this problem. The proposed method employs cosine similarity to identify hard samples and subsequently leverages contrastive features learning to
Repo: None
Keyword: data augmentation
Structuring Representation Geometry with Rotationally Equivariant Contrastive Learning
Authors: Sharut Gupta, Joshua Robinson, Derek Lim, Soledad Villar, Stefanie JegelkaArxiv: https://arxiv.org/abs/2306.13924
TLDR: Self-supervised learning converts raw perceptual data such as images to a compact space where simple Euclidean distances measure meaningful variations in data. In this paper, we extend this formulation by adding additional geometric structure to the embedding space by enforcing transformations of input space to correspond to simple (i.e., linear) transformations of embeddedding space. Specifically, in the contrastive learning setting, we introduce an equivariance objective and theoretically prove that its minima forces augmentations on input
Repo: None
An Analysis of Personalized Speech Recognition System Development for the Deaf and Hard-of-Hearing
Authors: Lester Phillip Violeta, Tomoki TodaArxiv: https://arxiv.org/abs/2306.13953
TLDR: Deaf or hard-of-hearing (DHH) speakers typically have atypical speech caused by deafness. With the growing support of speech-based devices and software applications, more work needs to be done to make these devices inclusive to everyone. To do so, we analyze the use of openly-available automatic speech recognition (ASR) tools with a DHH Japanese speaker dataset. As these out- of-the-box ASR models typically do not perform well
Repo: None
Towards Robust Aspect-based Sentiment Analysis through Non-counterfactual Augmentations
Authors: Xinyu Liu, Yan Ding, Kaikai An, Chunyang Xiao, Pranava Madhyastha, Tong Xiao, Jingbo ZhuArxiv: https://arxiv.org/abs/2306.13971
TLDR: While state-of-the-art NLP models have demonstrated excellent performance for aspect based sentiment analysis (ABSA), substantial evidence has been presented on their lack of robustness. This is especially manifested as significant degradation in performance when faced with out- of-distribution data. Recent solutions that rely on counterfactually augmented datasets show promising results, but they are inherently limited because of the lack of access to explicit causal structure. In this paper, we present an alternative approach that relies
Repo: None
Weighted Automata Extraction and Explanation of Recurrent Neural Networks for Natural Language Tasks
Authors: Zeming Wei, Xiyue Zhang, Yihao Zhang, Meng SunArxiv: https://arxiv.org/abs/2306.14040
TLDR: Recurrent Neural Networks (RNNs) have achieved tremendous success in processing sequential data, yet understanding and analyzing their behaviours remains a significant challenge. To this end, many efforts have been made to extract finite automata from RNNs, which are more amenable for analysis and explanation. However, existing approaches like exact learning and compositional approaches for model extraction have limitations in either scalability or precision. In this paper, we propose a novel framework of Weighted Finite Automata
Repo: None
Semi-supervised Object Detection: A Survey on Recent Research and Progress
Authors: Yanyang Wang, Zhaoxiang Liu, Shiguo LianArxiv: https://arxiv.org/abs/2306.14106
TLDR: In recent years, deep learning technology has been maturely applied in the field of object detection, and most algorithms tend to be supervised learning. However, a large amount of labeled data requires high costs of human resources, which brings about low efficiency and limitations. Semi-supervised object detection (SSOD) has been paid more and more attentions due to its high research value and practicability. It is designed to learn information by using small amounts of labeled and large amounts of unl
Repo: None
A Web-based Mpox Skin Lesion Detection System Using State-of-the-art Deep Learning Models Considering Racial Diversity
Authors: Shams Nafisa Ali, Md. Tazuddin Ahmed, Tasnim Jahan, Joydip Paul, S. M. Sakeef Sani, Nawsabah Noor, Anzirun Nahar Asma, Taufiq HasanArxiv: https://arxiv.org/abs/2306.14169
TLDR: The recent 'Mpox' outbreak, formerly known as 'Monkeypox', has become a significant public health concern and has spread to over 110 countries globally. The challenge of clinically diagnosing mpox early on is due, in part, to its similarity to other types of rashes. Computer-aided screening tools have been proven valuable in cases where Polymerase Chain Reaction (PCR) based diagnosis is not immediately available. Deep learning methods are powerful in learning complex data representations
Repo: None
Few-Shot Continual Learning via Flat-to-Wide Approaches
Authors: Muhammad Anwar Ma'sum, Mahardhika Pratama, Lin Liu, Edwin Lughofer, Habibullah, Ryszard KowalczykArxiv: https://arxiv.org/abs/2306.14369
TLDR: Existing approaches on continual learning call for a lot of samples in their training processes. Such approaches are impractical for many real-world problems having limited samples because of the overfitting problem. This paper proposes a few-shot continual learning approach, termed FLat-tO-WidE AppRoach (FLOWER), where a flat-to-wide learning process finding the flat-wide minima is proposed to address the catastrophic forgetting problem. The issue of data scarcity is overcome
Repo: https://github.com/anwarmaxsum/flower
Pseudo-Trilateral Adversarial Training for Domain Adaptive Traversability Prediction
Authors: Zheng Chen, Durgakant Pushp, Jason M. Gregory, Lantao LiuArxiv: https://arxiv.org/abs/2306.14370
TLDR: Traversability prediction is a fundamental perception capability for autonomous navigation. Deep neural networks (DNNs) have been widely used to predict traversability during the last decade. The performance of DNNs is significantly boosted by exploiting a large amount of data. However, the diversity of data in different domains imposes significant gaps in the prediction performance. In this work, we make efforts to reduce the gaps by proposing a novel pseudo-trilateral adversarial model that adopts a coarse-to
Repo: None
Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition
Authors: Samuel Cahyawijaya, Holy Lovenia, Willy Chung, Rita Frieske, Zihan Liu, Pascale FungArxiv: https://arxiv.org/abs/2306.14517
TLDR: Speech emotion recognition plays a crucial role in human-computer interactions. However, most speech emotion recognition research is biased toward English-speaking adults, which hinders its applicability to other demographic groups in different languages and age groups. In this work, we analyze the transferability of emotion recognition across three different languages--English, Mandarin Chinese, and Cantonese; and 2 different age groups--adults and the elderly. To conduct the experiment, we develop an English-Mandarin
Repo: None
Accelerating Molecular Graph Neural Networks via Knowledge Distillation
Authors: Filip Ekström Kelvinius, Dimitar Georgiev, Artur Petrov Toshev, Johannes GasteigerArxiv: https://arxiv.org/abs/2306.14818
TLDR: Recent advances in graph neural networks (GNNs) have allowed molecular simulations with accuracy on par with conventional gold-standard methods at a fraction of the computational cost. Nonetheless, as the field has been progressing to bigger and more complex architectures, state-of-the-art GNNs have become largely prohibitive for many large-scale applications. In this paper, we, for the first time, explore the utility of knowledge distillation (KD) for accelerating molecular GNN
Repo: None
Keyword: knowledge graph
DEKGCI: A double-sided recommendation model for integrating knowledge graph and user-item interaction graph
Authors: Yajing Yang, Zeyu Zeng, Mao Chen, Ruirui ShangArxiv: https://arxiv.org/abs/2306.13837
TLDR: Both knowledge graphs and user-item interaction graphs are frequently used in recommender systems due to their ability to provide rich information for modeling users and items. However, existing studies often focused on one of these sources (either the knowledge graph or the user- item interaction graph), resulting in underutilization of the benefits that can be obtained by integrating both sources of information. In this paper, we propose DEKGCI, a novel double-sided recommendation model. In DEK GCI
Repo: None
IERL: Interpretable Ensemble Representation Learning -- Combining CrowdSourced Knowledge and Distributed Semantic Representations
Authors: Yuxin Zi, Kaushik Roy, Vignesh Narayanan, Manas Gaur, Amit ShethArxiv: https://arxiv.org/abs/2306.13865
TLDR: Large Language Models (LLMs) encode meanings of words in the form of distributed semantics. Distributed semantics capture common statistical patterns among language tokens (words, phrases, and sentences) from large amounts of data. LLMs perform exceedingly well across General Language Understanding Evaluation (GLUE) tasks designed to test a model's understanding of the meanings of the input tokens. However, recent studies have shown that LLMs tend to generate unintended, inconsistent, or wrong texts as outputs when processing inputs that
Repo: None
Knowledge Graph-Augmented Korean Generative Commonsense Reasoning
Authors: Dahyun Jung, Jaehyung Seo, Jaewook Lee, Chanjun Park, Heuiseok LimArxiv: https://arxiv.org/abs/2306.14470
TLDR: Generative commonsense reasoning refers to the task of generating acceptable and logical assumptions about everyday situations based on commonsense understanding. By utilizing an existing dataset such as Korean CommonGen, language generation models can learn commonsense arguments specific to the Korean language. However, language models often fail to consider the relationships between concepts and the deep knowledge inherent to concepts. To address these limitations, we propose a method to utilize the Korean knowledge graph data for text generation. Our experimental result shows that the proposed method
Repo: None
TransERR: Translation-based Knowledge Graph Completion via Efficient Relation Rotation
Authors: Jiang Li, Xiangdong SuArxiv: https://arxiv.org/abs/2306.14580
TLDR: This paper presents translation-based knowledge graph completion method via efficient relation rotation (TransERR), a straightforward yet effective alternative to traditional translation-solutional learning methods. TransERR encodes knowledge graphs in the hypercomplex-valued space, thus enabling it to possess a higher degree of translation freedom in mining latent information between the head and tail entities. To further minimize the translation distance, TranserR adaptively rotates the head entity and the tail entity with their corresponding unit qu
Repo: https://github.com/dellixx/transerr
Keyword: legal
Can GPT-4 Support Analysis of Textual Data in Tasks Requiring Highly Specialized Domain Expertise?
Authors: Jaromir Savelka, Kevin D. Ashley, Morgan A Gray, Hannes Westermann, Huihui XuArxiv: https://arxiv.org/abs/2306.13906
TLDR: We evaluated the capability of generative pre-trained transformers~(GPT-4) in analysis of textual data in tasks that require highly specialized domain expertise. Specifically, we focused on the task of analyzing court opinions to interpret legal concepts. We found that GPT- 4, prompted with annotation guidelines, performs on par with well-trained law student annotators. We observed that, with a relatively minor decrease in performance, GPT‐4 can perform batch predictions leading to significant
Repo: None
On the Uses of Large Language Models to Interpret Ambiguous Cyberattack Descriptions
Authors: Reza Fayyazi, Shanchieh Jay YangArxiv: https://arxiv.org/abs/2306.14062
TLDR: The volume, variety, and velocity of change in vulnerabilities and exploits have made incident threat analysis challenging with human expertise and experience along. The MITRE AT&CK framework employs Tactics, Techniques, and Procedures (TTPs) to describe how and why attackers exploit vulnerabilities. However, a TTP description written by one security professional can be interpreted very differently by another, leading to confusion in cybersecurity operations or even business, policy, and legal decisions. Meanwhile, advancements in AI have led
Repo: None
LiResolver: License Incompatibility Resolution for Open Source Software
Authors: Sihan Xu, Ya Gao, Lingling Fan, Linyu Li, Xiangrui Cai, Zheli LiuArxiv: https://arxiv.org/abs/2306.14675
TLDR: Open source software (OSS) licenses regulate the conditions under which OSS can be legally reused, distributed, and modified. However, a common issue arises when incorporating third-party OSS accompanied with licenses, i.e., license incompatibility, which occurs when multiple licenses exist in one project and there are conflicts between them. Despite being problematic, fixing license expatibility issues requires substantial efforts due to the lack of license understanding and complex package dependency. In this paper, we propose LiRes
Repo: None
Keyword: mixup
Pseudo-Trilateral Adversarial Training for Domain Adaptive Traversability Prediction
Authors: Zheng Chen, Durgakant Pushp, Jason M. Gregory, Lantao LiuArxiv: https://arxiv.org/abs/2306.14370
TLDR: Traversability prediction is a fundamental perception capability for autonomous navigation. Deep neural networks (DNNs) have been widely used to predict traversability during the last decade. The performance of DNNs is significantly boosted by exploiting a large amount of data. However, the diversity of data in different domains imposes significant gaps in the prediction performance. In this work, we make efforts to reduce the gaps by proposing a novel pseudo-trilateral adversarial model that adopts a coarse-to
Repo: None
A Positive-Unlabeled Metric Learning Framework for Document-Level Relation Extraction with Incomplete Labeling
Authors: Ye Wang, Huazheng Pan, Tao Zhang, Wen Wu, Wenxin HuArxiv: https://arxiv.org/abs/2306.14806
TLDR: The goal of document-level relation extraction (RE) is to identify relations between entities that span multiple sentences. Recently, incomplete labeling in document- level RE has received increasing attention, and some studies have used methods such as positive-unlabeled learning to tackle this issue, but there is still a lot of room for improvement. Motivated by this, we propose a positive-augmentation and positive-mixup positive-nonlabeled metric learning framework (P3M).
Repo: None
Keyword: multi-task
MIRACLE: Multi-task Learning based Interpretable Regulation of Autoimmune Diseases through Common Latent Epigenetics
Authors: Pengcheng Xu, Jinpu Cai, Yulin Gao, Ziqi Rong, Hongyi XinArxiv: https://arxiv.org/abs/2306.13866
TLDR: DNA methylation is a crucial regulator of gene transcription and has been linked to various diseases, including autoimmune diseases and cancers. However, diagnostics based on DNA methylation face challenges due to large feature sets and small sample sizes, resulting in overfitting and suboptimal performance. To address these issues, we propose MIRACLE, a novel interpretable neural network that leverages autoencoder-based multi-task learning to integrate multiple datasets and jointly identify common patterns in DNA
Repo: None
Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning
Authors: Xiao Ma, Swaroop Mishra, Ahmad Beirami, Alex Beutel, Jilin ChenArxiv: https://arxiv.org/abs/2306.14308
TLDR: Language models still struggle on moral reasoning, despite their impressive performance in many other tasks. In particular, the Moral Scenarios task in MMLU (Multi-task Language Understanding) is among the worst performing tasks for many language models, including GPT-3. In this work, we propose a new prompting framework, Thought Experiments, to teach language models to do better moral reasoning using counterfactuals. Experiment results show that our framework elicits counterfactually questions and
Repo: None
Multi-task Item-attribute Graph Pre-training for Strict Cold-start Item Recommendation
Authors: Yuwei Cao, Liangwei Yang, Chen Wang, Zhiwei Liu, Hao Peng, Chenyu You, Philip S. YuArxiv: https://arxiv.org/abs/2306.14462
TLDR: Recommendation systems suffer in the strict cold-start (SCS) scenario, where the user-item interactions are entirely unavailable. The ID-based approaches completely fail to work. Cold-start recommenders, on the other hand, leverage item contents to map the new items to the existing ones. However, the existing SCS recommenders explore item contents in coarse-grained manners that introduce noise or information loss. Moreover, informative data sources other than item contents, such as users
Repo: https://github.com/yuweicao-uic/coldgpt
ChiPFormer: Transferable Chip Placement via Offline Decision Transformer
Authors: Yao Lai, Jinxin Liu, Zhentao Tang, Bin Wang, Jianye Hao, Ping LuoArxiv: https://arxiv.org/abs/2306.14744
TLDR: Placement is a critical step in modern chip design, aiming to determine the positions of circuit modules on the chip canvas. Recent works have shown that reinforcement learning (RL) can improve human performance in chip placement. However, such an RL-based approach suffers from long training time and low transfer ability in unseen chip circuits. To resolve these challenges, we cast the chip placement as an offline RL formulation and present ChiPFormer that enables learning a transferable placement policy from fixed offline data.
Repo: None
Composing Parameter-Efficient Modules with Arithmetic Operations
Authors: Jinghan Zhang, Shiqi Chen, Junteng Liu, Junxian HeArxiv: https://arxiv.org/abs/2306.14870
TLDR: As an efficient alternative to conventional full finetuning, parameter-efficient finetuned (PEFT) is becoming the prevailing method to adapt pretrained language models. In PEFT, a lightweight module is learned on each dataset while the underlying pretrained languages model remains unchanged, resulting in multiple compact modules representing diverse skills when applied to various domains and tasks. In this paper, we propose to compose these parameter-efficiency modules through linear arithmetic operations in the weight space, thereby integrating different module
Repo: https://github.com/sjtu-lit/pem_composition
Keyword: paraphrase
Math Word Problem Solving by Generating Linguistic Variants of Problem Statements
Authors: Syed Rifat Raiyan, Md. Nafis Faiyaz, Shah Md. Jawad Kabir, Mohsinul Kabir, Hasan Mahmud, Md Kamrul HasanArxiv: https://arxiv.org/abs/2306.13899
TLDR: The art of mathematical reasoning stands as a fundamental pillar of intellectual progress and is a central catalyst in cultivating human ingenuity. Researchers have recently published a plethora of works centered around the task of solving Math Word Problems (MWP)
Repo: None
Keyword: plagiarism
FastBCSD: Fast and Efficient Neural Network for Binary Code Similarity Detection
Authors: Chensen Huang, Guibo Zhu, Guojing Ge, Taihao Li, Jinqiao WangArxiv: https://arxiv.org/abs/2306.14168
TLDR: Binary code similarity detection (BCSD) has various applications, including but not limited to vulnerability detection, plagiarism detection, and malware detection. Previous research efforts mainly focus on transforming binary code to assembly code strings using reverse compilation and then using pre-trained deep learning models with large parameters to obtain feature representation vector of binary code. While these models have proven to be effective in representing binary code, their large parameter size leads to considerable computational expenses during both training and inference. In this paper
Repo: None
Keyword: robustness
Improving Panoptic Segmentation for Nighttime or Low-Illumination Urban Driving Scenes
Authors: Ankur ChrungooArxiv: https://arxiv.org/abs/2306.13725
TLDR: Autonomous vehicles and driving systems use scene parsing as an essential tool to understand the surrounding environment. Panoptic segmentation is a state-of-the-art technique which proves to be pivotal in this use case. Deep learning-based architectures have been utilized for effective and efficient PanoptIC Segmentation in recent times. However, when it comes to adverse conditions like dark scenes with poor illumination or nighttime images, existing methods perform poorly in comparison to daytime images. One of the
Repo: None
CeBed: A Benchmark for Deep Data-Driven OFDM Channel Estimation
Authors: Amal Feriani, Di Wu, Steve Liu, Greg DudekArxiv: https://arxiv.org/abs/2306.13761
TLDR: Deep learning has been extensively used in wireless communication problems, including channel estimation. Although several data-driven approaches exist, a fair and realistic comparison between them is difficult due to inconsistencies in the experimental conditions and the lack of a standardized experimental design. In addition, the performance of data-by-the-numbers approaches is often compared based on empirical analysis. The lack of reproducibility and availability of standardized evaluation tools (e.g., datasets, codebases) hinder the development
Repo: None
Similarity Preserving Adversarial Graph Contrastive Learning
Authors: Yeonjun In, Kanghoon Yoon, Chanyoung ParkArxiv: https://arxiv.org/abs/2306.13854
TLDR: Recent works demonstrate that GNN models are vulnerable to adversarial attacks, which refer to imperceptible perturbation on the graph structure and node features. Among various GNN model, graph contrastive learning (GCL) based methods specifically suffer from adversarial Attacks due to their inherent design that highly depends on the self-supervision signals derived from the original graph, which however already contains noise when the graph is attacked. To achieve adversarial robustness against such attacks, existing methods
Repo: None
Math Word Problem Solving by Generating Linguistic Variants of Problem Statements
Authors: Syed Rifat Raiyan, Md. Nafis Faiyaz, Shah Md. Jawad Kabir, Mohsinul Kabir, Hasan Mahmud, Md Kamrul HasanArxiv: https://arxiv.org/abs/2306.13899
TLDR: The art of mathematical reasoning stands as a fundamental pillar of intellectual progress and is a central catalyst in cultivating human ingenuity. Researchers have recently published a plethora of works centered around the task of solving Math Word Problems (MWP)
Repo: None
Towards Robust Aspect-based Sentiment Analysis through Non-counterfactual Augmentations
Authors: Xinyu Liu, Yan Ding, Kaikai An, Chunyang Xiao, Pranava Madhyastha, Tong Xiao, Jingbo ZhuArxiv: https://arxiv.org/abs/2306.13971
TLDR: While state-of-the-art NLP models have demonstrated excellent performance for aspect based sentiment analysis (ABSA), substantial evidence has been presented on their lack of robustness. This is especially manifested as significant degradation in performance when faced with out- of-distribution data. Recent solutions that rely on counterfactually augmented datasets show promising results, but they are inherently limited because of the lack of access to explicit causal structure. In this paper, we present an alternative approach that relies
Repo: None
Individualized Dosing Dynamics via Neural Eigen Decomposition
Authors: Stav Belogolovsky, Ido Greenberg, Danny Eytan, Shie MannorArxiv: https://arxiv.org/abs/2306.14020
TLDR: Dosing models often use differential equations to model biological dynamics. Neural differential equations in particular can learn to predict the derivative of a process, which permits predictions at irregular points of time. However, this temporal flexibility often comes with a high sensitivity to noise, whereas medical problems often present high noise and limited data. Moreover, medical dosing models must generalize reliably over individual patients and changing treatment policies. To address these challenges, we introduce the Neural Eigen Stochastic Differential Equation
Repo: None
Machine Learning needs its own Randomness Standard: Randomised Smoothing and PRNG-based attacks
Authors: Pranav Dahiya, Ilia Shumailov, Ross AndersonArxiv: https://arxiv.org/abs/2306.14043
TLDR: Randomness supports many critical functions in the field of machine learning (ML) including optimisation, data selection, privacy, and security. ML systems outsource the task of generating or harvesting randomness to the compiler, the cloud service provider or elsewhere in the toolchain. Yet there is a long history of attackers exploiting poor randomness, or even creating it -- as when the NSA put backdoors in random number generators to break cryptography. In this paper we consider whether attackers can compromise an
Repo: None
SuperBench: A Super-Resolution Benchmark Dataset for Scientific Machine Learning
Authors: Pu Ren, N. Benjamin Erichson, Shashank Subramanian, Omer San, Zarija Lukic, Michael W. MahoneyArxiv: https://arxiv.org/abs/2306.14070
TLDR: Super-Resolution (SR) techniques aim to enhance data resolution, enabling the retrieval of finer details, and improving the overall quality and fidelity of the data representation. There is growing interest in applying SR methods to complex spatiotemporal systems within the Scientific Machine Learning (SciML) community, with the hope of accelerating numerical simulations and/or improving forecasts in weather, climate, and related areas. However, the lack of standardized benchmark datasets for comparing and validating SR methods hind
Repo: None
Robust Spatiotemporal Traffic Forecasting with Reinforced Dynamic Adversarial Training
Authors: Fan Liu, Weijia Zhang, Hao LiuArxiv: https://arxiv.org/abs/2306.14126
TLDR: Machine learning-based forecasting models are commonly used in Intelligent Transportation Systems (ITS) to predict traffic patterns and provide city-wide services. However, most of the existing models are susceptible to adversarial attacks, which can lead to inaccurate predictions and negative consequences such as congestion and delays. Therefore, improving the adversarial robustness of these models is crucial for ITS. In this paper, we propose a novel framework for incorporating adversarial training into spatiotemporal traffic forecasting tasks. We demonstrate
Repo: None
Provably Convergent Policy Optimization via Metric-aware Trust Region Methods
Authors: Jun Song, Niao He, Lijun Ding, Chaoyue ZhaoArxiv: https://arxiv.org/abs/2306.14133
TLDR: Trust-region methods based on Kullback-Leibler divergence are pervasively used to stabilize policy optimization in reinforcement learning. In this paper, we exploit more flexible metrics and examine two natural extensions of policy optimization with Wasserstein and Sinkhorn trust regions, namely Wasserste policy optimization (WPO) and Sankhorn policy optimization/SPO). Instead of restricting the policy to a parametric distribution class, we directly optimize the policy distribution and derive their closed
Repo: None
BotanicGarden: A high-quality and large-scale robot navigation dataset in challenging natural environments
Authors: Yuanzhi Liu, Yujia Fu, Minghui Qin, Yufeng Xu, Baoxin Xu, Fengdong Chen, Bart Goossens, Hongwei Yu, Chun Liu, Long Chen, Wei Tao, Hui ZhaoArxiv: https://arxiv.org/abs/2306.14137
TLDR: The rapid developments of mobile robotics and autonomous navigation over the years are largely empowered by public datasets for testing and upgrading, such as SLAM and localization tasks. Impressive demos and benchmark results have arisen, indicating the establishment of a mature technical framework. However, from the view point of real-world deployments, there are still critical defects of robustness in challenging environments, especially in large-scale, GNSS-denied, textural-monotonous, and unstructured scenarios
Repo: None
BiFF: Bi-level Future Fusion with Polyline-based Coordinate for Interactive Trajectory Prediction
Authors: Yiyao Zhu, Di Luan, Shaojie ShenArxiv: https://arxiv.org/abs/2306.14161
TLDR: Predicting future trajectories of surrounding agents is essential for safety-critical autonomous driving. Most existing work focuses on predicting marginal trajectories for each agent independently. However, it has rarely been explored in predicting joint trajectories. In this work, we propose Bi-level Future Fusion (BiFF) to explicitly capture future interactions between interactive agents. Concretely, BiFF fuses the high-level future intentions followed by low-level behaviors. Then the polyline-based coordinate
Repo: None
On Evaluating the Adversarial Robustness of Semantic Segmentation Models
Authors: Levente Halmosi, Mark JelasityArxiv: https://arxiv.org/abs/2306.14217
TLDR: Achieving robustness against adversarial input perturbation is an important and intriguing problem in machine learning. In the area of semantic image segmentation, a number of adversarial training approaches have been proposed as a defense against advers adversarial perturbations, but the methodology of evaluating the robustness of the models is still lacking, compared to image classification. Here, we demonstrate that, just like in image classification, it is important to evaluate the models over several different and hard attacks
Repo: None
A Spectral Perspective towards Understanding and Improving Adversarial Robustness
Authors: Binxiao Huang, Rui Lin, Chaofan Tao, Ngai WongArxiv: https://arxiv.org/abs/2306.14262
TLDR: Deep neural networks (DNNs) are incredibly vulnerable to crafted, imperceptible adversarial perturbations. While adversarial training (AT) has proven to be an effective defense approach, the AT mechanism for robustness improvement is not fully understood. This work investigates AT from a spectral perspective, adding new insights to the design of effective defenses. In particular, we show that AT induces the deep model to focus more on the low-frequency region, which retains the shape-biased
Repo: None
Enhancing Adversarial Training via Reweighting Optimization Trajectory
Authors: Tianjin Huang, Shiwei Liu, Tianlong Chen, Meng Fang, Li Shen, Vlaod Menkovski, Lu Yin, Yulong Pei, Mykola PechenizkiyArxiv: https://arxiv.org/abs/2306.14275
TLDR: Despite the fact that adversarial training has become the de facto method for improving the robustness of deep neural networks, it is well-known that vanilla adversarial learning suffers from daunting robust overfitting, resulting in unsatisfactory robust generalization. A number of approaches have been proposed to address these drawbacks such as extra regularization, adversarial weights perturbation, and training with more data over the last few years. However, the robustgeneralization improvement is yet far from satisfactory. In
Repo: None
Adaptive Sharpness-Aware Pruning for Robust Sparse Networks
Authors: Anna Bair, Hongxu Yin, Maying Shen, Pavlo Molchanov, Jose AlvarezArxiv: https://arxiv.org/abs/2306.14306
TLDR: Robustness and compactness are two essential components of deep learning models that are deployed in the real world. The seemingly conflicting aims of (i) generalization across domains as in robustness, and (ii) specificity to one domain as in compression, are why the overall design goal of achieving robust compact models, despite being highly important, is still a challenging open problem. We introduce Adaptive Sharpness-Aware Pruning, or AdaSAP, a method that yields robust
Repo: None
Addressing Cold Start Problem for End-to-end Automatic Speech Scoring
Authors: Jungbae Park, Seungtaek ChoiArxiv: https://arxiv.org/abs/2306.14310
TLDR: Integrating automatic speech scoring/assessment systems has become a critical aspect of second-language speaking education. With self-supervised learning advancements, end-to-end speech scoring approaches have exhibited promising results. However, this study highlights the significant decrease in the performance of speech scoring systems in new question contexts, thereby identifying this as a cold start problem in terms of items. With the finding of cold-start phenomena, this paper seeks to alleviate the problem by following methods: 1)
Repo: None
A Closer Look at Geometric Temporal Dynamics for Face Anti-Spoofing
Authors: Chih-Jung Chang, Yaw-Chern Lee, Shih-Hsuan Yao, Min-Hung Chen, Chien-Yi Wang, Shang-Hong Lai, Trista Pei-Chun ChenArxiv: https://arxiv.org/abs/2306.14313
TLDR: Face anti-spoofing (FAS) is indispensable for a face recognition system. Many texture-driven countermeasures were developed against presentation attacks (PAs), but the performance against unseen domains or unseen spoofing types is still unsatisfactory. Instead of exhaustively collecting all the spoofing variations and making binary decisions of live/spoof, we offer a new perspective on the FAS task to distinguish between normal and abnormal movements of live and spoof presentations. We propose Geometry-
Repo: None
RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations
Authors: Yilun Zhao, Chen Zhao, Linyong Nan, Zhenting Qi, Wenlin Zhang, Xiangru Tang, Boyu Mi, Dragomir RadevArxiv: https://arxiv.org/abs/2306.14321
TLDR: Despite significant progress having been made in question answering on tabular data (Table QA), it's unclear whether, and to what extent existing Table QA models are robust to task-specific perturbations, e.g., replacing key question entities or shuffling table columns. We propose a benchmark called RobuT, which builds upon existing TableQA datasets (WTQ, WikiSQL-Weak, and SQA) and includes a set of human-annotated adversarial
Repo: https://github.com/yilunzhao/robut
Computational Asymmetries in Robust Classification
Authors: Samuele Marro, Michele LombardiArxiv: https://arxiv.org/abs/2306.14326
TLDR: In the context of adversarial robustness, we make three strongly related contributions. First, we prove that while attacking ReLU classifiers is
Repo: https://github.com/samuelemarro/counter-attack
Contrastive Multi-view Framework for Customer Lifetime Value Prediction
Authors: Chuhan Wu, Jingjie Li, Qinglin Jia, Hong Zhu, Yuan Fang, Ruiming TangArxiv: https://arxiv.org/abs/2306.14400
TLDR: Accurate customer lifetime value (LTV) prediction can help service providers optimize their marketing policies in customer-centric applications. However, the heavy sparsity of consumption events and the interference of data variance and noise obstruct LTV estimation. Many existing LTV prediction methods directly train a single-view LTV predictor on consumption samples, which may yield inaccurate and even biased knowledge extraction. In this paper, we propose a contrastive multi-view framework for LTV Prediction, which is a plug
Repo: None
RoboCook: Long-Horizon Elasto-Plastic Object Manipulation with Diverse Tools
Authors: Haochen Shi, Huazhe Xu, Samuel Clarke, Yunzhu Li, Jiajun WuArxiv: https://arxiv.org/abs/2306.14447
TLDR: Humans excel in complex long-horizon soft body manipulation tasks via flexible tool use: bread baking requires a knife to slice the dough and a rolling pin to flatten it. Often regarded as a hallmark of human cognition, tool use in autonomous robots remains limited due to challenges in understanding tool-object interactions. Here we develop an intelligent robotic system, RoboCook, which perceives, models, and manipulates elasto-plastic objects with various tools. RoboCook uses point
Repo: None
Histopathology Image Classification using Deep Manifold Contrastive Learning
Authors: Jing Wei Tan, Won-Ki JeongArxiv: https://arxiv.org/abs/2306.14459
TLDR: Contrastive learning has gained popularity due to its robustness with good feature representation performance. However, cosine distance, the commonly used similarity metric in contrastive learning, is not well suited to represent the distance between two data points, especially on a nonlinear feature manifold. Inspired by manifold learning, we propose a novel extension of contrastive training that leverages geodesic distance between features as a similarity metric for histopathology whole slide image classification. To reduce the computational overhead in
Repo: None
Exploring the Robustness of Large Language Models for Solving Programming Problems
Authors: Atsushi Shirafuji, Yutaka Watanobe, Takumi Ito, Makoto Morishita, Yuki Nakamura, Yusuke Oda, Jun SuzukiArxiv: https://arxiv.org/abs/2306.14583
TLDR: Using large language models (LLMs) for source code has recently gained attention. LLMs, such as Transformer-based models like Codex and ChatGPT, have been shown to be highly capable of solving a wide range of programming problems. However, the extent to which LLMs understand problem descriptions and generate programs accordingly or just retrieve source code from the most relevant problem in training data based on superficial cues has not been discovered yet. To explore this research question, we conduct experiments to
Repo: None
A Closed-Loop Bin Picking System for Entangled Wire Harnesses using Bimanual and Dynamic Manipulation
Authors: Xinyi Zhang, Yukiyasu Domae, Weiwei Wan, Kensuke HaradaArxiv: https://arxiv.org/abs/2306.14595
TLDR: This paper addresses the challenge of industrial bin picking using entangled wire harnesses. Wire harnesses are essential in manufacturing but poses challenges in automation due to their complex geometries and propensity for entanglement. Our previous work tackled this issue by proposing a quasi-static pulling motion to separate the entangled wire strands. However, it still lacks sufficiency and generalization to various shapes and structures. In this paper, we deploy a dual-arm robot that can grasp, extract and disent
Repo: None
A structure and asymptotic preserving scheme for the Vlasov-Poisson-Fokker-Planck model
Authors: Alain Blaustein (UT3), Francis Filbet (UT3)Arxiv: https://arxiv.org/abs/2306.14605
TLDR: We propose a numerical method for the Vlasov-Poisson-Fokker-Planck model written as an hyperbolic system thanks to a spectral decomposition in the basis of Hermite functions with respect to the velocity variable and a structure preserving finite volume scheme for the space variable. On the one hand, we show that this scheme naturally preserves both stationary solutions and linearized free-energy estimates for the nonlinearity of the linearized space variable, and that it preserves
Repo: None
The race to robustness: exploiting fragile models for urban camouflage and the imperative for machine learning security
Authors: Harriet Farlow, Matthew Garratt, Gavin Mount, Tim LynarArxiv: https://arxiv.org/abs/2306.14609
TLDR: Adversarial Machine Learning (AML) represents the ability to disrupt Machine Learning/ML) algorithms through a range of methods that broadly exploit the architecture of deep learning optimisation. This paper presents Distributed Adversarial Regions (DAR), a novel method that implements distributed instantiations of computer vision-based AML attack methods that may be used to disguise objects from image recognition in both white and black box settings. We consider the context of object detection models used in urban environments,
Repo: None
PhD Thesis: Exploring the role of (self-)attention in cognitive and computer vision architecture
Authors: Mohit VaishnavArxiv: https://arxiv.org/abs/2306.14650
TLDR: We investigate the role of attention and memory in complex reasoning tasks. We analyze Transformer-based self-attention as a model and extend it with memory. By studying a synthetic visual reasoning test, we refine the taxonomy of reasoning tasks with ResNet50, we enhance feature maps using feature-based and spatial attention, achieving efficient solving of challenging visual reasoning tasks in SVRT tasks. Our findings contribute to understanding the attentional needs of SVRTs, as well as the
Repo: None
A denoised Mean Teacher for domain adaptive point cloud registration
Authors: Alexander Bigalke, Mattias P. HeinrichArxiv: https://arxiv.org/abs/2306.14749
TLDR: Point cloud-based medical registration promises increased computational efficiency, robustness to intensity shifts, and anonymity preservation but is limited by the inefficacy of unsupervised learning with similarity metrics. Supervised training on synthetic deformations is an alternative but, in turn, suffers from the domain gap to the real domain. In this work, we aim to tackle this gap through domain adaptation. Self-training with the Mean Teacher is an established approach to this problem but is impaired by the inherent noise
Repo: https://github.com/multimodallearning/denoised_mt_pcd_reg
A Positive-Unlabeled Metric Learning Framework for Document-Level Relation Extraction with Incomplete Labeling
Authors: Ye Wang, Huazheng Pan, Tao Zhang, Wen Wu, Wenxin HuArxiv: https://arxiv.org/abs/2306.14806
TLDR: The goal of document-level relation extraction (RE) is to identify relations between entities that span multiple sentences. Recently, incomplete labeling in document- level RE has received increasing attention, and some studies have used methods such as positive-unlabeled learning to tackle this issue, but there is still a lot of room for improvement. Motivated by this, we propose a positive-augmentation and positive-mixup positive-nonlabeled metric learning framework (P3M).
Repo: None
Keyword: semantic similarity
Full Automation of Goal-driven LLM Dialog Threads with And-Or Recursors and Refiner Oracles
Authors: Paul TarauArxiv: https://arxiv.org/abs/2306.14077
TLDR: We automate deep step-by step reasoning in an LLM dialog thread by recursively exploring alternatives (OR-nodes) and expanding details (AND-node) up to a given depth. Starting from a single succinct task-specific initiator we steer the automated dialog thread to stay focussed on the task by synthesizing a prompt that summarizes the depth-first steps taken so far. Our algorithm is derived from a simple recursive descent implementation of a Horn Clause interpreter, except
Repo: None
Keyword: simplification
logLTN: Differentiable Fuzzy Logic in the Logarithm Space
Authors: Samy Badreddine, Luciano Serafini, Michael SprangerArxiv: https://arxiv.org/abs/2306.14546
TLDR: The AI community is increasingly focused on merging logic with deep learning to create Neuro-Symbolic (NeSy) paradigms and assist neural approaches with symbolic knowledge. A significant trend in the literature involves integrating axioms and facts in loss functions by grounding logical symbols with neural networks and operators with fuzzy semantics. Logic Tensor Networks (LTN) is one of the main representatives in this category, known for its simplicity, efficiency, and versatility. However, it has been previously
Repo: https://github.com/sbadredd/logltn-experiments
Keyword: summarization
Fusing Multimodal Signals on Hyper-complex Space for Extreme Abstractive Text Summarization (TL;DR) of Scientific Contents
Authors: Yash Kumar Atri, Vikram Goyal, Tanmoy ChakrabortyArxiv: https://arxiv.org/abs/2306.13968
TLDR: The realm of scientific text summarization has experienced remarkable progress due to the availability of annotated brief summaries and ample data. However, the utilization of multiple input modalities, such as videos and audio, has yet to be thoroughly explored. At present, scientific multimodal-input-based text summarisation systems tend to employ longer target summaries like abstracts, leading to an underwhelming performance in the task of text summarizing. In this paper, we deal with a novel
Repo: None
Vietnamese multi-document summary using subgraph selection approach -- VLSP 2022 AbMuSu Shared Task
Authors: Huu-Thin Nguyen, Tam Doan Thanh, Cam-Van Thi NguyenArxiv: https://arxiv.org/abs/2306.14827
TLDR: Document summarization is a task to generate afluent, condensed summary for a document, and a cluster summary for the same document, which contains important information. A cluster of documents serves as the input for multi-document summarizing (MDS), while the cluster summary serves as an output. In this paper, we focus on transforming the extractive MDS problem into subgraph selection. Approaching the problem in the form of graphs helps to capture simultaneously the relationship between sentences in the same
Repo: None
Keyword: text generation
Large Language Models as Sous Chefs: Revising Recipes with GPT-3
Authors: Alyssa Hwang, Bryan Li, Zhaoyi Hou, Dan RothArxiv: https://arxiv.org/abs/2306.13986
TLDR: With their remarkably improved text generation and prompting capabilities, large language models can adapt existing written information into forms that are easier to use and understand. In our work, we focus on recipes as an example of complex, diverse, and widely used instructions. We develop a prompt grounded in the original recipe and ingredients list that breaks recipes down into simpler steps. We apply this prompt to recipes from various world cuisines, and experiment with several large languages models (LLMs), finding best results with
Repo: None
Weakly Supervised Scene Text Generation for Low-resource Languages
Authors: Yangchen Xie, Xinyuan Chen, Hongjian Zhan, Palaiahankote ShivakumArxiv: https://arxiv.org/abs/2306.14269
TLDR: A large number of annotated training images is crucial for training successful scene text recognition models. However, collecting sufficient datasets can be a labor-intensive and costly process, particularly for low-resource languages. To address this challenge, auto-generating text data has shown promise in alleviating the problem. Unfortunately, existing scene text generation methods typically rely on a large amount of paired data, which is difficult to obtain for low and resource languages. In this paper, we propose a novel weak
Repo: None
Knowledge Graph-Augmented Korean Generative Commonsense Reasoning
Authors: Dahyun Jung, Jaehyung Seo, Jaewook Lee, Chanjun Park, Heuiseok LimArxiv: https://arxiv.org/abs/2306.14470
TLDR: Generative commonsense reasoning refers to the task of generating acceptable and logical assumptions about everyday situations based on commonsense understanding. By utilizing an existing dataset such as Korean CommonGen, language generation models can learn commonsense arguments specific to the Korean language. However, language models often fail to consider the relationships between concepts and the deep knowledge inherent to concepts. To address these limitations, we propose a method to utilize the Korean knowledge graph data for text generation. Our experimental result shows that the proposed method
Repo: None
FunQA: Towards Surprising Video Comprehension
Authors: Binzhu Xie, Sicheng Zhang, Zitang Zhou, Bo Li, Yuanhan Zhang, Jack Hessel, Jingkang Yang, Ziwei LiuArxiv: https://arxiv.org/abs/2306.14899
TLDR: Surprising videos, e.g., funny clips, creative performances, or visual illusions, attract significant attention. Enjoyment of these videos is not simply a response to visual stimuli; rather, it hinges on the human capacity to understand (and appreciate) commonsense violations depicted in these videos. We introduce FunQA, a challenging video question answering (QA) dataset specifically designed to evaluate and enhance the depth of video reasoning based on counter-intuitive and fun videos. Unlike most video
Repo: https://github.com/jingkang50/funqa
The text was updated successfully, but these errors were encountered: