Skip to content

Commit

Permalink
Added new papers
Browse files Browse the repository at this point in the history
  • Loading branch information
asteroidhouse committed Nov 28, 2024
1 parent a3b50db commit abaa6cc
Show file tree
Hide file tree
Showing 2 changed files with 247 additions and 0 deletions.
30 changes: 30 additions & 0 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -493,6 +493,36 @@
</div>
</div>

<div class="displaycards touchup-date" id="event-IJlbuSrXmk">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/IJlbuSrXmk.html">Audio-Visual Dataset Distillation</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Saksham Singh Kushwaha &middot; Siva Sai Nagender Vasireddy &middot; Kai Wang &middot; Yapeng Tian</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-IJlbuSrXmk"></div>

<a href="paper_pages/IJlbuSrXmk.html">
<img src="http://img.youtube.com/vi/SfXLu8D_K6o/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>

<div class="abstract-section">
<div>
<a id="abstract-link-IJlbuSrXmk" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-IJlbuSrXmk" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-IJlbuSrXmk">
Abstract <i id="caret-IJlbuSrXmk" class="fas fa-caret-right"></i>
</a>
</div>
</div>

<div class="collapse" id="collapse-event-abstract-IJlbuSrXmk">
<div class="abstract-display">
<p>In this article, we introduce \textit{audio-visual dataset distillation}, a task to construct a smaller yet representative synthetic audio-visual dataset that maintains the cross-modal semantic association between audio and visual modalities. Dataset distillation techniques have primarily focused on image classification. However, with the growing capabilities of audio-visual models and the vast datasets required for their training, it is necessary to explore distillation methods beyond the visual modality. Our approach builds upon the foundation of Distribution Matching (DM), extending it to handle the unique challenges of audio-visual data. A key challenge is to jointly learn synthetic data that distills both the modality-wise information and natural alignment from real audio-visual data. We introduce a vanilla audio-visual distribution matching framework that separately trains visual-only and audio-only DM components, enabling us to investigate the effectiveness of audio-visual integration and various multimodal fusion methods. To address the limitations of unimodal distillation, we propose two novel matching losses: implicit cross-matching and cross-modal gap matching. These losses work in conjunction with the vanilla unimodal distribution matching loss to enforce cross-modal alignment and enhance the audio-visual dataset distillation process. Extensive audio-visual classification and retrieval experiments on four audio-visual datasets, AVE, MUSIC-21, VGGSound, and VGGSound-10K, demonstrate the effectiveness of our proposed matching approaches and validate the benefits of audio-visual integration with condensed data. This work establishes a new frontier in audio-visual dataset distillation, paving the way for further advancements in this exciting field. \textit{Our source code and pre-trained models will be released}.</p>
</div>
</div>
</div>

<div class="displaycards touchup-date" id="event-hbtG6s6e7r">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/hbtG6s6e7r.html">Growing Tiny Networks: Spotting Expressivity Bottlenecks and Fixing Them Optimally</a>
Expand Down
217 changes: 217 additions & 0 deletions paper_pages/IJlbuSrXmk.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
<!DOCTYPE html>
<html lang="en" style="scroll-padding-top: 70px;">

<head>

<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0, shrink-to-fit=no">
<link rel="stylesheet"
href="https://fonts.googleapis.com/css?family=Open+Sans:300italic,400italic,600italic,700italic,800italic,400,300,600,700,800">
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Lora:400,700,400italic,700italic">
<link href="https://fonts.googleapis.com/css2?family=Exo:wght@400;700&family=Lato:wght@400;700&display=swap" rel="stylesheet">

<link rel="stylesheet" href="/static/expo/fonts/font-awesome.min.css">
<link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.8.1/css/all.css" integrity="sha384-50oBUHEmvpQ+1lW4y57PTFmhCaXp0ML5d60M1M7uH2+nqUivzIebhndOJK28anvf" crossorigin="anonymous">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap-select.min.css">
<link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.8.1/css/all.css" integrity="sha384-50oBUHEmvpQ+1lW4y57PTFmhCaXp0ML5d60M1M7uH2+nqUivzIebhndOJK28anvf" crossorigin="anonymous">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" integrity="sha384-xOolHFLEh07PJGoPkLv1IbcEPTNtaed2xpHsD9ESMhqIYd0nLMwNLD69Npy4HI+N" crossorigin="anonymous">


<script src="https://code.jquery.com/jquery-3.6.1.min.js"
integrity="sha256-o88AwQnZB+VDvE9tvIXrMQaPlFFSUTR+nldQm1LuPXQ=" crossorigin="anonymous"></script>
</script>

<script>
if (typeof jQuery === 'undefined') {
var script = document.createElement('script');
script.type = 'text/javascript';
script.src = "/static/core/js/jquery-3.6.1.min.js";
document.head.appendChild(script);
}
</script>

<script src="https://d3js.org/d3.v5.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/umd/popper.min.js" integrity="sha384-Q6E9RHvbIyZFJoft+2mJbHaEWldlvI9IOYy5n3zV9zzTtmI3UksdQRVvoxMfooAo" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.bundle.min.js" integrity="sha384-Fy6S3B9q64WdZWQUiU+q4/2Lc9npb8tCaSX9FK7E8HnRr0Jz8D6OP9dO5Vg3Q9ct" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap-select.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/corejs-typeahead/1.3.1/typeahead.bundle.min.js" integrity="sha512-lEb9Vp/rkl9g2E/LdHIMFTqz21+LA79f84gqP75fbimHqVTu6483JG1AwJlWLLQ8ezTehty78fObKupq3HSHPQ==" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/min/moment.min.js"
integrity="sha256-4iQZ6BVL4qNKlQ27TExEhBN1HFPvAvAMbFavKKosSWQ="
crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/js-cookie@2/src/js.cookie.min.js"></script>
<script src="/static/core/js/ajax-csrf-snippet.js" type="text/javascript"></script>
<script src="/static/virtual/js/virtual.js"></script>


<link rel="stylesheet" href="../virtual.css">
<link rel="stylesheet" href="/static/virtual/css/calendar.css">
<link rel="stylesheet" href="/static/virtual/css/calendar-ICML.css">
<link rel="stylesheet" href="/static/virtual/css/calendar-ICML2023.css">
<script src='https://slideslive.com/embed_presentation.js'></script>

</head>

<body>
<!-- NAV -->

<!--<nav class="navbar sticky-top navbar-expand-lg navbar-light bg-light mr-auto" id="main-nav">-->
<nav class="navbar sticky-top navbar-expand-lg mr-auto navbar-light" id="main-nav">
<div class="container-fluid">
<a class="navbar-brand" href="../index.html">
<img src="../tmlr_logo.jpeg" height="40px">
Transactions on Machine Learning Research
</a>
<button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarNav"
aria-controls="navbarNav" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse text-right flex-grow-1" id="navbarNav">
<ul class="navbar-nav ml-auto">

<!--
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdown" role="button"
data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Main Conference
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdown">
<div class="dropdown-divider"></div>
<a class="dropdown-item" href="/virtual/2023/events/oral">Orals</a>
<div class="dropdown-divider"></div>
<a class="dropdown-item" href="/virtual/2023/events/spotlight">Spotlights</a>
<div class="dropdown-divider"></div>
<a class="dropdown-item" href="/virtual/2023/papers.html">Papers</a>
</div>
</li>
-->

<!--
<li class="nav-item">
<a class="nav-link" href="../index_file.html">All Papers</a>
</li>
-->

<!--
<li class="nav-item">
<a class="nav-link" href="../">Papers with Videos</a>
</li>
-->

<!--
<li class="nav-item">
<a class="nav-link" href="../features_papers.html">Featured Papers</a>
</li>
-->

<!--
<li class="nav-item ">
<a class="nav-link" href="/virtual/2023/search"><i class="fas fa-search"></i> &nbsp;</a>
</li>
-->
</ul>
</div>
</div>
</nav>


<div class="container">
<!-- Title -->
<div class="pp-card m-3" style="" id="bookmark-here">
<div class="card-header">
<!-- <h3 class="text-center">Spotlight</h3> -->

<h2 class="card-title main-title text-center" style="">
Audio-Visual Dataset Distillation
</h2>

<h3 class="card-subtitle mb-2 text-muted text-center">
Saksham Singh Kushwaha &middot; Siva Sai Nagender Vasireddy &middot; Kai Wang &middot; Yapeng Tian
</h3>

<div class="text-center p-3">

<!--
<a class="card-link" data-toggle="collapse" role="button" href="#details">
Abstract
</a>
-->

<div class="schedule-html-detail"></div>

<div>
<span class="nowrap" style="white-space:nowrap">
<a href="https://openreview.net/forum?id=IJlbuSrXmk" class="btn btn-default" title="OpenReview">
<img src="../message-logo2.svg" width="30px" alt="Discussion Logo"/> OpenReview
</span>

<span class="nowrap" style="white-space:nowrap">
<a href="https://openreview.net/pdf?id=IJlbuSrXmk" class="btn btn-default href_PDF" title="Paper PDF">
<img src="../pdf-logo.svg" width="30px" alt="PDF Logo"/> Paper PDF
</a>
</span>

<span class="nowrap" style="white-space:nowrap">
<a href="https://github.com/sakshamsingh1/AVDD" class="btn btn-default href_PDF" title="Code">
<img src="../github-logo.svg" width="30px" alt="Github Logo"/> Code
</a>
</span>
</div>
</div>
<div class=" text-center text-muted text-monospace ">
<div>
</div>
</div>
</div>
</div>


<!-- YouTube Embed -->
<div class="text-center">
<h4 class="text-center">Video</h4>
<iframe width="896" height="504" src="https://www.youtube.com/embed/SfXLu8D_K6o" title="Embedded Video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
</div>

<div class="m-3 text-center">
<h4 class="text-center">Paper PDF</h4>
<a href="https://openreview.net/pdf?id=IJlbuSrXmk"><img src="../paper_thumbnails/IJlbuSrXmk.pdf.jpg" class="border border-dark rounded" alt="Thumbnail of paper pages" /></a>
</div>

<div id="details" class="pp-card m-3">
<div class="card-body">
<p class="card-text">
<div id="abstractExample">
<h4 class="text-center">Abstract</h4>
<p>
In this article, we introduce \textit{audio-visual dataset distillation}, a task to construct a smaller yet representative synthetic audio-visual dataset that maintains the cross-modal semantic association between audio and visual modalities. Dataset distillation techniques have primarily focused on image classification. However, with the growing capabilities of audio-visual models and the vast datasets required for their training, it is necessary to explore distillation methods beyond the visual modality. Our approach builds upon the foundation of Distribution Matching (DM), extending it to handle the unique challenges of audio-visual data. A key challenge is to jointly learn synthetic data that distills both the modality-wise information and natural alignment from real audio-visual data. We introduce a vanilla audio-visual distribution matching framework that separately trains visual-only and audio-only DM components, enabling us to investigate the effectiveness of audio-visual integration and various multimodal fusion methods. To address the limitations of unimodal distillation, we propose two novel matching losses: implicit cross-matching and cross-modal gap matching. These losses work in conjunction with the vanilla unimodal distribution matching loss to enforce cross-modal alignment and enhance the audio-visual dataset distillation process. Extensive audio-visual classification and retrieval experiments on four audio-visual datasets, AVE, MUSIC-21, VGGSound, and VGGSound-10K, demonstrate the effectiveness of our proposed matching approaches and validate the benefits of audio-visual integration with condensed data. This work establishes a new frontier in audio-visual dataset distillation, paving the way for further advancements in this exciting field. \textit{Our source code and pre-trained models will be released}.
</p>
</div>
</p>

</div>
</div>
</div>


<script>
var show_abstract = false;
</script>


<script type="text/x-mathjax-config">
MathJax.Hub.Config({
"tex2jax": {
"inlineMath": [["$","$"], ["\(","\)"]],
"displayMath": [["\[","\]"]],
"processEscapes": true
}
}
);
var jq2 = $;
</script>

<script type="text/javascript" async
src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-MML-AM_CHTML">
</script>

</body>
</html>

0 comments on commit abaa6cc

Please sign in to comment.