-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
6048 lines (5250 loc) · 503 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html lang="en" style="scroll-padding-top: 70px;">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0, shrink-to-fit=no">
<link rel="stylesheet"
href="https://fonts.googleapis.com/css?family=Open+Sans:300italic,400italic,600italic,700italic,800italic,400,300,600,700,800">
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Lora:400,700,400italic,700italic">
<link href="https://fonts.googleapis.com/css2?family=Exo:wght@400;700&family=Lato:wght@400;700&display=swap" rel="stylesheet">
<link rel="stylesheet" href="/static/expo/fonts/font-awesome.min.css">
<link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.8.1/css/all.css" integrity="sha384-50oBUHEmvpQ+1lW4y57PTFmhCaXp0ML5d60M1M7uH2+nqUivzIebhndOJK28anvf" crossorigin="anonymous">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap-select.min.css">
<link rel="stylesheet" href="cards.css">
<link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.8.1/css/all.css" integrity="sha384-50oBUHEmvpQ+1lW4y57PTFmhCaXp0ML5d60M1M7uH2+nqUivzIebhndOJK28anvf" crossorigin="anonymous">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" integrity="sha384-xOolHFLEh07PJGoPkLv1IbcEPTNtaed2xpHsD9ESMhqIYd0nLMwNLD69Npy4HI+N" crossorigin="anonymous">
<script src="https://code.jquery.com/jquery-3.6.1.min.js"
integrity="sha256-o88AwQnZB+VDvE9tvIXrMQaPlFFSUTR+nldQm1LuPXQ=" crossorigin="anonymous"></script>
</script>
<script>
if (typeof jQuery === 'undefined') {
var script = document.createElement('script');
script.type = 'text/javascript';
script.src = "/static/core/js/jquery-3.6.1.min.js";
document.head.appendChild(script);
}
</script>
<script src="https://d3js.org/d3.v5.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/umd/popper.min.js" integrity="sha384-Q6E9RHvbIyZFJoft+2mJbHaEWldlvI9IOYy5n3zV9zzTtmI3UksdQRVvoxMfooAo" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.bundle.min.js" integrity="sha384-Fy6S3B9q64WdZWQUiU+q4/2Lc9npb8tCaSX9FK7E8HnRr0Jz8D6OP9dO5Vg3Q9ct" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap-select.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/corejs-typeahead/1.3.1/typeahead.bundle.min.js" integrity="sha512-lEb9Vp/rkl9g2E/LdHIMFTqz21+LA79f84gqP75fbimHqVTu6483JG1AwJlWLLQ8ezTehty78fObKupq3HSHPQ==" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/min/moment.min.js"
integrity="sha256-4iQZ6BVL4qNKlQ27TExEhBN1HFPvAvAMbFavKKosSWQ="
crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/js-cookie@2/src/js.cookie.min.js"></script>
<script src="/static/core/js/ajax-csrf-snippet.js" type="text/javascript"></script>
<script src="/static/virtual/js/virtual.js"></script>
<link rel="stylesheet" href="virtual.css">
<style>
body {
background: #f6f6f6;
}
</style>
</head>
<body>
<!-- NAV -->
<!--
<nav class="navbar sticky-top navbar-expand-lg navbar-light mr-auto" id="main-nav">
<div class="container-fluid">
<a class="navbar-brand" href="/">
<img src="/static/core/img/ICML-logo.svg" height="40px">
</a>
<button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarNav"
aria-controls="navbarNav" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse text-right flex-grow-1" id="navbarNav">
<ul class="navbar-nav ml-auto">
<li class="nav-item ">
<a class="nav-link" href="/virtual/2022/events/tutorial">Tutorials</a>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdown" role="button"
data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Main Conference
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdown">
<a class="dropdown-item" href="/virtual/2022/events/oral">Orals</a>
<div class="dropdown-divider"></div>
<a class="dropdown-item" href="/virtual/2022/papers.html">Papers</a>
</div>
</li>
</ul>
</div>
</div>
</nav>
-->
<!-- NAV -->
<nav class="navbar sticky-top navbar-expand-lg navbar-light mr-auto" id="main-nav">
<div class="container-fluid">
<a class="navbar-brand" href="">
<img src="tmlr_logo.jpeg" height="40px">
Transactions on Machine Learning Research
</a>
<button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarNav"
aria-controls="navbarNav" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse text-right flex-grow-1" id="navbarNav">
<ul class="navbar-nav ml-auto">
<!--
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdown" role="button"
data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Main Conference
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdown">
<div class="dropdown-divider"></div>
<a class="dropdown-item" href="/virtual/2023/events/oral">Orals</a>
<div class="dropdown-divider"></div>
<a class="dropdown-item" href="/virtual/2023/events/spotlight">Spotlights</a>
<div class="dropdown-divider"></div>
<a class="dropdown-item" href="/virtual/2023/papers.html">Papers</a>
</div>
</li>
-->
<!--
<li class="nav-item">
<a class="nav-link" href="../">All Papers</a>
</li>
-->
<!--
<li class="nav-item">
<a class="nav-link" href="">Papers with Videos</a>
</li>
-->
<!--
<li class="nav-item">
<a class="nav-link" href="../featured_papers.html">Featured Papers</a>
</li>
-->
<!--
<li class="nav-item ">
<a class="nav-link" href="/virtual/2023/search"><i class="fas fa-search"></i> </a>
</li>
-->
</ul>
</div>
</div>
</nav>
<div class="container">
<br />
<div class="row">
<div class="col-md-12"></div>
<div class="title-centered" style="text-align:center">TMLR Infinite Conference</div>
</div>
</div>
<div class="row">
<div class="col-sm-12">
<div style="max-width: 1500px; margin:auto; border">
<div class="grid-displaycards">
<div class="displaycards touchup-date" id="event-ZRXwHRXm8i">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/ZRXwHRXm8i.html">CREW: Facilitating Human-AI Teaming Research</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Lingyu Zhang · Zhengran Ji · Boyuan Chen</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-ZRXwHRXm8i"></div>
<a href="paper_pages/ZRXwHRXm8i.html">
<img src="http://img.youtube.com/vi/RINSo3uI0dI/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-ZRXwHRXm8i" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-ZRXwHRXm8i" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-ZRXwHRXm8i">
Abstract <i id="caret-ZRXwHRXm8i" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-ZRXwHRXm8i">
<div class="abstract-display">
<p>With the increasing deployment of artificial intelligence (AI) technologies, the potential of humans working with AI agents has been growing at a great speed. Human-AI teaming is an important paradigm for studying various aspects when humans and AI agents work together. The unique aspect of Human-AI teaming research is the need to jointly study humans and AI agents, demanding multidisciplinary research efforts from machine learning to human-computer interaction, robotics, cognitive science, neuroscience, psychology, social science, and complex systems. However, existing platforms for Human-AI teaming research are limited, often supporting oversimplified scenarios and a single task, or specifically focusing on either human-teaming research or multi-agent AI algorithms. We introduce \textbf{CREW}, a platform to facilitate Human-AI teaming research in real-time decision-making scenarios and engage collaborations from multiple scientific disciplines, with a strong emphasis on human involvement. It includes pre-built tasks for cognitive studies and Human-AI teaming with expandable potentials from our modular design. Following conventional cognitive neuroscience research, CREW also supports multimodal human physiological signal recording for behavior analysis. Moreover, CREW benchmarks real-time human-guided reinforcement learning agents using state-of-the-art algorithms and well-tuned baselines. With CREW, we were able to conduct 50 human subject studies within a week to verify the effectiveness of our benchmark.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-QezxDgd5hf">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/QezxDgd5hf.html">Mind the truncation gap: challenges of learning on dynamic graphs with recurrent architectures</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">João Bravo · Jacopo Bono · Hugo Ferreira · Pedro Saleiro · Pedro Bizarro</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-QezxDgd5hf"></div>
<a href="paper_pages/QezxDgd5hf.html">
<img src="http://img.youtube.com/vi/IdteTB8IzP8/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-QezxDgd5hf" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-QezxDgd5hf" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-QezxDgd5hf">
Abstract <i id="caret-QezxDgd5hf" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-QezxDgd5hf">
<div class="abstract-display">
<p>Systems characterized by evolving interactions, prevalent in social, financial, and biological domains, are effectively modeled as continuous-time dynamic graphs (CTDGs). To manage the scale and complexity of these graph datasets, machine learning (ML) approaches have become essential. However, CTDGs pose challenges for ML because traditional static graph methods fail to account for event timings naturally. Newer approaches, such as graph recurrent neural networks (GRNNs), are inherently time-aware and offer advantages over static methods for CTDGs. Yet, GRNNs face another issue: the short truncation of backpropagation-through-time (BPTT) whose impact has never been properly examined until now. In this work, we demonstrate that this truncation can limit the learning of dependencies more than a hop away, resulting in reduced performance. Through experiments on a novel synthetic task as well as real-world datasets, we reveal that there exists a performance gap between full backpropagation-through-time (F-BPTT) and the truncated backpropagation-through-time (T-BPTT) commonly used to train GRNN models. We term this gap the "truncation gap" and argue that understanding and addressing it is essential as the importance of CTDGs grows, discussing potential future directions of research for this type of models.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-BsMMc4MEGS">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/BsMMc4MEGS.html">CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Zachary S Siegel · Sayash Kapoor · Nitya Nadgir · Benedikt Stroebl · Arvind Narayanan</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-BsMMc4MEGS"></div>
<a href="paper_pages/BsMMc4MEGS.html">
<img src="http://img.youtube.com/vi/Nrml8ta3PFc/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-BsMMc4MEGS" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-BsMMc4MEGS" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-BsMMc4MEGS">
Abstract <i id="caret-BsMMc4MEGS" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-BsMMc4MEGS">
<div class="abstract-display">
<p>AI agents have the potential to aid users on a variety of consequential tasks, including conducting scientific research. To spur the development of useful agents, we need benchmarks that are challenging, but more crucially, directly correspond to real-world tasks of interest. This paper introduces such a benchmark, designed to measure the accuracy of AI agents in tackling a crucial yet surprisingly challenging aspect of scientific research: computational reproducibility. This task, fundamental to the scientific process, involves reproducing the results of a study using the provided code and data. We introduce CORE-Bench (Computational Reproducibility Agent Benchmark), a benchmark consisting of 270 tasks based on 90 scientific papers across three disciplines (computer science, social science, and medicine). Tasks in CORE-Bench consist of three difficulty levels and include both language-only and vision-language tasks. We provide an evaluation system to measure the accuracy of agents in a fast and parallelizable way, saving days of evaluation time for each run compared to a sequential implementation. We evaluated two baseline agents: the general-purpose AutoGPT and a task-specific agent called CORE-Agent. We tested both variants using two underlying language models: GPT-4o and GPT-4o-mini. The best agent achieved an accuracy of 19% on the hardest level of tasks, showing the vast scope for improvement in automating routine scientific tasks. Having agents that can reproduce existing work is a necessary step toward building agents that can conduct novel research and could verify and improve the performance of other research agents. We hope that CORE-Bench can improve the state of reproducibility and spur the development of future research agents.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-MHJlFCqXdA">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/MHJlFCqXdA.html">Is Value Functions Estimation with Classification Plug-and- play for Offline Reinforcement Learning?</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Denis Tarasov · Kirill Brilliantov · Dmitrii Kharlapenko</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-MHJlFCqXdA"></div>
<a href="paper_pages/MHJlFCqXdA.html">
<img src="http://img.youtube.com/vi/xwfQ2Oa6ycs/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-MHJlFCqXdA" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-MHJlFCqXdA" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-MHJlFCqXdA">
Abstract <i id="caret-MHJlFCqXdA" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-MHJlFCqXdA">
<div class="abstract-display">
<p>In deep Reinforcement Learning (RL), value functions are typically approximated using deep neural networks and trained via mean squared error regression objectives to fit the true value functions. Recent research has proposed an alternative approach, utilizing the cross-entropy classification objective, which has demonstrated improved performance and scalability of RL algorithms. However, existing study have not extensively benchmarked the effects of this replacement across various domains, as the primary objective was to demonstrate the efficacy of the concept across a broad spectrum of tasks, without delving into in-depth analysis. Our work seeks to empirically investigate the impact of such a replacement in an offline RL setup and analyze the effects of different aspects on performance. Through large-scale experiments conducted across a diverse range of tasks using different algorithms, we aim to gain deeper insights into the implications of this approach. Our results reveal that incorporating this change can lead to superior performance over state-of-the-art solutions for some algorithms in certain tasks, while maintaining comparable performance levels in other tasks, however for other algorithms this modification might lead to the dramatic performance drop. This findings are crucial for further application of classification approach in research and practical tasks.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-QlTLkH6xRC">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/QlTLkH6xRC.html">TOTEM: TOkenized Time Series EMbeddings for General Time Series Analysis</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Sabera J Talukder · Yisong Yue · Georgia Gkioxari</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-QlTLkH6xRC"></div>
<a href="paper_pages/QlTLkH6xRC.html">
<img src="http://img.youtube.com/vi/OqrCpdb6MJk/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-QlTLkH6xRC" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-QlTLkH6xRC" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-QlTLkH6xRC">
Abstract <i id="caret-QlTLkH6xRC" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-QlTLkH6xRC">
<div class="abstract-display">
<p>This work studies the problem of time series analysis with generalist (or foundation) models, which are models trained across many data domains. Drawing inspiration from the widespread success of large language models, we consider the simple strategy of discretely tokenizing time series data drawn from a myriad of datasets via self-supervision, then using the fixed tokenization to solve a variety of tasks across many data domains. Canonically, time series models are either trained on a single dataset or built in a task-specific manner (e.g., a forecasting-only model), where many use patches of time as inputs to the model. As such, performant generalist, discrete representation time series models explored across many tasks are of value. Our method, TOkenized Time Series EMbeddings (TOTEM), produces such generalist time series models with minimal or no fine-tuning while exhibiting strong zero-shot performance. We evaluate TOTEM extensively over nearly 500 experiments on three commonly-studied time series tasks with real-world data: imputation (17 baselines, 12 datasets), anomaly detection (19 baselines, 25 datasets), and forecasting (14 baselines, 12 datasets). We conclude that TOTEM matches or outperforms existing state-of-the-art models in both the canonical specialist setting (i.e., training one model on one domain) as well as the generalist setting (i.e., training a single model on many domains), which demonstrates the efficacy of tokenization for general time series analysis. The open-source implementation is available here: https://github.com/SaberaTalukder/TOTEM; a video summary is available here: https://www.youtube.com/watch?v=OqrCpdb6MJk.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-lIy0TEUou7">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/lIy0TEUou7.html">Modular Quantization-Aware Training for 6D Object Pose Estimation</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Saqib Javed · Chengkun Li · Andrew Lawrence Price · Yinlin Hu · Mathieu Salzmann</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-lIy0TEUou7"></div>
<a href="paper_pages/lIy0TEUou7.html">
<img src="http://img.youtube.com/vi/EBNr0qNem8U/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-lIy0TEUou7" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-lIy0TEUou7" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-lIy0TEUou7">
Abstract <i id="caret-lIy0TEUou7" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-lIy0TEUou7">
<div class="abstract-display">
<p>Edge applications, such as collaborative robotics and spacecraft rendezvous, demand efficient 6D object pose estimation on resource-constrained embedded platforms. Existing 6D object pose estimation networks are often too large for such deployments, necessitating compression while maintaining reliable performance. To address this challenge, we introduce Modular Quantization-Aware Training (MQAT), an adaptive and mixed-precision quantization-aware training strategy that exploits the modular structure of modern 6D object pose estimation architectures. MQAT guides a systematic gradated modular quantization sequence and determines module-specific bit precisions, leading to quantized models that outperform those produced by state-of-the-art uniform and mixed-precision quantization techniques. Our experiments showcase the generality of MQAT across datasets, architectures, and quantization algorithms. Additionally, we observe that MQAT quantized models can achieve an accuracy boost (>7% ADI-0.1d) over the baseline full-precision network while reducing model size by a factor of 4x or more.
Project Page: https://saqibjaved1.github.io/MQAT_</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-q7YXEbFOAt">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/q7YXEbFOAt.html">$\clubsuit$ CLOVER $\clubsuit$: Probabilistic Forecasting with Coherent Learning Objective Reparameterization</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Kin G. Olivares · Geoffrey Négiar · Ruijun Ma · Oinam Nganba Meetei · Mengfei Cao · Michael W. Mahoney</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-q7YXEbFOAt"></div>
<a href="paper_pages/q7YXEbFOAt.html">
<img src="https://drive.google.com/thumbnail?id=1-xkmuYSB7YQDXOEaBKeFa-pAa1Gq1-i8" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-q7YXEbFOAt" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-q7YXEbFOAt" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-q7YXEbFOAt">
Abstract <i id="caret-q7YXEbFOAt" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-q7YXEbFOAt">
<div class="abstract-display">
<p>Obtaining accurate probabilistic forecasts is an operational challenge in many applications, such as energy management, climate forecasting, supply chain planning, and resource allocation.
Many of these applications present a natural hierarchical structure over the forecasted quantities; and forecasting systems that adhere to this hierarchical structure are said to be coherent.
Furthermore, operational planning benefits from the accuracy at all levels of the aggregation hierarchy. However, building accurate and coherent forecasting systems is challenging: classic multivariate time series tools and neural network methods are still being adapted for this purpose. In this paper, we augment an MQForecaster neural network architecture with a modified multivariate Gaussian factor model that achieves coherence by construction. The factor model samples can be differentiated with respect to the model parameters, allowing optimization on arbitrary differentiable learning objectives that align with the forecasting system's goals, including quantile loss and the scaled Continuous Ranked Probability Score (CRPS). We call our method the Coherent Learning Objective Reparametrization Neural Network (CLOVER). In comparison to state-of-the-art coherent forecasting methods,
CLOVER achieves significant improvements in scaled CRPS forecast accuracy, with average gains of 15%, as measured on six publicly-available datasets.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-tYxRyNT0TC">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/tYxRyNT0TC.html">Perception Stitching: Zero-Shot Perception Encoder Transfer for Visuomotor Robot Policies</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Pingcheng Jian · Easop Lee · Zachary I. Bell · Michael M. Zavlanos · Boyuan Chen</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-tYxRyNT0TC"></div>
<a href="paper_pages/tYxRyNT0TC.html">
<img src="http://img.youtube.com/vi/H6SD9Tcvhrg/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-tYxRyNT0TC" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-tYxRyNT0TC" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-tYxRyNT0TC">
Abstract <i id="caret-tYxRyNT0TC" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-tYxRyNT0TC">
<div class="abstract-display">
<p>Vision-based imitation learning has shown promising capabilities of endowing robots with various motion skills given visual observation. However, current visuomotor policies fail to adapt to drastic changes in their visual observations. We present Perception Stitching that enables strong zero-shot adaptation to large visual changes by directly stitching novel combinations of visual encoders. Our key idea is to enforce modularity of visual encoders by aligning the latent visual features among different visuomotor policies. Our method disentangles the perceptual knowledge with the downstream motion skills and allows the reuse of the visual encoders by directly stitching them to a policy network trained with partially different visual conditions. We evaluate our method in various simulated and real-world manipulation tasks. While baseline methods failed at all attempts, our method could achieve zero-shot success in real-world visuomotor tasks. Our quantitative and qualitative analysis of the learned features of the policy network provides more insights into the high performance of our proposed method.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-4c9UzDhg49">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/4c9UzDhg49.html">On the theoretical limit of gradient descent for Simple Recurrent Neural Networks with finite precision</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Volodimir Mitarchuk · Rémi Emonet · Remi Eyraud · Amaury Habrard</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-4c9UzDhg49"></div>
<a href="paper_pages/4c9UzDhg49.html">
<img src="http://img.youtube.com/vi/ap6LOok_Vtk/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-4c9UzDhg49" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-4c9UzDhg49" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-4c9UzDhg49">
Abstract <i id="caret-4c9UzDhg49" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-4c9UzDhg49">
<div class="abstract-display">
<p>Despite their great practical successes, the understanding of neural network behavior is still
a topical research issue. In particular, the class of functions learnable in the context of a
finite precision configuration is an open question. In this paper, we propose to study the
limits of gradient descent when such a configuration is set for the class of Simple Recurrent
Networks (SRN). We exhibit conditions under which the gradient descend will provably fail.
We also design a class of SRN based on Deterministic finite State Automata (DFA) that
fulfills the failure requirements. The definition of this class is constructive: we propose an
algorithm that, from any DFA, constructs a SRN that computes exactly the same function,
a result of interest by its own.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-SP8DLl6jgb">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/SP8DLl6jgb.html">Feature Distillation Improves Zero-Shot Transfer from Synthetic Images</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Niclas Popp · Jan Hendrik Metzen · Matthias Hein</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-SP8DLl6jgb"></div>
<a href="paper_pages/SP8DLl6jgb.html">
<img src="http://img.youtube.com/vi/KbdacNWGiAM/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-SP8DLl6jgb" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-SP8DLl6jgb" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-SP8DLl6jgb">
Abstract <i id="caret-SP8DLl6jgb" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-SP8DLl6jgb">
<div class="abstract-display">
<p>Vision-language foundation models such as CLIP have showcased impressive zero-shot capabilities. However, their applicability in resource-constrained environments is limited due to their size and the resulting latency. Knowledge distillation allows to mitigate these challenges by distilling small image encoders that can replace the large CLIP image encoder. In a zero-shot setting, where only the class names are known, no real domain images can be used for this process. Instead, we investigate the use of synthetic images for this purpose. Unlike existing works that focus on improving the quality of synthetic images to bridge the performance gap compared to training on natural images, we find the choice of loss to be a crucial factor. Specifically, minimizing only the distance between the student and teacher image features, without incorporating image captions in the loss function, increases the robustness to spurious features and data corruptions. As a result, this feature distillation approach greatly improves the transfer performance from synthetic to real images. Leveraging these insights, we are able to train domain-specific students that achieve zero-shot performance comparable to a ViT-B/32 teacher on six fine-grained classification datasets while using up to 92% fewer parameters.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-QdGtwjDgub">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/QdGtwjDgub.html">Contaminated Online Convex Optimization</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Tomoya Kamijima · Shinji Ito</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-QdGtwjDgub"></div>
<a href="paper_pages/QdGtwjDgub.html">
<img src="https://drive.google.com/thumbnail?id=1EwrCZxiGUj_iw5_i787d88x3Nqy-Kcij" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-QdGtwjDgub" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-QdGtwjDgub" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-QdGtwjDgub">
Abstract <i id="caret-QdGtwjDgub" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-QdGtwjDgub">
<div class="abstract-display">
<p>In online convex optimization, some efficient algorithms have been designed for each of the individual classes of objective functions, e.g., convex, strongly convex, and exp-concave. However, existing regret analyses, including those of universal algorithms, are limited to cases in which the objective functions in all rounds belong to the same class and cannot be applied to cases in which the property of objective functions may change in each time step. This paper introduces a novel approach to address such cases, proposing a new regime we term as \textit{contaminated} online convex optimization. For the contaminated case, we demonstrate that the regret is lower bounded by $\Omega(\log T + \sqrt{k})$. Here, $k$ signifies the level of contamination in the objective functions. We also demonstrate that the regret is bounded by $O(\log T+\sqrt{k\log T})$ when universal algorithms are used. When our proposed algorithms with additional information are employed, the regret is bounded by $O(\log T+\sqrt{k})$, which matches the lower bound. These are intermediate bounds between a convex case and a strongly convex or exp-concave case.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-t9c3pfrR1X">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/t9c3pfrR1X.html">OmniPred: Language Models as Universal Regressors</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Xingyou Song · Oscar Li · Chansoo Lee · Bangding Yang · Daiyi Peng · Sagi Perel · Yutian Chen</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-t9c3pfrR1X"></div>
<a href="paper_pages/t9c3pfrR1X.html">
<img src="http://img.youtube.com/vi/fv-cK9LgQmk/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-t9c3pfrR1X" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-t9c3pfrR1X" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-t9c3pfrR1X">
Abstract <i id="caret-t9c3pfrR1X" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-t9c3pfrR1X">
<div class="abstract-display">
<p>Regression is a powerful tool to accurately predict the outcome metric of a system given a set of parameters, but has traditionally been restricted to methods which are only applicable to a specific task. In this paper, we propose OmniPred, a framework for training language models as universal end-to-end regressors over (x,y) data from arbitrary formats. Using data sourced from Google Vizier, one of the largest proprietary blackbox optimization databases in the world, our extensive experiments demonstrate that language models are capable of very precise numerical regression using only textual representations of mathematical parameters and values, and if given the opportunity to train at scale over multiple tasks, can significantly outperform traditional regression models.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-lh6vOAHuvo">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/lh6vOAHuvo.html">AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Subhojeet Pramanik · Esraa Elelimy · Marlos C. Machado · Adam White</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-lh6vOAHuvo"></div>
<a href="paper_pages/lh6vOAHuvo.html">
<img src="http://img.youtube.com/vi/-bTe48JIUds/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-lh6vOAHuvo" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-lh6vOAHuvo" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-lh6vOAHuvo">
Abstract <i id="caret-lh6vOAHuvo" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-lh6vOAHuvo">
<div class="abstract-display">
<p>In this paper we investigate transformer architectures designed for partially observable online reinforcement learning. The self-attention mechanism in the transformer architecture is capable of capturing long-range dependencies and it is the main reason behind its effectiveness in processing sequential data. Nevertheless, despite their success, transformers have two significant drawbacks that still limit their applicability in online reinforcement learning: (1) in order to remember all past information, the self-attention mechanism requires access to the whole history to be provided as context. (2) The inference cost in transformers is expensive. In this paper, we introduce recurrent alternatives to the transformer self-attention mechanism that offer context-independent inference cost, leverage long-range dependencies effectively, and performs well in online reinforcement learning task. We quantify the impact of the different components of our architecture in a diagnostic environment and assess performance gains in 2D and 3D pixel-based partially-observable environments (e.g. T-Maze, Mystery Path, Craftax, and Memory Maze). Compared with a state-of-the-art architecture, GTrXL, inference in our approach is at least 40% cheaper while reducing memory use more than 50%. Our approach either performs similarly or better than GTrXL, improving more than 37% upon GTrXL performance in harder tasks.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-0uwe0z2Hqm">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/0uwe0z2Hqm.html">Deep-Graph-Sprints: Accelerated Representation Learning in Continuous-Time Dynamic Graphs</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Ahmad Naser Eddin · Jacopo Bono · David Oliveira Aparicio · Hugo Ferreira · Pedro Manuel Pinto Ribeiro · Pedro Bizarro</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-0uwe0z2Hqm"></div>
<a href="paper_pages/0uwe0z2Hqm.html">
<img src="http://img.youtube.com/vi/LU0324z6mHo/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-0uwe0z2Hqm" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-0uwe0z2Hqm" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-0uwe0z2Hqm">
Abstract <i id="caret-0uwe0z2Hqm" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-0uwe0z2Hqm">
<div class="abstract-display">
<p>Continuous-time dynamic graphs (CTDGs) are essential for modeling interconnected, evolving systems. Traditional methods for extracting knowledge from these graphs often depend on feature engineering or deep learning. Feature engineering is limited by the manual and time-intensive nature of crafting features, while deep learning approaches suffer from high inference latency, making them impractical for real-time applications. This paper introduces Deep-Graph-Sprints (DGS), a novel deep learning architecture designed for efficient representation learning on CTDGs with low-latency inference requirements. We benchmark DGS against state-of-the-art (SOTA) feature engineering and graph neural network methods using five diverse datasets. The results indicate that DGS achieves competitive performance while inference speed improves between 4x and 12x compared to other deep learning approaches on our benchmark datasets. Our method effectively bridges the gap between deep representation learning and low-latency application requirements for CTDGs.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-15tjpSHI15">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/15tjpSHI15.html">Teacher-Guided Graph Contrastive Learning</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Jay Nandy · Arnab Kumar Mondal · Manohar Kaul · Prathosh AP</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-15tjpSHI15"></div>
<a href="paper_pages/15tjpSHI15.html">
<img src="http://img.youtube.com/vi/WAG3W63m8g8/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-15tjpSHI15" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-15tjpSHI15" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-15tjpSHI15">
Abstract <i id="caret-15tjpSHI15" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-15tjpSHI15">
<div class="abstract-display">
<p>State-of-the-art self-supervised representation learning methods for Graphs are typically based on contrastive learning (CL) principles.
These CL objective functions can be posed as a supervised discriminative task using *'hard'* labels that consider any minor augmented pairs of graphs as 'equally positive'. However, such a notion of 'equal' pairs is incorrect for graphs as even a smaller 'discrete' perturbation may lead to large semantic changes that should be carefully encapsulated within the learned representations. This paper proposes a novel CL framework for GNNs, called *Teacher-guided Graph Contrastive Learning (TGCL)*, that incorporates 'soft' pseudo-labels to facilitate a more regularized discrimination. In particular, we propose a teacher-student framework where the student learns the representation by distilling the teacher's perception. Our TGCL framework can be adapted to existing CL methods to enhance their performance. Our empirical findings validate these claims on both inductive and transductive settings across diverse downstream tasks, including molecular graphs and social networks. Our experiments on benchmark datasets demonstrate that our framework consistently improves the average AUROC scores for molecules' property prediction and social network link prediction. Our code is available at: https://github.com/jayjaynandy/TGCL.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-fJEsas1z8J">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/fJEsas1z8J.html">MoCaE: Mixture of Calibrated Experts Significantly Improves Object Detection</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Kemal Oksuz · Selim Kuzucu · Tom Joy · Puneet K. Dokania</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-fJEsas1z8J"></div>
<a href="paper_pages/fJEsas1z8J.html">
<img src="http://img.youtube.com/vi/PzyXl5VBrqE/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-fJEsas1z8J" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-fJEsas1z8J" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-fJEsas1z8J">
Abstract <i id="caret-fJEsas1z8J" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-fJEsas1z8J">
<div class="abstract-display">
<p>Combining the strengths of many existing predictors to obtain a Mixture of Experts which is superior to its individual components is an effective way to improve the performance without having to develop new architectures or train a model from scratch. However, surprisingly, we find that naively combining off-the-shelf object detectors in a similar way to Deep Ensembles, can often lead to degraded performance. We identify that the primary cause of this issue is that the predictions of the experts do not match their performance, a term referred to as miscalibration. Consequently, the most confident detector dominates the final predictions, preventing the mixture from leveraging all the predictions from the experts appropriately. To address this, when constructing the Mixture of Experts for object detection, we propose to combine their predictions in a manner which reflects the individual performance of the experts; an objective we achieve by first calibrating the predictions before filtering and refining them. We term this approach the Mixture of Calibrated Experts (MoCaE) and demonstrate its effectiveness through extensive experiments on 5 different detection tasks, showing that it: (i) improves object detectors on COCO and instance segmentation methods on LVIS by up to $\sim 2.5$ AP; (ii) reaches state-of-the-art on COCO test-dev with $65.1$ AP and on DOTA with $82.62$ $\mathrm{AP_{50}}$; (iii) outperforms single models consistently on recent detection tasks such as Open Vocabulary Object Detection. Code is available at: https://github.com/fiveai/MoCaE</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-6j5M75iK3a">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/6j5M75iK3a.html">Continual Learning in Open-vocabulary Classification with Complementary Memory Systems</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Zhen Zhu · Weijie Lyu · Yao Xiao · Derek Hoiem</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-6j5M75iK3a"></div>
<a href="paper_pages/6j5M75iK3a.html">
<img src="http://img.youtube.com/vi/RkmeJupwNkY/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-6j5M75iK3a" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-6j5M75iK3a" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-6j5M75iK3a">
Abstract <i id="caret-6j5M75iK3a" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-6j5M75iK3a">
<div class="abstract-display">
<p>We introduce a method for flexible and efficient continual learning in open-vocabulary image classification, drawing inspiration from the complementary learning systems observed in human cognition. Specifically, we propose to combine predictions from a CLIP zero-shot model and the exemplar-based model, using the zero-shot estimated probability that a sample's class is within the exemplar classes. We also propose a ``tree probe'' method, an adaption of lazy learning principles, which enables fast learning from new examples with competitive accuracy to batch-trained linear models. We test in data incremental, class incremental, and task incremental settings, as well as ability to perform flexible inference on varying subsets of zero-shot and learned categories. Our proposed method achieves a good balance of learning speed, target task effectiveness, and zero-shot effectiveness. Code is available at https://github.com/jessemelpolio/TreeProbe.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-YcnjgKbZQS">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/YcnjgKbZQS.html">Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Vaidehi Patil · Yi-Lin Sung · Peter Hase · Jie Peng · Tianlong Chen · Mohit Bansal</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-YcnjgKbZQS"></div>
<a href="paper_pages/YcnjgKbZQS.html">
<img src="https://drive.google.com/thumbnail?id=13SWGsoj_rGGLawFD9zYzQUBUfuGIvTc3" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-YcnjgKbZQS" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-YcnjgKbZQS" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-YcnjgKbZQS">
Abstract <i id="caret-YcnjgKbZQS" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-YcnjgKbZQS">
<div class="abstract-display">
<p>Large Language Models (LLMs) trained on massive datasets may inadvertently acquire sensitive information such as personal details and potentially harmful content. This risk is further heightened in multimodal LLMs (aka MLLMs) as they integrate information from multiple modalities (image and text). Adversaries can exploit this stored knowledge by crafting inputs across modalities to extract sensitive details. Evaluating how effectively MLLMs can forget such information (targeted unlearning) necessitates the creation of high-quality, well-annotated image-text pairs. While significant research has addressed the creation of datasets for unlearning within LLMs, it has primarily concentrated on text modality. Creation of analogous datasets for multimodal data and models remain an understudied area. To address this gap, we first introduce a multimodal unlearning benchmark, UnLOK-VQA (Unlearning Outside Knowledge VQA), as well as an “attack and-defense” framework to evaluate methods for deleting specific multimodal knowledge from MLLMs. Our dataset generation process involves an automated pipeline to create samples of varied proximity levels to the target data point for evaluation of generalization and specificity, followed by manual filtering to retain only the high-quality data points. We use this process to extend a visual question-answering dataset for evaluating multimodal information deletion. Next, we present a comprehensive unlearning evaluation involving an attack-and-defense framework consisting of four white box and three blackbox attacks against six unlearning defense objectives. We also design a whitebox attack based on the interpretability of hidden states in LLMs motivated by past work. Our experimental results demonstrate that multimodal extraction attacks (with an attack success rate of 45.5%) are more successful than either image-only (32%) or text-only attacks (39%). The best overall defense mechanism, which removes answer information from internal model hidden states, reduces the success rate of multimodal attack to 15.7%. Furthermore, our findings suggest that larger models exhibit greater resilience to attacks, implying that model scaling could be a valuable strategy for enhancing robustness and developing safer models. UnLOK-VQA thus facilitates a comprehensive evaluation of unlearning in MLLMs and serves as a challenging benchmark for future research in unlearning.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-oG65SjZNIF">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/oG65SjZNIF.html">Expressive Higher-Order Link Prediction through Hypergraph Symmetry Breaking</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Simon Zhang · Cheng Xin · Tamal K. Dey</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-oG65SjZNIF"></div>
<a href="paper_pages/oG65SjZNIF.html">
<img src="http://img.youtube.com/vi/ZRiaYiN6BNw/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-oG65SjZNIF" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-oG65SjZNIF" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-oG65SjZNIF">
Abstract <i id="caret-oG65SjZNIF" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-oG65SjZNIF">
<div class="abstract-display">
<p>A hypergraph consists of a set of nodes along with a collection of subsets of the nodes called hyperedges. Higher order link prediction is the task of predicting the existence of a missing hyperedge in a hypergraph. A hyperedge representation learned for higher order link prediction is fully expressive when it does not lose distinguishing power up to an isomorphism. Many existing hypergraph representation learners, are bounded in expressive power by the Generalized Weisfeiler Lehman-1 (GWL-1) algorithm, a generalization of the Weisfeiler Lehman-1 (WL-1) algorithm. The WL-1 algorithm can approximately decide whether two graphs are isomorphic. However, GWL-1 has limited expressive power. In fact, GWL-1 can only view the hypergraph as a collection of trees rooted at each of the nodes in the hypergraph. Furthermore, message passing on hypergraphs can already be computationally expensive, particularly with limited GPU device memory. To address these limitations, we devise a preprocessing algorithm that can identify certain regular subhypergraphs exhibiting symmetry with respect to GWL-1. Our preprocessing algorithm runs once with the time complexity linear in the size of the input hypergraph. During training, we randomly drop the hyperedges of the subhypergraphs identifed by the algorithm and add covering hyperedges to break symmetry. We show that our method improves the expressivity of GWL-1. Our extensive experiments 1 also demonstrate the effectiveness of our approach for higher-order link prediction on both graph and hypergraph datasets with negligible change in computation.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-mDGvrH7lju">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/mDGvrH7lju.html">CFASL: Composite Factor-Aligned Symmetry Learning for Disentanglement in Variational AutoEncoder</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Hee-Jun Jung · Jaehyoung Jeong · Kangil Kim</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-mDGvrH7lju"></div>
<a href="paper_pages/mDGvrH7lju.html">
<img src="http://img.youtube.com/vi/R03AoD3SRZ8/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-mDGvrH7lju" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-mDGvrH7lju" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-mDGvrH7lju">
Abstract <i id="caret-mDGvrH7lju" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-mDGvrH7lju">
<div class="abstract-display">
<p>Symmetries of input and latent vectors have provided valuable insights for disentanglement learning in VAEs. However, only a few works were proposed as an unsupervised method, and even these works require known factor information in training data. We propose a
novel method, Composite Factor-Aligned Symmetry Learning (CFASL), which is integrated into VAEs for learning symmetry-based disentanglement in unsupervised learning without any knowledge of the dataset factor information. CFASL incorporates three novel features for learning symmetry-based disentanglement: 1) Injecting inductive bias to align latent vector dimensions to factor-aligned symmetries within an explicit learnable symmetry code-book 2) Learning a composite symmetry to express unknown factors change between two random samples by learning factor-aligned symmetries within the codebook 3) Inducing group equivariant encoder and decoder in training VAEs with the two conditions. In addition, we propose an extended evaluation metric for multi-factor changes in comparison to disentanglement evaluation in VAEs. In quantitative and in-depth qualitative analysis, CFASL demonstrates a significant improvement of disentanglement in single-factor change, and multi-factor change conditions compared to state-of-the-art methods.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-IJlbuSrXmk">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/IJlbuSrXmk.html">Audio-Visual Dataset Distillation</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Saksham Singh Kushwaha · Siva Sai Nagender Vasireddy · Kai Wang · Yapeng Tian</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-IJlbuSrXmk"></div>
<a href="paper_pages/IJlbuSrXmk.html">
<img src="http://img.youtube.com/vi/SfXLu8D_K6o/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-IJlbuSrXmk" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-IJlbuSrXmk" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-IJlbuSrXmk">
Abstract <i id="caret-IJlbuSrXmk" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-IJlbuSrXmk">
<div class="abstract-display">
<p>In this article, we introduce \textit{audio-visual dataset distillation}, a task to construct a smaller yet representative synthetic audio-visual dataset that maintains the cross-modal semantic association between audio and visual modalities. Dataset distillation techniques have primarily focused on image classification. However, with the growing capabilities of audio-visual models and the vast datasets required for their training, it is necessary to explore distillation methods beyond the visual modality. Our approach builds upon the foundation of Distribution Matching (DM), extending it to handle the unique challenges of audio-visual data. A key challenge is to jointly learn synthetic data that distills both the modality-wise information and natural alignment from real audio-visual data. We introduce a vanilla audio-visual distribution matching framework that separately trains visual-only and audio-only DM components, enabling us to investigate the effectiveness of audio-visual integration and various multimodal fusion methods. To address the limitations of unimodal distillation, we propose two novel matching losses: implicit cross-matching and cross-modal gap matching. These losses work in conjunction with the vanilla unimodal distribution matching loss to enforce cross-modal alignment and enhance the audio-visual dataset distillation process. Extensive audio-visual classification and retrieval experiments on four audio-visual datasets, AVE, MUSIC-21, VGGSound, and VGGSound-10K, demonstrate the effectiveness of our proposed matching approaches and validate the benefits of audio-visual integration with condensed data. This work establishes a new frontier in audio-visual dataset distillation, paving the way for further advancements in this exciting field. \textit{Our source code and pre-trained models will be released}.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-3YlOr7BHkx">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/3YlOr7BHkx.html">Mislabeled examples detection viewed as probing machine learning models: concepts, survey and extensive benchmark</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Thomas George · Pierre Nodet · Alexis Bondu · Vincent Lemaire</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-3YlOr7BHkx"></div>
<a href="paper_pages/3YlOr7BHkx.html">
<img src="http://img.youtube.com/vi/fT9VZXs0nh8/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-3YlOr7BHkx" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-3YlOr7BHkx" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-3YlOr7BHkx">
Abstract <i id="caret-3YlOr7BHkx" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-3YlOr7BHkx">
<div class="abstract-display">
<p>Mislabeled examples are ubiquitous in real-world machine learning datasets, advocating the development of techniques for automatic detection. We show that most mislabeled detection methods can be viewed as probing trained machine learning models using a few core principles. We formalize a modular framework that encompasses these methods, parameterized by only 4 building blocks, as well as a Python library that demonstrates that these principles can actually be implemented. The focus is on classifier-agnostic concepts, with an emphasis on adapting methods developed for deep learning models to non-deep classifiers for tabular data. We benchmark existing methods on (artificial) Completely At Random (NCAR) as well as (realistic) Not At Random (NNAR) labeling noise from a variety of tasks with imperfect labeling rules. This benchmark provides new insights as well as limitations of existing methods in this setup.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-2noXK5KBbx">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/2noXK5KBbx.html">Graph Structure Learning with Interpretable Bayesian Neural Networks</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Max Wasserman · Gonzalo Mateos</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-2noXK5KBbx"></div>
<a href="paper_pages/2noXK5KBbx.html">
<img src="http://img.youtube.com/vi/zcYD-r8DlUI/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-2noXK5KBbx" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-2noXK5KBbx" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-2noXK5KBbx">
Abstract <i id="caret-2noXK5KBbx" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-2noXK5KBbx">
<div class="abstract-display">
<p>Graphs serve as generic tools to encode the underlying relational structure of data. Often this graph is not given, and so the task of inferring it from nodal observations becomes important. Traditional approaches formulate a convex inverse problem with a smoothness promoting objective and rely on iterative methods to obtain a solution. In supervised settings where graph labels are available, one can unroll and truncate these iterations into a deep network that is trained end-to-end. Such a network is parameter efficient and inherits inductive bias from the optimization formulation, an appealing aspect for data constrained settings in, e.g., medicine, finance, and the natural sciences. But typically such settings care equally about \textit{uncertainty} over edge predictions, not just point estimates. Here we introduce novel iterations with \textit{independently interpretable parameters}, i.e., parameters whose values - independent of other parameters' settings - proportionally influence characteristics of the estimated graph, such as edge sparsity. After unrolling these iterations, prior knowledge over such graph characteristics shape \textit{prior distributions} over these independently interpretable network parameters to yield a Bayesian neural network (BNN) capable of graph structure learning (GSL) from smooth signal observations. Fast execution and parameter efficiency allow for high-fidelity posterior approximation via Markov Chain Monte Carlo (MCMC) and thus uncertainty quantification on edge predictions. Informative priors unlock modeling tools from Bayesian statistics like prior predictive checks. Synthetic and real data experiments corroborate this model's ability to provide well-calibrated estimates of uncertainty, in test cases that include unveiling economic sector modular structure from S$\&$P$500$ data and recovering pairwise digit similarities from MNIST images. Overall, this framework enables GSL in modest-scale applications where uncertainty on the data structure is paramount.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-hrKHkmLUFk">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/hrKHkmLUFk.html">Multi-intention Inverse Q-learning for Interpretable Behavior Representation</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Hao Zhu · Brice De La Crompe · Gabriel Kalweit · Artur Schneider · Maria Kalweit · Ilka Diester · Joschka Boedecker</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-hrKHkmLUFk"></div>
<a href="paper_pages/hrKHkmLUFk.html">
<img src="http://img.youtube.com/vi/0u-fboAO6-I/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-hrKHkmLUFk" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-hrKHkmLUFk" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-hrKHkmLUFk">
Abstract <i id="caret-hrKHkmLUFk" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-hrKHkmLUFk">
<div class="abstract-display">
<p>In advancing the understanding of natural decision-making processes, inverse reinforcement learning (IRL) methods have proven instrumental in reconstructing animal's intentions underlying complex behaviors. Given the recent development of a continuous-time multi-intention IRL framework, there has been persistent inquiry into inferring discrete time-varying rewards with IRL. To address this challenge, we introduce the class of hierarchical inverse Q-learning (HIQL) algorithms. Through an unsupervised learning process, HIQL divides expert trajectories into multiple intention segments, and solves the IRL problem independently for each. Applying HIQL to simulated experiments and several real animal behavior datasets, our approach outperforms current benchmarks in behavior prediction and produces interpretable reward functions. Our results suggest that the intention transition dynamics underlying complex decision-making behavior is better modeled by a step function instead of a smoothly varying function. This advancement holds promise for neuroscience and cognitive science, contributing to a deeper understanding of decision-making and uncovering underlying brain mechanisms.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-kUuPUIPvJ6">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/kUuPUIPvJ6.html">Support-Set Context Matters for Bongard Problems</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Nikhil Raghuraman · Adam W Harley · Leonidas Guibas</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-kUuPUIPvJ6"></div>
<a href="paper_pages/kUuPUIPvJ6.html">
<img src="http://img.youtube.com/vi/JO00GQHp0mQ/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-kUuPUIPvJ6" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-kUuPUIPvJ6" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-kUuPUIPvJ6">
Abstract <i id="caret-kUuPUIPvJ6" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-kUuPUIPvJ6">
<div class="abstract-display">
<p>Current machine learning methods struggle to solve Bongard problems, which are a type of IQ test that requires deriving an abstract “concept” from a set of positive and negative “support” images, and then classifying whether or not a new query image depicts the key concept. On Bongard-HOI, a benchmark for natural-image Bongard problems, most existing methods have reached at best 69% accuracy (where chance is 50%). Low accuracy is often attributed to neural nets’ lack of ability to find human-like symbolic rules. In this work, we point out that many existing methods are forfeiting accuracy due to a much simpler problem: they do not adapt image features given information contained in the support set as a whole, and rely instead on information extracted from individual supports. This is a critical issue, because the “key concept” in a typical Bongard problem can often only be distinguished using multiple positives and multiple negatives. We explore simple methods to incorporate this context and show substantial gains over prior works, leading to new state-of-the-art accuracy on Bongard-LOGO (75.3%) and Bongard-HOI (76.4%) compared to methods with equivalent vision backbone architectures and strong performance on the original Bongard problem set (60.8%). Code is available at https://github.com/nraghuraman/bongard-context.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-FNBv2vweBI">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/FNBv2vweBI.html">Constraining Generative Models for Engineering Design with Negative Data</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Lyle Regenwetter · Giorgio Giannone · Akash Srivastava · Dan Gutfreund · Faez Ahmed</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-FNBv2vweBI"></div>
<a href="paper_pages/FNBv2vweBI.html">
<img src="http://img.youtube.com/vi/dFwRUL2qB3o/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-FNBv2vweBI" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-FNBv2vweBI" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-FNBv2vweBI">
Abstract <i id="caret-FNBv2vweBI" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-FNBv2vweBI">
<div class="abstract-display">
<p>Generative models have recently achieved remarkable success and widespread adoption in society, yet they still often struggle to generate realistic and accurate outputs. This challenge extends beyond language and vision into fields like engineering design, where safety-critical engineering standards and non-negotiable physical laws tightly constrain what outputs are considered acceptable.
In this work, we introduce two approaches to guide models toward constraint-satisfying outputs using `negative data' -- examples of what to avoid. Our negative data generative models (NDGMs) outperform state-of-the-art NDGMs by 4x in constraint satisfaction and easily outperform classic generative models using 8x less data in certain problems. To demonstrate this, we rigorously benchmark our NDGMs against 14 baseline models across numerous synthetic and real engineering problems, such as ship hulls with hydrodynamic constraints and vehicle design with impact safety constraints. Our benchmarks showcase both the best-in-class performance of our new NDGM models and the widespread dominance of NDGMs over classic generative models in general. In doing so, we advocate for the more widespread use of NDGMs in engineering design tasks.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-llQXLfbGOq">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/llQXLfbGOq.html">Attention Normalization Impacts Cardinality Generalization in Slot Attention</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Markus Krimmel · Jan Achterhold · Joerg Stueckler</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-llQXLfbGOq"></div>
<a href="paper_pages/llQXLfbGOq.html">
<img src="http://img.youtube.com/vi/dnTrHyZgyCY/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>
<div class="abstract-section">
<div>
<a id="abstract-link-llQXLfbGOq" class="abstract-link" data-toggle="collapse"
href="#collapse-event-abstract-llQXLfbGOq" role="button"
aria-expanded="false" aria-controls="collapse-event-abstract-llQXLfbGOq">
Abstract <i id="caret-llQXLfbGOq" class="fas fa-caret-right"></i>
</a>
</div>
</div>
<div class="collapse" id="collapse-event-abstract-llQXLfbGOq">
<div class="abstract-display">
<p>Object-centric scene decompositions are important representations for downstream tasks in fields such as computer vision and robotics. The recently proposed Slot Attention module, already leveraged by several derivative works for image segmentation and object tracking in videos, is a deep learning component which performs unsupervised object-centric scene decomposition on input images. It is based on an attention architecture, in which latent slot vectors, which hold compressed information on objects, attend to localized perceptual features from the input image. In this paper, we demonstrate that design decisions on normalizing the aggregated values in the attention architecture have considerable impact on the capabilities of Slot Attention to generalize to a higher number of slots and objects as seen during training. We propose and investigate alternatives to the original normalization scheme which increase the generalization capabilities of Slot Attention to varying slot and object counts, resulting in performance gains on the task of unsupervised image segmentation. The newly proposed normalizations represent minimal and easy to implement modifications of the usual Slot Attention module, changing the value aggregation mechanism from a weighted mean operation to a scaled weighted sum operation.</p>
</div>
</div>
</div>
<div class="displaycards touchup-date" id="event-2D36otXvBE">
<div style="width:80%;margin:auto;">
<a class="small-title" href="paper_pages/2D36otXvBE.html">Identifying and Clustering Counter Relationships of Team Compositions in PvP Games for Efficient Balance Analysis</a>
</div>
<div class="type_display_name_minus_type"></div>
<div class="author-str">Chiu-Chou Lin · Yu-Wei Shih · Kuei-Ting Kuo · Yu-Cheng Chen · Chien-Hua Chen · Wei-Chen Chiu · I-Chen Wu</div>
<div class="author-str higher"></div>
<div class="text-muted touchup-date-div" id="touchup-date-event-2D36otXvBE"></div>
<a href="paper_pages/2D36otXvBE.html">
<img src="http://img.youtube.com/vi/CmgoUtRqmI8/0.jpg" class="social-img-thumb rounded" alt="thumbnail"/>
</a>