Add v239 pages

mlresearch · Apr 24, 2024 · b9bc65e · b9bc65e
commit b9bc65e
Show file tree

Hide file tree

Showing 25 changed files with 802 additions and 0 deletions.
diff --git a/Gemfile b/Gemfile
@@ -0,0 +1,15 @@
+source "https://rubygems.org"
+
+git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }
+
+gem 'jekyll'
+
+group :jekyll_plugins do
+  gem 'github-pages'
+  gem 'jekyll-remote-theme'
+  gem 'jekyll-include-cache'
+  gem 'webrick'
+end
+
+# gem "rails"
+
diff --git a/README.md b/README.md
@@ -0,0 +1,33 @@
+# PMLR 239
+
+To suggest fixes to this volume please make a pull request containing the changes requested and a justification for the changes.
+
+To edit the details of this conference work edit the [_config.yml](./_config.yml) file and submit a pull request.
+
+To make changes to the individual paper details, edit the associated paper file in the [./_posts](./_posts) subdirectory.
+
+For details of how to publish in PMLR please check https://proceedings.mlr.press/faq.html
+
+For details of what is required to submit a proceedings please check https://proceedings.mlr.press/spec.html
+
+
+
+Published as Volume 239 by the Proceedings of Machine Learning Research on 24 April 2023.
+
+Volume Edited by:
+  * Javier Antorán
+  * Arno Blaas
+  * Kelly Buchanan
+  * Fan Feng
+  * Vincent Fortuin
+  * Sahra Ghalebikesabi
+  * Andreas Kriegler
+  * Ian Mason
+  * David Rohde
+  * Francisco J. R. Ruiz
+  * Uelwer Tobias
+  * Yubin Xie
+  * Rui Yang
+
+Series Editors:
+  * Neil D. Lawrence
diff --git a/_config.yml b/_config.yml
@@ -0,0 +1,116 @@
+---
+booktitle: 'Proceedings on "I Can''t Believe It''s Not Better: Failure  Modes in the
+  Age of Foundation Models" at NeurIPS 2022 Workshops'
+year: '2023'
+shortname: ICBINB 23
+volume: '239'
+start: 2023-12-16
+end: 2023-12-16
+published: 2023-04-24
+conference_number: '4'
+layout: proceedings
+series: Proceedings of Machine Learning Research
+publisher: PMLR
+issn: 2640-3498
+id: icbinb-2023
+month: 0
+cycles: false
+bibtex_editor: Antor\'an, Javier and Blaas, Arno and Buchanan, Kelly and Feng, Fan
+  and Fortuin, Vincent and Ghalebikesabi, Sahra and Kriegler, Andreas and Mason, Ian
+  and Rohde, David and Ruiz, Francisco J. R. and Tobias, Uelwer and Xie, Yubin and
+  Yang, Rui
+editor:
+- given: Javier
+  family: Antorán
+- given: Arno
+  family: Blaas
+- given: Kelly
+  family: Buchanan
+- given: Fan
+  family: Feng
+- given: Vincent
+  family: Fortuin
+- given: Sahra
+  family: Ghalebikesabi
+- given: Andreas
+  family: Kriegler
+- given: Ian
+  family: Mason
+- given: David
+  family: Rohde
+- given: Francisco J. R.
+  family: Ruiz
+- given: Uelwer
+  family: Tobias
+- given: Yubin
+  family: Xie
+- given: Rui
+  family: Yang
+title: Proceedings of Machine Learning Research
+description: |
+  Proceedings on "I Can't Believe It's Not Better: Failure  Modes in the Age of Foundation Models" at NeurIPS 2022 Workshops
+    Held in New Orleans, Louisiana, USA on 16 December 2023
+
+  Published as Volume 239 by the Proceedings of Machine Learning Research on 24 April 2023.
+
+  Volume Edited by:
+    Javier Antorán
+    Arno Blaas
+    Kelly Buchanan
+    Fan Feng
+    Vincent Fortuin
+    Sahra Ghalebikesabi
+    Andreas Kriegler
+    Ian Mason
+    David Rohde
+    Francisco J. R. Ruiz
+    Uelwer Tobias
+    Yubin Xie
+    Rui Yang
+
+  Series Editors:
+    Neil D. Lawrence
+date_str: 16 Dec
+url: https://proceedings.mlr.press
+author:
+  name: PMLR
+baseurl: "/v239"
+twitter_username: MLResearchPress
+github_username: mlresearch
+markdown: kramdown
+exclude:
+- README.md
+- Gemfile
+- ".gitignore"
+plugins:
+- jekyll-feed
+- jekyll-seo-tag
+- jekyll-remote-theme
+remote_theme: mlresearch/jekyll-theme
+style: pmlr
+permalink: "/:title.html"
+ghub:
+  edit: true
+  repository: v239
+display:
+  copy_button:
+    bibtex: true
+    endnote: true
+    apa: true
+  comments: false
+volume_type: Volume
+volume_dir: v239
+email: ''
+conference:
+  name: 'Proceedings on "I Can''t Believe It''s Not Better: Failure  Modes in the
+    Age of Foundation Models" at NeurIPS 2022 Workshops'
+  url: https://sites.google.com/view/icbinb-2023/
+  location: New Orleans, Louisiana, USA
+  dates:
+  - 2023-12-16
+analytics:
+  google:
+    tracking_id: UA-92432422-1
+orig_bibfile: "/Users/neil/mlresearch/v239/icbinb23.bib"
+# Site settings
+# Original source:  /Users/neil/mlresearch/v239/icbinb23.bib
diff --git a/_posts/2023-04-24-alazraki23a.md b/_posts/2023-04-24-alazraki23a.md
@@ -0,0 +1,54 @@
+---
+title: How (not) to ensemble LVLMs for VQA
+abstract: 'This paper studies ensembling in the era of Large Vision-Language Models
+  (LVLMs). Ensembling is a classical method to combine different models to get increased
+  performance. In the recent work on Encyclopedic-VQA the authors examine a wide variety
+  of models to solve their task: from vanilla LVLMs, to mod- els including the caption
+  as extra context, to models augmented with Lens-based retrieval of Wikipedia pages.
+  Intuitively these models are highly complementary, which should make them ideal
+  for ensembling. Indeed, an oracle experiment (Fig. 1) shows potential gains from
+  48.8% accuracy (the best single model) all the way up to 67% (best possible ensemble).
+  So it is a trivial exercise to create an ensemble with substantial real gains. Or
+  is it?'
+layout: inproceedings
+series: Proceedings of Machine Learning Research
+publisher: PMLR
+issn: 2640-3498
+id: alazraki23a
+month: 0
+tex_title: How (not) to ensemble LVLMs for VQA
+firstpage: 1
+lastpage: 20
+page: 1-20
+order: 1
+cycles: false
+bibtex_author: Alazraki, Lisa and Castrejon, Lluis and Dehghani, Mostafa and Huot,
+  Fantine and Uijlings, Jasper and Mensink, Thomas
+author:
+- given: Lisa
+  family: Alazraki
+- given: Lluis
+  family: Castrejon
+- given: Mostafa
+  family: Dehghani
+- given: Fantine
+  family: Huot
+- given: Jasper
+  family: Uijlings
+- given: Thomas
+  family: Mensink
+date: 2023-04-24
+address:
+container-title: 'Proceedings on "I Can''t Believe It''s Not Better: Failure  Modes
+  in the Age of Foundation Models" at NeurIPS 2022 Workshops'
+volume: '239'
+genre: inproceedings
+issued:
+  date-parts:
+  - 2023
+  - 4
+  - 24
+pdf: https://proceedings.mlr.press/v239/alazraki23a/alazraki23a.pdf
+extras: []
+# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/
+---
diff --git a/_posts/2023-04-24-hsu23a.md b/_posts/2023-04-24-hsu23a.md
@@ -0,0 +1,49 @@
+---
+title: Can Visual Scratchpads With Diagrammatic Abstractions Augment LLM Reasoning?
+abstract: When humans reason about complex text-based questions, we leverage diagrammatic
+  abstractions drawn on a visual scratchpad. In this paper, we introduce and explore
+  the capabilities of Visual-Scratchpad, a method that augments a large language foundation
+  model (LLM) with diagrammatic execution and readout. We enable the LLM to generate
+  drawing commands and to readout abstractions from the resulting picture. The visual
+  readout operation uses a visual foundation model, optionally finetuned with expert
+  iteration. Here, we show that although Visual-Scratchpad outperforms an inference-only
+  LLM, it surprisingly yields worse performance compared to a single finetuned LLM.
+  Through experiments, we propose that this gap is due to the failure mode of vision
+  foundation models in understanding abstractions in diagrams.
+layout: inproceedings
+series: Proceedings of Machine Learning Research
+publisher: PMLR
+issn: 2640-3498
+id: hsu23a
+month: 0
+tex_title: Can Visual Scratchpads With Diagrammatic Abstractions Augment LLM Reasoning?
+firstpage: 21
+lastpage: 28
+page: 21-28
+order: 21
+cycles: false
+bibtex_author: Hsu, Joy and Poesia, Gabriel and Wu, Jiajun and Goodman, Noah
+author:
+- given: Joy
+  family: Hsu
+- given: Gabriel
+  family: Poesia
+- given: Jiajun
+  family: Wu
+- given: Noah
+  family: Goodman
+date: 2023-04-24
+address:
+container-title: 'Proceedings on "I Can''t Believe It''s Not Better: Failure  Modes
+  in the Age of Foundation Models" at NeurIPS 2022 Workshops'
+volume: '239'
+genre: inproceedings
+issued:
+  date-parts:
+  - 2023
+  - 4
+  - 24
+pdf: https://proceedings.mlr.press/v239/hsu23a/hsu23a.pdf
+extras: []
+# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/
+---
diff --git a/_posts/2023-04-24-lazovich23a.md b/_posts/2023-04-24-lazovich23a.md
@@ -0,0 +1,54 @@
+---
+title: Filter bubbles and affective polarization in user-personalized large language
+  model outputs
+abstract: Echoing the history of search engines and social media content rankings,
+  the advent of large language models (LLMs) has led to a push for increased personalization
+  of model outputs to individual users. In the past, personalized recommendations
+  and ranking systems have been linked to the development of filter bubbles (serving
+  content that may confirm a user’s existing biases) and affective polarization (strong
+  negative sentiment towards those with differing views). In this work, we explore
+  how prompting a leading large language model, ChatGPT-3.5, with a user’s political
+  affiliation prior to asking factual questions about public figures and organizations
+  leads to differing results. We observe that left-leaning users tend to receive more
+  positive statements about left-leaning political figures and media outlets, while
+  right-leaning users see more positive statements about right-leaning entities. This
+  pattern holds across presidential candidates, members of the U.S. Senate, and media
+  organizations with ratings from AllSides. When qualitatively evaluating some of
+  these outputs, there is evidence that particular facts are included or excluded
+  based on the user’s political affiliation. These results illustrate that personalizing
+  LLMs based on user demographics carry the same risks of affective polarization and
+  filter bubbles that have been seen in other personalized internet technologies.
+  This “failure mode" should be monitored closely as there are more attempts to monetize
+  and personalize these models.
+layout: inproceedings
+series: Proceedings of Machine Learning Research
+publisher: PMLR
+issn: 2640-3498
+id: lazovich23a
+month: 0
+tex_title: Filter bubbles and affective polarization in user-personalized large language
+  model outputs
+firstpage: 29
+lastpage: 37
+page: 29-37
+order: 29
+cycles: false
+bibtex_author: Lazovich, Tomo
+author:
+- given: Tomo
+  family: Lazovich
+date: 2023-04-24
+address:
+container-title: 'Proceedings on "I Can''t Believe It''s Not Better: Failure  Modes
+  in the Age of Foundation Models" at NeurIPS 2022 Workshops'
+volume: '239'
+genre: inproceedings
+issued:
+  date-parts:
+  - 2023
+  - 4
+  - 24
+pdf: https://proceedings.mlr.press/v239/lazovich23a/lazovich23a.pdf
+extras: []
+# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/
+---
diff --git a/_posts/2023-04-24-mohta23a.md b/_posts/2023-04-24-mohta23a.md
@@ -0,0 +1,56 @@
+---
+title: Are large language models good annotators?
+abstract: Numerous Natural Language Processing (NLP) tasks require precisely labeled
+  data to ensure effective model training and achieve optimal performance. However,
+  data annotation is marked by substantial costs and time requirements, especially
+  when requiring specialized domain expertise or annotating a large number of samples.
+  In this study, we investigate the feasibility of employing large language models
+  (LLMs) as replacements for human annotators. We assess the zero-shot performance
+  of various LLMs of different sizes to determine their viability as substitutes.
+  Furthermore, recognizing that human annotators have access to diverse modalities,
+  we introduce an image-based modality using the BLIP-2 architecture to evaluate LLM
+  annotation performance. Among the tested LLMs, Vicuna-13b demonstrates competitive
+  performance across diverse tasks. To assess the potential for LLMs to replace human
+  annotators, we train a supervised model using labels generated by LLMs and compare
+  its performance with models trained using human-generated labels. However, our findings
+  reveal that models trained with human labels consistently outperform those trained
+  with LLM-generated labels. We also highlights the challenges faced by LLMs in multilingual
+  settings, where their performance significantly diminishes for tasks in languages
+  other than English.
+layout: inproceedings
+series: Proceedings of Machine Learning Research
+publisher: PMLR
+issn: 2640-3498
+id: mohta23a
+month: 0
+tex_title: Are large language models good annotators?
+firstpage: 38
+lastpage: 48
+page: 38-48
+order: 38
+cycles: false
+bibtex_author: Mohta, Jay and Ak, Kenan and Xu, Yan and Shen, Mingwei
+author:
+- given: Jay
+  family: Mohta
+- given: Kenan
+  family: Ak
+- given: Yan
+  family: Xu
+- given: Mingwei
+  family: Shen
+date: 2023-04-24
+address:
+container-title: 'Proceedings on "I Can''t Believe It''s Not Better: Failure  Modes
+  in the Age of Foundation Models" at NeurIPS 2022 Workshops'
+volume: '239'
+genre: inproceedings
+issued:
+  date-parts:
+  - 2023
+  - 4
+  - 24
+pdf: https://proceedings.mlr.press/v239/mohta23a/mohta23a.pdf
+extras: []
+# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/
+---