-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit a97ddfa
Showing
51 changed files
with
3,242 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
source "https://rubygems.org" | ||
|
||
git_source(:github) {|repo_name| "https://github.com/#{repo_name}" } | ||
|
||
gem 'jekyll' | ||
|
||
group :jekyll_plugins do | ||
gem 'github-pages' | ||
gem 'jekyll-remote-theme' | ||
gem 'jekyll-include-cache' | ||
gem 'webrick' | ||
end | ||
|
||
# gem "rails" | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# PMLR 262 | ||
|
||
To suggest fixes to this volume please make a pull request containing the changes requested and a justification for the changes. | ||
|
||
To edit the details of this conference work edit the [_config.yml](./_config.yml) file and submit a pull request. | ||
|
||
To make changes to the individual paper details, edit the associated paper file in the [./_posts](./_posts) subdirectory. | ||
|
||
For details of how to publish in PMLR please check https://proceedings.mlr.press/faq.html | ||
|
||
For details of what is required to submit a proceedings please check https://proceedings.mlr.press/spec.html | ||
|
||
|
||
|
||
Published as Volume 262 by the Proceedings of Machine Learning Research on 10 December 2024. | ||
|
||
Volume Edited by: | ||
* Mehdi Rezagholizadeh | ||
* Peyman Passban | ||
* Soheila Samiee | ||
* Vahid Partovi Nia | ||
* Yu Cheng | ||
* Yue Deng | ||
* Qun Liu | ||
* Boxing Chen | ||
|
||
Series Editors: | ||
* Neil D. Lawrence |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
--- | ||
booktitle: Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing | ||
Workshop | ||
shortname: ENLSP-IV 2024 | ||
sections: | ||
- name: Training | ||
title: Training | ||
- name: Model Design \& Architecture | ||
title: Model Design \& Architecture | ||
- name: Model Efficiency \& Compression | ||
title: Model Efficiency \& Compression | ||
- name: Inference | ||
title: Inference | ||
- name: " Benchmark \\& Evaluation" | ||
title: " Benchmark \\& Evaluation" | ||
- name: 'Applications ' | ||
title: 'Applications ' | ||
volume: '262' | ||
year: '2024' | ||
start: &1 2024-12-14 | ||
end: 2024-12-14 | ||
published: 2024-12-10 | ||
layout: proceedings | ||
series: Proceedings of Machine Learning Research | ||
publisher: PMLR | ||
issn: 2640-3498 | ||
id: ENLSP-2024 | ||
month: 0 | ||
cycles: false | ||
bibtex_editor: Rezagholizadeh, Mehdi and Passban, Peyman and Samiee, Soheila and Partovi | ||
Nia, Vahid and Cheng, Yu and Deng, Yue and Liu, Qun and Chen, Boxing | ||
editor: | ||
- given: Mehdi | ||
family: Rezagholizadeh | ||
- given: Peyman | ||
family: Passban | ||
- given: Soheila | ||
family: Samiee | ||
- given: Vahid | ||
family: Partovi Nia | ||
- given: Yu | ||
family: Cheng | ||
- given: Yue | ||
family: Deng | ||
- given: Qun | ||
family: Liu | ||
- given: Boxing | ||
family: Chen | ||
title: Proceedings of Machine Learning Research | ||
description: | | ||
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop | ||
Held in Vancouver, British Columbia, Canada on 14 December 2024 | ||
Published as Volume 262 by the Proceedings of Machine Learning Research on 10 December 2024. | ||
Volume Edited by: | ||
Mehdi Rezagholizadeh | ||
Peyman Passban | ||
Soheila Samiee | ||
Vahid Partovi Nia | ||
Yu Cheng | ||
Yue Deng | ||
Qun Liu | ||
Boxing Chen | ||
Series Editors: | ||
Neil D. Lawrence | ||
date_str: 14 Dec | ||
url: https://proceedings.mlr.press | ||
author: | ||
name: PMLR | ||
baseurl: "/v262" | ||
twitter_username: MLResearchPress | ||
github_username: mlresearch | ||
markdown: kramdown | ||
exclude: | ||
- README.md | ||
- Gemfile | ||
- ".gitignore" | ||
plugins: | ||
- jekyll-feed | ||
- jekyll-seo-tag | ||
- jekyll-remote-theme | ||
remote_theme: mlresearch/jekyll-theme | ||
style: pmlr | ||
permalink: "/:title.html" | ||
ghub: | ||
edit: true | ||
repository: v262 | ||
display: | ||
copy_button: | ||
bibtex: true | ||
endnote: true | ||
apa: true | ||
comments: false | ||
volume_type: Volume | ||
volume_dir: v262 | ||
email: '' | ||
conference: | ||
name: NeurIPS Efficient Natural Language and Speech Processing Workshop | ||
url: https://neurips2024-enlsp.github.io/ | ||
location: Vancouver, British Columbia, Canada | ||
dates: | ||
- *1 | ||
analytics: | ||
google: | ||
tracking_id: UA-92432422-1 | ||
orig_bibfile: "/Users/neil/mlresearch/v262/enlsp24.bib" | ||
# Site settings | ||
# Original source: /Users/neil/mlresearch/v262/enlsp24.bib |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
--- | ||
title: 'AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models | ||
via an Entropy-based Lower Bound on Token Acceptance Probability' | ||
section: Inference | ||
abstract: 'Speculative decoding is a powerful technique that attempts to circumvent | ||
the autoregressive constraint of modern Large Language Models (LLMs). The aim of | ||
speculative decoding techniques is to improve the average inference time of a large, | ||
target model without sacrificing its accuracy, by using a more efficient draft model | ||
to propose draft tokens which are then verified in parallel. The number of draft | ||
tokens produced in each drafting round is referred to as the draft length and is | ||
often a static hyperparameter chosen based on the acceptance rate statistics of | ||
the draft tokens. However, setting a static draft length can negatively impact performance, | ||
especially in scenarios where drafting is expensive and there is a high variance | ||
in the number of tokens accepted. Adaptive Entropy-based Draft Length (AdaEDL) is | ||
a simple, training and parameter-free criteria which allows for early stopping of | ||
the token drafting process by approximating a lower bound on the expected acceptance | ||
probability of the drafted token based on the currently observed entropy of the | ||
drafted logits. We show that AdaEDL consistently outperforms static draft-length | ||
speculative decoding by 10%-57% as well as other training-free draft-stopping techniques | ||
by upto 10% in a variety of settings and datasets. At the same time, we show that | ||
AdaEDL is more robust than these techniques and preserves performance in high-sampling-temperature | ||
scenarios. Since it is training-free, in contrast to techniques that rely on the | ||
training of dataset-specific draft-stopping predictors, AdaEDL can seamlessly be | ||
integrated into a variety of pre-existing LLM systems. ' | ||
layout: inproceedings | ||
series: Proceedings of Machine Learning Research | ||
publisher: PMLR | ||
issn: 2640-3498 | ||
id: agrawal24a | ||
month: 0 | ||
tex_title: "{AdaEDL}: Early Draft Stopping for Speculative Decoding of Large Language | ||
Models via an Entropy-based Lower Bound on Token Acceptance Probability" | ||
firstpage: 355 | ||
lastpage: 369 | ||
page: 355-369 | ||
order: 355 | ||
cycles: false | ||
bibtex_author: Agrawal, Sudhanshu and Jeon, Wonseok and Lee, Mingu | ||
author: | ||
- given: Sudhanshu | ||
family: Agrawal | ||
- given: Wonseok | ||
family: Jeon | ||
- given: Mingu | ||
family: Lee | ||
date: 2024-12-10 | ||
address: | ||
container-title: Proceedings of The 4th NeurIPS Efficient Natural Language and Speech | ||
Processing Workshop | ||
volume: '262' | ||
genre: inproceedings | ||
issued: | ||
date-parts: | ||
- 2024 | ||
- 12 | ||
- 10 | ||
pdf: https://raw.githubusercontent.com/mlresearch/v262/main/assets/agrawal24a/agrawal24a.pdf | ||
extras: [] | ||
# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ | ||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
--- | ||
title: 'SuperPos-Prompt: Enhancing Soft Prompt Tuning of Language Models with Superposition | ||
of Multi Token Embeddings' | ||
section: Training | ||
abstract: 'Soft prompt tuning techniques have recently gained traction as an effective | ||
strategy for the parameter-efficient tuning of pre-trained language models, particularly | ||
minimizing the required adjustment of model parameters. Despite their growing use, | ||
achieving optimal tuning with soft prompts, especially with smaller datasets, remains | ||
a substantial challenge. This study makes two contributions in this domain: (i) | ||
we introduce SuperPos-Prompt, a new reparameterization technique employing the superposition | ||
of multiple pre-trained vocabulary embeddings to improve the learning of soft prompts. | ||
Our experiments across several GLUE and SuperGLUE benchmarks consistently highlight | ||
SuperPos-Prompt’s superiority over Residual Prompt tuning, exhibiting an average | ||
score increase of +6.4 in T5-Small and +5.0 in T5-Base along with a faster convergence. | ||
Remarkably, SuperPos-Prompt occasionally outperforms even full fine-tuning methods. | ||
(ii) Additionally, we demonstrate enhanced performance and rapid convergence by | ||
omitting dropouts from the frozen network, yielding consistent improvements across | ||
various scenarios and tuning methods.' | ||
layout: inproceedings | ||
series: Proceedings of Machine Learning Research | ||
publisher: PMLR | ||
issn: 2640-3498 | ||
id: ali-sadraei-javaheri24a | ||
month: 0 | ||
tex_title: "{SuperPos-Prompt}: Enhancing Soft Prompt Tuning of Language Models with | ||
Superposition of Multi Token Embeddings" | ||
firstpage: 34 | ||
lastpage: 46 | ||
page: 34-46 | ||
order: 34 | ||
cycles: false | ||
bibtex_author: Ali Sadraei Javaheri, Mohammad and Asgari, Ehsaneddin and C. McHardy, | ||
Alice and R. Rabiee, Hamid | ||
author: | ||
- given: Mohammad | ||
family: Ali Sadraei Javaheri | ||
- given: Ehsaneddin | ||
family: Asgari | ||
- given: Alice | ||
family: C. McHardy | ||
- given: Hamid | ||
family: R. Rabiee | ||
date: 2024-12-10 | ||
address: | ||
container-title: Proceedings of The 4th NeurIPS Efficient Natural Language and Speech | ||
Processing Workshop | ||
volume: '262' | ||
genre: inproceedings | ||
issued: | ||
date-parts: | ||
- 2024 | ||
- 12 | ||
- 10 | ||
pdf: https://raw.githubusercontent.com/mlresearch/v262/main/assets/ali-sadraei-javaheri24a/ali-sadraei-javaheri24a.pdf | ||
extras: [] | ||
# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ | ||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
--- | ||
title: 'Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models' | ||
section: Inference | ||
abstract: 'Large Language Models (LLMs) typically generate outputs token by token | ||
using a fixed compute budget, leading to inefficient resource utilization. To address | ||
this shortcoming, recent advancements in mixture of expert (MoE) models, speculative | ||
decoding, and early exit strategies leverage the insight that computational demands | ||
can vary significantly based on the complexity and nature of the input. However, | ||
identifying optimal routing patterns for dynamic execution remains an open challenge, | ||
limiting the full potential of these adaptive methods. To address this need, we | ||
study adaptive computation in LLMs more systematically. We propose a novel framework | ||
that integrates smaller auxiliary modules within each Feed-Forward Network layer | ||
of the LLM. This design enables dynamic routing of tokens based on task complexity: | ||
tokens can be processed by either the small or big modules at each layer, or even | ||
bypass certain layers entirely. This allows us to introduce a novel notion of a | ||
token’s difficulty, defined by its potential to benefit from additional computational | ||
resources. Importantly, by employing oracles to identify optimal patterns of adaptive | ||
computations, we gain valuable insights into the internal workings of LLMs and the | ||
routing processes in a simplified heterogeneous MoE setup. We show that trained | ||
routers operate differently from oracles and often yield suboptimal solutions. Notably, | ||
activating a large module in just one layer outperforms models that use large modules | ||
across all layers, underscoring the gap between practical implementations of routing | ||
in MoE models and theoretical optima for adaptive computation.' | ||
layout: inproceedings | ||
series: Proceedings of Machine Learning Research | ||
publisher: PMLR | ||
issn: 2640-3498 | ||
id: alizadeh-vahid24a | ||
month: 0 | ||
tex_title: "{Duo-LLM}: A Framework for Studying Adaptive Computation in Large Language | ||
Models" | ||
firstpage: 443 | ||
lastpage: 455 | ||
page: 443-455 | ||
order: 443 | ||
cycles: false | ||
bibtex_author: Alizadeh-Vahid, Keivan and Iman Mirzadeh, Seyed and Shahrkokhi, Hooman | ||
and Belenko, Dmitry and Sun, Frank and Cho, Minsik and Hossein Sekhavat, Mohammad | ||
and Nabi, Moin and Farajtabar, Mehrdad | ||
author: | ||
- given: Keivan | ||
family: Alizadeh-Vahid | ||
- given: Seyed | ||
family: Iman Mirzadeh | ||
- given: Hooman | ||
family: Shahrkokhi | ||
- given: Dmitry | ||
family: Belenko | ||
- given: Frank | ||
family: Sun | ||
- given: Minsik | ||
family: Cho | ||
- given: Mohammad | ||
family: Hossein Sekhavat | ||
- given: Moin | ||
family: Nabi | ||
- given: Mehrdad | ||
family: Farajtabar | ||
date: 2024-12-10 | ||
address: | ||
container-title: Proceedings of The 4th NeurIPS Efficient Natural Language and Speech | ||
Processing Workshop | ||
volume: '262' | ||
genre: inproceedings | ||
issued: | ||
date-parts: | ||
- 2024 | ||
- 12 | ||
- 10 | ||
pdf: https://raw.githubusercontent.com/mlresearch/v262/main/assets/alizadeh-vahid24a/alizadeh-vahid24a.pdf | ||
extras: [] | ||
# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ | ||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
--- | ||
title: Text Summarization With Graph Attention Networks | ||
section: Applications | ||
abstract: This study aimed to leverage graph information, particularly Rhetorical | ||
Structure Theory (RST) and Co-reference (Coref) graphs, to enhance the performance | ||
of our baseline summarization models. Specifically, we experimented with a Graph | ||
Attention Network architecture to incorporate graph information. However, this architecture | ||
did not enhance the performance. Subsequently, we used a simple Multi-layer Perceptron | ||
architecture, which improved the results in our proposed model on our primary dataset, | ||
CNN/DM. Additionally, we annotated XSum dataset with RST graph information, establishing | ||
a benchmark for future graph-based summarization models. This secondary dataset | ||
posed multiple challenges, revealing both the merits and limitations of our models. | ||
layout: inproceedings | ||
series: Proceedings of Machine Learning Research | ||
publisher: PMLR | ||
issn: 2640-3498 | ||
id: ardestani24a | ||
month: 0 | ||
tex_title: Text Summarization With Graph Attention Networks | ||
firstpage: 540 | ||
lastpage: 553 | ||
page: 540-553 | ||
order: 540 | ||
cycles: false | ||
bibtex_author: Ardestani, Mohammadreza and Chali, Yllias | ||
author: | ||
- given: Mohammadreza | ||
family: Ardestani | ||
- given: Yllias | ||
family: Chali | ||
date: 2024-12-10 | ||
address: | ||
container-title: Proceedings of The 4th NeurIPS Efficient Natural Language and Speech | ||
Processing Workshop | ||
volume: '262' | ||
genre: inproceedings | ||
issued: | ||
date-parts: | ||
- 2024 | ||
- 12 | ||
- 10 | ||
pdf: https://raw.githubusercontent.com/mlresearch/v262/main/assets/ardestani24a/ardestani24a.pdf | ||
extras: [] | ||
# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ | ||
--- |
Oops, something went wrong.