-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update CPT documentation #2229
base: main
Are you sure you want to change the base?
Update CPT documentation #2229
Changes from all commits
92b9e1a
e54d380
023f071
54cddaf
2dfe70f
ba4b115
f8c8317
bd2fc70
b01b214
6ed1723
77bb0b9
dbcdedf
f7138d4
0a5fb20
7206db5
24b0af9
81ffa09
130ec76
70067d8
9397314
249713c
0a43473
144f042
97449da
dacb400
cc348a4
79959d1
7eea892
6d625c0
d120d13
9ae9939
2fada31
43260c7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,6 +9,8 @@ Unless required by applicable law or agreed to in writing, software distributed | |
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations under the License. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove? |
||
⚠️ Note that this file is in Markdown but contains specific syntax for our doc-builder (similar to MDX) that may not be | ||
rendered properly in your Markdown viewer. | ||
--> | ||
|
@@ -21,6 +23,9 @@ The abstract from the paper is: | |
|
||
*Traditional fine-tuning is effective but computationally intensive, as it requires updating billions of parameters. CPT, inspired by ICL, PT, and adversarial attacks, refines context embeddings in a parameter-efficient manner. By optimizing context tokens and applying a controlled gradient descent, CPT achieves superior accuracy across various few-shot classification tasks, showing significant improvement over existing methods such as LoRA, PT, and ICL.* | ||
|
||
Take a look at [Example](../../../examples/cpt_finetuning/README.md) for a step-by-step guide on how to train a model with CPT. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same argument about the link. |
||
|
||
|
||
## CPTConfig | ||
|
||
[[autodoc]] tuners.cpt.config.CPTConfig | ||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,68 @@ | ||||||
|
||||||
# Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods | ||||||
## Introduction ([Paper](https://arxiv.org/abs/2410.17222), [Code](https://github.com/tsachiblau/Context-aware-Prompt-Tuning-Advancing-In-Context-Learning-with-Adversarial-Methods), [Notebook](cpt_train_and_inference.ipynb), [Colab](https://colab.research.google.com/drive/1UhQDVhZ9bDlSk1551SuJV8tIUmlIayta?usp=sharing)) | ||||||
Large Language Models (LLMs) can perform few-shot learning using either optimization-based approaches or In-Context Learning (ICL). Optimization-based methods often suffer from overfitting, as they require updating a large number of parameters with limited data. In contrast, ICL avoids overfitting but typically underperforms compared to optimization-based methods and is highly sensitive to the selection, order, and format of demonstration examples. | ||||||
|
||||||
To overcome these challenges, we introduce Context-aware Prompt Tuning (CPT), a method inspired by ICL, Prompt Tuning (PT), and adversarial attacks. | ||||||
CPT builds on the ICL strategy of concatenating examples before the input, extending it by incorporating PT-like learning to refine the context embedding through iterative optimization, extracting deeper insights from the training examples. Our approach carefully modifies specific context tokens, considering the unique structure of the examples within the context. | ||||||
|
||||||
In addition to updating the context with PT-like optimization, CPT draws inspiration from adversarial attacks, adjusting the input based on the labels present in the context while preserving the inherent value of the user-provided data. | ||||||
To ensure robustness and stability during optimization, we employ a projected gradient descent algorithm, constraining token embeddings to remain close to their original values and safeguarding the quality of the context. | ||||||
Our method has demonstrated superior accuracy across multiple classification tasks using various LLM models, outperforming existing baselines and effectively addressing the overfitting challenge in few-shot learning. | ||||||
Comment on lines
+6
to
+11
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In this section, you use a lot of "we" and "our". Let's try to word it in a more neutral way, as for the reader it could appear like "we" refers to the PEFT maintainers :) So use "The approach" instead of "Our approach" etc. |
||||||
|
||||||
|
||||||
<div class="flex justify-center"> | ||||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/cpt.png"/> | ||||||
</div> | ||||||
<small>CPT optimizing only specific token embeddings while keeping the rest of the model frozen <a href="https://huggingface.co/papers/2410.17222">(image source)</a>.</small> | ||||||
|
||||||
--- | ||||||
|
||||||
## Dataset Creation and Collation for CPT | ||||||
|
||||||
This document explains how to prepare datasets for **Context-Aware Prompt Tuning (CPT)** and align these processes with the CPT paper. | ||||||
|
||||||
--- | ||||||
|
||||||
### Template-Based Tokenization | ||||||
|
||||||
#### Purpose | ||||||
Templates define the structure of the input-output pairs, enabling the model to interpret the task within a unified context. | ||||||
|
||||||
- **Input Templates**: | ||||||
Templates such as `"input: {sentence}"` format the raw input sentences. The `{sentence}` placeholder is replaced with the actual input text. | ||||||
|
||||||
- **Output Templates**: | ||||||
Similarly, templates like `"output: {label}"` format the labels (`positive`, `negative`, etc.). | ||||||
|
||||||
- **Separator Tokens**: | ||||||
Separators are used to distinguish between different parts of the input (e.g., input text and labels) and between examples. | ||||||
|
||||||
#### Paper Reference | ||||||
- Refer to **Section 3.1** of the paper, where template-based tokenization is described as a critical step in structuring inputs for CPT. | ||||||
|
||||||
#### How it Helps | ||||||
Templates provide context-aware structure, ensuring the model does not overfit by utilizing structured input-output formats. Using cpt_tokens_type_mask, we gain fine-grained information about the roles of different tokens in the input-output structure. This enables the model to: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
1. Refrain from Updating Label Tokens: Prevent overfitting to label tokens by excluding their gradients during training. | ||||||
2. Apply Different Projection Norms: Use type-specific projections for different parts of the input during Projected Gradient Descent (PGD), enhancing robustness and generalization. | ||||||
|
||||||
|
||||||
#### Paper Reference | ||||||
|
||||||
These steps are directly informed by the principles outlined in the CPT paper, particularly in Sections **3.1**, **3.2**, and **3.3**. | ||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
Comment on lines
+55
to
+58
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove. |
||||||
## Citation | ||||||
```bib | ||||||
@article{ | ||||||
blau2025cpt, | ||||||
title={Context-Aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods}, | ||||||
author={Tsachi Blau, Moshe Kimhi, Yonatan Belinkov, Alexander Bronstein, Chaim Baskin}, | ||||||
journal={arXiv preprint arXiv:2410.17222}}, | ||||||
year={2025} | ||||||
} | ||||||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not sure if this link is going to work from the built docs. It's better if you link directly to the README, i.e.
https://github.com/huggingface/peft/blob/main/examples/cpt_finetuning/README.md
(of course, the link won't point anywhere right now, but after merging it will be valid).