Abstract
While Vision Transformer has facilitated remarkable advancements in computer vision, it concurrently requires vast amounts of training data and iterations.
- To overcome these constraints, transfer learning utilizes knowledge from networks pretrained with substantial resources.
- However, privacy attacks such as Model Inversion may pose a threat by potentially exposing training data through shared network weights, which is particularly critical in the medical field where data privacy must be taken into account.
- Departing from the conventional weight transfer scheme sharing entire weight parameters under risk, we introduce the innovative and straightforward transfer strategy, called \textit{Prompt Distillation}.
- Prompt distillation compresses knowledge of the pretrained network into prompt embeddings and shares these embeddings instead of network weights.
- This process is conducted by effectively leveraging the attention mechanism to reveal relationships in knowledge.
- In our experiments, prompt distillation outperformed training from scratch and achieved close performance to full weights transfer learning, while reducing the parameter scale up to 90 times lighter than full weights.
- Moreover, it demonstrates the ability to transfer knowledge between already-trained networks without additional modules or complex procedures.
- By merely inserting prompts and training a few additional iterations, it showed improved results compared to inference with the already-trained network alone.
- Its applications were validated through medical image classification tasks across three domains, chest X-ray, pathology, and retinography, distinct in degrees of the distribution shift.
+ Transfer learning utilizes knowledge from pre-trained networks to overcome these, but it is difficult to share entire network weights in the medical field where data privacy must be taken into account.
+ We introduce the innovative transfer strategy, called \textit{Prompt Distillation}, which shares prompts instead of network weights.
+ Prompt distillation compresses knowledge of the pre-trained network into prompts by effectively leveraging the attention mechanism.
+ In experiments, it outperformed training from scratch and achieved close performance to full weights transfer learning, while reducing the parameter scale up to 90 times lighter than full weights.
+ Moreover, it demonstrates the ability to transfer knowledge between already-trained networks by merely inserting prompts.
+ Its applications were validated through medical image classification across three domains, chest X-ray, pathology, and retinography, distinct in degrees of the distribution shift.
@@ -983,13 +980,12 @@ Pipeline
- The pipeline of Prompt Distillation based transfer learning.
+ The pipeline of Prompt Distillation based transfer learning.
In the pretraining step, a network is trained with a large-scale dataset and acquires generalization.
- In the prompt distillation step, we inject prompts in the pretrained network's embedding space and compress knowledge from the pretrained networks into prompts.
+ In the prompt distillation step, we inject prompts in the pre-trained network's embedding space and compress knowledge from the pre-trained networks into prompts.
Compression is conducted by training the network in supervision for simplicity, and nested dropout and knowledge distillation techniques can be applied.
Prompt distillation is divided into two categories based on where prompts are projected.
- Prompt Learning projects prompts onto query, key, and value vectors, and Query Learning projects onto query vectors only.
- Prompt learning updates prompts to adapt the network toward a deep understanding of training data, while query learning updates prompts to summarize.
+ Prompt distillation projects prompts onto query, key, and value vectors, and updates prompts to adapt the network toward a deep understanding of training data.
Learned prompts are shared instead of pretrained network weights.
In a transfer learning step, targets with a smaller, task-specific dataset attach learned prompts to their networks and leverage generalized knowledge in prompts to optimize particular objectives.
@@ -1000,10 +996,8 @@ Knowledge Compression Strategies
- An illustration of the knowledge compression strategies for distilling prompts.
- (a) Visual Prompt Tuning injects prompts between a class embeddings and patch embeddings and learns relationship through supervised learning.
- (b) Ordered Representation learns different degrees of importance across dimensions by stochastically masks nested subsets of hidden units.
- (c) Knowledge Distillation compresses knowledge in the cumbersome network (Teacher) to the same network injected prompts (Student).
+ An illustration of the knowledge compression for distilling prompts.
+ Visual Prompt Tuning injects prompts between a class embeddings and patch embeddings and learns relationship through supervised learning.
@@ -1027,30 +1021,21 @@ Transfer Learning via Prompt Distillati
-
- Knowledge Compression
-
-

-
- Comparing distinct knowledge compression strategies.
-
-
-
Knowledge Enhancement
-

+
Analyze the enhancement ability to already-trained networks.
- Verifying Prompt Distillation
+ Knowledge Compression
-

+
- Validation of prompt distillation against other parameter transfer strategies.
+ Comparing distinct knowledge compression strategies.
diff --git a/promptdistill2024/static/images_promptdistill/mainfigure.png b/promptdistill2024/static/images_promptdistill/mainfigure.png
index 484df35..3cfb2d3 100644
Binary files a/promptdistill2024/static/images_promptdistill/mainfigure.png and b/promptdistill2024/static/images_promptdistill/mainfigure.png differ
diff --git a/promptdistill2024/static/images_promptdistill/pipeline.png b/promptdistill2024/static/images_promptdistill/pipeline.png
new file mode 100644
index 0000000..1a38f13
Binary files /dev/null and b/promptdistill2024/static/images_promptdistill/pipeline.png differ
diff --git a/promptdistill2024/static/images_promptdistill/result.jpg b/promptdistill2024/static/images_promptdistill/result.jpg
new file mode 100644
index 0000000..f6b5b99
Binary files /dev/null and b/promptdistill2024/static/images_promptdistill/result.jpg differ
diff --git a/promptdistill2024/static/images_promptdistill/tab1.png b/promptdistill2024/static/images_promptdistill/tab1.png
index c2da713..611f9aa 100644
Binary files a/promptdistill2024/static/images_promptdistill/tab1.png and b/promptdistill2024/static/images_promptdistill/tab1.png differ
diff --git a/promptdistill2024/static/images_promptdistill/tab2.png b/promptdistill2024/static/images_promptdistill/tab2.png
new file mode 100644
index 0000000..4fe6a9e
Binary files /dev/null and b/promptdistill2024/static/images_promptdistill/tab2.png differ
diff --git a/promptdistill2024/static/images_promptdistill/tab2_a.png b/promptdistill2024/static/images_promptdistill/tab2_a.png
deleted file mode 100644
index 24a4fc3..0000000
Binary files a/promptdistill2024/static/images_promptdistill/tab2_a.png and /dev/null differ
diff --git a/promptdistill2024/static/images_promptdistill/tab2_b.png b/promptdistill2024/static/images_promptdistill/tab2_b.png
deleted file mode 100644
index 3984b27..0000000
Binary files a/promptdistill2024/static/images_promptdistill/tab2_b.png and /dev/null differ
diff --git a/promptdistill2024/static/images_promptdistill/tab3.png b/promptdistill2024/static/images_promptdistill/tab3.png
index b5fa8f9..dde2453 100644
Binary files a/promptdistill2024/static/images_promptdistill/tab3.png and b/promptdistill2024/static/images_promptdistill/tab3.png differ