You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
📌 Issue: Confusion About Learnable Prompts in Prompt Learning
Description:
While working with the PromptLearner class for text-based prompt learning, I encountered confusion regarding how the learnable prompts are inserted into the embeddings.
From the code, it appears that the learnable prompt (self.ctx) is inserted into the token embeddings by directly replacing the first 16 tokens. However, I expected that the learnable prompt would replace the initial descriptive tokens in the text, such as "A whole slide image of" in the example description "A whole slide image of lung adenocarcinoma...".
The issue is that this descriptive part, when tokenized, does not have a fixed length of 16 tokens, which raises the following questions:
Why does the code directly insert the learnable prompts into the first 16 positions?
💡 Expected Behavior:
I expected the learnable prompts to replace the initial descriptive tokens like "A whole slide image of", but given that tokenizing this phrase doesn't yield exactly 16 tokens, the current implementation seems counterintuitive.
🙏 Additional Context:
If this behavior is intentional, I would appreciate any clarification on the underlying design choice for the fixed token length.
Thank you for your time and assistance!
The text was updated successfully, but these errors were encountered:
Thank you for your interest in our work. The learnable prompts (ctx) do not replace the descriptive tokens like "A whole slide image of". They are instead inserted between the prefix and suffix, serving as additional learnable embeddings to enrich the prompt's representation. The fixed length of ctx ensures uniformity and avoids issues caused by variable tokenization.
Thanks for your quick response! I appreciate your explanation. However, I’ve noticed a potential issue:
I reviewed the CoOp code, where I found a clear explanation of the approach:
In CoOp, they explicitly prepend ‘X’*n_ctx to the class name. However, in your implementation, I didn’t see a similar step. Instead, part of the class name is directly replaced with learnable prompts, which seems counterintuitive to me.
📌 Issue: Confusion About Learnable Prompts in Prompt Learning
Description:
While working with the
PromptLearner
class for text-based prompt learning, I encountered confusion regarding how the learnable prompts are inserted into the embeddings.From the code, it appears that the learnable prompt (
self.ctx
) is inserted into the token embeddings by directly replacing the first 16 tokens. However, I expected that the learnable prompt would replace the initial descriptive tokens in the text, such as "A whole slide image of" in the example description "A whole slide image of lung adenocarcinoma...".The issue is that this descriptive part, when tokenized, does not have a fixed length of 16 tokens, which raises the following questions:
📖 Code Reference:
🤔 My Confusion:
Why does the code directly insert the learnable prompts into the first 16 positions?
💡 Expected Behavior:
I expected the learnable prompts to replace the initial descriptive tokens like "A whole slide image of", but given that tokenizing this phrase doesn't yield exactly 16 tokens, the current implementation seems counterintuitive.
🙏 Additional Context:
If this behavior is intentional, I would appreciate any clarification on the underlying design choice for the fixed token length.
Thank you for your time and assistance!
The text was updated successfully, but these errors were encountered: