Confusion About Learnable Prompts in Prompt Learning #17

exoticism4869 · 2025-02-14T06:59:46Z

📌 Issue: Confusion About Learnable Prompts in Prompt Learning

Description:
While working with the PromptLearner class for text-based prompt learning, I encountered confusion regarding how the learnable prompts are inserted into the embeddings.

From the code, it appears that the learnable prompt (self.ctx) is inserted into the token embeddings by directly replacing the first 16 tokens. However, I expected that the learnable prompt would replace the initial descriptive tokens in the text, such as "A whole slide image of" in the example description "A whole slide image of lung adenocarcinoma...".

The issue is that this descriptive part, when tokenized, does not have a fixed length of 16 tokens, which raises the following questions:

📖 Code Reference:

class PromptLearner(nn.Module):
    def __init__(self, classnames, clip_model):
        ...
        self.ctx = nn.Parameter(ctx_vectors)  
        classnames = [name.replace("_", " ") for name in classnames]
        name_lens = [len(_tokenizer.encode(name)) for name in classnames]
        prompts = [name for name in classnames]
        print('prompts:', prompts)

        tokenized_prompts = torch.cat([clip.tokenize(p) for p in prompts])
        with torch.no_grad():
            embedding = clip_model.token_embedding(tokenized_prompts).type(dtype)

        self.register_buffer("token_prefix", embedding[:, :1, :])
        self.register_buffer("token_suffix", embedding[:, 1 + n_ctx :, :])

    def forward(self):
        if self.class_token_position == "end":
            prompts = torch.cat([
                prefix,  # 1 token for the start
                ctx,     # 16 learnable tokens
                suffix,  # remaining tokens from the class description
            ], dim=1)

🤔 My Confusion:

Why does the code directly insert the learnable prompts into the first 16 positions?

💡 Expected Behavior:

I expected the learnable prompts to replace the initial descriptive tokens like "A whole slide image of", but given that tokenizing this phrase doesn't yield exactly 16 tokens, the current implementation seems counterintuitive.

🙏 Additional Context:

If this behavior is intentional, I would appreciate any clarification on the underlying design choice for the fixed token length.

Thank you for your time and assistance!

The text was updated successfully, but these errors were encountered:

Jiangbo-Shi · 2025-02-15T01:49:11Z

Thank you for your interest in our work. The learnable prompts (ctx) do not replace the descriptive tokens like "A whole slide image of". They are instead inserted between the prefix and suffix, serving as additional learnable embeddings to enrich the prompt's representation. The fixed length of ctx ensures uniformity and avoids issues caused by variable tokenization.

exoticism4869 · 2025-02-15T02:46:36Z

Thanks for your quick response! I appreciate your explanation. However, I’ve noticed a potential issue:

I reviewed the CoOp code, where I found a clear explanation of the approach:

In CoOp, they explicitly prepend ‘X’*n_ctx to the class name. However, in your implementation, I didn’t see a similar step. Instead, part of the class name is directly replaced with learnable prompts, which seems counterintuitive to me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion About Learnable Prompts in Prompt Learning #17

Confusion About Learnable Prompts in Prompt Learning #17

exoticism4869 commented Feb 14, 2025

Jiangbo-Shi commented Feb 15, 2025

exoticism4869 commented Feb 15, 2025

Confusion About Learnable Prompts in Prompt Learning #17

Confusion About Learnable Prompts in Prompt Learning #17

Comments

exoticism4869 commented Feb 14, 2025

📌 Issue: Confusion About Learnable Prompts in Prompt Learning

📖 Code Reference:

🤔 My Confusion:

💡 Expected Behavior:

🙏 Additional Context:

Jiangbo-Shi commented Feb 15, 2025

exoticism4869 commented Feb 15, 2025