Use of causal models for generation #82

dipankarsrirag · 2024-06-26T08:20:42Z

This is an amazing work. I have been working on something that would require me to evaluate the generated outputs of models like Mistral, using a prompt like:
"Fill the [MASK] token in the sentence. Generate a single output."

Now earlier, I would simply instruction fine-tune a Mistral Model. But I would like to explore the possibility of using these models with a bi-directional attention.

I see that the library allows me to access the backbone model underneath. But it is not clear to me if this model has the bi-directional attention. Can you please clarify this? If it does, I could simply use the backbone.generate() function for my purpose.

Thanks in advance!

The text was updated successfully, but these errors were encountered:

SeanLee97 · 2024-06-26T10:44:36Z

Hi @dipankarsrirag, thanks for your kind words. AnglE supports bi-directional LLMs.

If you want to train AnglE embedding with bi-directional LLMs, you can refer to this documentation, in Examples/b.LLM-based

If you just want to test the prompt with biLLM, you can directly use our BiLLM toolkit: https://github.com/WhereIsAI/BiLLM. It is compatible with huggingface transformers.

dipankarsrirag · 2024-06-26T11:00:16Z

Hi @SeanLee97, thanks for the quick reply. I have been working with AnglE for the past few hours now. Just need a clarification:

When I initialise a bidirectional LLM with AnglE like this:
angle = AnglE.from_pretrained( 'mistralai/Mistral-7b-Instruct-v0.2', is_llm=True, apply_billm=True, billm_model_class = "MistralForCausalLM", load_kbit=4, torch_dtype=torch.bfloat16, pooling_strategy="last", trust_remote_code=True )
Would the model returned by model = angle.backbone, have its attentions changed to bidirectional?
I have a mask filling task with each input being a <masked_sentence, target_word>, which according to the documentation is in Prompts.C format. But when I use the angle.fit() method for finetuning, I get an error saying that only Prompts.A format is supported. This made me use the SFTTrainer with model. Is this correct. If not, how would I do it otherwise?

SeanLee97 · 2024-06-27T01:41:06Z

hi @dipankarsrirag, here are the answers to the questions:

Yes. when you set is_llm=True and apply_billm=True, the backbone will be bi-directional.
The Prompts setting only works for the inference phase. If you use angle-trainer and want to apply a prompt for all text columns in the training stage, you can specify the prompt via --prompt_template "Here is your custom prompt {text}". If you use custom code, you can assign a prompt to prompt_template in AngleDataTokenizer, see this documentation. Other situations, for example, just apply a prompt to a specific text column, please set a prompt to it manually, i.e., do it in the preprocessing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use of causal models for generation #82

Use of causal models for generation #82

dipankarsrirag commented Jun 26, 2024 •

edited

Loading

SeanLee97 commented Jun 26, 2024

dipankarsrirag commented Jun 26, 2024

SeanLee97 commented Jun 27, 2024 •

edited

Loading

Use of causal models for generation #82

Use of causal models for generation #82

Comments

dipankarsrirag commented Jun 26, 2024 • edited Loading

SeanLee97 commented Jun 26, 2024

dipankarsrirag commented Jun 26, 2024

SeanLee97 commented Jun 27, 2024 • edited Loading

dipankarsrirag commented Jun 26, 2024 •

edited

Loading

SeanLee97 commented Jun 27, 2024 •

edited

Loading