Fix GEMMA_2, LLAMA_3, and PHI_3 MessagesFormatter #73

woheller69 · 2024-07-07T17:29:56Z

Gemma needs another \n in the prompt template to avoid slow processing of follow up prompts, see #54

As can be seen in the attached screenshot the prompt processing is much faster.
Before: The new user prompt plus the models previous answer are processed (42 tokens)
After: Only the new user prompt is processed (18 tokens)

The longer the models previous answer the more relevant...

Probably there are similar issues in all other prompt templates.

Gemma needs another \n to avoid slow processing of follow up prompts, see Maximilian-Winter#54

Add another \n to speed up follow up prompts

<|end_of_turn|> does not exist

pabl-o-ce · 2024-07-07T21:39:06Z

Great patch I would check all to see if anyone is missing a typo or newline.

Thank u

pabl-o-ce · 2024-07-07T21:56:04Z

it seems that keep missing some newlines in there you wanna add in llama3 in this PR?
It would be like this the correct one:

llama_3_prompt_markers = {
    Roles.system: PromptMarkers("""<|start_header_id|>system<|end_header_id|>\n\n""", """<|eot_id|>"""),
    Roles.user: PromptMarkers("""<|start_header_id|>user<|end_header_id|>\n\n""", """<|eot_id|>"""),
    Roles.assistant: PromptMarkers("""<|start_header_id|>assistant<|end_header_id|>\n\n""", """<|eot_id|>"""),
    Roles.tool: PromptMarkers("""<|start_header_id|>function_calling_results<|end_header_id|>\n""", """<|eot_id|>"""),
}

I don't know if in tools would be another \n newline.. I'm not sure

~~On Gemma I'm not sure is required a new line I have to test it~~

And on Phi you are correct :)

pabl-o-ce · 2024-07-07T22:01:33Z

I saw the issue you put it in Gemma Thanks.

Just if wanna add the missing Llama 3 would be awesome

Thank u for the contribution

woheller69 · 2024-07-08T04:16:33Z

I don't use tools, so I don't know if another \n is missing there. But Llama3 is also fixed in this PR.

But I think a better long term fix in general would be my proposal in #54, i.e. to keep the leading and trailing newlines (and spaces?) produced by the model and store them in history. I don't know the code (and I am a python noob) but it feels like they are being stipped.

woheller69 · 2024-07-08T04:24:11Z

it seems that keep missing some newlines in there you wanna add in llama3 in this PR? It would be like this the correct one:

llama_3_prompt_markers = {
    Roles.system: PromptMarkers("""<|start_header_id|>system<|end_header_id|>\n\n""", """<|eot_id|>"""),
    Roles.user: PromptMarkers("""<|start_header_id|>user<|end_header_id|>\n\n""", """<|eot_id|>"""),
    Roles.assistant: PromptMarkers("""<|start_header_id|>assistant<|end_header_id|>\n\n""", """<|eot_id|>"""),
    Roles.tool: PromptMarkers("""<|start_header_id|>function_calling_results<|end_header_id|>\n""", """<|eot_id|>"""),
}

We only need one more \n in Roles.assistant, not in Roles.user (there it should not matter)
We only need to make sure that we append the latest model answer exactly as the model produced it so that it finds the tokens already processed in memory.

woheller69 · 2024-07-25T14:56:17Z

Is there any reason, not to merge this?

Maximilian-Winter · 2024-08-01T00:46:28Z

@woheller69 Sorry, was busy with other stuff, will merge this after checking it in the next days.

woheller69 · 2024-09-18T17:30:18Z

Mistral Small 22B needs an additional blank for faster multi-turn conversation: Roles.assistant: PromptMarkers(""" """, """""")
Unfortunately other Mistral based models like Mistral-Nemo do not need it...

woheller69 · 2024-10-03T16:35:51Z

what is the reason for stripping the prompt? For Mistral Small it can also be fixed like this if the assistant reply is not stripped. I think this would alter the history the llm knows:

def _format_message_content(self, content: str, role: Roles) -> str:
    if self.strip_prompt and role != Roles.assistant:
        return content.strip()
    else:
        return content

Can that be the root cause?

woheller69 · 2024-10-03T17:52:58Z

def _format_response(
        self,
        formatted_messages: str,
        last_role: Roles,
        response_role: Literal[Roles.user, Roles.assistant] | None = None,
) -> Tuple[str, Roles]:
    if response_role is None:
        response_role = Roles.assistant if last_role != Roles.assistant else Roles.user

    prompt_start = self.prompt_markers[response_role].start.strip() if self.strip_prompt else self.prompt_markers[
        response_role].start
    return formatted_messages + prompt_start, response_role

Also _format_response contains strip() and seems to be part of the problem.

If I set strip_prompt = False Llama3, Mixtral, and Gemma work fine without the modifications I proposed above.
So essentially: what is the advantage of strip_prompt = True which is set by default?

woheller69 · 2024-12-01T15:49:32Z

Is this project still maintained?

pabl-o-ce · 2024-12-10T17:24:05Z

Hello @woheller69 I'm here to help / maintain the repo

pabl-o-ce

looks good thank you

woheller69 · 2024-12-10T17:42:39Z

do you know the reason for "So essentially: what is the advantage of strip_prompt = True which is set by default?"
This seems to be the root cause.

pabl-o-ce · 2024-12-10T19:32:48Z

No ideas... but I would ask @Maximilian-Winter whenever he have time. Also @woheller69 if you have discord we can talk there I'm looking forward to have people help us in the repo

woheller69 added 2 commits July 7, 2024 19:24

Fix GEMMA MessagesFormatter

05325eb

Gemma needs another \n to avoid slow processing of follow up prompts, see Maximilian-Winter#54

Fix LLAMA_3 messages formatter

b60b0a1

Add another \n to speed up follow up prompts

woheller69 changed the title ~~Fix GEMMA MessagesFormatter~~ Fix GEMMA_2 and LLAMA_3 MessagesFormatter Jul 7, 2024

woheller69 changed the title ~~Fix GEMMA_2 and LLAMA_3 MessagesFormatter~~ Fix GEMMA_2, LLAMA_3, and PHI_3 MessagesFormatter Jul 7, 2024

woheller69 mentioned this pull request Jul 7, 2024

Slow processing of follow-up prompt #54

Open

woheller69 mentioned this pull request Jul 10, 2024

Cache misses previous generation abetlen/llama-cpp-python#1369

Open

pabl-o-ce reviewed Dec 10, 2024

View reviewed changes

pabl-o-ce merged commit 40b1eb8 into Maximilian-Winter:master Dec 10, 2024

woheller69 deleted the patch-1 branch December 10, 2024 17:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GEMMA_2, LLAMA_3, and PHI_3 MessagesFormatter #73

Fix GEMMA_2, LLAMA_3, and PHI_3 MessagesFormatter #73

woheller69 commented Jul 7, 2024

pabl-o-ce commented Jul 7, 2024

pabl-o-ce commented Jul 7, 2024 •

edited

Loading

pabl-o-ce commented Jul 7, 2024

woheller69 commented Jul 8, 2024

woheller69 commented Jul 8, 2024

woheller69 commented Jul 25, 2024

Maximilian-Winter commented Aug 1, 2024

woheller69 commented Sep 18, 2024

woheller69 commented Oct 3, 2024

woheller69 commented Oct 3, 2024

woheller69 commented Dec 1, 2024

pabl-o-ce commented Dec 10, 2024

pabl-o-ce left a comment

woheller69 commented Dec 10, 2024

pabl-o-ce commented Dec 10, 2024

Fix GEMMA_2, LLAMA_3, and PHI_3 MessagesFormatter #73

Fix GEMMA_2, LLAMA_3, and PHI_3 MessagesFormatter #73

Conversation

woheller69 commented Jul 7, 2024

pabl-o-ce commented Jul 7, 2024

pabl-o-ce commented Jul 7, 2024 • edited Loading

pabl-o-ce commented Jul 7, 2024

woheller69 commented Jul 8, 2024

woheller69 commented Jul 8, 2024

woheller69 commented Jul 25, 2024

Maximilian-Winter commented Aug 1, 2024

woheller69 commented Sep 18, 2024

woheller69 commented Oct 3, 2024

woheller69 commented Oct 3, 2024

woheller69 commented Dec 1, 2024

pabl-o-ce commented Dec 10, 2024

pabl-o-ce left a comment

Choose a reason for hiding this comment

woheller69 commented Dec 10, 2024

pabl-o-ce commented Dec 10, 2024

pabl-o-ce commented Jul 7, 2024 •

edited

Loading