Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ollama support? #1001

Open
txhno opened this issue Aug 26, 2024 · 6 comments · May be fixed by #1036
Open

ollama support? #1001

txhno opened this issue Aug 26, 2024 · 6 comments · May be fixed by #1036

Comments

@txhno
Copy link

txhno commented Aug 26, 2024

Is your feature request related to a problem? Please describe.
would want to reuse the models that I already have downloaded on ollama

Describe the solution you'd like
being able to use models.ollama(model_name_or_path)

Describe alternatives you've considered
llama cpp works as of now, but ollama would just make process of using this app a lot more user friendly having downloads automated and models stored centrally

Additional context
none

@guidance-ai guidance-ai deleted a comment from txhno Aug 26, 2024
@Harsha-Nori
Copy link
Collaborator

Harsha-Nori commented Aug 26, 2024

@txhno Sorry about that random weird comment...removed your reply too since it had a quote of the link in it, hope that's OK!

On topic -- exploring Ollama support is a really good idea. My understanding is that they just use llama.cpp under the hood and manage GGUF files, right? If we can figure out where the GGUFs are hosted on local file systems, then we can use our llama.cpp infrastructure to make it easy to load ollama models in guidance.

We can put this on our backlog to investigate, but if you (or anyone reading this!) have some knowledge about how Ollama works, I'd be happy to tag-team and support a PR here.

@riedgar-ms @nking-1 for awareness

@microdev1
Copy link

Hi! I’ve implemented a thin wrapper for Ollama support in my fork. Can you give it a shot before I submit a PR? Thanks!

@microdev1 microdev1 linked a pull request Sep 25, 2024 that will close this issue
@leechen2024
Copy link

When will Ollama support be available?
I'm trying to do it with forked Ollama, but I'm getting {G|Number|G} if it's not being handled correctly.
The Ollama initialization seems to be working fine, but the guidance doesn't seem to be producing the right result.

@xruifan
Copy link
Contributor

xruifan commented Nov 11, 2024

When will Ollama support be available? I'm trying to do it with forked Ollama, but I'm getting {G|Number|G} if it's not being handled correctly. The Ollama initialization seems to be working fine, but the guidance doesn't seem to be producing the right result.

This is likely because the model's chat template did not load. see the comment here

@xruifan
Copy link
Contributor

xruifan commented Nov 11, 2024

From what I know, for a model to work with guidance, it needs to provide guidance role start and role end tags, e.g., <|user|>\n and <|assistant|>\n for Phi3 Small and Medium. see here.

Currently, guidance uses the templates of the models as keys to find the constructed chat template classes; otherwise, it uses the predefined chat template class, ChatMLTemplate, if the chat template class for the model in use is not implemented. However, using the default tags of ChatMLTemplate may cause guidance to be not constrained and generate unexpected outputs.

Ollama uses llamacpp as its backend, and the models that Ollama serves contain a template and modelfile, see the output of Ollama api /api/show -d '{"name": "phi3"}':

{
  "license": "Microsoft.\nCopyright (c) Microsoft Corporation.\n\nMIT License\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.",
  "modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this, replace FROM with:\n# FROM phi3:latest\n\nFROM D:\\ollama_models\\blobs\\sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf\nTEMPLATE \"{{ if .System }}<|system|>\n{{ .System }}<|end|>\n{{ end }}{{ if .Prompt }}<|user|>\n{{ .Prompt }}<|end|>\n{{ end }}<|assistant|>\n{{ .Response }}<|end|>\"\nPARAMETER stop <|end|>\nPARAMETER stop <|user|>\nPARAMETER stop <|assistant|>\nLICENSE \"\"\"Microsoft.\nCopyright (c) Microsoft Corporation.\n\nMIT License\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\"\"\"\n",
  "parameters": "stop                           \"<|end|>\"\nstop                           \"<|user|>\"\nstop                           \"<|assistant|>\"",
  "template": "{{ if .System }}<|system|>\n{{ .System }}<|end|>\n{{ end }}{{ if .Prompt }}<|user|>\n{{ .Prompt }}<|end|>\n{{ end }}<|assistant|>\n{{ .Response }}<|end|>",
  "details": {
    "parent_model": "",
    "format": "gguf",
    "family": "phi3",
    "families": [
      "phi3"
    ],
    "parameter_size": "3.8B",
    "quantization_level": "Q4_0"
  },
  "model_info": {
    "general.architecture": "phi3",
    "general.basename": "Phi-3",
    "general.file_type": 2,
    "general.finetune": "128k-instruct",
    "general.languages": [
      "en"
    ],
    "general.license": "mit",
    "general.license.link": "https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/resolve/main/LICENSE",
    "general.parameter_count": 3821079648,
    "general.quantization_version": 2,
    "general.size_label": "mini",
    "general.tags": [
      "nlp",
      "code",
      "text-generation"
    ],
    "general.type": "model",
    "phi3.attention.head_count": 32,
    "phi3.attention.head_count_kv": 32,
    "phi3.attention.layer_norm_rms_epsilon": 0.00001,
    "phi3.attention.sliding_window": 262144,
    "phi3.block_count": 32,
    "phi3.context_length": 131072,
    "phi3.embedding_length": 3072,
    "phi3.feed_forward_length": 8192,
    "phi3.rope.dimension_count": 96,
    "phi3.rope.freq_base": 10000,
    "phi3.rope.scaling.attn_factor": 1.1902381,
    "phi3.rope.scaling.original_context_length": 4096,
    "tokenizer.ggml.add_bos_token": false,
    "tokenizer.ggml.add_eos_token": false,
    "tokenizer.ggml.bos_token_id": 1,
    "tokenizer.ggml.eos_token_id": 32000,
    "tokenizer.ggml.model": "llama",
    "tokenizer.ggml.padding_token_id": 32000,
    "tokenizer.ggml.pre": "default",
    "tokenizer.ggml.scores": null,
    "tokenizer.ggml.token_type": null,
    "tokenizer.ggml.tokens": null,
    "tokenizer.ggml.unknown_token_id": 0
  },
  "modified_at": "2024-11-06T12:04:14+08:00"
}

Supposedly, if the model Ollama serves contains a chat template and the corresponding chat template is implemented in guidance, guidance will work fine. But for all the models of Ollama to fully work, it needs a way for the forked Ollama models to locate their role tags. One approach is to implement chat templates for all Ollama models in guidance/chat.py, but this is somewhat cumbersome and labour intensive.

I am not sure if there are any other ways to automatically retrieve the role tags based on the model information provided by Ollama. If I have misunderstood anything, please correct me.

Fan

@microdev1
Copy link

I am not sure if there are any other ways to automatically retrieve the role tags based on the model information provided by Ollama. If I have misunderstood anything, please correct me.

You are spot on. There is #947 which attempts to extract a ChatTemplate from HuggingFace transformer tokenizer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants
@Harsha-Nori @txhno @microdev1 @xruifan @leechen2024 and others