-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AmazonBedrockChatGenerator Mistral models cause credentials issue #732
Comments
I'm not sure why it's needed for Bedrock, but the error is related to accessing a HF model without a token. You need to sign up for a hugging face account and agree to the terms of a specific model. Try:
and see if you can run the generator again. |
Thanks for the suggestion @lbux, I will try that out to see if it fixes the issue; however, it doesn't seem like a workable solution as the |
AutoTokenizer is used to ensure the model doesn't exceed its prompt length and the tokenizer hosted on HF is used for that. From my experience with using Bedrock, there isn't any support for something similar natively. The issue is that Mistral decided that users must agree to their terms on HF before having access to the model. So, unless there is a way to do tokenization without using HF, a valid HF token will be needed. It's not ideal in your case. I believe you can bypass the need for the HF token by downloading the model and passing in the model location to from_pretrained() in chat/adapters.py, but that is even less than ideal as the file is quite large and any updates to the library AmazonBedrock component may break the "patch". |
Just following up on this. I'm able to call Mistral models with Bedrock via LlamaIndex without creating a Huggingface account and setting Huggingface credentials. So this seems like a Haystack issue more than a Bedrock/Mistral issue. import os
from llama_index.llms.bedrock import Bedrock
llm = Bedrock(
model="mistral.mistral-large-2402-v1:0",
profile_name=os.getenv("AWS_PROFILE"),
)
resp = llm.complete("What is the capital of France? Tell me a fun fact about French people.")
print(resp)
|
Yes, I took a look at their code and they are not using a tokenizer to count the input. I don't think they're using anything to count the input either besides allowing you to set a context_size. Unless they have other mechanisms to check, then it is possible to exceed the context size and your input will either be truncated or rejected after making the call. Haystack takes an alternative approach and counts your input before sending it off and returns an error if it's too long. I don't think either solution is ideal. Mistral does provide their tokenizer on github with 3 different tokenizer versions depending on what mistral model you use, but this would probably slow down the process and it still does require you to have the model saved somewhere. |
It doesn't need to download a model. As per docs
Based on the model's name, tokenizer version can be identified. It should solve both the problems of downloading a model and using the HF token. |
Describe the bug
When attempting to use Mistral models with the
AmazonBedrockChatGenerator
a 401 unauthenticated error response is returned from huggingface.co as shown below.To Reproduce
Set the required AWS credentials env vars in the environment then attempt to instantiate the
AmazonBedrockChatGenerator
using a Mistral model.Describe your environment (please complete the following information):
The text was updated successfully, but these errors were encountered: