Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When trying to create a Task Bundle using a TFLite file, I'm not allowed to enter the stop token of the model #5715

Open
Arya-Hari opened this issue Nov 3, 2024 · 8 comments
Assignees
Labels
platform:python MediaPipe Python issues task:LLM inference Issues related to MediaPipe LLM Inference Gen AI setup type:bug Bug in the Source Code of MediaPipe Solution

Comments

@Arya-Hari
Copy link

Arya-Hari commented Nov 3, 2024

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

No

OS Platform and Distribution

Linux Ubuntu 16.04

Mobile device if the issue happens on mobile device

No response

Browser and version if the issue happens on browser

No response

Programming Language and version

Python

MediaPipe version

No response

Bazel version

No response

Solution

LLM Inference

Android Studio, NDK, SDK versions (if issue is related to building in Android environment)

No response

Xcode & Tulsi version (if issue is related to building for iOS)

No response

Describe the actual behavior

I created .tflite file using ai-edge-torch for Llama 3.2 1B model and now am trying to deploy it for inference on edge. When trying to create the task bundle, the stop token is asked. When I provide "<|end_of_text|>", it is not able to resolve it. I previously converted the tokenizer to the SentencePiece format through the code given in the ai-edge-torch repository.

Describe the expected behaviour

The task bundle should be created without errors.

Standalone code/steps you may have used to try to get what you need

I tried to manually check the possible tokens the model could identify using its vocab and "<|end_of_text|>" is a token in its vocab.

I also tried changing the stop token and the task bundle was created. However, on using the bundle for deployment, I was getting a Failed to initialize engine : modelError building tflite model. Also, just as a side question, the .task file that's created, can it be used interchangeably with the .bin file that's given in the model path in the repository examples?

Other info / Complete Logs

No response

@Arya-Hari Arya-Hari added the type:bug Bug in the Source Code of MediaPipe Solution label Nov 3, 2024
@kuaashish kuaashish added os:linux-non-arm Issues on linux distributions which run on x86-64 architecture. DOES NOT include ARM devices. python Pull requests that update Python code task:LLM inference Issues related to MediaPipe LLM Inference Gen AI setup and removed os:linux-non-arm Issues on linux distributions which run on x86-64 architecture. DOES NOT include ARM devices. labels Nov 4, 2024
@kuaashish
Copy link
Collaborator

Hi @Arya-Hari,

Could you please share the complete example you are using from our documentation? Additionally, if you have any error logs, sharing them would help us better understand the issue.

Thank you!!

@kuaashish kuaashish added the stat:awaiting response Waiting for user response label Nov 4, 2024
@Arya-Hari
Copy link
Author

Hello @kuaashish

So I converted the tokenizer to the SentencePiece compatible format through the code given in the ai-edge-torch repository. It generated a llama3.spm.model file.

Then I ran this script

import sentencepiece as spm

# Load the SentencePiece model
sp = spm.SentencePieceProcessor()
sp.load("/content/llama3.spm.model")

# Check special tokens or tokens that might indicate sequence ends
print("End token ID:", sp.eos_id())  # Check if the model has a predefined EOS token ID
print("Start token ID:", sp.bos_id())  # BOS may also indicate a start-of-sequence token

vocab_size = sp.get_piece_size()
for i in range(vocab_size):
    print(f"ID {i}: {sp.id_to_piece(i)}")

This then printed 128255 tokens along with their ID. The token with ID 128001 was <|end_of_text|>. According to the official documentation in the config files of Llama 3.2 1B, this is the stop token, and <|begin_of_text|> is the start token.

When running this piece of code as given in llm_bundling.ipynb,

tflite_model="/content/gemma_2b_quantized.tflite" # @param {type:"string"}
tokenizer_model="/content/llama3.spm.model" # @param {type:"string"}
start_token="<|begin_of_text|>" # @param {type:"string"}
stop_token="<|end_of_text|>" # @param {type:"string"}
output_filename="/content/llama.task" # @param {type:"string"}
enable_bytes_to_unicode_mapping=False # @param ["False", "True"] {type:"raw"}

config = bundler.BundleConfig(
    tflite_model=tflite_model,
    tokenizer_model=tokenizer_model,
    start_token=start_token,
    stop_tokens=[stop_token],
    output_filename=output_filename,
    enable_bytes_to_unicode_mapping=enable_bytes_to_unicode_mapping,
)
bundler.create_bundle(config)

I get this error - ValueError: Failed to encode stop token <|end_of_text|> with tokenizer.. When I try with any other valid token from the list of 128255 tokens, the code executes properly and generates a .task file. This is the first issue.

Secondly, when pushing the model onto the device, the documentation requires that a .bin file be pushed. I did not understand how to generate the .bin file after generating the Task Bundle.

Your help is much appreciated. Thank you!

Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@Arya-Hari Arya-Hari reopened this Nov 5, 2024
@Arya-Hari
Copy link
Author

@kuaashish Hello...is there way to resolve this?

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Waiting for user response label Nov 7, 2024
@talumbau
Copy link
Contributor

Thanks for all of the detail provided. Two quick items:

  1. re: .task vs. .bin: Yes, you can use the .task wherever you would use a .bin file. The .task extension indicates that the file is a "converted TF Lite model + metadata/tokenizer"
  2. I noticed in your provided script that you have this line:
tflite_model="/content/gemma_2b_quantized.tflite" # @param {type:"string"}

Is this just a copy/paste error? I assumed you would have something like llama3_1_1b_quantized.tflite, not a Gemma model.

@Arya-Hari
Copy link
Author

Hi @talumbau. To clarify, I used the quantization script provided in the AI Edge Torch repository for quantizing and converting it to the TFLite format. The script used there, by default, saves the output file under the name of gemma_2b_quantized.tflite. I forgot to change it before using it. But, I changed everything else from the script to work for Llama 3.2 1B instead. Sorry for the confusion.

@hheydary
Copy link
Contributor

Hello @Arya-Hari,
Thank you for reporting this issue. The task bundler code is now updated at HEAD to allow end of text tokens to be same as unknown token. Please pull the latest changes and create the task bundle file.

@Arya-Hari
Copy link
Author

Okay thank you

@kuaashish kuaashish added platform:python MediaPipe Python issues and removed python Pull requests that update Python code labels Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:python MediaPipe Python issues task:LLM inference Issues related to MediaPipe LLM Inference Gen AI setup type:bug Bug in the Source Code of MediaPipe Solution
Projects
None yet
Development

No branches or pull requests

4 participants