-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update tensor-llm v0.9 with latest api #46
base: release/0.1
Are you sure you want to change the base?
Conversation
update requirements final code clear
This is cool, but the branch you did was an unstable one for dev. think rel branch is 0.8.0 is the stable one :) the naming of the branches is a bit off if you ask me. |
@suede299 see your json/yaml config, is the structure changed? |
Is it the .json under RAG\trt-llm-rag-windows-main\config? Went back through the TensorRT-LLM docs and couldn't find anything config related. |
@suede299 The NV likes change the config file of the trt engine. When you generate the trt engine files, you will get a file of config.json and rank0.engine.
|
@suede299 This is why the version compat is very hard. The NV Group loves to change the file config format everyday. |
Thanks for the reply. |
@suede299 TensorRT is better for docker server, or edge device, not suitable for consumer clients. The tensorrt model need to keep the same version with TensorRT Library. Every version of model format is different, even with 9.2.0 and 9.2.1. The tensorrt will check the engine file header of magic number to check the engine generated version. Sometimes, the engine need to regenerate when your card or driver sdk is changed. The engine file is very unstable, and need the environment that hardware and software are not changed. Maybe consumers will more like to compat old versions. However, tensorrt group likes mutable every version. |
Yes, I gave up, quantizing the gemma model would be wrong, found a change to the gemma script on github, tried to update it, but it asked for version 0.10dev, and there was no whl available for the win platform at all. |
update tensor-llm v0.9 with the latest api (ModelRunner/ModelRunnerCpp)
Have tested successfully on Linux Docker for Llama-2-13b-chat-hf