Llama with MLC on Jetson Orin NX 16GB

Pareper environment

sudo apt-get update && sudo apt-get install git python3-pip

git clone --depth=1 https://github.com/dusty-nv/jetson-containers

cd jetson-containers pip3 install -r requirements.txt

cd ./data && git clone https://github.com/LJ-Hao/MLC-LLM-on-Jetson.git && cd ..

Install and run contiainer

first step: install image

./run.sh --env HUGGINGFACE_TOKEN=<YOUR-ACCESS-TOKEN> $(./autotag mlc) /bin/bash -c 'ln -s $(huggingface-downloader meta-llama/Llama-2-7b-chat-hf) /data/models/mlc/dist/models/Llama-2-7b-chat-hf'

use sudo docker images to check wether the image is installed or not

second step: Install Llama2-7b-chat-hf and Use MLC quantify the model

./run.sh $(./autotag mlc) \
python3 -m mlc_llm.build \
--model Llama-2-7b-chat-hf \
--quantization q4f16_ft \
--artifact-path /data/models/mlc/dist \
--max-seq-len 4096 \
--target cuda \
--use-cuda-graph \
--use-flash-attn-mqa

thrid step: Run and enter docker

./run.sh <YOUR IMAGE NAME> 
#for me dustynv/mlc:51fb0f4-builder-r35.4.1 check result of first step

Run Llama with MLC

first : run Llama without quanifing with MCL

cd /data/MLC-LLM-on-Jetson && python3 Llama-2-7b-chat-hf.py

here is the result: https://github.com/dusty-nv/jetson-containers you can see without quanifing with MLC, Jetson Nano 16GB can load the model but cant not run

second : run Llama with quanifing with MCL

cd /data/MLC-LLM-on-Jetson && python3 Llama-2-7b-chat-hf-q4f16_ft.py

here is the result: you can see with quanifing with MLC, Jetson Nano 16GB can run

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
source		source
LICENSE		LICENSE
Llama-2-7b-chat-hf-q4f16_ft.py		Llama-2-7b-chat-hf-q4f16_ft.py
Llama-2-7b-chat-hf.py		Llama-2-7b-chat-hf.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Llama with MLC on Jetson Orin NX 16GB

Pareper environment

Install and run contiainer

first step: install image

second step: Install Llama2-7b-chat-hf and Use MLC quantify the model

thrid step: Run and enter docker

Run Llama with MLC

first : run Llama without quanifing with MCL

second : run Llama with quanifing with MCL

Video of running Llama with MLC on Jetson Orin NX 16GB:

Thanks for MLC LLM and dusty.

About

Releases

Packages

Languages

License

LJ-Hao/MLC-LLM-on-Jetson

Folders and files

Latest commit

History

Repository files navigation

Llama with MLC on Jetson Orin NX 16GB

Pareper environment

Install and run contiainer

first step: install image

second step: Install Llama2-7b-chat-hf and Use MLC quantify the model

thrid step: Run and enter docker

Run Llama with MLC

first : run Llama without quanifing with MCL

second : run Llama with quanifing with MCL

Video of running Llama with MLC on Jetson Orin NX 16GB:

Thanks for MLC LLM and dusty.

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages