[Experimental] Script to export 🤗 models #4723

guangy10 · 2024-08-15T01:36:33Z

[Done] ~~Require PR Make StaticCache configurable at model construct time in order to export, lower and run the 🤗 model OOTB.~~
[Done] ~~Require huggingface/transformers#33303 or huggingface/transformers#33287 to be merged to 🤗 transformers to resolve the export issue introduced by huggingface/transformers#32543~~

Now we can take the integration point from 🤗 transformers to lower compatible models to ExecuTorch OOTB.

This PR creates a simple script with recipe of XNNPACK.
This PR also created a secret EXECUTORCH_HT_TOKEN to allow download checkpoints in the CI
This PR connects the 🤗 "Export to ExecuTorch" e2e workflow to ExecuTorch CI

Instructions to run the demo:

Run the export_hf_model.py to lower gemma-2b to ExecuTorch:

python -m extension.export_util.export_hf_model -hfm "google/gemma-2b" # The model is exported statical dims with static KV cache

Run the tokenizer.py to generate the binary format for ExecuTorch runtime:

python -m extension.llm.tokenizer.tokenizer -t <path_to_downloaded_gemma_checkpoint_dir>/tokenizer.model -o tokenizer.bin

Build llm runner by following this guide step 4
Run the lowered model

cmake-out/examples/models/llama2/llama_main --model_path=gemma.pte --tokenizer_path=tokenizer.bin --prompt="My name is"

OOTB output and perf

I 00:00:00.003110 executorch:cpuinfo_utils.cpp:62] Reading file /sys/devices/soc0/image_version
I 00:00:00.003360 executorch:cpuinfo_utils.cpp:78] Failed to open midr file /sys/devices/soc0/image_version
I 00:00:00.003380 executorch:cpuinfo_utils.cpp:158] Number of efficient cores 4
I 00:00:00.003384 executorch:main.cpp:65] Resetting threadpool with num threads = 6
I 00:00:00.014716 executorch:runner.cpp:51] Creating LLaMa runner: model_path=gemma.pte, tokenizer_path=tokenizer_gemma.bin
I 00:00:03.065359 executorch:runner.cpp:66] Reading metadata from model
I 00:00:03.065391 executorch:metadata_util.h:43] get_n_bos: 1
I 00:00:03.065396 executorch:metadata_util.h:43] get_n_eos: 1
I 00:00:03.065399 executorch:metadata_util.h:43] get_max_seq_len: 123
I 00:00:03.065402 executorch:metadata_util.h:43] use_kv_cache: 1
I 00:00:03.065404 executorch:metadata_util.h:41] The model does not contain use_sdpa_with_kv_cache method, using default value 0
I 00:00:03.065405 executorch:metadata_util.h:43] use_sdpa_with_kv_cache: 0
I 00:00:03.065407 executorch:metadata_util.h:41] The model does not contain append_eos_to_prompt method, using default value 0
I 00:00:03.065409 executorch:metadata_util.h:43] append_eos_to_prompt: 0
I 00:00:03.065411 executorch:metadata_util.h:41] The model does not contain enable_dynamic_shape method, using default value 0
I 00:00:03.065412 executorch:metadata_util.h:43] enable_dynamic_shape: 0
I 00:00:03.130388 executorch:metadata_util.h:43] get_vocab_size: 256000
I 00:00:03.130405 executorch:metadata_util.h:43] get_bos_id: 2
I 00:00:03.130408 executorch:metadata_util.h:43] get_eos_id: 1
My name is Melle. I am a 20 year old girl from Belgium. I am living in the southern part of Belgium. I am 165 cm tall and I weigh 45kg. I like to play sports like swimming, running and playing tennis. I am very interested in music and I like to listen to classical music. I like to sing and I can play the piano. I would like to go to the USA because I like to travel a lot. I am looking for a boy from the USA who is between 18 and 25 years old. I
PyTorchObserver {"prompt_tokens":4,"generated_tokens":118,"model_load_start_ms":1723685715497,"model_load_end_ms":1723685718612,"inference_start_ms":1723685718612,"inference_end_ms":1723685732965,"prompt_eval_end_ms":1723685719087,"first_token_ms":1723685719087,"aggregate_sampling_time_ms":182,"SCALING_FACTOR_UNITS_PER_SECOND":1000}
I 00:00:17.482472 executorch:stats.h:70] 	Prompt Tokens: 4    Generated Tokens: 118
I 00:00:17.482475 executorch:stats.h:76] 	Model Load Time:		3.115000 (seconds)
I 00:00:17.482481 executorch:stats.h:86] 	Total inference time:		14.353000 (seconds)		 Rate: 	8.221278 (tokens/second)
I 00:00:17.482483 executorch:stats.h:94] 		Prompt evaluation:	0.475000 (seconds)		 Rate: 	8.421053 (tokens/second)
I 00:00:17.482485 executorch:stats.h:105] 		Generated 118 tokens:	13.878000 (seconds)		 Rate: 	8.502666 (tokens/second)
I 00:00:17.482486 executorch:stats.h:113] 	Time to first generated token:	0.475000 (seconds)
I 00:00:17.482488 executorch:stats.h:120] 	Sampling time over 122 tokens:	0.182000 (seconds)

pytorch-bot · 2024-08-15T01:36:36Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4723

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c707e4c with merge base bfce743 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

guangy10 · 2024-08-15T01:39:12Z

Not to merge until the dependency PRs are merged in 🤗 and included in the release, then we can bump the transformers version, merge this PR with CI to run it

.github/workflows/trunk.yml

guangy10 · 2024-09-11T22:49:30Z

The failure is expected because the required patch (huggingface/transformers#33303 or huggingface/transformers#33287) has not been merged to transformers yet.

guangy10 · 2024-09-12T00:45:44Z

Once this PR is unblocked and merged, we will connect the same workflow to the benchmarking infra.

facebook-github-bot · 2024-09-12T00:49:23Z

@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-09-12T00:51:20Z

@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

.github/workflows/trunk.yml

huydhn

The workflow and the script overall LGTM!

guangy10 · 2024-09-13T19:14:56Z

test-huggingface-transformers (google/gemma-2b) is working e2e. Can start merging this PR now.

facebook-github-bot · 2024-09-13T19:15:22Z

@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-09-13T20:29:25Z

@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-09-13T23:02:12Z

@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-09-14T00:21:22Z

@guangy10 merged this pull request in 67be84b.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 15, 2024

guangy10 mentioned this pull request Aug 15, 2024

Make StaticCache configurable at model construct time huggingface/transformers#32830

Merged

4 tasks

guangy10 force-pushed the gemma_executorch branch from 3c52b5e to 4e752f7 Compare August 28, 2024 03:58

guangy10 force-pushed the gemma_executorch branch 2 times, most recently from 232fed9 to cba4ffa Compare September 10, 2024 22:47

guangy10 changed the title ~~[Not To Merge][Experimental] Script to export 🤗 models~~ [Experimental] Script to export 🤗 models Sep 10, 2024

guangy10 added the ciflow/trunk label Sep 10, 2024

guangy10 force-pushed the gemma_executorch branch from cba4ffa to 9d7e16f Compare September 10, 2024 22:51

guangy10 marked this pull request as draft September 10, 2024 22:52

guangy10 force-pushed the gemma_executorch branch from 9d7e16f to 9766c69 Compare September 10, 2024 23:02

guangy10 removed the ciflow/trunk label Sep 10, 2024

guangy10 force-pushed the gemma_executorch branch from 9766c69 to 8a96833 Compare September 10, 2024 23:15

guangy10 added the ciflow/trunk label Sep 10, 2024

guangy10 force-pushed the gemma_executorch branch 3 times, most recently from de3430d to fb5672c Compare September 10, 2024 23:52

guangy10 commented Sep 11, 2024

View reviewed changes

.github/workflows/trunk.yml Outdated Show resolved Hide resolved

guangy10 requested a review from huydhn September 11, 2024 00:22

guangy10 force-pushed the gemma_executorch branch 10 times, most recently from 3e9acfe to 106883e Compare September 11, 2024 20:36

guangy10 force-pushed the gemma_executorch branch 2 times, most recently from 6333278 to 422102f Compare September 11, 2024 22:36

guangy10 force-pushed the gemma_executorch branch 5 times, most recently from d525d58 to 04b5ed2 Compare September 12, 2024 00:17

guangy10 marked this pull request as ready for review September 12, 2024 00:45

guangy10 requested a review from mergennachin September 12, 2024 00:46

guangy10 force-pushed the gemma_executorch branch from 04b5ed2 to f9df7be Compare September 12, 2024 00:51

huydhn reviewed Sep 12, 2024

View reviewed changes

.github/workflows/trunk.yml Outdated Show resolved Hide resolved

huydhn approved these changes Sep 12, 2024

View reviewed changes

guangy10 force-pushed the gemma_executorch branch 3 times, most recently from aefff2e to b3eefd7 Compare September 13, 2024 19:02

guangy10 force-pushed the gemma_executorch branch from b3eefd7 to 751ce14 Compare September 13, 2024 20:28

Script to export HF models

c707e4c

guangy10 force-pushed the gemma_executorch branch from 751ce14 to c707e4c Compare September 13, 2024 23:01

facebook-github-bot closed this in 67be84b Sep 14, 2024

facebook-github-bot added the Merged label Sep 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Experimental] Script to export 🤗 models #4723

[Experimental] Script to export 🤗 models #4723

guangy10 commented Aug 15, 2024 •

edited

Loading

pytorch-bot bot commented Aug 15, 2024 •

edited

Loading

guangy10 commented Aug 15, 2024

guangy10 commented Sep 11, 2024

guangy10 commented Sep 12, 2024

facebook-github-bot commented Sep 12, 2024

facebook-github-bot commented Sep 12, 2024

huydhn left a comment

guangy10 commented Sep 13, 2024

facebook-github-bot commented Sep 13, 2024

facebook-github-bot commented Sep 13, 2024

facebook-github-bot commented Sep 13, 2024

facebook-github-bot commented Sep 14, 2024

[Experimental] Script to export 🤗 models #4723

[Experimental] Script to export 🤗 models #4723

Conversation

guangy10 commented Aug 15, 2024 • edited Loading

Instructions to run the demo:

pytorch-bot bot commented Aug 15, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4723

✅ No Failures

guangy10 commented Aug 15, 2024

guangy10 commented Sep 11, 2024

guangy10 commented Sep 12, 2024

facebook-github-bot commented Sep 12, 2024

facebook-github-bot commented Sep 12, 2024

huydhn left a comment

Choose a reason for hiding this comment

guangy10 commented Sep 13, 2024

facebook-github-bot commented Sep 13, 2024

facebook-github-bot commented Sep 13, 2024

facebook-github-bot commented Sep 13, 2024

facebook-github-bot commented Sep 14, 2024

guangy10 commented Aug 15, 2024 •

edited

Loading

pytorch-bot bot commented Aug 15, 2024 •

edited

Loading