Example llama3 on inf2 (#3133)

* add llama3 support * delete model config yaml * update model config * fix typo --------- Co-authored-by: Matthias Reso <[email protected]>
pytorch · May 8, 2024 · 0b4539f · 0b4539f
1 parent 239f91e
commit 0b4539f
Show file tree

Hide file tree

Showing 13 changed files with 475 additions and 36 deletions.
diff --git a/...large_models/inferentia2/llama2/Readme.md → .../large_models/inferentia2/llama/Readme.md b/...large_models/inferentia2/llama2/Readme.md → .../large_models/inferentia2/llama/Readme.md
@@ -1,6 +1,6 @@
 # Large model inference on Inferentia2
 
-This folder briefs on serving the [Llama 2](https://huggingface.co/meta-llama) model on [AWS Inferentia2](https://aws.amazon.com/ec2/instance-types/inf2/) for text completion with TorchServe's features:
+This folder briefs on serving the [Llama 2 and Llama 3](https://huggingface.co/meta-llama) model on an [AWS Inferentia2](https://aws.amazon.com/ec2/instance-types/inf2/) for text completion with TorchServe's features:
 
 * demo1: [micro batching](https://github.com/pytorch/serve/tree/96450b9d0ab2a7290221f0e07aea5fda8a83efaf/examples/micro_batching) and [streaming response](https://github.com/pytorch/serve/blob/96450b9d0ab2a7290221f0e07aea5fda8a83efaf/docs/inference_api.md#curl-example-1) support in folder [streamer](streamer).
 * demo2: continuous batching support in folder [continuous_batching](continuous_batching)
diff --git a/...tia2/llama2/continuous_batching/Readme.md → ...ntia2/llama/continuous_batching/Readme.md b/...tia2/llama2/continuous_batching/Readme.md → ...ntia2/llama/continuous_batching/Readme.md