Skip to content

Commit

Permalink
Example llama3 on inf2 (#3133)
Browse files Browse the repository at this point in the history
* add llama3 support

* delete model config yaml

* update model config

* fix typo

---------

Co-authored-by: Matthias Reso <[email protected]>
  • Loading branch information
lxning and mreso authored May 8, 2024
1 parent 239f91e commit 0b4539f
Show file tree
Hide file tree
Showing 13 changed files with 475 additions and 36 deletions.
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Large model inference on Inferentia2

This folder briefs on serving the [Llama 2](https://huggingface.co/meta-llama) model on [AWS Inferentia2](https://aws.amazon.com/ec2/instance-types/inf2/) for text completion with TorchServe's features:
This folder briefs on serving the [Llama 2 and Llama 3](https://huggingface.co/meta-llama) model on an [AWS Inferentia2](https://aws.amazon.com/ec2/instance-types/inf2/) for text completion with TorchServe's features:

* demo1: [micro batching](https://github.com/pytorch/serve/tree/96450b9d0ab2a7290221f0e07aea5fda8a83efaf/examples/micro_batching) and [streaming response](https://github.com/pytorch/serve/blob/96450b9d0ab2a7290221f0e07aea5fda8a83efaf/docs/inference_api.md#curl-example-1) support in folder [streamer](streamer).
* demo2: continuous batching support in folder [continuous_batching](continuous_batching)
Loading

0 comments on commit 0b4539f

Please sign in to comment.