Skip to content

Commit

Permalink
comments on compilation documentation (#726)
Browse files Browse the repository at this point in the history
Added some comments on the parameters and link to the documentation to avoid any confusion for anyone changing the settings.
  • Loading branch information
jimburtoft authored Oct 28, 2024
1 parent 6a6bb03 commit 5652735
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion benchmark/text-generation-inference/performance/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,10 @@ See the [llama3-70b-trn1.32xlarge](llama3-70b-trn1.32xlarge) as an example.

It is best to compile the model with the software in the container you will be using to ensure all library versions match.

As an example, you can compile with the following command. **If you make changes, make sure your batch size, sequence length, and num_cores for compilation match the same settings in the .env file**
As an example, you can compile with the following command.

**If you make changes, make sure your batch size, sequence length, and num_cores for compilation match the MAX_BATCH_SIZE, and MAX_TOTAL_TOKENS settings in the .env file and the HF_NUM_CORES setting in the docker-compose file.
MAX_INPUT_LENGTH needs to be less than sequence_length/MAX_TOTAL_TOKENS. The directory at the end of the compile command needs to match the MODEL_ID in the .env file.**

```
docker run -p 8080:80 \
Expand Down Expand Up @@ -123,6 +126,7 @@ export neuron --model NousResearch/Meta-Llama-3-70B-Instruct \
--num_cores 32 \
/data/exportedmodel/
```
See the [Hugging Face documentation](https://huggingface.co/docs/optimum-neuron/en/guides/export_model#exporting-a-model-to-neuron-using-the-cli) for more information on compilation.

Note that the .env file has a path for MODEL_ID to load the model from the /data directory.

Expand Down

0 comments on commit 5652735

Please sign in to comment.