Skip to content

Commit

Permalink
update wording
Browse files Browse the repository at this point in the history
  • Loading branch information
tybalex committed Jul 5, 2024
1 parent 5790f61 commit 992dee7
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ We extend the following inferencing tools to run Rubra models in an OpenAI-compa
- [llama.cpp](https://github.com/rubra-ai/tools.cpp)
- [vLLM](https://github.com/rubra-ai/vllm)

**Note**: It is a known issue that Llama3 models (including 8B and 70B) are more prone to damage from quantization. We recommend serving them with either vLLM or using the fp16 quantization.
**Note**: Llama3 models, including the 8B and 70B variants, are known to experience increased perplexity and a subsequent degradation in function-calling performance as a result of quantization. We recommend serving them with either vLLM or using the fp16 quantization.

## Benchmark

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ We extend the following inferencing tools to run Rubra models in an OpenAI-compa
- [llama.cpp](https://github.com/rubra-ai/tools.cpp)
- [vLLM](https://github.com/rubra-ai/vllm)

**Note**: It is a known issue that Llama3 models (including 8B and 70B) are more prone to damage from quantization. We recommend serving them with either vLLM or using the fp16 quantization.
**Note**: Llama3 models, including the 8B and 70B variants, are known to experience increased perplexity and a subsequent degradation in function-calling performance as a result of quantization. We recommend serving them with either vLLM or using the fp16 quantization.

## Contributing

Expand Down

0 comments on commit 992dee7

Please sign in to comment.