title | emoji | colorFrom | colorTo | sdk | sdk_version | app_file | pinned | header | fullWidth | license | short_description |
---|---|---|---|---|---|---|---|---|---|---|---|
Exllama |
😽 |
purple |
indigo |
gradio |
5.5.0 |
app.py |
false |
mini |
true |
apache-2.0 |
Chat: exllama v2 |
A Gradio-based chat interface for ExLlamaV2, featuring Mistral-7B-Instruct-v0.3 and Llama-3-70B-Instruct models. Experience high-performance inference on consumer GPUs with Flash Attention support.
- 🚀 Powered by ExLlamaV2 inference library
- 💨 Flash Attention support for optimized performance
- 🎯 Supports multiple instruction-tuned models:
- Mistral-7B-Instruct v0.3
- Meta's Llama-3-70B-Instruct
- ⚡ Dynamic text generation with adjustable parameters
- 🎨 Clean, modern UI with dark mode support
Customize your chat experience with these adjustable parameters:
- System Message: Set the AI assistant's behavior and context
- Max Tokens: Control response length (1-4096)
- Temperature: Adjust response creativity (0.1-4.0)
- Top-p: Fine-tune response diversity (0.1-1.0)
- Top-k: Control vocabulary sampling (0-100)
- Repetition Penalty: Prevent repetitive text (0.0-2.0)
- Framework: Gradio 5.5.0
- Models: ExLlamaV2-compatible models
- UI: Custom-themed interface with Gradio's Soft theme
- Optimization: Flash Attention for improved performance
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
- ExLlamaV2 for the core inference library
- Hugging Face for hosting and model distribution
- Gradio for the web interface framework
Made with ❤️ using ExLlamaV2 and Gradio